Study Plan & Record 02

Recently I learnt about both Data Mining and reviewed Python fundamentals, so every blog will be dividen into two sections for each of them, including contents I learn and thoughts I have every day. I am native chinese so some typoes may seem a little ridiculous. Sorry for that(Well, nobody will read it any way! but I will try to do my best).

Data Minin: Concepts and Technologies

Today, I read the Chatper 3 about data preprocessing, and it basically includes four parts: data cleaning, data integration, data reduction, and data transformation. The central idea above the chapter is to make our dataset to be accurate(excluding noisy data or that deviates the espectation of the attribute), complete(consisting of all interesting attributes), consistent(eg: no discripancy on catergory of attribute values), timely, believable, and interpretable(easy to be understood).

Data Cleaning:

Data cleaning is normaly the first step in Data Mining or Data Analysis. This step is to handle with problems about missing value, noisy data, and identify and delete outliers. Methods are to represent miss values by using espectations or values in regression on other attributes, but they will let the result be bias. For noisy data, binning and replacing data in a bin by their espectation to smooth the data. And Outlier Analysis refers to clusters that will be explained in chapter 8 and 9.

Data Integration:

My summary is to remove the redundant attributes and objects in different data sources. For identifying redundant attributes, Chi-square test is effective in analyzing nominal data, and Correlation Coefficient is useful to analyze numerical data, so is Covariance. Moreover, dealing with redundant objects, method is like using denormalized table(I have no idea about it now, further study in future).

Data Reduction:

Dimensionality Reduction and Numerosity Reduction are two directions to avoid low efficiency in analyzing too many data that is also within not primary attributes. Methods in dimensionality reduction are like Discrete Wavelet Transformation, Discrete Fourier Transformation, Principle Components Analysis, and Attributes Subsets Selections. For methods in numerosity reduction, methods like using regressions, histograms, clusters, and sampling are helpful.

Data Transformation and Data Discretization:

Idea is to make each attribute to be in a suitable format to analyze. Smoothing, attribute rebuilding, clustering, standerlization, discretization, and conceptional stratification of nominal data generation. Take standerlization as an example. Transform data to be Max-min normalization, z-score normalization, and decimal scaling. It can balance the weight that each attribute have. In data discretization,  I still need time to make every concept clearer.

Python: Review

Basically, I did a review about OOP like __repr__(), __eq__() and some relevant tricks. Write a MVC framework again... tomorrow, I wish I can move to the part on functional programming. Too many thing needed to reniew!


Move on! need to go to bed, bye!

TO BE CONTINUED...

### E-R 图设计 根据描述的实体及其关系,可以构建如下 ER 图模型: #### 实体与属性 1. **用户 (User)** 属性:`UserID`, `用户名`, `密码`, `邮箱`。 2. **学习记录 (Learning_Record)** 属性:`RecordID`, `UserID`, `学习时间`, `学习内容`. 3. **单词 (Word)** 属性:`WordID`, `单词`, `词义`, `发音`. 4. **测试记录 (Test_Record)** 属性:`TestID`, `UserID`, `测试时间`, `得分`. 5. **学习计划 (Study_Plan)** 属性:`PlanID`, `UserID`, `计划名称`, `开始日期`, `结束日期`. 6. **复习提醒 (Review_Reminder)** 属性:`ReminderID`, `UserID`, `提醒时间`, `状态`. --- #### 关系说明 1. 用户与学习记录之间存在一对多的关系(一个用户有多条学习记录)。 2. 学习记录与单词之间存在多对一的关系(每条学习记录对应多个单词)。 3. 用户与测试记录之间存在一对多的关系(一个用户有多个测试记录)。 4. 测试记录与单词之间存在多对多的关系(每次测试可能涉及多个单词,而每个单词也可能出现在多次测试中)。 5. 用户与学习计划之间存在一对一或多对一的关系(一个用户有一个或多个学习计划)。 6. 复习提醒与学习计划之间存在多对一的关系(每个复习提醒属于某个特定的学习计划)。 --- #### 数据库表结构示例 以下是各实体对应的数据库表结构示例: ```sql -- 用户表 CREATE TABLE User ( UserID INT PRIMARY KEY, UserName VARCHAR(50), PasswordHash VARCHAR(100), Email VARCHAR(100) ); -- 单词表 CREATE TABLE Word ( WordID INT PRIMARY KEY, WordName VARCHAR(50), Meaning TEXT, Pronunciation VARCHAR(100) ); -- 学习记录表 CREATE TABLE Learning_Record ( RecordID INT PRIMARY KEY, UserID INT, StudyTime DATETIME, FOREIGN KEY (UserID) REFERENCES User(UserID) ); -- 学习记录与单词关联表 CREATE TABLE Learning_Word_Link ( LinkID INT PRIMARY KEY, RecordID INT, WordID INT, FOREIGN KEY (RecordID) REFERENCES Learning_Record(RecordID), FOREIGN KEY (WordID) REFERENCES Word(WordID) ); -- 测试记录表 CREATE TABLE Test_Record ( TestID INT PRIMARY KEY, UserID INT, TestTime DATETIME, Score DECIMAL(5, 2), FOREIGN KEY (UserID) REFERENCES User(UserID) ); -- 测试记录与单词关联表 CREATE TABLE Test_Word_Link ( LinkID INT PRIMARY KEY, TestID INT, WordID INT, FOREIGN KEY (TestID) REFERENCES Test_Record(TestID), FOREIGN KEY (WordID) REFERENCES Word(WordID) ); -- 学习计划表 CREATE TABLE Study_Plan ( PlanID INT PRIMARY KEY, UserID INT, PlanName VARCHAR(100), StartDate DATE, EndDate DATE, FOREIGN KEY (UserID) REFERENCES User(UserID) ); -- 复习提醒表 CREATE TABLE Review_Reminder ( ReminderID INT PRIMARY KEY, UserID INT, ReminderTime DATETIME, Status ENUM('Pending', 'Completed'), PlanID INT, FOREIGN KEY (UserID) REFERENCES User(UserID), FOREIGN KEY (PlanID) REFERENCES Study_Plan(PlanID) ); ``` --- ### E-R 图可视化建议 为了更直观地展示这些实体和它们之间的关系,可以通过工具绘制 E-R 图。推荐使用的工具有: - Microsoft Visio - Lucidchart - Draw.io - MySQL Workbench 在绘图时,注意标注清楚各个实体的主键、外键以及它们之间的关系类型(如一对一、一对多、多对多),以便后续数据库建模更加清晰[^1]。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值