-
Missing data
Missing data can grocely be classified into three types:
- MCAR(Missing Completely At Random), which means that there is nothing systematic about why some date is missing. That is, there is no relationship between the fact that data is missing and either the observed or unobserved covariates.
- MAR(Missing At Random), resembles MCAR because there still is an element of randomness.
- MNAR(Missing Not At Random), implies that the fact that fata is missing is directly correlated with the value of the misssing data.
-
How to deal with missing data
-
Just delete missing entries
-
Replaceing missing values with the mean or median
-
Linear Regression
First, several predictors of the variable with missing values are identified using a correlation matrix. The best predictors are selected and used as independent variables in a regression equation.
The variable with missing data is used as the dependent variable.
Second, cases with complete data for the predictor variables are used to generate the regression equation;
Third, the equation is then used to predict missing values for incomplete cases in an iterative process.
以上是单变量线性回归
-
多元线性回归
Linear regression has signigicant limits like:
- It can’t easily match any data set that is non-linear
- It can only be used to make predictions that fit within the range of the training data set
- It can only be fit to data sets with a single dependent variables and a single independent variable
This is where multiple regression comes in. It is specifically designed to create regressions on models with a single dependent variable and multiple independent variables.
Equation for multiple regpression takes the form:
y = b 1 ∗ x 1 + b 2 ∗ x 2 + . . . + b n ∗ x n + a y=b_1*x_1+b_2*x_2+...+b_n*x_n+a y=
-
用线性回归计算缺失值
最新推荐文章于 2024-06-22 18:06:25 发布

最低0.47元/天 解锁文章
940

被折叠的 条评论
为什么被折叠?



