机器学习实战系列[一]:工业蒸汽量预测
- 背景介绍
火力发电的基本原理是:燃料在燃烧时加热水生成蒸汽,蒸汽压力推动汽轮机旋转,然后汽轮机带动发电机旋转,产生电能。在这一系列的能量转化中,影响发电效率的核心是锅炉的燃烧效率,即燃料燃烧加热水产生高温高压蒸汽。锅炉的燃烧效率的影响因素很多,包括锅炉的可调参数,如燃烧给量,一二次风,引风,返料风,给水水量;以及锅炉的工况,比如锅炉床温、床压,炉膛温度、压力,过热器的温度等。
- 相关描述
经脱敏后的锅炉传感器采集的数据(采集频率是分钟级别),根据锅炉的工况,预测产生的蒸汽量。
- 数据说明
数据分成训练数据(train.txt)和测试数据(test.txt),其中字段”V0”-“V37”,这38个字段是作为特征变量,”target”作为目标变量。选手利用训练数据训练出模型,预测测试数据的目标变量,排名结果依据预测结果的MSE(mean square error)。
- 结果评估
预测结果以mean square error作为评判标准。
原项目链接: https://www.heywhale.com/home/column/64141d6b1c8c8b518ba97dcc
1.数据探索性分析
1.1 查看数据信息
此训练集数据共有2888个样本,数据中有V0-V37共计38个特征变量,变量类型都为数值类型,所有数据特征没有缺失值数据;
数据字段由于采用了脱敏处理,删除了特征数据的具体含义;target字段为标签变量
测试集数据共有1925个样本,数据中有V0-V37共计38个特征变量,变量类型都为数值类型
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>…</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.123048</td>
<td>0.056068</td>
<td>0.289720</td>
<td>-0.067790</td>
<td>0.012921</td>
<td>-0.558565</td>
<td>0.182892</td>
<td>0.116155</td>
<td>0.177856</td>
<td>-0.169452</td>
<td>…</td>
<td>0.097648</td>
<td>0.055477</td>
<td>0.127791</td>
<td>0.020806</td>
<td>0.007801</td>
<td>0.006715</td>
<td>0.197764</td>
<td>0.030658</td>
<td>-0.130330</td>
<td>0.126353</td>
</tr>
<tr>
<th>std</th>
<td>0.928031</td>
<td>0.941515</td>
<td>0.911236</td>
<td>0.970298</td>
<td>0.888377</td>
<td>0.517957</td>
<td>0.918054</td>
<td>0.955116</td>
<td>0.895444</td>
<td>0.953813</td>
<td>…</td>
<td>1.061200</td>
<td>0.901934</td>
<td>0.873028</td>
<td>0.902584</td>
<td>1.006995</td>
<td>1.003291</td>
<td>0.985675</td>
<td>0.970812</td>
<td>1.017196</td>
<td>0.983966</td>
</tr>
<tr>
<th>min</th>
<td>-4.335000</td>
<td>-5.122000</td>
<td>-3.420000</td>
<td>-3.956000</td>
<td>-4.742000</td>
<td>-2.182000</td>
<td>-4.576000</td>
<td>-5.048000</td>
<td>-4.692000</td>
<td>-12.891000</td>
<td>…</td>
<td>-2.912000</td>
<td>-4.507000</td>
<td>-5.859000</td>
<td>-4.053000</td>
<td>-4.627000</td>
<td>-4.789000</td>
<td>-5.695000</td>
<td>-2.608000</td>
<td>-3.630000</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>-0.297000</td>
<td>-0.226250</td>
<td>-0.313000</td>
<td>-0.652250</td>
<td>-0.385000</td>
<td>-0.853000</td>
<td>-0.310000</td>
<td>-0.295000</td>
<td>-0.159000</td>
<td>-0.390000</td>
<td>…</td>
<td>-0.664000</td>
<td>-0.283000</td>
<td>-0.170250</td>
<td>-0.407250</td>
<td>-0.499000</td>
<td>-0.290000</td>
<td>-0.202500</td>
<td>-0.413000</td>
<td>-0.798250</td>
<td>-0.350250</td>
</tr>
<tr>
<th>50%</th>
<td>0.359000</td>
<td>0.272500</td>
<td>0.386000</td>
<td>-0.044500</td>
<td>0.110000</td>
<td>-0.466000</td>
<td>0.388000</td>
<td>0.344000</td>
<td>0.362000</td>
<td>0.042000</td>
<td>…</td>
<td>-0.023000</td>
<td>0.053500</td>
<td>0.299500</td>
<td>0.039000</td>
<td>-0.040000</td>
<td>0.160000</td>
<td>0.364000</td>
<td>0.137000</td>
<td>-0.185500</td>
<td>0.313000</td>
</tr>
<tr>
<th>75%</th>
<td>0.726000</td>
<td>0.599000</td>
<td>0.918250</td>
<td>0.624000</td>
<td>0.550250</td>
<td>-0.154000</td>
<td>0.831250</td>
<td>0.782250</td>
<td>0.726000</td>
<td>0.042000</td>
<td>…</td>
<td>0.745250</td>
<td>0.488000</td>
<td>0.635000</td>
<td>0.557000</td>
<td>0.462000</td>
<td>0.273000</td>
<td>0.602000</td>
<td>0.644250</td>
<td>0.495250</td>
<td>0.793250</td>
</tr>
<tr>
<th>max</th>
<td>2.121000</td>
<td>1.918000</td>
<td>2.828000</td>
<td>2.457000</td>
<td>2.689000</td>
<td>0.489000</td>
<td>1.895000</td>
<td>1.918000</td>
<td>2.245000</td>
<td>1.335000</td>
<td>…</td>
<td>4.580000</td>
<td>2.689000</td>
<td>2.013000</td>
<td>2.395000</td>
<td>5.465000</td>
<td>5.110000</td>
<td>2.324000</td>
<td>5.238000</td>
<td>3.000000</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
<p>8 rows × 39 columns</p>
</div>
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V28</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>…</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
</tr>
<tr>
<th>mean</th>
<td>-0.184404</td>
<td>-0.083912</td>
<td>-0.434762</td>
<td>0.101671</td>
<td>-0.019172</td>
<td>0.838049</td>
<td>-0.274092</td>
<td>-0.173971</td>
<td>-0.266709</td>
<td>0.255114</td>
<td>…</td>
<td>-0.206871</td>
<td>-0.146463</td>
<td>-0.083215</td>
<td>-0.191729</td>
<td>-0.030782</td>
<td>-0.011433</td>
<td>-0.009985</td>
<td>-0.296895</td>
<td>-0.046270</td>
<td>0.195735</td>
</tr>
<tr>
<th>std</th>
<td>1.073333</td>
<td>1.076670</td>
<td>0.969541</td>
<td>1.034925</td>
<td>1.147286</td>
<td>0.963043</td>
<td>1.054119</td>
<td>1.040101</td>
<td>1.085916</td>
<td>1.014394</td>
<td>…</td>
<td>1.064140</td>
<td>0.880593</td>
<td>1.126414</td>
<td>1.138454</td>
<td>1.130228</td>
<td>0.989732</td>
<td>0.995213</td>
<td>0.946896</td>
<td>1.040854</td>
<td>0.940599</td>
</tr>
<tr>
<th>min</th>
<td>-4.814000</td>
<td>-5.488000</td>
<td>-4.283000</td>
<td>-3.276000</td>
<td>-4.921000</td>
<td>-1.168000</td>
<td>-5.649000</td>
<td>-5.625000</td>
<td>-6.059000</td>
<td>-6.784000</td>
<td>…</td>
<td>-2.435000</td>
<td>-2.413000</td>
<td>-4.507000</td>
<td>-7.698000</td>
<td>-4.057000</td>
<td>-4.627000</td>
<td>-4.789000</td>
<td>-7.477000</td>
<td>-2.608000</td>
<td>-3.346000</td>
</tr>
<tr>
<th>25%</th>
<td>-0.664000</td>
<td>-0.451000</td>
<td>-0.978000</td>
<td>-0.644000</td>
<td>-0.497000</td>
<td>0.122000</td>
<td>-0.732000</td>
<td>-0.509000</td>
<td>-0.775000</td>
<td>-0.390000</td>
<td>…</td>
<td>-0.453000</td>
<td>-0.818000</td>
<td>-0.339000</td>
<td>-0.476000</td>
<td>-0.472000</td>
<td>-0.460000</td>
<td>-0.290000</td>
<td>-0.349000</td>
<td>-0.593000</td>
<td>-0.432000</td>
</tr>
<tr>
<th>50%</th>
<td>0.065000</td>
<td>0.195000</td>
<td>-0.267000</td>
<td>0.220000</td>
<td>0.118000</td>
<td>0.437000</td>
<td>-0.082000</td>
<td>0.018000</td>
<td>-0.004000</td>
<td>0.401000</td>
<td>…</td>
<td>-0.445000</td>
<td>-0.199000</td>
<td>0.010000</td>
<td>0.100000</td>
<td>0.155000</td>
<td>-0.040000</td>
<td>0.160000</td>
<td>-0.270000</td>
<td>0.083000</td>
<td>0.152000</td>
</tr>
<tr>
<th>75%</th>
<td>0.549000</td>
<td>0.589000</td>
<td>0.278000</td>
<td>0.793000</td>
<td>0.610000</td>
<td>1.928000</td>
<td>0.457000</td>
<td>0.515000</td>
<td>0.482000</td>
<td>0.904000</td>
<td>…</td>
<td>-0.434000</td>
<td>0.468000</td>
<td>0.447000</td>
<td>0.471000</td>
<td>0.627000</td>
<td>0.419000</td>
<td>0.273000</td>
<td>0.364000</td>
<td>0.651000</td>
<td>0.797000</td>
</tr>
<tr>
<th>max</th>
<td>2.100000</td>
<td>2.120000</td>
<td>1.946000</td>
<td>2.603000</td>
<td>4.475000</td>
<td>3.176000</td>
<td>1.528000</td>
<td>1.394000</td>
<td>2.408000</td>
<td>1.766000</td>
<td>…</td>
<td>4.656000</td>
<td>3.022000</td>
<td>3.139000</td>
<td>1.428000</td>
<td>2.299000</td>
<td>5.465000</td>
<td>5.110000</td>
<td>1.671000</td>
<td>2.861000</td>
<td>3.021000</td>
</tr>
</tbody>
</table>
<p>8 rows × 38 columns</p>
</div>
上面数据显示了数据的统计信息,例如样本数,数据的均值mean,标准差std,最小值,最大值等
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.566</td>
<td>0.016</td>
<td>-0.143</td>
<td>0.407</td>
<td>0.452</td>
<td>-0.901</td>
<td>-1.812</td>
<td>-2.360</td>
<td>-0.436</td>
<td>-2.114</td>
<td>…</td>
<td>0.136</td>
<td>0.109</td>
<td>-0.615</td>
<td>0.327</td>
<td>-4.627</td>
<td>-4.789</td>
<td>-5.101</td>
<td>-2.608</td>
<td>-3.508</td>
<td>0.175</td>
</tr>
<tr>
<th>1</th>
<td>0.968</td>
<td>0.437</td>
<td>0.066</td>
<td>0.566</td>
<td>0.194</td>
<td>-0.893</td>
<td>-1.566</td>
<td>-2.360</td>
<td>0.332</td>
<td>-2.114</td>
<td>…</td>
<td>-0.128</td>
<td>0.124</td>
<td>0.032</td>
<td>0.600</td>
<td>-0.843</td>
<td>0.160</td>
<td>0.364</td>
<td>-0.335</td>
<td>-0.730</td>
<td>0.676</td>
</tr>
<tr>
<th>2</th>
<td>1.013</td>
<td>0.568</td>
<td>0.235</td>
<td>0.370</td>
<td>0.112</td>
<td>-0.797</td>
<td>-1.367</td>
<td>-2.360</td>
<td>0.396</td>
<td>-2.114</td>
<td>…</td>
<td>-0.009</td>
<td>0.361</td>
<td>0.277</td>
<td>-0.116</td>
<td>-0.843</td>
<td>0.160</td>
<td>0.364</td>
<td>0.765</td>
<td>-0.589</td>
<td>0.633</td>
</tr>
<tr>
<th>3</th>
<td>0.733</td>
<td>0.368</td>
<td>0.283</td>
<td>0.165</td>
<td>0.599</td>
<td>-0.679</td>
<td>-1.200</td>
<td>-2.086</td>
<td>0.403</td>
<td>-2.114</td>
<td>…</td>
<td>0.015</td>
<td>0.417</td>
<td>0.279</td>
<td>0.603</td>
<td>-0.843</td>
<td>-0.065</td>
<td>0.364</td>
<td>0.333</td>
<td>-0.112</td>
<td>0.206</td>
</tr>
<tr>
<th>4</th>
<td>0.684</td>
<td>0.638</td>
<td>0.260</td>
<td>0.209</td>
<td>0.337</td>
<td>-0.454</td>
<td>-1.073</td>
<td>-2.086</td>
<td>0.314</td>
<td>-2.114</td>
<td>…</td>
<td>0.183</td>
<td>1.078</td>
<td>0.328</td>
<td>0.418</td>
<td>-0.843</td>
<td>-0.215</td>
<td>0.364</td>
<td>-0.280</td>
<td>-0.028</td>
<td>0.384</td>
</tr>
</tbody>
</table>
<p>5 rows × 39 columns</p>
</div>
上面显示训练集前5条数据的基本信息,可以看到数据都是浮点型数据,数据都是数值型连续型特征
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V28</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.368</td>
<td>0.380</td>
<td>-0.225</td>
<td>-0.049</td>
<td>0.379</td>
<td>0.092</td>
<td>0.550</td>
<td>0.551</td>
<td>0.244</td>
<td>0.904</td>
<td>…</td>
<td>-0.449</td>
<td>0.047</td>
<td>0.057</td>
<td>-0.042</td>
<td>0.847</td>
<td>0.534</td>
<td>-0.009</td>
<td>-0.190</td>
<td>-0.567</td>
<td>0.388</td>
</tr>
<tr>
<th>1</th>
<td>0.148</td>
<td>0.489</td>
<td>-0.247</td>
<td>-0.049</td>
<td>0.122</td>
<td>-0.201</td>
<td>0.487</td>
<td>0.493</td>
<td>-0.127</td>
<td>0.904</td>
<td>…</td>
<td>-0.443</td>
<td>0.047</td>
<td>0.560</td>
<td>0.176</td>
<td>0.551</td>
<td>0.046</td>
<td>-0.220</td>
<td>0.008</td>
<td>-0.294</td>
<td>0.104</td>
</tr>
<tr>
<th>2</th>
<td>-0.166</td>
<td>-0.062</td>
<td>-0.311</td>
<td>0.046</td>
<td>-0.055</td>
<td>0.063</td>
<td>0.485</td>
<td>0.493</td>
<td>-0.227</td>
<td>0.904</td>
<td>…</td>
<td>-0.458</td>
<td>-0.398</td>
<td>0.101</td>
<td>0.199</td>
<td>0.634</td>
<td>0.017</td>
<td>-0.234</td>
<td>0.008</td>
<td>0.373</td>
<td>0.569</td>
</tr>
<tr>
<th>3</th>
<td>0.102</td>
<td>0.294</td>
<td>-0.259</td>
<td>0.051</td>
<td>-0.183</td>
<td>0.148</td>
<td>0.474</td>
<td>0.504</td>
<td>0.010</td>
<td>0.904</td>
<td>…</td>
<td>-0.456</td>
<td>-0.398</td>
<td>1.007</td>
<td>0.137</td>
<td>1.042</td>
<td>-0.040</td>
<td>-0.290</td>
<td>0.008</td>
<td>-0.666</td>
<td>0.391</td>
</tr>
<tr>
<th>4</th>
<td>0.300</td>
<td>0.428</td>
<td>0.208</td>
<td>0.051</td>
<td>-0.033</td>
<td>0.116</td>
<td>0.408</td>
<td>0.497</td>
<td>0.155</td>
<td>0.904</td>
<td>…</td>
<td>-0.458</td>
<td>-0.776</td>
<td>0.291</td>
<td>0.370</td>
<td>0.181</td>
<td>-0.040</td>
<td>-0.290</td>
<td>0.008</td>
<td>-0.140</td>
<td>-0.497</td>
</tr>
</tbody>
</table>
<p>5 rows × 38 columns</p>
</div>
1.2 可视化探索数据
查看数据分布图
- 查看特征变量‘V0’的数据分布直方图,并绘制Q-Q图查看数据是否近似于正态分布
查看查看所有数据的直方图和Q-Q图,查看训练集的数据是否近似于正态分布
由上面的数据分布图信息可以看出,很多特征变量(如’V1’,‘V9’,‘V24’,'V28’等)的数据分布不是正态的,数据并不跟随对角线,后续可以使用数据变换对数据进行转换。
对比同一特征变量‘V0’下,训练集数据和测试集数据的分布情况,查看数据分布是否一致
查看所有特征变量下,训练集数据和测试集数据的分布情况,分析并寻找出数据分布不一致的特征变量。
查看特征’V5’, ‘V17’, ‘V28’, ‘V22’, ‘V11’, 'V9’数据的数据分布
由上图的数据分布可以看到特征’V5’,‘V9’,‘V11’,‘V17’,‘V22’,‘V28’ 训练集数据与测试集数据分布不一致,会导致模型泛化能力差,采用删除此类特征方法。
可视化线性回归关系
- 查看特征变量‘V0’与’target’变量的线性回归关系
1.2.2 查看变量间线性回归关系
1.2.2 查看特征变量的相关性
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V10</th>
<th>V12</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>V0</th>
<td>1.000000</td>
<td>0.908607</td>
<td>0.463643</td>
<td>0.409576</td>
<td>0.781212</td>
<td>0.189267</td>
<td>0.141294</td>
<td>0.794013</td>
<td>0.298443</td>
<td>0.751830</td>
<td>…</td>
<td>0.302145</td>
<td>0.156968</td>
<td>0.675003</td>
<td>0.050951</td>
<td>0.056439</td>
<td>-0.019342</td>
<td>0.138933</td>
<td>0.231417</td>
<td>-0.494076</td>
<td>0.873212</td>
</tr>
<tr>
<th>V1</th>
<td>0.908607</td>
<td>1.000000</td>
<td>0.506514</td>
<td>0.383924</td>
<td>0.657790</td>
<td>0.276805</td>
<td>0.205023</td>
<td>0.874650</td>
<td>0.310120</td>
<td>0.656186</td>
<td>…</td>
<td>0.147096</td>
<td>0.175997</td>
<td>0.769745</td>
<td>0.085604</td>
<td>0.035129</td>
<td>-0.029115</td>
<td>0.146329</td>
<td>0.235299</td>
<td>-0.494043</td>
<td>0.871846</td>
</tr>
<tr>
<th>V2</th>
<td>0.463643</td>
<td>0.506514</td>
<td>1.000000</td>
<td>0.410148</td>
<td>0.057697</td>
<td>0.615938</td>
<td>0.477114</td>
<td>0.703431</td>
<td>0.346006</td>
<td>0.059941</td>
<td>…</td>
<td>-0.275764</td>
<td>0.175943</td>
<td>0.653764</td>
<td>0.033942</td>
<td>0.050309</td>
<td>-0.025620</td>
<td>0.043648</td>
<td>0.316462</td>
<td>-0.734956</td>
<td>0.638878</td>
</tr>
<tr>
<th>V3</th>
<td>0.409576</td>
<td>0.383924</td>
<td>0.410148</td>
<td>1.000000</td>
<td>0.315046</td>
<td>0.233896</td>
<td>0.197836</td>
<td>0.411946</td>
<td>0.321262</td>
<td>0.306397</td>
<td>…</td>
<td>0.117610</td>
<td>0.043966</td>
<td>0.421954</td>
<td>-0.092423</td>
<td>-0.007159</td>
<td>-0.031898</td>
<td>0.080034</td>
<td>0.324475</td>
<td>-0.229613</td>
<td>0.512074</td>
</tr>
<tr>
<th>V4</th>
<td>0.781212</td>
<td>0.657790</td>
<td>0.057697</td>
<td>0.315046</td>
<td>1.000000</td>
<td>-0.117529</td>
<td>-0.052370</td>
<td>0.449542</td>
<td>0.141129</td>
<td>0.927685</td>
<td>…</td>
<td>0.659093</td>
<td>0.022807</td>
<td>0.447016</td>
<td>-0.026186</td>
<td>0.062367</td>
<td>0.028659</td>
<td>0.100010</td>
<td>0.113609</td>
<td>-0.031054</td>
<td>0.603984</td>
</tr>
<tr>
<th>V6</th>
<td>0.189267</td>
<td>0.276805</td>
<td>0.615938</td>
<td>0.233896</td>
<td>-0.117529</td>
<td>1.000000</td>
<td>0.917502</td>
<td>0.468233</td>
<td>0.415660</td>
<td>-0.087312</td>
<td>…</td>
<td>-0.467980</td>
<td>0.188907</td>
<td>0.546535</td>
<td>0.144550</td>
<td>0.054210</td>
<td>-0.002914</td>
<td>0.044992</td>
<td>0.433804</td>
<td>-0.404817</td>
<td>0.370037</td>
</tr>
<tr>
<th>V7</th>
<td>0.141294</td>
<td>0.205023</td>
<td>0.477114</td>
<td>0.197836</td>
<td>-0.052370</td>
<td>0.917502</td>
<td>1.000000</td>
<td>0.389987</td>
<td>0.310982</td>
<td>-0.036791</td>
<td>…</td>
<td>-0.311363</td>
<td>0.170113</td>
<td>0.475254</td>
<td>0.122707</td>
<td>0.034508</td>
<td>-0.019103</td>
<td>0.111166</td>
<td>0.340479</td>
<td>-0.292285</td>
<td>0.287815</td>
</tr>
<tr>
<th>V8</th>
<td>0.794013</td>
<td>0.874650</td>
<td>0.703431</td>
<td>0.411946</td>
<td>0.449542</td>
<td>0.468233</td>
<td>0.389987</td>
<td>1.000000</td>
<td>0.419703</td>
<td>0.420557</td>
<td>…</td>
<td>-0.011091</td>
<td>0.150258</td>
<td>0.878072</td>
<td>0.038430</td>
<td>0.026843</td>
<td>-0.036297</td>
<td>0.179167</td>
<td>0.326586</td>
<td>-0.553121</td>
<td>0.831904</td>
</tr>
<tr>
<th>V10</th>
<td>0.298443</td>
<td>0.310120</td>
<td>0.346006</td>
<td>0.321262</td>
<td>0.141129</td>
<td>0.415660</td>
<td>0.310982</td>
<td>0.419703</td>
<td>1.000000</td>
<td>0.140462</td>
<td>…</td>
<td>-0.105042</td>
<td>-0.036705</td>
<td>0.560213</td>
<td>-0.093213</td>
<td>0.016739</td>
<td>-0.026994</td>
<td>0.026846</td>
<td>0.922190</td>
<td>-0.045851</td>
<td>0.394767</td>
</tr>
<tr>
<th>V12</th>
<td>0.751830</td>
<td>0.656186</td>
<td>0.059941</td>
<td>0.306397</td>
<td>0.927685</td>
<td>-0.087312</td>
<td>-0.036791</td>
<td>0.420557</td>
<td>0.140462</td>
<td>1.000000</td>
<td>…</td>
<td>0.666775</td>
<td>0.028866</td>
<td>0.441963</td>
<td>-0.007658</td>
<td>0.046674</td>
<td>0.010122</td>
<td>0.081963</td>
<td>0.112150</td>
<td>-0.054827</td>
<td>0.594189</td>
</tr>
<tr>
<th>V13</th>
<td>0.185144</td>
<td>0.157518</td>
<td>0.204762</td>
<td>-0.003636</td>
<td>0.075993</td>
<td>0.138367</td>
<td>0.110973</td>
<td>0.153299</td>
<td>-0.059553</td>
<td>0.098771</td>
<td>…</td>
<td>0.008235</td>
<td>0.027328</td>
<td>0.113743</td>
<td>0.130598</td>
<td>0.157513</td>
<td>0.116944</td>
<td>0.219906</td>
<td>-0.024751</td>
<td>-0.379714</td>
<td>0.203373</td>
</tr>
<tr>
<th>V14</th>
<td>-0.004144</td>
<td>-0.006268</td>
<td>-0.106282</td>
<td>-0.232677</td>
<td>0.023853</td>
<td>0.072911</td>
<td>0.163931</td>
<td>0.008138</td>
<td>-0.077543</td>
<td>0.020069</td>
<td>…</td>
<td>0.056814</td>
<td>-0.004057</td>
<td>0.010989</td>
<td>0.106581</td>
<td>0.073535</td>
<td>0.043218</td>
<td>0.233523</td>
<td>-0.086217</td>
<td>0.010553</td>
<td>0.008424</td>
</tr>
<tr>
<th>V15</th>
<td>0.314520</td>
<td>0.164702</td>
<td>-0.224573</td>
<td>0.143457</td>
<td>0.615704</td>
<td>-0.431542</td>
<td>-0.291272</td>
<td>0.018366</td>
<td>-0.046737</td>
<td>0.642081</td>
<td>…</td>
<td>0.951314</td>
<td>-0.111311</td>
<td>0.011768</td>
<td>-0.104618</td>
<td>0.050254</td>
<td>0.048602</td>
<td>0.100817</td>
<td>-0.051861</td>
<td>0.245635</td>
<td>0.154020</td>
</tr>
<tr>
<th>V16</th>
<td>0.347357</td>
<td>0.435606</td>
<td>0.782474</td>
<td>0.394517</td>
<td>0.023818</td>
<td>0.847119</td>
<td>0.752683</td>
<td>0.680031</td>
<td>0.546975</td>
<td>0.025736</td>
<td>…</td>
<td>-0.342210</td>
<td>0.154794</td>
<td>0.778538</td>
<td>0.041474</td>
<td>0.028878</td>
<td>-0.054775</td>
<td>0.082293</td>
<td>0.551880</td>
<td>-0.420053</td>
<td>0.536748</td>
</tr>
<tr>
<th>V18</th>
<td>0.148622</td>
<td>0.123862</td>
<td>0.132105</td>
<td>0.022868</td>
<td>0.136022</td>
<td>0.110570</td>
<td>0.098691</td>
<td>0.093682</td>
<td>-0.024693</td>
<td>0.119833</td>
<td>…</td>
<td>0.053958</td>
<td>0.470341</td>
<td>0.079718</td>
<td>0.411967</td>
<td>0.512139</td>
<td>0.365410</td>
<td>0.152088</td>
<td>0.019603</td>
<td>-0.181937</td>
<td>0.170721</td>
</tr>
<tr>
<th>V19</th>
<td>-0.100294</td>
<td>-0.092673</td>
<td>-0.161802</td>
<td>-0.246008</td>
<td>-0.205729</td>
<td>0.215290</td>
<td>0.158371</td>
<td>-0.144693</td>
<td>0.074903</td>
<td>-0.148319</td>
<td>…</td>
<td>-0.205409</td>
<td>0.100133</td>
<td>-0.131542</td>
<td>0.144018</td>
<td>-0.021517</td>
<td>-0.079753</td>
<td>-0.220737</td>
<td>0.087605</td>
<td>0.012115</td>
<td>-0.114976</td>
</tr>
<tr>
<th>V20</th>
<td>0.462493</td>
<td>0.459795</td>
<td>0.298385</td>
<td>0.289594</td>
<td>0.291309</td>
<td>0.136091</td>
<td>0.089399</td>
<td>0.412868</td>
<td>0.207612</td>
<td>0.271559</td>
<td>…</td>
<td>0.016233</td>
<td>0.086165</td>
<td>0.326863</td>
<td>0.050699</td>
<td>0.009358</td>
<td>-0.000979</td>
<td>0.048981</td>
<td>0.161315</td>
<td>-0.322006</td>
<td>0.444965</td>
</tr>
<tr>
<th>V21</th>
<td>-0.029285</td>
<td>-0.012911</td>
<td>-0.030932</td>
<td>0.114373</td>
<td>0.174025</td>
<td>-0.051806</td>
<td>-0.065300</td>
<td>-0.047839</td>
<td>0.082288</td>
<td>0.144371</td>
<td>…</td>
<td>0.157097</td>
<td>-0.077945</td>
<td>0.053025</td>
<td>-0.159128</td>
<td>-0.087561</td>
<td>-0.053707</td>
<td>-0.199398</td>
<td>0.047340</td>
<td>0.315470</td>
<td>-0.010063</td>
</tr>
<tr>
<th>V23</th>
<td>0.231136</td>
<td>0.222574</td>
<td>0.065509</td>
<td>0.081374</td>
<td>0.196530</td>
<td>0.069901</td>
<td>0.125180</td>
<td>0.174124</td>
<td>-0.066537</td>
<td>0.180049</td>
<td>…</td>
<td>0.116122</td>
<td>0.363963</td>
<td>0.129783</td>
<td>0.367086</td>
<td>0.183666</td>
<td>0.196681</td>
<td>0.635252</td>
<td>-0.035949</td>
<td>-0.187582</td>
<td>0.226331</td>
</tr>
<tr>
<th>V24</th>
<td>-0.324959</td>
<td>-0.233556</td>
<td>0.010225</td>
<td>-0.237326</td>
<td>-0.529866</td>
<td>0.072418</td>
<td>-0.030292</td>
<td>-0.136898</td>
<td>-0.029420</td>
<td>-0.550881</td>
<td>…</td>
<td>-0.642370</td>
<td>0.033532</td>
<td>-0.202097</td>
<td>0.060608</td>
<td>-0.134320</td>
<td>-0.095588</td>
<td>-0.243738</td>
<td>-0.041325</td>
<td>-0.137614</td>
<td>-0.264815</td>
</tr>
<tr>
<th>V25</th>
<td>-0.200706</td>
<td>-0.070627</td>
<td>0.481785</td>
<td>-0.100569</td>
<td>-0.444375</td>
<td>0.438610</td>
<td>0.316744</td>
<td>0.173320</td>
<td>0.079805</td>
<td>-0.448877</td>
<td>…</td>
<td>-0.575154</td>
<td>0.088238</td>
<td>0.201243</td>
<td>0.065501</td>
<td>-0.013312</td>
<td>-0.030747</td>
<td>-0.093948</td>
<td>0.069302</td>
<td>-0.246742</td>
<td>-0.019373</td>
</tr>
<tr>
<th>V26</th>
<td>-0.125140</td>
<td>-0.043012</td>
<td>0.035370</td>
<td>-0.027685</td>
<td>-0.080487</td>
<td>0.106055</td>
<td>0.160566</td>
<td>0.015724</td>
<td>0.072366</td>
<td>-0.124111</td>
<td>…</td>
<td>-0.133694</td>
<td>-0.057247</td>
<td>0.062879</td>
<td>-0.004545</td>
<td>-0.034596</td>
<td>0.051294</td>
<td>0.085576</td>
<td>0.064963</td>
<td>0.010880</td>
<td>-0.046724</td>
</tr>
<tr>
<th>V27</th>
<td>0.733198</td>
<td>0.824198</td>
<td>0.726250</td>
<td>0.392006</td>
<td>0.412083</td>
<td>0.474441</td>
<td>0.424185</td>
<td>0.901100</td>
<td>0.246085</td>
<td>0.374380</td>
<td>…</td>
<td>-0.032772</td>
<td>0.208074</td>
<td>0.790239</td>
<td>0.095127</td>
<td>0.030135</td>
<td>-0.036123</td>
<td>0.159884</td>
<td>0.226713</td>
<td>-0.617771</td>
<td>0.812585</td>
</tr>
<tr>
<th>V29</th>
<td>0.302145</td>
<td>0.147096</td>
<td>-0.275764</td>
<td>0.117610</td>
<td>0.659093</td>
<td>-0.467980</td>
<td>-0.311363</td>
<td>-0.011091</td>
<td>-0.105042</td>
<td>0.666775</td>
<td>…</td>
<td>1.000000</td>
<td>-0.122817</td>
<td>-0.004364</td>
<td>-0.110699</td>
<td>0.035272</td>
<td>0.035392</td>
<td>0.078588</td>
<td>-0.099309</td>
<td>0.285581</td>
<td>0.123329</td>
</tr>
<tr>
<th>V30</th>
<td>0.156968</td>
<td>0.175997</td>
<td>0.175943</td>
<td>0.043966</td>
<td>0.022807</td>
<td>0.188907</td>
<td>0.170113</td>
<td>0.150258</td>
<td>-0.036705</td>
<td>0.028866</td>
<td>…</td>
<td>-0.122817</td>
<td>1.000000</td>
<td>0.114318</td>
<td>0.695725</td>
<td>0.083693</td>
<td>-0.028573</td>
<td>-0.027987</td>
<td>0.006961</td>
<td>-0.256814</td>
<td>0.187311</td>
</tr>
<tr>
<th>V31</th>
<td>0.675003</td>
<td>0.769745</td>
<td>0.653764</td>
<td>0.421954</td>
<td>0.447016</td>
<td>0.546535</td>
<td>0.475254</td>
<td>0.878072</td>
<td>0.560213</td>
<td>0.441963</td>
<td>…</td>
<td>-0.004364</td>
<td>0.114318</td>
<td>1.000000</td>
<td>0.016782</td>
<td>0.016733</td>
<td>-0.047273</td>
<td>0.152314</td>
<td>0.510851</td>
<td>-0.357785</td>
<td>0.750297</td>
</tr>
<tr>
<th>V32</th>
<td>0.050951</td>
<td>0.085604</td>
<td>0.033942</td>
<td>-0.092423</td>
<td>-0.026186</td>
<td>0.144550</td>
<td>0.122707</td>
<td>0.038430</td>
<td>-0.093213</td>
<td>-0.007658</td>
<td>…</td>
<td>-0.110699</td>
<td>0.695725</td>
<td>0.016782</td>
<td>1.000000</td>
<td>0.105255</td>
<td>0.069300</td>
<td>0.016901</td>
<td>-0.054411</td>
<td>-0.162417</td>
<td>0.066606</td>
</tr>
<tr>
<th>V33</th>
<td>0.056439</td>
<td>0.035129</td>
<td>0.050309</td>
<td>-0.007159</td>
<td>0.062367</td>
<td>0.054210</td>
<td>0.034508</td>
<td>0.026843</td>
<td>0.016739</td>
<td>0.046674</td>
<td>…</td>
<td>0.035272</td>
<td>0.083693</td>
<td>0.016733</td>
<td>0.105255</td>
<td>1.000000</td>
<td>0.719126</td>
<td>0.167597</td>
<td>0.031586</td>
<td>-0.062715</td>
<td>0.077273</td>
</tr>
<tr>
<th>V34</th>
<td>-0.019342</td>
<td>-0.029115</td>
<td>-0.025620</td>
<td>-0.031898</td>
<td>0.028659</td>
<td>-0.002914</td>
<td>-0.019103</td>
<td>-0.036297</td>
<td>-0.026994</td>
<td>0.010122</td>
<td>…</td>
<td>0.035392</td>
<td>-0.028573</td>
<td>-0.047273</td>
<td>0.069300</td>
<td>0.719126</td>
<td>1.000000</td>
<td>0.233616</td>
<td>-0.019032</td>
<td>-0.006854</td>
<td>-0.006034</td>
</tr>
<tr>
<th>V35</th>
<td>0.138933</td>
<td>0.146329</td>
<td>0.043648</td>
<td>0.080034</td>
<td>0.100010</td>
<td>0.044992</td>
<td>0.111166</td>
<td>0.179167</td>
<td>0.026846</td>
<td>0.081963</td>
<td>…</td>
<td>0.078588</td>
<td>-0.027987</td>
<td>0.152314</td>
<td>0.016901</td>
<td>0.167597</td>
<td>0.233616</td>
<td>1.000000</td>
<td>0.025401</td>
<td>-0.077991</td>
<td>0.140294</td>
</tr>
<tr>
<th>V36</th>
<td>0.231417</td>
<td>0.235299</td>
<td>0.316462</td>
<td>0.324475</td>
<td>0.113609</td>
<td>0.433804</td>
<td>0.340479</td>
<td>0.326586</td>
<td>0.922190</td>
<td>0.112150</td>
<td>…</td>
<td>-0.099309</td>
<td>0.006961</td>
<td>0.510851</td>
<td>-0.054411</td>
<td>0.031586</td>
<td>-0.019032</td>
<td>0.025401</td>
<td>1.000000</td>
<td>-0.039478</td>
<td>0.319309</td>
</tr>
<tr>
<th>V37</th>
<td>-0.494076</td>
<td>-0.494043</td>
<td>-0.734956</td>
<td>-0.229613</td>
<td>-0.031054</td>
<td>-0.404817</td>
<td>-0.292285</td>
<td>-0.553121</td>
<td>-0.045851</td>
<td>-0.054827</td>
<td>…</td>
<td>0.285581</td>
<td>-0.256814</td>
<td>-0.357785</td>
<td>-0.162417</td>
<td>-0.062715</td>
<td>-0.006854</td>
<td>-0.077991</td>
<td>-0.039478</td>
<td>1.000000</td>
<td>-0.565795</td>
</tr>
<tr>
<th>target</th>
<td>0.873212</td>
<td>0.871846</td>
<td>0.638878</td>
<td>0.512074</td>
<td>0.603984</td>
<td>0.370037</td>
<td>0.287815</td>
<td>0.831904</td>
<td>0.394767</td>
<td>0.594189</td>
<td>…</td>
<td>0.123329</td>
<td>0.187311</td>
<td>0.750297</td>
<td>0.066606</td>
<td>0.077273</td>
<td>-0.006034</td>
<td>0.140294</td>
<td>0.319309</td>
<td>-0.565795</td>
<td>1.000000</td>
</tr>
</tbody>
</table>
<p>33 rows × 33 columns</p>
</div>
上图为所有特征变量和target变量两两之间的相关系数,由此可以看出各个特征变量V0-V37之间的相关性以及特征变量V0-V37与target的相关性。
1.2.3 查找重要变量
查找出特征变量和target变量相关系数大于0.5的特征变量
由于’V14’, ‘V21’, ‘V25’, ‘V26’, ‘V32’, ‘V33’, 'V34’特征的相关系数值小于0.5,故认为这些特征与最终的预测target值不相关,删除这些特征变量;
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V10</th>
<th>V12</th>
<th>…</th>
<th>V27</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.566</td>
<td>0.016</td>
<td>-0.143</td>
<td>0.407</td>
<td>0.452</td>
<td>-1.812</td>
<td>-2.360</td>
<td>-0.436</td>
<td>-0.940</td>
<td>-0.073</td>
<td>…</td>
<td>0.168</td>
<td>0.136</td>
<td>0.109</td>
<td>-0.615</td>
<td>0.327</td>
<td>-4.627</td>
<td>-4.789</td>
<td>-5.101</td>
<td>-2.608</td>
<td>-3.508</td>
</tr>
<tr>
<th>1</th>
<td>0.968</td>
<td>0.437</td>
<td>0.066</td>
<td>0.566</td>
<td>0.194</td>
<td>-1.566</td>
<td>-2.360</td>
<td>0.332</td>
<td>0.188</td>
<td>-0.134</td>
<td>…</td>
<td>0.338</td>
<td>-0.128</td>
<td>0.124</td>
<td>0.032</td>
<td>0.600</td>
<td>-0.843</td>
<td>0.160</td>
<td>0.364</td>
<td>-0.335</td>
<td>-0.730</td>
</tr>
<tr>
<th>2</th>
<td>1.013</td>
<td>0.568</td>
<td>0.235</td>
<td>0.370</td>
<td>0.112</td>
<td>-1.367</td>
<td>-2.360</td>
<td>0.396</td>
<td>0.874</td>
<td>-0.072</td>
<td>…</td>
<td>0.326</td>
<td>-0.009</td>
<td>0.361</td>
<td>0.277</td>
<td>-0.116</td>
<td>-0.843</td>
<td>0.160</td>
<td>0.364</td>
<td>0.765</td>
<td>-0.589</td>
</tr>
<tr>
<th>3</th>
<td>0.733</td>
<td>0.368</td>
<td>0.283</td>
<td>0.165</td>
<td>0.599</td>
<td>-1.200</td>
<td>-2.086</td>
<td>0.403</td>
<td>0.011</td>
<td>-0.014</td>
<td>…</td>
<td>0.277</td>
<td>0.015</td>
<td>0.417</td>
<td>0.279</td>
<td>0.603</td>
<td>-0.843</td>
<td>-0.065</td>
<td>0.364</td>
<td>0.333</td>
<td>-0.112</td>
</tr>
<tr>
<th>4</th>
<td>0.684</td>
<td>0.638</td>
<td>0.260</td>
<td>0.209</td>
<td>0.337</td>
<td>-1.073</td>
<td>-2.086</td>
<td>0.314</td>
<td>-0.251</td>
<td>0.199</td>
<td>…</td>
<td>0.332</td>
<td>0.183</td>
<td>1.078</td>
<td>0.328</td>
<td>0.418</td>
<td>-0.843</td>
<td>-0.215</td>
<td>0.364</td>
<td>-0.280</td>
<td>-0.028</td>
</tr>
</tbody>
</table>
<p>5 rows × 32 columns</p>
</div>
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V10</th>
<th>V12</th>
<th>…</th>
<th>V27</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>…</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
<td>4813.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.694172</td>
<td>0.721357</td>
<td>0.602300</td>
<td>0.603139</td>
<td>0.523743</td>
<td>0.748823</td>
<td>0.745740</td>
<td>0.715607</td>
<td>0.348518</td>
<td>0.578507</td>
<td>…</td>
<td>0.881401</td>
<td>0.388683</td>
<td>0.589459</td>
<td>0.792709</td>
<td>0.628824</td>
<td>0.458493</td>
<td>0.483790</td>
<td>0.762873</td>
<td>0.332385</td>
<td>0.545795</td>
</tr>
<tr>
<th>std</th>
<td>0.144198</td>
<td>0.131443</td>
<td>0.140628</td>
<td>0.152462</td>
<td>0.106430</td>
<td>0.132560</td>
<td>0.132577</td>
<td>0.118105</td>
<td>0.134882</td>
<td>0.105088</td>
<td>…</td>
<td>0.128221</td>
<td>0.133475</td>
<td>0.130786</td>
<td>0.102976</td>
<td>0.155003</td>
<td>0.099095</td>
<td>0.101020</td>
<td>0.102037</td>
<td>0.127456</td>
<td>0.150356</td>
</tr>
<tr>
<th>min</th>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>…</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
</tr>
<tr>
<th>25%</th>
<td>0.626676</td>
<td>0.679416</td>
<td>0.514414</td>
<td>0.503888</td>
<td>0.478182</td>
<td>0.683324</td>
<td>0.696938</td>
<td>0.664934</td>
<td>0.284327</td>
<td>0.532892</td>
<td>…</td>
<td>0.888575</td>
<td>0.292445</td>
<td>0.550092</td>
<td>0.761816</td>
<td>0.562461</td>
<td>0.409037</td>
<td>0.454490</td>
<td>0.727273</td>
<td>0.270584</td>
<td>0.445647</td>
</tr>
<tr>
<th>50%</th>
<td>0.729488</td>
<td>0.752497</td>
<td>0.617072</td>
<td>0.614270</td>
<td>0.535866</td>
<td>0.774125</td>
<td>0.771974</td>
<td>0.742884</td>
<td>0.366469</td>
<td>0.591635</td>
<td>…</td>
<td>0.916015</td>
<td>0.375734</td>
<td>0.594428</td>
<td>0.815055</td>
<td>0.643056</td>
<td>0.454518</td>
<td>0.499949</td>
<td>0.800020</td>
<td>0.347056</td>
<td>0.539317</td>
</tr>
<tr>
<th>75%</th>
<td>0.790195</td>
<td>0.799553</td>
<td>0.700464</td>
<td>0.710474</td>
<td>0.585036</td>
<td>0.842259</td>
<td>0.836405</td>
<td>0.790835</td>
<td>0.432965</td>
<td>0.641971</td>
<td>…</td>
<td>0.932555</td>
<td>0.471837</td>
<td>0.650798</td>
<td>0.852229</td>
<td>0.719777</td>
<td>0.500000</td>
<td>0.511365</td>
<td>0.800020</td>
<td>0.414861</td>
<td>0.643061</td>
</tr>
<tr>
<th>max</th>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>…</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
</tr>
</tbody>
</table>
<p>8 rows × 32 columns</p>
</div>
2.数据特征工程
2.1数据预处理和特征处理
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>…</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
<td>2888.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.123048</td>
<td>0.056068</td>
<td>0.289720</td>
<td>-0.067790</td>
<td>0.012921</td>
<td>-0.558565</td>
<td>0.182892</td>
<td>0.116155</td>
<td>0.177856</td>
<td>-0.169452</td>
<td>…</td>
<td>0.097648</td>
<td>0.055477</td>
<td>0.127791</td>
<td>0.020806</td>
<td>0.007801</td>
<td>0.006715</td>
<td>0.197764</td>
<td>0.030658</td>
<td>-0.130330</td>
<td>0.126353</td>
</tr>
<tr>
<th>std</th>
<td>0.928031</td>
<td>0.941515</td>
<td>0.911236</td>
<td>0.970298</td>
<td>0.888377</td>
<td>0.517957</td>
<td>0.918054</td>
<td>0.955116</td>
<td>0.895444</td>
<td>0.953813</td>
<td>…</td>
<td>1.061200</td>
<td>0.901934</td>
<td>0.873028</td>
<td>0.902584</td>
<td>1.006995</td>
<td>1.003291</td>
<td>0.985675</td>
<td>0.970812</td>
<td>1.017196</td>
<td>0.983966</td>
</tr>
<tr>
<th>min</th>
<td>-4.335000</td>
<td>-5.122000</td>
<td>-3.420000</td>
<td>-3.956000</td>
<td>-4.742000</td>
<td>-2.182000</td>
<td>-4.576000</td>
<td>-5.048000</td>
<td>-4.692000</td>
<td>-12.891000</td>
<td>…</td>
<td>-2.912000</td>
<td>-4.507000</td>
<td>-5.859000</td>
<td>-4.053000</td>
<td>-4.627000</td>
<td>-4.789000</td>
<td>-5.695000</td>
<td>-2.608000</td>
<td>-3.630000</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>-0.297000</td>
<td>-0.226250</td>
<td>-0.313000</td>
<td>-0.652250</td>
<td>-0.385000</td>
<td>-0.853000</td>
<td>-0.310000</td>
<td>-0.295000</td>
<td>-0.159000</td>
<td>-0.390000</td>
<td>…</td>
<td>-0.664000</td>
<td>-0.283000</td>
<td>-0.170250</td>
<td>-0.407250</td>
<td>-0.499000</td>
<td>-0.290000</td>
<td>-0.202500</td>
<td>-0.413000</td>
<td>-0.798250</td>
<td>-0.350250</td>
</tr>
<tr>
<th>50%</th>
<td>0.359000</td>
<td>0.272500</td>
<td>0.386000</td>
<td>-0.044500</td>
<td>0.110000</td>
<td>-0.466000</td>
<td>0.388000</td>
<td>0.344000</td>
<td>0.362000</td>
<td>0.042000</td>
<td>…</td>
<td>-0.023000</td>
<td>0.053500</td>
<td>0.299500</td>
<td>0.039000</td>
<td>-0.040000</td>
<td>0.160000</td>
<td>0.364000</td>
<td>0.137000</td>
<td>-0.185500</td>
<td>0.313000</td>
</tr>
<tr>
<th>75%</th>
<td>0.726000</td>
<td>0.599000</td>
<td>0.918250</td>
<td>0.624000</td>
<td>0.550250</td>
<td>-0.154000</td>
<td>0.831250</td>
<td>0.782250</td>
<td>0.726000</td>
<td>0.042000</td>
<td>…</td>
<td>0.745250</td>
<td>0.488000</td>
<td>0.635000</td>
<td>0.557000</td>
<td>0.462000</td>
<td>0.273000</td>
<td>0.602000</td>
<td>0.644250</td>
<td>0.495250</td>
<td>0.793250</td>
</tr>
<tr>
<th>max</th>
<td>2.121000</td>
<td>1.918000</td>
<td>2.828000</td>
<td>2.457000</td>
<td>2.689000</td>
<td>0.489000</td>
<td>1.895000</td>
<td>1.918000</td>
<td>2.245000</td>
<td>1.335000</td>
<td>…</td>
<td>4.580000</td>
<td>2.689000</td>
<td>2.013000</td>
<td>2.395000</td>
<td>5.465000</td>
<td>5.110000</td>
<td>2.324000</td>
<td>5.238000</td>
<td>3.000000</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
<p>8 rows × 39 columns</p>
</div>
2.1.1 异常值分析
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.00000</td>
<td>…</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.123725</td>
<td>0.056856</td>
<td>0.290340</td>
<td>-0.068364</td>
<td>0.012254</td>
<td>-0.558971</td>
<td>0.183273</td>
<td>0.116274</td>
<td>0.178138</td>
<td>-0.16213</td>
<td>…</td>
<td>0.097019</td>
<td>0.058619</td>
<td>0.127617</td>
<td>0.023626</td>
<td>0.008271</td>
<td>0.006959</td>
<td>0.198513</td>
<td>0.030099</td>
<td>-0.131957</td>
<td>0.127451</td>
</tr>
<tr>
<th>std</th>
<td>0.927984</td>
<td>0.941269</td>
<td>0.911231</td>
<td>0.970357</td>
<td>0.888037</td>
<td>0.517871</td>
<td>0.918211</td>
<td>0.955418</td>
<td>0.895552</td>
<td>0.91089</td>
<td>…</td>
<td>1.060824</td>
<td>0.894311</td>
<td>0.873300</td>
<td>0.896509</td>
<td>1.007175</td>
<td>1.003411</td>
<td>0.985058</td>
<td>0.970258</td>
<td>1.015666</td>
<td>0.983144</td>
</tr>
<tr>
<th>min</th>
<td>-4.335000</td>
<td>-5.122000</td>
<td>-3.420000</td>
<td>-3.956000</td>
<td>-4.742000</td>
<td>-2.182000</td>
<td>-4.576000</td>
<td>-5.048000</td>
<td>-4.692000</td>
<td>-7.07100</td>
<td>…</td>
<td>-2.912000</td>
<td>-4.507000</td>
<td>-5.859000</td>
<td>-4.053000</td>
<td>-4.627000</td>
<td>-4.789000</td>
<td>-5.695000</td>
<td>-2.608000</td>
<td>-3.630000</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>-0.292000</td>
<td>-0.224250</td>
<td>-0.310000</td>
<td>-0.652750</td>
<td>-0.385000</td>
<td>-0.853000</td>
<td>-0.310000</td>
<td>-0.295000</td>
<td>-0.158750</td>
<td>-0.39000</td>
<td>…</td>
<td>-0.664000</td>
<td>-0.282000</td>
<td>-0.170750</td>
<td>-0.405000</td>
<td>-0.499000</td>
<td>-0.290000</td>
<td>-0.199750</td>
<td>-0.412750</td>
<td>-0.798750</td>
<td>-0.347500</td>
</tr>
<tr>
<th>50%</th>
<td>0.359500</td>
<td>0.273000</td>
<td>0.386000</td>
<td>-0.045000</td>
<td>0.109500</td>
<td>-0.466000</td>
<td>0.388500</td>
<td>0.345000</td>
<td>0.362000</td>
<td>0.04200</td>
<td>…</td>
<td>-0.023000</td>
<td>0.054500</td>
<td>0.299500</td>
<td>0.040000</td>
<td>-0.040000</td>
<td>0.160000</td>
<td>0.364000</td>
<td>0.137000</td>
<td>-0.186000</td>
<td>0.314000</td>
</tr>
<tr>
<th>75%</th>
<td>0.726000</td>
<td>0.599000</td>
<td>0.918750</td>
<td>0.623500</td>
<td>0.550000</td>
<td>-0.154000</td>
<td>0.831750</td>
<td>0.782750</td>
<td>0.726000</td>
<td>0.04200</td>
<td>…</td>
<td>0.745000</td>
<td>0.488000</td>
<td>0.635000</td>
<td>0.557000</td>
<td>0.462000</td>
<td>0.273000</td>
<td>0.602000</td>
<td>0.643750</td>
<td>0.493000</td>
<td>0.793750</td>
</tr>
<tr>
<th>max</th>
<td>2.121000</td>
<td>1.918000</td>
<td>2.828000</td>
<td>2.457000</td>
<td>2.689000</td>
<td>0.489000</td>
<td>1.895000</td>
<td>1.918000</td>
<td>2.245000</td>
<td>1.33500</td>
<td>…</td>
<td>4.580000</td>
<td>2.689000</td>
<td>2.013000</td>
<td>2.395000</td>
<td>5.465000</td>
<td>5.110000</td>
<td>2.324000</td>
<td>5.238000</td>
<td>3.000000</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
<p>8 rows × 39 columns</p>
</div>
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V28</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>…</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
</tr>
<tr>
<th>mean</th>
<td>-0.184404</td>
<td>-0.083912</td>
<td>-0.434762</td>
<td>0.101671</td>
<td>-0.019172</td>
<td>0.838049</td>
<td>-0.274092</td>
<td>-0.173971</td>
<td>-0.266709</td>
<td>0.255114</td>
<td>…</td>
<td>-0.206871</td>
<td>-0.146463</td>
<td>-0.083215</td>
<td>-0.191729</td>
<td>-0.030782</td>
<td>-0.011433</td>
<td>-0.009985</td>
<td>-0.296895</td>
<td>-0.046270</td>
<td>0.195735</td>
</tr>
<tr>
<th>std</th>
<td>1.073333</td>
<td>1.076670</td>
<td>0.969541</td>
<td>1.034925</td>
<td>1.147286</td>
<td>0.963043</td>
<td>1.054119</td>
<td>1.040101</td>
<td>1.085916</td>
<td>1.014394</td>
<td>…</td>
<td>1.064140</td>
<td>0.880593</td>
<td>1.126414</td>
<td>1.138454</td>
<td>1.130228</td>
<td>0.989732</td>
<td>0.995213</td>
<td>0.946896</td>
<td>1.040854</td>
<td>0.940599</td>
</tr>
<tr>
<th>min</th>
<td>-4.814000</td>
<td>-5.488000</td>
<td>-4.283000</td>
<td>-3.276000</td>
<td>-4.921000</td>
<td>-1.168000</td>
<td>-5.649000</td>
<td>-5.625000</td>
<td>-6.059000</td>
<td>-6.784000</td>
<td>…</td>
<td>-2.435000</td>
<td>-2.413000</td>
<td>-4.507000</td>
<td>-7.698000</td>
<td>-4.057000</td>
<td>-4.627000</td>
<td>-4.789000</td>
<td>-7.477000</td>
<td>-2.608000</td>
<td>-3.346000</td>
</tr>
<tr>
<th>25%</th>
<td>-0.664000</td>
<td>-0.451000</td>
<td>-0.978000</td>
<td>-0.644000</td>
<td>-0.497000</td>
<td>0.122000</td>
<td>-0.732000</td>
<td>-0.509000</td>
<td>-0.775000</td>
<td>-0.390000</td>
<td>…</td>
<td>-0.453000</td>
<td>-0.818000</td>
<td>-0.339000</td>
<td>-0.476000</td>
<td>-0.472000</td>
<td>-0.460000</td>
<td>-0.290000</td>
<td>-0.349000</td>
<td>-0.593000</td>
<td>-0.432000</td>
</tr>
<tr>
<th>50%</th>
<td>0.065000</td>
<td>0.195000</td>
<td>-0.267000</td>
<td>0.220000</td>
<td>0.118000</td>
<td>0.437000</td>
<td>-0.082000</td>
<td>0.018000</td>
<td>-0.004000</td>
<td>0.401000</td>
<td>…</td>
<td>-0.445000</td>
<td>-0.199000</td>
<td>0.010000</td>
<td>0.100000</td>
<td>0.155000</td>
<td>-0.040000</td>
<td>0.160000</td>
<td>-0.270000</td>
<td>0.083000</td>
<td>0.152000</td>
</tr>
<tr>
<th>75%</th>
<td>0.549000</td>
<td>0.589000</td>
<td>0.278000</td>
<td>0.793000</td>
<td>0.610000</td>
<td>1.928000</td>
<td>0.457000</td>
<td>0.515000</td>
<td>0.482000</td>
<td>0.904000</td>
<td>…</td>
<td>-0.434000</td>
<td>0.468000</td>
<td>0.447000</td>
<td>0.471000</td>
<td>0.627000</td>
<td>0.419000</td>
<td>0.273000</td>
<td>0.364000</td>
<td>0.651000</td>
<td>0.797000</td>
</tr>
<tr>
<th>max</th>
<td>2.100000</td>
<td>2.120000</td>
<td>1.946000</td>
<td>2.603000</td>
<td>4.475000</td>
<td>3.176000</td>
<td>1.528000</td>
<td>1.394000</td>
<td>2.408000</td>
<td>1.766000</td>
<td>…</td>
<td>4.656000</td>
<td>3.022000</td>
<td>3.139000</td>
<td>1.428000</td>
<td>2.299000</td>
<td>5.465000</td>
<td>5.110000</td>
<td>1.671000</td>
<td>2.861000</td>
<td>3.021000</td>
</tr>
</tbody>
</table>
<p>8 rows × 38 columns</p>
</div>
2.1.2 归一化处理
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V28</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>…</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
<td>1925.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.642905</td>
<td>0.715637</td>
<td>0.477791</td>
<td>0.632726</td>
<td>0.635558</td>
<td>1.130681</td>
<td>0.664798</td>
<td>0.699688</td>
<td>0.637926</td>
<td>0.871534</td>
<td>…</td>
<td>0.313556</td>
<td>0.369132</td>
<td>0.614756</td>
<td>0.719928</td>
<td>0.623793</td>
<td>0.457349</td>
<td>0.482778</td>
<td>0.673164</td>
<td>0.326501</td>
<td>0.577034</td>
</tr>
<tr>
<th>std</th>
<td>0.166253</td>
<td>0.152936</td>
<td>0.155176</td>
<td>0.161379</td>
<td>0.154392</td>
<td>0.360555</td>
<td>0.162899</td>
<td>0.149311</td>
<td>0.156540</td>
<td>0.120675</td>
<td>…</td>
<td>0.149752</td>
<td>0.117538</td>
<td>0.156533</td>
<td>0.144621</td>
<td>0.175284</td>
<td>0.098071</td>
<td>0.100537</td>
<td>0.118082</td>
<td>0.132661</td>
<td>0.141870</td>
</tr>
<tr>
<th>min</th>
<td>-0.074195</td>
<td>-0.051989</td>
<td>-0.138124</td>
<td>0.106035</td>
<td>-0.024088</td>
<td>0.379633</td>
<td>-0.165817</td>
<td>-0.082831</td>
<td>-0.197059</td>
<td>0.034142</td>
<td>…</td>
<td>0.000000</td>
<td>0.066604</td>
<td>0.000000</td>
<td>-0.233613</td>
<td>-0.000620</td>
<td>0.000000</td>
<td>0.000000</td>
<td>-0.222222</td>
<td>0.000000</td>
<td>0.042836</td>
</tr>
<tr>
<th>25%</th>
<td>0.568618</td>
<td>0.663494</td>
<td>0.390845</td>
<td>0.516451</td>
<td>0.571256</td>
<td>0.862598</td>
<td>0.594035</td>
<td>0.651593</td>
<td>0.564653</td>
<td>0.794789</td>
<td>…</td>
<td>0.278919</td>
<td>0.279498</td>
<td>0.579211</td>
<td>0.683816</td>
<td>0.555366</td>
<td>0.412901</td>
<td>0.454490</td>
<td>0.666667</td>
<td>0.256819</td>
<td>0.482353</td>
</tr>
<tr>
<th>50%</th>
<td>0.681537</td>
<td>0.755256</td>
<td>0.504641</td>
<td>0.651177</td>
<td>0.654017</td>
<td>0.980532</td>
<td>0.694483</td>
<td>0.727247</td>
<td>0.675796</td>
<td>0.888889</td>
<td>…</td>
<td>0.280045</td>
<td>0.362120</td>
<td>0.627710</td>
<td>0.756987</td>
<td>0.652605</td>
<td>0.454518</td>
<td>0.499949</td>
<td>0.676518</td>
<td>0.342977</td>
<td>0.570437</td>
</tr>
<tr>
<th>75%</th>
<td>0.756506</td>
<td>0.811222</td>
<td>0.591869</td>
<td>0.740527</td>
<td>0.720226</td>
<td>1.538750</td>
<td>0.777778</td>
<td>0.798593</td>
<td>0.745856</td>
<td>0.948727</td>
<td>…</td>
<td>0.281593</td>
<td>0.451148</td>
<td>0.688438</td>
<td>0.804116</td>
<td>0.725806</td>
<td>0.500000</td>
<td>0.511365</td>
<td>0.755580</td>
<td>0.415371</td>
<td>0.667722</td>
</tr>
<tr>
<th>max</th>
<td>0.996747</td>
<td>1.028693</td>
<td>0.858835</td>
<td>1.022766</td>
<td>1.240345</td>
<td>2.005990</td>
<td>0.943285</td>
<td>0.924777</td>
<td>1.023497</td>
<td>1.051273</td>
<td>…</td>
<td>0.997889</td>
<td>0.792045</td>
<td>1.062535</td>
<td>0.925686</td>
<td>0.985112</td>
<td>1.000000</td>
<td>1.000000</td>
<td>0.918568</td>
<td>0.697043</td>
<td>1.003167</td>
</tr>
</tbody>
</table>
<p>8 rows × 38 columns</p>
</div>
查看特征’V5’, ‘V17’, ‘V28’, ‘V22’, ‘V11’, 'V9’数据的数据分布
这几个特征下,训练集的数据和测试集的数据分布不一致,会影响模型的泛化能力,故删除这些特征
3.1.3 特征相关性
2.2 特征降维
2.2.1 相关性初筛
2.2.2 多重共线性分析
2.2.3 PCA处理降维
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2.886000e+03</td>
<td>2886.000000</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2886.000000</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2884.000000</td>
</tr>
<tr>
<th>mean</th>
<td>2.954440e-17</td>
<td>0.000000</td>
<td>3.200643e-17</td>
<td>4.924066e-18</td>
<td>7.139896e-17</td>
<td>-2.585135e-17</td>
<td>7.878506e-17</td>
<td>-5.170269e-17</td>
<td>-9.848132e-17</td>
<td>1.218706e-16</td>
<td>-7.016794e-17</td>
<td>1.181776e-16</td>
<td>0.000000</td>
<td>-3.446846e-17</td>
<td>-3.446846e-17</td>
<td>8.863319e-17</td>
<td>0.127274</td>
</tr>
<tr>
<th>std</th>
<td>3.998976e-01</td>
<td>0.350024</td>
<td>2.938631e-01</td>
<td>2.728023e-01</td>
<td>2.077128e-01</td>
<td>1.951842e-01</td>
<td>1.877104e-01</td>
<td>1.607670e-01</td>
<td>1.512707e-01</td>
<td>1.443772e-01</td>
<td>1.368790e-01</td>
<td>1.286192e-01</td>
<td>0.119330</td>
<td>1.149758e-01</td>
<td>1.133507e-01</td>
<td>1.019259e-01</td>
<td>0.983462</td>
</tr>
<tr>
<th>min</th>
<td>-1.071795e+00</td>
<td>-0.942948</td>
<td>-9.948314e-01</td>
<td>-7.103087e-01</td>
<td>-7.703987e-01</td>
<td>-5.340294e-01</td>
<td>-5.993766e-01</td>
<td>-5.870755e-01</td>
<td>-6.282818e-01</td>
<td>-4.902583e-01</td>
<td>-6.341045e-01</td>
<td>-5.906753e-01</td>
<td>-0.417515</td>
<td>-4.310613e-01</td>
<td>-4.170535e-01</td>
<td>-3.601627e-01</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>-2.804085e-01</td>
<td>-0.261373</td>
<td>-2.090797e-01</td>
<td>-1.945196e-01</td>
<td>-1.315620e-01</td>
<td>-1.264097e-01</td>
<td>-1.236360e-01</td>
<td>-1.016452e-01</td>
<td>-9.662098e-02</td>
<td>-9.297088e-02</td>
<td>-8.202809e-02</td>
<td>-7.721868e-02</td>
<td>-0.071400</td>
<td>-7.474073e-02</td>
<td>-7.709743e-02</td>
<td>-6.603914e-02</td>
<td>-0.348500</td>
</tr>
<tr>
<th>50%</th>
<td>-1.417104e-02</td>
<td>-0.012772</td>
<td>2.112166e-02</td>
<td>-2.337401e-02</td>
<td>-5.122797e-03</td>
<td>-1.355336e-02</td>
<td>-1.747870e-04</td>
<td>-4.656359e-03</td>
<td>2.572054e-03</td>
<td>-1.479172e-03</td>
<td>7.286444e-03</td>
<td>-5.745946e-03</td>
<td>-0.004141</td>
<td>1.054915e-03</td>
<td>-1.758387e-03</td>
<td>-7.533392e-04</td>
<td>0.313000</td>
</tr>
<tr>
<th>75%</th>
<td>2.287306e-01</td>
<td>0.231772</td>
<td>2.069571e-01</td>
<td>1.657590e-01</td>
<td>1.281660e-01</td>
<td>9.993122e-02</td>
<td>1.272081e-01</td>
<td>9.657222e-02</td>
<td>1.002626e-01</td>
<td>9.059634e-02</td>
<td>8.833765e-02</td>
<td>7.148033e-02</td>
<td>0.067862</td>
<td>7.574868e-02</td>
<td>7.116829e-02</td>
<td>6.357449e-02</td>
<td>0.794250</td>
</tr>
<tr>
<th>max</th>
<td>1.597730e+00</td>
<td>1.382802</td>
<td>1.010250e+00</td>
<td>1.448007e+00</td>
<td>1.034061e+00</td>
<td>1.358962e+00</td>
<td>6.191589e-01</td>
<td>7.370089e-01</td>
<td>6.449125e-01</td>
<td>5.839586e-01</td>
<td>6.405187e-01</td>
<td>6.780732e-01</td>
<td>0.515612</td>
<td>4.978126e-01</td>
<td>4.673189e-01</td>
<td>4.570870e-01</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
</div>
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>V0</th>
<th>V1</th>
<th>V2</th>
<th>V3</th>
<th>V4</th>
<th>V5</th>
<th>V6</th>
<th>V7</th>
<th>V8</th>
<th>V9</th>
<th>…</th>
<th>V29</th>
<th>V30</th>
<th>V31</th>
<th>V32</th>
<th>V33</th>
<th>V34</th>
<th>V35</th>
<th>V36</th>
<th>V37</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>…</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2886.000000</td>
<td>2884.000000</td>
</tr>
<tr>
<th>mean</th>
<td>0.690633</td>
<td>0.735633</td>
<td>0.593844</td>
<td>0.606212</td>
<td>0.639787</td>
<td>0.607649</td>
<td>0.735477</td>
<td>0.741354</td>
<td>0.702053</td>
<td>0.821897</td>
<td>…</td>
<td>0.401631</td>
<td>0.634466</td>
<td>0.760495</td>
<td>0.632231</td>
<td>0.459302</td>
<td>0.484489</td>
<td>0.734944</td>
<td>0.336235</td>
<td>0.527608</td>
<td>0.127274</td>
</tr>
<tr>
<th>std</th>
<td>0.143740</td>
<td>0.133703</td>
<td>0.145844</td>
<td>0.151311</td>
<td>0.119504</td>
<td>0.193887</td>
<td>0.141896</td>
<td>0.137154</td>
<td>0.129098</td>
<td>0.108362</td>
<td>…</td>
<td>0.141594</td>
<td>0.124279</td>
<td>0.110938</td>
<td>0.139037</td>
<td>0.099799</td>
<td>0.101365</td>
<td>0.122840</td>
<td>0.123663</td>
<td>0.153192</td>
<td>0.983462</td>
</tr>
<tr>
<th>min</th>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>…</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>0.000000</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>0.626239</td>
<td>0.695703</td>
<td>0.497759</td>
<td>0.515087</td>
<td>0.586328</td>
<td>0.497566</td>
<td>0.659249</td>
<td>0.682314</td>
<td>0.653489</td>
<td>0.794789</td>
<td>…</td>
<td>0.300053</td>
<td>0.587132</td>
<td>0.722593</td>
<td>0.565757</td>
<td>0.409037</td>
<td>0.454490</td>
<td>0.685279</td>
<td>0.279792</td>
<td>0.427036</td>
<td>-0.348500</td>
</tr>
<tr>
<th>50%</th>
<td>0.727153</td>
<td>0.766335</td>
<td>0.609155</td>
<td>0.609855</td>
<td>0.652873</td>
<td>0.642456</td>
<td>0.767192</td>
<td>0.774189</td>
<td>0.728557</td>
<td>0.846181</td>
<td>…</td>
<td>0.385611</td>
<td>0.633894</td>
<td>0.782330</td>
<td>0.634770</td>
<td>0.454518</td>
<td>0.499949</td>
<td>0.755580</td>
<td>0.349860</td>
<td>0.519457</td>
<td>0.313000</td>
</tr>
<tr>
<th>75%</th>
<td>0.783922</td>
<td>0.812642</td>
<td>0.694422</td>
<td>0.714096</td>
<td>0.712152</td>
<td>0.759266</td>
<td>0.835690</td>
<td>0.837030</td>
<td>0.781029</td>
<td>0.846181</td>
<td>…</td>
<td>0.488121</td>
<td>0.694136</td>
<td>0.824949</td>
<td>0.714950</td>
<td>0.504261</td>
<td>0.511365</td>
<td>0.785260</td>
<td>0.414447</td>
<td>0.621870</td>
<td>0.794250</td>
</tr>
<tr>
<th>max</th>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>…</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>1.000000</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
<p>8 rows × 39 columns</p>
</div>
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
</style>
<table border=“1” class=“dataframe”>
<thead>
<tr style=“text-align: right;”>
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>…</th>
<th>12</th>
<th>13</th>
<th>14</th>
<th>15</th>
<th>16</th>
<th>17</th>
<th>18</th>
<th>19</th>
<th>20</th>
<th>target</th>
</tr>
</thead>
<tbody>
<tr>
<th>count</th>
<td>2.886000e+03</td>
<td>2886.000000</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>…</td>
<td>2886.000000</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2.886000e+03</td>
<td>2884.000000</td>
</tr>
<tr>
<th>mean</th>
<td>2.954440e-17</td>
<td>0.000000</td>
<td>3.200643e-17</td>
<td>4.924066e-18</td>
<td>7.139896e-17</td>
<td>-2.585135e-17</td>
<td>7.878506e-17</td>
<td>-5.170269e-17</td>
<td>-9.848132e-17</td>
<td>1.218706e-16</td>
<td>…</td>
<td>0.000000</td>
<td>-3.446846e-17</td>
<td>-3.446846e-17</td>
<td>8.863319e-17</td>
<td>4.493210e-17</td>
<td>1.107915e-17</td>
<td>-1.908076e-17</td>
<td>7.293773e-17</td>
<td>-1.224861e-16</td>
<td>0.127274</td>
</tr>
<tr>
<th>std</th>
<td>3.998976e-01</td>
<td>0.350024</td>
<td>2.938631e-01</td>
<td>2.728023e-01</td>
<td>2.077128e-01</td>
<td>1.951842e-01</td>
<td>1.877104e-01</td>
<td>1.607670e-01</td>
<td>1.512707e-01</td>
<td>1.443772e-01</td>
<td>…</td>
<td>0.119330</td>
<td>1.149758e-01</td>
<td>1.133507e-01</td>
<td>1.019259e-01</td>
<td>9.617307e-02</td>
<td>9.205940e-02</td>
<td>8.423171e-02</td>
<td>8.295263e-02</td>
<td>7.696785e-02</td>
<td>0.983462</td>
</tr>
<tr>
<th>min</th>
<td>-1.071795e+00</td>
<td>-0.942948</td>
<td>-9.948314e-01</td>
<td>-7.103087e-01</td>
<td>-7.703987e-01</td>
<td>-5.340294e-01</td>
<td>-5.993766e-01</td>
<td>-5.870755e-01</td>
<td>-6.282818e-01</td>
<td>-4.902583e-01</td>
<td>…</td>
<td>-0.417515</td>
<td>-4.310613e-01</td>
<td>-4.170535e-01</td>
<td>-3.601627e-01</td>
<td>-3.432530e-01</td>
<td>-3.530609e-01</td>
<td>-3.908328e-01</td>
<td>-3.089560e-01</td>
<td>-2.867812e-01</td>
<td>-3.044000</td>
</tr>
<tr>
<th>25%</th>
<td>-2.804085e-01</td>
<td>-0.261373</td>
<td>-2.090797e-01</td>
<td>-1.945196e-01</td>
<td>-1.315620e-01</td>
<td>-1.264097e-01</td>
<td>-1.236360e-01</td>
<td>-1.016452e-01</td>
<td>-9.662098e-02</td>
<td>-9.297088e-02</td>
<td>…</td>
<td>-0.071400</td>
<td>-7.474073e-02</td>
<td>-7.709743e-02</td>
<td>-6.603914e-02</td>
<td>-6.064846e-02</td>
<td>-6.247177e-02</td>
<td>-5.357475e-02</td>
<td>-5.279870e-02</td>
<td>-4.930849e-02</td>
<td>-0.348500</td>
</tr>
<tr>
<th>50%</th>
<td>-1.417104e-02</td>
<td>-0.012772</td>
<td>2.112166e-02</td>
<td>-2.337401e-02</td>
<td>-5.122797e-03</td>
<td>-1.355336e-02</td>
<td>-1.747870e-04</td>
<td>-4.656359e-03</td>
<td>2.572054e-03</td>
<td>-1.479172e-03</td>
<td>…</td>
<td>-0.004141</td>
<td>1.054915e-03</td>
<td>-1.758387e-03</td>
<td>-7.533392e-04</td>
<td>-4.559279e-03</td>
<td>-2.317781e-03</td>
<td>-3.034317e-04</td>
<td>3.391130e-03</td>
<td>-1.703944e-03</td>
<td>0.313000</td>
</tr>
<tr>
<th>75%</th>
<td>2.287306e-01</td>
<td>0.231772</td>
<td>2.069571e-01</td>
<td>1.657590e-01</td>
<td>1.281660e-01</td>
<td>9.993122e-02</td>
<td>1.272081e-01</td>
<td>9.657222e-02</td>
<td>1.002626e-01</td>
<td>9.059634e-02</td>
<td>…</td>
<td>0.067862</td>
<td>7.574868e-02</td>
<td>7.116829e-02</td>
<td>6.357449e-02</td>
<td>5.732624e-02</td>
<td>6.139602e-02</td>
<td>5.068802e-02</td>
<td>5.084688e-02</td>
<td>4.693391e-02</td>
<td>0.794250</td>
</tr>
<tr>
<th>max</th>
<td>1.597730e+00</td>
<td>1.382802</td>
<td>1.010250e+00</td>
<td>1.448007e+00</td>
<td>1.034061e+00</td>
<td>1.358962e+00</td>
<td>6.191589e-01</td>
<td>7.370089e-01</td>
<td>6.449125e-01</td>
<td>5.839586e-01</td>
<td>…</td>
<td>0.515612</td>
<td>4.978126e-01</td>
<td>4.673189e-01</td>
<td>4.570870e-01</td>
<td>5.153325e-01</td>
<td>3.556862e-01</td>
<td>4.709891e-01</td>
<td>3.677911e-01</td>
<td>3.663361e-01</td>
<td>2.538000</td>
</tr>
</tbody>
</table>
<p>8 rows × 22 columns</p>
</div>
3.模型训练
3.1 回归及相关模型
3.1.1 多元线性回归模型
定义绘制模型学习曲线函数
3.1.2 KNN近邻回归
3.1.3决策树回归
3.1.4 随机森林回归
3.1.5 Gradient Boosting
3.1.6 lightgbm回归
4.篇中总结
在工业蒸汽量预测上篇中,主要讲解了数据探索性分析:查看变量间相关性以及找出关键变量;数据特征工程对数据精进:异常值处理、归一化处理以及特征降维;在进行归回模型训练涉及主流ML模型:决策树、随机森林,lightgbm等。下一篇中将着重讲解模型验证、特征优化、模型融合等。
原项目链接: https://www.heywhale.com/home/column/64141d6b1c8c8b518ba97dcc