Al-Cu-Mg-x合金被选为本工作的基础材料
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data=pd.read_excel('1-s2.0-S2352492820329081-mmc1.xlsx',header=1)
data.head()
Hardness | Al | Cu | Mg | Si | Zn | Zr | Mn | Ag | Li | ... | atomic radius | Specfic heat | Heat of Vaporization | thermal conductivity | group | period | Time (min) | Temearture (K) | Unnamed: 34 | Reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 169 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 60.0 | 403 | NaN | Scr mat 64, 21 |
1 | 174 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 180.0 | 403 | NaN | Scr mat 64, 21 |
2 | 180 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 360.0 | 403 | NaN | Scr mat 64, 21 |
3 | 196 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 720.0 | 403 | NaN | Scr mat 64, 21 |
4 | 187 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 900.0 | 403 | NaN | Scr mat 64, 21 |
5 rows × 36 columns
data.dropna(axis=1,inplace=True)
data.head()
Hardness | Al | Cu | Mg | Si | Zn | Zr | Mn | Ag | Li | ... | Fusion heat | atomic radius | Specfic heat | Heat of Vaporization | thermal conductivity | group | period | Time (min) | Temearture (K) | Reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 169 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 60.0 | 403 | Scr mat 64, 21 |
1 | 174 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 180.0 | 403 | Scr mat 64, 21 |
2 | 180 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 360.0 | 403 | Scr mat 64, 21 |
3 | 196 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 720.0 | 403 | Scr mat 64, 21 |
4 | 187 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 900.0 | 403 | Scr mat 64, 21 |
5 rows × 35 columns
data.reset_index(drop=True)
data.drop(axis=1,labels=data.columns[-1],inplace=True)
data.head()
Hardness | Al | Cu | Mg | Si | Zn | Zr | Mn | Ag | Li | ... | Electroaffinity | Fusion heat | atomic radius | Specfic heat | Heat of Vaporization | thermal conductivity | group | period | Time (min) | Temearture (K) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 169 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 41.4625 | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 60.0 | 403 |
1 | 174 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 41.4625 | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 180.0 | 403 |
2 | 180 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 41.4625 | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 360.0 | 403 |
3 | 196 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 41.4625 | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 720.0 | 403 |
4 | 187 | 0.9338 | 0.015 | 0.025 | 0.0 | 0.057 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 41.4625 | 10.82461 | 124.0824 | 897.5312 | 288.0864 | 236.283 | 13.0384 | 3.1644 | 900.0 | 403 |
5 rows × 34 columns
各种 Al-Cu-Mg-x 基合金的成分和时效条件(温度和时间)以及相关性能(硬度)等特征
物理特性 X i X_{i} Xi:元素i的电负性,原子序数,原子质量,原子半径,价电子,沸点,比热,熔点,汽化热,聚变热,组,周期,电亲和力,密度,导热性
C i C_{i} Ci:元素i在合金的浓度
X = Σ C i X i X=\Sigma C_{i} X_{i} X=ΣCiXi
X是特征加权分数
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1591 entries, 0 to 1590
Data columns (total 34 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Hardness 1591 non-null int64
1 Al 1591 non-null float64
2 Cu 1591 non-null float64
3 Mg 1591 non-null float64
4 Si 1591 non-null float64
5 Zn 1591 non-null float64
6 Zr 1591 non-null float64
7 Mn 1591 non-null float64
8 Ag 1591 non-null float64
9 Li 1591 non-null float64
10 Ca 1591 non-null int64
11 Fe 1591 non-null float64
12 Ti 1591 non-null float64
13 Sn 1591 non-null float64
14 Cr 1591 non-null float64
15 Ge 1591 non-null float64
16 Sc 1591 non-null float64
17 EN 1591 non-null float64
18 VE 1591 non-null float64
19 Atomic number 1591 non-null float64
20 mass 1591 non-null float64
21 Melting point 1591 non-null float64
22 Boiling point 1591 non-null float64
23 Density 1591 non-null float64
24 Electroaffinity 1591 non-null float64
25 Fusion heat 1591 non-null float64
26 atomic radius 1591 non-null float64
27 Specfic heat 1591 non-null float64
28 Heat of Vaporization 1591 non-null float64
29 thermal conductivity 1591 non-null float64
30 group 1591 non-null float64
31 period 1591 non-null float64
32 Time (min) 1591 non-null float64
33 Temearture (K) 1591 non-null int64
dtypes: float64(31), int64(3)
memory usage: 422.7 KB
data.isnull().sum()
Hardness 0
Al 0
Cu 0
Mg 0
Si 0
Zn 0
Zr 0
Mn 0
Ag 0
Li 0
Ca 0
Fe 0
Ti 0
Sn 0
Cr 0
Ge 0
Sc 0
EN 0
VE 0
Atomic number 0
mass 0
Melting point 0
Boiling point 0
Density 0
Electroaffinity 0
Fusion heat 0
atomic radius 0
Specfic heat 0
Heat of Vaporization 0
thermal conductivity 0
group 0
period 0
Time (min) 0
Temearture (K) 0
dtype: int64
LOF异常值检验
每个样本的异常值称为局部异常因子。它测量给定样品相对于其邻近样品密度的局部偏差。它是局部的,因为异常评分取决于物体相对于周围邻居的隔离程度。更准确地说,局部性是由 k 个最近邻居给出的,它们的距离用来估计局部密度。通过比较样本的局部密度与其邻近样本的局部密度,我们可以识别出密度大大低于其邻近样本的样本。这些被认为是异常值。
- n_neighbors:如果 n _ neighbors大于提供的样本数,则将使用所有样本。
- algorithm:{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}
df=data.values
df.shape
(1591, 34)
from sklearn.neighbors import LocalOutlierFactor
#10个样本一组,异常值比例为0.055(和论文差不多)
LOF=LocalOutlierFactor(n_neighbors=10,contamination=0.055)
#inliner为1,outliner为-1
lables=LOF.fit_predict(df)
#LOF检验完的数据
df=df[lables>0]
df.shape
(1503, 34)
y=df[:,0]
y.shape
(1503,)
X=df[:,1:]
X.shape
(1503, 33)
划分数据和标准化数据
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,
y,
test_size=0.4,
random_state=0)
X_train.shape,X_test.shape
((901, 33), (602, 33))
from sklearn.preprocessing import StandardScaler
sc1=StandardScaler()
X_train=sc1.fit_transform(X_train)
X_test=sc1.transform(X_test)
sc2=StandardScaler()
y_train=sc2.fit_transform(y_train.reshape(-1,1))
y_test=sc2.transform(y_test.reshape(-1,1))
y_test_inverse=sc2.inverse_transform(y_test)#反归一化
特征选择
filter
name_list=list(data.columns[17:])#是在原列表变量上进行修改,不会返回一个新的修改后的列表
name_list.append('Hardness')
name=pd.Index