Machine-learning-aided-design-of-aluminum-alloys复现

Al-Cu-Mg-x合金被选为本工作的基础材料

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data=pd.read_excel('1-s2.0-S2352492820329081-mmc1.xlsx',header=1)
data.head()
Hardness Al Cu Mg Si Zn Zr Mn Ag Li ... atomic radius Specfic heat Heat of Vaporization thermal conductivity group period Time (min) Temearture (K) Unnamed: 34 Reference
0 169 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 60.0 403 NaN Scr mat 64, 21
1 174 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 180.0 403 NaN Scr mat 64, 21
2 180 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 360.0 403 NaN Scr mat 64, 21
3 196 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 720.0 403 NaN Scr mat 64, 21
4 187 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 900.0 403 NaN Scr mat 64, 21

5 rows × 36 columns

data.dropna(axis=1,inplace=True)
data.head()
Hardness Al Cu Mg Si Zn Zr Mn Ag Li ... Fusion heat atomic radius Specfic heat Heat of Vaporization thermal conductivity group period Time (min) Temearture (K) Reference
0 169 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 60.0 403 Scr mat 64, 21
1 174 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 180.0 403 Scr mat 64, 21
2 180 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 360.0 403 Scr mat 64, 21
3 196 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 720.0 403 Scr mat 64, 21
4 187 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 900.0 403 Scr mat 64, 21

5 rows × 35 columns

data.reset_index(drop=True)
data.drop(axis=1,labels=data.columns[-1],inplace=True)
data.head()
Hardness Al Cu Mg Si Zn Zr Mn Ag Li ... Electroaffinity Fusion heat atomic radius Specfic heat Heat of Vaporization thermal conductivity group period Time (min) Temearture (K)
0 169 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 41.4625 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 60.0 403
1 174 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 41.4625 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 180.0 403
2 180 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 41.4625 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 360.0 403
3 196 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 41.4625 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 720.0 403
4 187 0.9338 0.015 0.025 0.0 0.057 0.0 0.0 0.0 0.0 ... 41.4625 10.82461 124.0824 897.5312 288.0864 236.283 13.0384 3.1644 900.0 403

5 rows × 34 columns

各种 Al-Cu-Mg-x 基合金的成分和时效条件(温度和时间)以及相关性能(硬度)等特征

物理特性 X i X_{i} Xi:元素i的电负性,原子序数,原子质量,原子半径,价电子,沸点,比热,熔点,汽化热,聚变热,组,周期,电亲和力,密度,导热性

C i C_{i} Ci:元素i在合金的浓度

X = Σ C i X i X=\Sigma C_{i} X_{i} X=ΣCiXi

X是特征加权分数

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1591 entries, 0 to 1590
Data columns (total 34 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Hardness               1591 non-null   int64  
 1   Al                     1591 non-null   float64
 2   Cu                     1591 non-null   float64
 3   Mg                     1591 non-null   float64
 4   Si                     1591 non-null   float64
 5   Zn                     1591 non-null   float64
 6   Zr                     1591 non-null   float64
 7   Mn                     1591 non-null   float64
 8   Ag                     1591 non-null   float64
 9   Li                     1591 non-null   float64
 10  Ca                     1591 non-null   int64  
 11  Fe                     1591 non-null   float64
 12  Ti                     1591 non-null   float64
 13  Sn                     1591 non-null   float64
 14  Cr                     1591 non-null   float64
 15  Ge                     1591 non-null   float64
 16  Sc                     1591 non-null   float64
 17  EN                     1591 non-null   float64
 18  VE                     1591 non-null   float64
 19  Atomic number          1591 non-null   float64
 20  mass                   1591 non-null   float64
 21  Melting point          1591 non-null   float64
 22  Boiling point          1591 non-null   float64
 23  Density                1591 non-null   float64
 24  Electroaffinity        1591 non-null   float64
 25  Fusion heat            1591 non-null   float64
 26  atomic radius          1591 non-null   float64
 27  Specfic heat           1591 non-null   float64
 28  Heat of Vaporization   1591 non-null   float64
 29  thermal conductivity   1591 non-null   float64
 30  group                  1591 non-null   float64
 31  period                 1591 non-null   float64
 32  Time  (min)            1591 non-null   float64
 33  Temearture (K)         1591 non-null   int64  
dtypes: float64(31), int64(3)
memory usage: 422.7 KB
data.isnull().sum()
Hardness                 0
Al                       0
Cu                       0
Mg                       0
Si                       0
Zn                       0
Zr                       0
Mn                       0
Ag                       0
Li                       0
Ca                       0
Fe                       0
Ti                       0
Sn                       0
Cr                       0
Ge                       0
Sc                       0
EN                       0
VE                       0
Atomic number            0
mass                     0
Melting point            0
Boiling point            0
Density                  0
Electroaffinity          0
Fusion heat              0
atomic radius            0
Specfic heat             0
Heat of Vaporization     0
thermal conductivity     0
group                    0
period                   0
Time  (min)              0
Temearture (K)           0
dtype: int64

LOF异常值检验

每个样本的异常值称为局部异常因子。它测量给定样品相对于其邻近样品密度的局部偏差。它是局部的,因为异常评分取决于物体相对于周围邻居的隔离程度。更准确地说,局部性是由 k 个最近邻居给出的,它们的距离用来估计局部密度。通过比较样本的局部密度与其邻近样本的局部密度,我们可以识别出密度大大低于其邻近样本的样本。这些被认为是异常值。

  • n_neighbors:如果 n _ neighbors大于提供的样本数,则将使用所有样本。
  • algorithm:{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}
df=data.values
df.shape
(1591, 34)
from sklearn.neighbors import LocalOutlierFactor
#10个样本一组,异常值比例为0.055(和论文差不多)
LOF=LocalOutlierFactor(n_neighbors=10,contamination=0.055)
#inliner为1,outliner为-1
lables=LOF.fit_predict(df)
#LOF检验完的数据
df=df[lables>0]
df.shape

(1503, 34)
y=df[:,0]
y.shape
(1503,)
X=df[:,1:]
X.shape
(1503, 33)

划分数据和标准化数据

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,
                                               y,
                                               test_size=0.4,
                                               random_state=0)
X_train.shape,X_test.shape
((901, 33), (602, 33))
from sklearn.preprocessing import StandardScaler
sc1=StandardScaler()
X_train=sc1.fit_transform(X_train)
X_test=sc1.transform(X_test)
sc2=StandardScaler()
y_train=sc2.fit_transform(y_train.reshape(-1,1))
y_test=sc2.transform(y_test.reshape(-1,1))
y_test_inverse=sc2.inverse_transform(y_test)#反归一化

特征选择

filter

name_list=list(data.columns[17:])#是在原列表变量上进行修改,不会返回一个新的修改后的列表
name_list.append('Hardness')
name=pd.Index
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值