简介
迭代生成器工具包 itertools
更多功能请参考:http://www.wklken.me/posts/2013/08/20/python-extra-itertools.html
使用
排序:permutations
组合:combinations
# 排序:permutations
# 组合:combinations
from itertools import permutations
from itertools import combinations
col_list = ['col1', 'col2', 'col3']
print('所有排序:')
for value in permutations(col_list,2):
print(list(value))
print("所有组合:")
for value in combinations(col_list,2):
print(list(value))
结果如下:
所有排序:
['col1', 'col2']
['col1', 'col3']
['col2', 'col1']
['col2', 'col3']
['col3', 'col1']
['col3', 'col2']
所有组合:
['col1', 'col2']
['col1', 'col3']
['col2', 'col3']
for instance
计算特征之间的所有组合的相关系数,并根据阈值进行筛选想要的特征组合。
import pandas as pd
import numpy as np
from itertools import combinations
df = pd.DataFrame({'A':np.random.randint(1, 100, 10),
'B':np.random.randint(1, 100, 10),
'C':np.random.randint(1, 100, 10)})
col_list = df.columns
print("所有组合:")
list_corr = []
for value in combinations(col_list,2):
print([value[0],value[1],df[list(value)].corr().iloc[1,0]])
list_corr.append([value[0],value[1],df[list(value)].corr().iloc[1,0]])
df_list_corr = pd.DataFrame(list_corr)
df_list_corr.columns =['cols','cols2','corr']
df_list_corr_filter = df_list_corr[df_list_corr['corr']>0.2]