Tips
from tabulate import tabulate
df.decribe(include = ‘all’)
for example:
pandas.DataFrame.groupby
* DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
as_indexbool, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output.
对于聚合输出,返回带有组标签的对象作为索引。 仅与DataFrame输入有关。 as_index = False实际上是“ SQL风格”的分组输出。
sns.FaceGrid
sns.FaceGrid 提供了一种在数据集不同子集上绘制同一个图的实例方法
seaborn.FacetGrid(data, *, row=None, col=None, hue=None, col_wrap=None, sharex=True, sharey=True, height=3, aspect=1, palette=None, row_order=None, col_order=None, hue_order=None, hue_kws=None, dropna=False, legend_out=True, despine=True, margin_titles=False, xlim=None, ylim=None, subplot_kws=None, gridspec_kws=None, size=None)¶
sns.map()
sns.add_legend()
提供三个维度,row col hue
palette 用于控制hue 颜色
pd.crosstab 交叉验证表
pd.pivot_table 数据透视表
convert categorical titles to ordinal
title_mapping = {"Mr": 1, "Miss": 2, "Mrs": 3, "Master": 4, "Rare": 5}
for dataset in combine:
dataset['Title'] = dataset['Title'].map(title_mapping)
dataset['Title'] = dataset['Title'].fillna(0)
train_df.head()
- pandas.DataFrame.mode(axis =0,numeric_only = False,dropna =True)
返回沿着某个选择的轴的众数
e,g,
df = pd.DataFrame([('bird', 2, 2),
('mammal', 4, np.nan),
('arthropod', 8, 0),
('bird', 2, np.nan)],
index=('falcon', 'horse', 'spider', 'ostrich'),
columns=('species', 'legs', 'wings'))
df
species legs wings
falcon bird 2 2.0
horse mammal 4 NaN
spider arthropod 8 0.0
ostrich bird 2 NaN
===========
df.mode()
species legs wings
0 bird 2.0 0.0
1 NaN NaN 2.0
plt.cm.RdBu 即对应RdBu 的色彩映射
一种遍历文件夹中所有文件的方法
import os
for dirname,_,filenames in os.walk(os.getcwd()):
for filename in filenames:
print(os.path.join(dirname,filename))
list.append VS list.extend
x.append([2,3])
print(x)
>>> [1,2,3,[2,3]]
x = [1,2,3]
x.extend([2,3,])
print(x)
>>> [1,2,3,2,3]
from collections import Counter
from collections import Counter
一个计数器工具提供快速和方便的计数
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
... cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
sns.countplot
sns 设置调色板
sns.set_palette('Set1') # 设置调色板
plt.rcParams['figure.figsize'] = (20, 8)
plt.rcParams['figure.dpi'] = 200
plt.subplots
plt.subplots()是一个返回包含图形和轴对象的元组的函数
如果针对其中一个子图,使用plt.subplot
e.g.
1.conda 建新环境
# create a conda environment
conda create --name yourenvname python=3.6
# activate conda environment
conda activate yourenvname
# install pycaret
pip install pycaret
# create notebook kernel connected with the conda environment
python -m ipykernel install --user --name yourenvname --display-name "display-name"```
2.conda 退出虚拟环境
conda env list # 显示虚拟环境列表
conda activate python36 # 进入python36虚拟环境
conda deactivate # 退出虚拟环境
pip 使用socks5 代理
pip install pysocks
pip install pycaret[full] --proxy socks5:127.0.0.1:10808
dataframe 调整行顺序
df = df.reindex(['this is the modified order'])
print(df)
python 私有化方法
Python 3
class Site:
def __init__(self,name,url,number):
self.name = name # public
self.__url = url # private
self.number = number # public
def who(self):
print('name: %s' % self.name)
print('url: %s' % self.__url)
def __foo(self):
print('{} 大于 10 可以执行'.format(self.number))
print('这是私有方法')
def foo(self): # 公共方法
print('这是公共方法')
if self.number > 10: # 可以在类内部调用私有方法
self.__foo() #
x = Site('老马的程序人生', 'https://blog.youkuaiyun.com/LSGO_MYP',11)
x.who()
#name: 老马的程序人生
#url: https://blog.youkuaiyun.com/LSGO_MYP
x.foo()
#这是公共方法
#11 大于 10 可以执行
#这是私有方法
x.__foo()
---------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-6b29c9f2e0b8> in <module>
----> 1 x.__foo()
AttributeError: 'Site' object has no attribute '__foo'