P/M/K_cozythecool的博客-优快云博客

P/M/K

关注

关注数：文章数：18 文章阅读量：5689 文章收藏量：0

作者: cozythecool

这个作者很懒，什么都没留下…

展开

[P/M/K] How to select specific columns

How to select specific columnsWhen dataframe comes with a variety of types, sk.select_dtypes() will be helpful.train.select_dtypes(np.int64)for i, col in enumerate(train.select_dtypes('float')...

原创 2018-09-05 10:30:41 · 146 阅读 · 0 评论
[P/M/K] sns.countplot & df.value_counts().plot(kind='bar')

Difference between sns and matplotlib.pyplotsns.countplot(df_train.Census_OSInstallLanguageIdentifier, ax=axis1)df_train['Census_OSInstallLanguageIdentifier'].value_counts().plot(kind='bar')

原创 2018-12-18 10:46:47 · 2730 阅读 · 0 评论
[P/M/K] simple way to downcast type to reduce memory

<class 'pandas.core.frame.DataFrame'>RangeIndex: 5 entries, 0 to 4Data columns (total 11 columns):Season_Year 5 non-null int64GameKey 5 non-null int64PlayID 5 non-null int...

原创 2018-12-10 21:31:49 · 166 阅读 · 0 评论
[P/M/K] select specific type from dataframe

float_cols = df_temp.select_dtypes(include=['float'])int_cols = df_temp.select_dtypes(include=['int'])

原创 2018-12-10 21:29:12 · 183 阅读 · 0 评论
[P/M/K] random in pandas.dataframe --sample

simple way :print(train['id'].sample(1))

原创 2018-11-05 15:39:33 · 189 阅读 · 0 评论
[P/M] One-hot encoding is BAD for Boosting

One-hot encoding is not required for tree-models like RF and boostings. Here I would say categorical variable do not benefit boostings but opposite.The main idea is decision-tree based models have wa...

原创 2018-10-24 14:06:45 · 114 阅读 · 0 评论
[P/M/D] How to change order of a dataframe

best way to put:order = ['date', 'time', 'open', 'high', 'low', 'close', 'volumefrom', 'volumeto']df = df[order]

原创 2018-10-23 11:05:33 · 127 阅读 · 0 评论
[P/M/T]Select dataframe by multiple conditions

It’s easy to select a part of dataframe by one condition like below.pos = df_train[df_train['Date']>0]But when you are trying to add conditions like thispos = df_train[df_train['Date']>0 and...

原创 2018-10-10 17:00:59 · 286 阅读 · 0 评论
[P/M/K] Merge different dataframes

Merge different dataframesIt’s a really confusing problem when there are a few different dataframes with correlation provided in one dataset. Now I know how to merge it together.train = train.set_in...

原创 2018-10-16 18:52:41 · 114 阅读 · 0 评论
[P/M/K] Groupby

GroupbyIt occurs so many times that I have to record it down.dataframe before:date date_block_num shop_id item_id item_price item_cnt_day0 2013-01-02 0 59 22154 999.00 1.01 2013-01-03 0 25 2552 8...

原创 2018-10-16 13:47:47 · 198 阅读 · 0 评论
[P/M/T]Datedelta to int

Datedelta to intThis is the only way work for me.Y = (Y / np.timedelta64(1, 'D')).astype(int)[1]: https://blog.youkuaiyun.com/xu200yang/article/details/70460592

原创 2018-10-14 12:03:50 · 158 阅读 · 0 评论
[P/M/K]How to see missing data percentage

How to see missing data percentageSee it in textpercent = (100 * train_df.isnull().sum() / train_df.shape[0]).sort_values(ascending=False)percent[:10]trafficSource.adContent ...

原创 2018-09-25 11:57:44 · 134 阅读 · 0 评论
[M/K]Scaling have different affection on regression or decisiontree

Scaling have different affection on regression or decisiontreeScaling is a necessary step of preprocessing,it can help eliminating the bias caused by variable with different scales. It works in SVM o...

原创 2018-09-24 18:55:57 · 122 阅读 · 0 评论
[P/M/K]sklearn.preprocessing.LabelEncoder() & pandas.factorize

sklearn.preprocessing.LabelEncoder() & pandas.factorizeI am used to usedata.loc[:, "MSZoning"] = pd.factorize(data.MSZoning)[0]Actually what it does is exactly the same with LabelEncoder. The ...

原创 2018-09-23 11:54:06 · 483 阅读 · 0 评论
[P/M/K] How to check NaN in df swiftly?

How to check NaN in df swiftly?To check NaN in a dataframe.train.isnull().sum()Id 0v2a1 6860hacdor 0rooms 0hacapo ...

原创 2018-09-05 11:55:24 · 136 阅读 · 0 评论
[P/M/K] copy() & deepcopy()

copy() &amp; deepcopy()When we copy one column in a dataframe to use,we are usually talking about .deepcopy() – to take the copy as another new one. As for .copy(), it remains synchronization with...

原创 2018-09-05 11:12:37 · 108 阅读 · 0 评论
[P/M/K] 2 way to transform from different types: .loc or mapping

2 way to transform from different type: .loc or mappingWhen we have to tranform a column from type to type,here are two ways: 1all_data.loc[all_data[&quot;edjefe&quot;]==&quot;yes&quot;,&quot;edjefe&quot

原创 2018-09-05 10:37:14 · 136 阅读 · 0 评论
[P/M/K] measure error between 'y' and 'pred'

def rmse(y, y_pred): return np.sqrt(mean_squared_error(y, y_pred))

原创 2019-07-13 15:55:59 · 161 阅读 · 0 评论

P/M/K

作者: cozythecool

[P/M/K] How to select specific columns

[P/M/K] sns.countplot & df.value_counts().plot(kind='bar')

[P/M/K] simple way to downcast type to reduce memory

[P/M/K] select specific type from dataframe

[P/M/K] random in pandas.dataframe --sample

[P/M] One-hot encoding is BAD for Boosting

[P/M/D] How to change order of a dataframe

[P/M/T]Select dataframe by multiple conditions

[P/M/K] Merge different dataframes

[P/M/K] Groupby

[P/M/T]Datedelta to int

[P/M/K]How to see missing data percentage

[M/K]Scaling have different affection on regression or decisiontree

[P/M/K]sklearn.preprocessing.LabelEncoder() & pandas.factorize

[P/M/K] How to check NaN in df swiftly?

[P/M/K] copy() & deepcopy()

[P/M/K] 2 way to transform from different types: .loc or mapping

[P/M/K] measure error between 'y' and 'pred'