
P/M/K
cozythecool
这个作者很懒,什么都没留下…
展开
-
[P/M/K] How to select specific columns
How to select specific columnsWhen dataframe comes with a variety of types, sk.select_dtypes() will be helpful.train.select_dtypes(np.int64)for i, col in enumerate(train.select_dtypes('float')...原创 2018-09-05 10:30:41 · 146 阅读 · 0 评论 -
[P/M/K] sns.countplot & df.value_counts().plot(kind='bar')
Difference between sns and matplotlib.pyplotsns.countplot(df_train.Census_OSInstallLanguageIdentifier, ax=axis1)df_train['Census_OSInstallLanguageIdentifier'].value_counts().plot(kind='bar')原创 2018-12-18 10:46:47 · 2730 阅读 · 0 评论 -
[P/M/K] simple way to downcast type to reduce memory
<class 'pandas.core.frame.DataFrame'>RangeIndex: 5 entries, 0 to 4Data columns (total 11 columns):Season_Year 5 non-null int64GameKey 5 non-null int64PlayID 5 non-null int...原创 2018-12-10 21:31:49 · 166 阅读 · 0 评论 -
[P/M/K] select specific type from dataframe
float_cols = df_temp.select_dtypes(include=['float'])int_cols = df_temp.select_dtypes(include=['int'])原创 2018-12-10 21:29:12 · 183 阅读 · 0 评论 -
[P/M/K] random in pandas.dataframe --sample
simple way :print(train['id'].sample(1))原创 2018-11-05 15:39:33 · 189 阅读 · 0 评论 -
[P/M] One-hot encoding is BAD for Boosting
One-hot encoding is not required for tree-models like RF and boostings. Here I would say categorical variable do not benefit boostings but opposite.The main idea is decision-tree based models have wa...原创 2018-10-24 14:06:45 · 114 阅读 · 0 评论 -
[P/M/D] How to change order of a dataframe
best way to put:order = ['date', 'time', 'open', 'high', 'low', 'close', 'volumefrom', 'volumeto']df = df[order]原创 2018-10-23 11:05:33 · 127 阅读 · 0 评论 -
[P/M/T]Select dataframe by multiple conditions
It’s easy to select a part of dataframe by one condition like below.pos = df_train[df_train['Date']>0]But when you are trying to add conditions like thispos = df_train[df_train['Date']>0 and...原创 2018-10-10 17:00:59 · 286 阅读 · 0 评论 -
[P/M/K] Merge different dataframes
Merge different dataframesIt’s a really confusing problem when there are a few different dataframes with correlation provided in one dataset. Now I know how to merge it together.train = train.set_in...原创 2018-10-16 18:52:41 · 114 阅读 · 0 评论 -
[P/M/K] Groupby
GroupbyIt occurs so many times that I have to record it down.dataframe before:date date_block_num shop_id item_id item_price item_cnt_day0 2013-01-02 0 59 22154 999.00 1.01 2013-01-03 0 25 2552 8...原创 2018-10-16 13:47:47 · 198 阅读 · 0 评论 -
[P/M/T]Datedelta to int
Datedelta to intThis is the only way work for me.Y = (Y / np.timedelta64(1, 'D')).astype(int)[1]: https://blog.youkuaiyun.com/xu200yang/article/details/70460592原创 2018-10-14 12:03:50 · 158 阅读 · 0 评论 -
[P/M/K]How to see missing data percentage
How to see missing data percentageSee it in textpercent = (100 * train_df.isnull().sum() / train_df.shape[0]).sort_values(ascending=False)percent[:10]trafficSource.adContent ...原创 2018-09-25 11:57:44 · 134 阅读 · 0 评论 -
[M/K]Scaling have different affection on regression or decisiontree
Scaling have different affection on regression or decisiontreeScaling is a necessary step of preprocessing,it can help eliminating the bias caused by variable with different scales. It works in SVM o...原创 2018-09-24 18:55:57 · 122 阅读 · 0 评论 -
[P/M/K]sklearn.preprocessing.LabelEncoder() & pandas.factorize
sklearn.preprocessing.LabelEncoder() & pandas.factorizeI am used to usedata.loc[:, "MSZoning"] = pd.factorize(data.MSZoning)[0]Actually what it does is exactly the same with LabelEncoder. The ...原创 2018-09-23 11:54:06 · 483 阅读 · 0 评论 -
[P/M/K] How to check NaN in df swiftly?
How to check NaN in df swiftly?To check NaN in a dataframe.train.isnull().sum()Id 0v2a1 6860hacdor 0rooms 0hacapo ...原创 2018-09-05 11:55:24 · 136 阅读 · 0 评论 -
[P/M/K] copy() & deepcopy()
copy() &amp; deepcopy()When we copy one column in a dataframe to use,we are usually talking about .deepcopy() – to take the copy as another new one. As for .copy(), it remains synchronization with...原创 2018-09-05 11:12:37 · 108 阅读 · 0 评论 -
[P/M/K] 2 way to transform from different types: .loc or mapping
2 way to transform from different type: .loc or mappingWhen we have to tranform a column from type to type,here are two ways: 1all_data.loc[all_data[&quot;edjefe&quot;]==&quot;yes&quot;,&quot;edjefe&quot原创 2018-09-05 10:37:14 · 136 阅读 · 0 评论 -
[P/M/K] measure error between 'y' and 'pred'
def rmse(y, y_pred): return np.sqrt(mean_squared_error(y, y_pred))原创 2019-07-13 15:55:59 · 161 阅读 · 0 评论