自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(19)
  • 收藏
  • 关注

原创 [P/M/K] measure error between 'y' and 'pred'

def rmse(y, y_pred): return np.sqrt(mean_squared_error(y, y_pred))

2019-07-13 15:55:59 186

原创 [P/M/K] sns.countplot & df.value_counts().plot(kind='bar')

Difference between sns and matplotlib.pyplot sns.countplot(df_train.Census_OSInstallLanguageIdentifier, ax=axis1) df_train['Census_OSInstallLanguageIdentifier'].value_counts().plot(kind='bar')

2018-12-18 10:46:47 2769

原创 [P/M/K] simple way to downcast type to reduce memory

<class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 11 columns): Season_Year 5 non-null int64 GameKey 5 non-null int64 PlayID 5 non-null int...

2018-12-10 21:31:49 198

原创 [P/M/K] select specific type from dataframe

float_cols = df_temp.select_dtypes(include=['float']) int_cols = df_temp.select_dtypes(include=['int'])

2018-12-10 21:29:12 204

原创 [P/M/K] random in pandas.dataframe --sample

simple way : print(train['id'].sample(1))

2018-11-05 15:39:33 207

原创 [P/M] One-hot encoding is BAD for Boosting

One-hot encoding is not required for tree-models like RF and boostings. Here I would say categorical variable do not benefit boostings but opposite. The main idea is decision-tree based models have wa...

2018-10-24 14:06:45 130

原创 [P/M/D] How to change order of a dataframe

best way to put: order = ['date', 'time', 'open', 'high', 'low', 'close', 'volumefrom', 'volumeto'] df = df[order]

2018-10-23 11:05:33 161

原创 [P/M/K] Merge different dataframes

Merge different dataframes It’s a really confusing problem when there are a few different dataframes with correlation provided in one dataset. Now I know how to merge it together. train = train.set_in...

2018-10-16 18:52:41 144

原创 [P/M/K] Groupby

Groupby It occurs so many times that I have to record it down. dataframe before: date date_block_num shop_id item_id item_price item_cnt_day 0 2013-01-02 0 59 22154 999.00 1.0 1 2013-01-03 0 25 2552 8...

2018-10-16 13:47:47 224

原创 [P/M/T]Datedelta to int

Datedelta to int This is the only way work for me. Y = (Y / np.timedelta64(1, 'D')).astype(int) [1]: https://blog.youkuaiyun.com/xu200yang/article/details/70460592

2018-10-14 12:03:50 178

原创 [P/M/T]Select dataframe by multiple conditions

It’s easy to select a part of dataframe by one condition like below. pos = df_train[df_train['Date']>0] But when you are trying to add conditions like this pos = df_train[df_train['Date']>0 and...

2018-10-10 17:00:59 308

原创 [P/M/K]How to see missing data percentage

How to see missing data percentage See it in text percent = (100 * train_df.isnull().sum() / train_df.shape[0]).sort_values(ascending=False) percent[:10] trafficSource.adContent ...

2018-09-25 11:57:44 151

原创 [M/K]Scaling have different affection on regression or decisiontree

Scaling have different affection on regression or decisiontree Scaling is a necessary step of preprocessing,it can help eliminating the bias caused by variable with different scales. It works in SVM o...

2018-09-24 18:55:57 141

原创 [P/M/K]sklearn.preprocessing.LabelEncoder() & pandas.factorize

sklearn.preprocessing.LabelEncoder() & pandas.factorize I am used to use data.loc[:, "MSZoning"] = pd.factorize(data.MSZoning)[0] Actually what it does is exactly the same with LabelEncoder. The ...

2018-09-23 11:54:06 519

原创 [P/M/K] How to see correlation when variables are more or less

How to see correlation when variables are small or large Sometimes when we are doing prediction,we have to see the correlation between target and other variables. When the variables are not too many,...

2018-09-05 16:43:31 123

原创 [P/M/K] How to check NaN in df swiftly?

How to check NaN in df swiftly? To check NaN in a dataframe. train.isnull().sum() Id 0 v2a1 6860 hacdor 0 rooms 0 hacapo ...

2018-09-05 11:55:24 153

原创 [P/M/K] copy() & deepcopy()

copy() & deepcopy() When we copy one column in a dataframe to use,we are usually talking about .deepcopy() – to take the copy as another new one. As for .copy(), it remains synchronization with...

2018-09-05 11:12:37 128

原创 [P/M/K] 2 way to transform from different types: .loc or mapping

2 way to transform from different type: .loc or mapping When we have to tranform a column from type to type,here are two ways: 1 all_data.loc[all_data["edjefe"]=="yes","edjefe&quot

2018-09-05 10:37:14 153

原创 [P/M/K] How to select specific columns

How to select specific columns When dataframe comes with a variety of types, sk.select_dtypes() will be helpful. train.select_dtypes(np.int64) for i, col in enumerate(train.select_dtypes('float')...

2018-09-05 10:30:41 165

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除