
Book-feature engineer for ML
文章平均质量分 92
Sarah ฅʕ•̫͡•ʔฅ
勿忘初心
展开
-
chapter 2:numeric feature engineering
dealing with counts Binarization 举例说明: 假设我们现在要构建一个music recommender,recommender可根据user的“listen count”来向其他user推荐歌曲。在recommender中有一弊病,即listen count次数并不一定能真实反应user对歌曲的喜爱程度,比如:user1听song1 10 ,user2听song2...原创 2018-11-14 11:50:53 · 161 阅读 · 0 评论 -
chapter 3:feature engineering for text
Bag of X : turning natural text into flat vectors Bag of words Bag of words是一种简单的document表示方法,他将corpas中所有word组成一个dictionary。document可表示为:dictionary中各个word的one-hot vector set。 Bag of words将sentence简单的...原创 2018-11-14 19:31:43 · 212 阅读 · 0 评论 -
Chapter 4: The Effects of Feature Scaling: From Bag-of-Words to Tf-Idf
Tf-Idf 要点一 bow(w, d) = # times word w appears in document d tf-idf(w, d) = bow(w, d) * log (N / # documents in which word w appears) N is the total number of documents in the dataset. 当word为一常用词时,N /...原创 2018-11-14 22:36:02 · 87 阅读 · 0 评论 -
Chapter 5:Catergorical variable: Counting Eggs in the Age of Robotic Chickens
Encoding categorical variables 举例说明,one-hot,dummy code,effect code,3种编码方式。 one-hot vector 假如现有k个category,则各个category的feature vector 为k 维。 上述例子中,e1 + e2 + e3 =1,3个feature存在linear dependence,因此,用one-h...原创 2018-11-16 19:05:03 · 407 阅读 · 0 评论 -
Chapter 6: Dimensionality Reduction: Squashing the Data Pancake with PCA
PCA原理 PCA PCA实施注意事项 注意事项 1)PCA主要解决feature 之间 linear dependency的问题; 2)PCA核心思想:maximize the variance of data point in the new feature space; 3)利用PCA之前,要先对data进行“去中心化”。 hyperparameter k(principle compon...原创 2018-11-16 21:53:36 · 128 阅读 · 0 评论 -
Chapter 7: Nonlinear Featurization via K-Means Model Stacking
Confused: If the data is distributed uniformly throughout the space, then picking the right k boils down to a sphere-packing problem. In d dimensions, one could fit roughly 1/rd spheres of radius r. E...原创 2018-11-16 22:58:41 · 161 阅读 · 0 评论 -
Chapter 8:Automating the Featurizer: Image Feature Extraction and Deep Learning
SIFT特征详解: SIFT特征匹配算法介绍——寻找图像特征点的原理 SIFT特征点提取 SIFT特征详解 HOG详解: Histogram of Oriented Gridients(HOG) 方向梯度直方图 HOG特征(Histogram of Gradient)学习总结 ...原创 2018-11-17 23:32:02 · 177 阅读 · 0 评论