Datacamp 笔记&代码 Unsupervised Learning in Python 第三章 Decorrelating your data and dimension reduction

通过PCA对谷物尺寸数据进行去相关分析,发现宽度和长度高度相关。使用PCA进一步降低鱼测量数据的维度,观察主成分变化。研究PCA特征的方差,确定鱼样本的内在维度为2。接着将PCA应用于鱼的测量数据,保留最重要的两个成分。构建tf-idf词频矩阵,对宠物主题的文档进行预处理。最后,利用TruncatedSVD和KMeans对Wikipedia文章进行聚类,探索知识的组织结构。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python

Datacamp track: Data Scientist with Python - Course 23 (3)

Exercise

Correlated data in nature

You are given an array grains giving the width and length of samples of grain. You suspect that width and length will be correlated. To confirm this, make a scatter plot of width vs length and measure their Pearson correlation.

Instruction

  • Import:
    • matplotlib.pyplot as plt.
    • pearsonr from scipy.stats.
  • Assign column 0 of grains to width and column 1 of grains to length.
  • Make a scatter plot with width on the x-axis and lengthon the y-axis.
  • Use the pearsonr() function to calculate the Pearson correlation of width and length.
import pandas as pd

grains = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2141/datasets/seeds-width-vs-length.csv', header=None).values

# Perform the necessary imports
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

# Assign the 0th column of grains: width
width = grains[:,0]

# Assign the 1st column of grains: length
length = grains[:,1]

# Scatter plot width vs length
plt.scatter(width, length)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation
correlation, pvalue = pearsonr(width, length)

# Display the correlation
print(correlation)

[外链图片转存失败(img-fyXaqT3p-1564520846485)(output_2_0.png)]

0.8604149377143467

Exercise

Decorrelating the grain measurements with PCA

You observed in the previous exercise that the width and length measurements of the grain are correlated. Now, you’ll use PCA to decorrelate these measurements, then plot the decorrelated points and measure their Pearson correlation.

Instruction

  • Import PCA from sklearn.decomposition.
  • Create an instance of PCA called model.
  • Use the .fit_transform() method of model to apply the PCA transformation to grains. Assign the result to pca_features.
  • The subsequent code to extract, plot, and compute the Pearson correlation of the first two columns pca_features has been written for you, so hit ‘Submit Answer’ to see the result!
# Import PCA
from sklearn.decomposition import PCA

# Create PCA instance: model
model = PCA()

# Apply the fit_transform method of model to grains: pca_features
pca_features = model.fit_transform(grains)

# Assign 0th column of pca_features: xs
xs = pca_features[:,0]

# Assign 1st column of pca_features: ys
ys = pca_features[:,1]

# Scatter plot xs vs ys
plt.scatter(xs, ys)
plt.axis('equal')
plt.show()

# Calculate the Pearson correlation of xs and ys
correlation, pvalue = pearsonr(xs, ys)

# Display the correlation
print(correlation)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值