更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python
Datacamp track: Data Scientist with Python - Course 21 (1)
Exercise
k-Nearest Neighbors: Fit
Having explored the Congressional voting records dataset, it is time now to build your first classifier. In this exercise, you will fit a k-Nearest Neighbors classifier to the voting dataset, which has once again been pre-loaded for you into a DataFrame df
.
In the video, Hugo discussed the importance of ensuring your data adheres to the format required by the scikit-learn API. The features need to be in an array where each column is a feature and each row a different observation or data point - in this case, a Congressman’s voting record. The target needs to be a single column with the same number of observations as the feature data. We have done this for you in this exercise. Notice we named the feature array X
and response variable y
: This is in accordance with the common scikit-learn practice.
Your job is to create an instance of a k-NN classifier with 6 neighbors (by specifying the n_neighbors
parameter) and then fit it to the data. The data has been pre-loaded into a DataFrame called df
.
Instruction
- Import
KNeighborsClassifier
fromsklearn.neighbors
. - Create arrays
X
andy
for the features and the target variable. Here this has been done for you. Note the use of.drop()
to drop the target variable'party'
from the feature arrayX
as well as the use of the.values
attribute to ensureX
andy
are NumPy arrays. Without using.values
,X
andy
are a DataFrame and Series respectively; the scikit-learn API will accept them in this form also as long as they are of the right shape. - Instantiate a
KNeighborsClassifier
calledknn
with6
neighbors by specifying then_neighbors
parameter. - Fit the classifier to the data using the
.fit()
method.
import pandas as pd
df = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_1939/datasets/votes-ch1.csv')
# Import KNeighborsClassifier from sklearn.neighbors
from sklearn.neighbors import KNeighborsClassifier
# Create arrays for the features and the response variable
y = df['party'].values
X = df.drop('party', axis=1).values
# Create a k-NN classifier with 6 neighbors
knn = KNeighborsClassifier(n_neighbors=6)
# Fit the classifier to the data
knn.fit(X,y)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=6, p=2,
weights='uniform')
Exercise
k-Nearest Neighbors: Predict
Having fit a k-NN classifier, you can now use it to predict the label of a new data point. However, there is no unlabeled data available since all of it was used to fit the model! You can still use the .predict()
method on the X
that was used to fit the model, but it