Kaggle实战-最简单的DIGIT RECOGNIZER

这篇博客介绍了Kaggle上的Digit Recognizer问题,包括数据集描述、数据预处理、特征提取以及模型选择。在特征提取阶段,作者探讨了PCA和LDA两种线性降维方法,PCA通过保留大部分信息的主成分来降低维度,而LDA则利用类别信息最大化类别间分离。最后,作者选择了PCA降维结合SVM模型进行模型选择。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Digit Recognizer from kaggle

link: https://www.kaggle.com/c/digit-recognizer

Digit Recognizer是kaggle上很基本的一道题目。

数据集描述:

The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero through nine.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.

The training data set, (train.csv), has 785 columns. The first column, called “label”, is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.

Each pixel column in the training set has a name like pixelx, where x is an integer between 0 and 783, inclusive. To locate this pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27, inclusive. Then pixelx is located on row i and column j of a 28 x 28 matrix, (indexing by zero).

首先查看下数据集

#coding = utf8
%matplotlib inline
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
def opencsv():  # open with pandas
    data = pd.read_csv('data/train.csv')
    data1 = pd.read_csv('data/test.csv')
    train_data = data.values[0:, 1:]  # 读入全部训练数据
    train_label = data.values[0:, 0]
    test_data = data1.values[0:, 0:]  # 测试全部测试个数据
    print 'Data Load Done!'
    return train_data, train_label, test_data
train_data, train_label, test_data = opencsv() 
# Train_data 中存储了训练集的784个特征,Test_data存储了测试集的784个特征,train_lable则存储了训练集的标签
# 可以看出这道题是典型的监督学习问题
Data Load Done!
import matplotlib.pyplot as plt
from numpy import *
print shape(train_data),shape(test_data) #训练集有42000个。测试集有28000个
def showPic(data):
    plt.figure(figsize=(7,7))
    # 查看前70幅图
    for digit_num in range(0,
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值