三星GALAXY智能手机数据分析的准备:Preparation OF Data Analysis.Data from" Samsung Galaxy S smartphone"

本文档是关于使用R语言对三星Galaxy智能手机数据进行分析的项目。主要步骤包括加载数据、合并训练和测试集、提取测量的均值和标准差、使用描述性活动名称、适当地为数据集命名变量以及创建平均值数据集。

This s my "Getting and Cleaning Data Course" Project.

目录

1.load the data in R

2.Merges the training and the test sets to create one data set.

3.Extracts only the measurements on the mean and standard deviation for each measurement.

4.Uses descriptive activity names to name the activities in the data set

5.Appropriately labels the data set with descriptive variable names.

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.


Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip


One of the most exciting areas in all of data science right now is wearable computing .Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. 

And this time ,i downloaded the file into my workdir to read the readme.txt easier . If u wanna know sth about the download process in R, u can go to : 用R获得你想要的原始数据-如何下载  to check more detail.


 

1.load the data in R

here,i download the  dataset into my wd already. If u wanna download from R coding ,and wanna know how to do it ,welcome to :how to LOAD the data .

#already set the dataset file as wd
setwd("C:/Users/zhong/Desktop/coursera/R/UCI HAR Dataset")

#load the data
train_x <- read.table("./train/X_train.txt")
train_y <- read.table("./train/y_train.txt")
train_subject <- read.table("./train/subject_train.txt")
test_x <- read.table("./test/X_test.txt")
test_y <- read.table("./test/y_test.txt")
test_subject <- read.table("./test/subject_test.txt")

 

2.Merges the training and the test sets to create one data set.

#combine the data
trainData <- cbind(train_subject, train_y, train_x)
testData <- cbind(test_subject, test_y, test_x)

#merge the train and test data
MergeData <- rbind(trainData, testData)

 

3.Extracts only the measurements on the mean and standard deviation for each measurement.

#Extract only the measurements on the mean and standard deviation for each measurement. 
##get the feature of the data
Feature <- read.table("./features.txt", stringsAsFactors = FALSE)[,2]

##add feature into the data
FeatureIndex <- grep(("mean\\(\\)|std\\(\\)"), Feature)
DATA <- MergeData[, c(1, 2, FeatureIndex+2)]
colnames(DATA) <- c("subject", "activity", Feature[FeatureIndex])


4.Uses descriptive activity names to name the activities in the data set

#Uses descriptive activity names to name the activities in the data set
## get activity name
ActivityName <- read.table("./activity_labels.txt")

##replace activity names
DATA$activity <- factor(DATA$activity, levels = ActivityName[,1], labels = ActivityName[,2])

5.Appropriately labels the data set with descriptive variable names.

#Appropriately labels the data set with descriptive variable names.

names(DATA) <- gsub("\\()", "", names(DATA))
names(DATA) <- gsub("^t", "time", names(DATA))
names(DATA) <- gsub("^f", "frequence", names(DATA))
names(DATA) <- gsub("-mean", "Mean", names(DATA))
names(DATA) <- gsub("-std", "Std", names(DATA))

 

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

#From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
library(plyr)
tidyData<-aggregate(. ~subject + activity, DATA, mean)
tidyData<-tidyData[order(tidyData$subject,tidyData$activity),]

#save the data which s clean and tidy
write.table(tidyData, file = "tidyData.txt",row.name=FALSE)

 more info. and code update :https://github.com/kidpea/Preparation-OF-Data-Analysis.Data-from-Samsung-Galaxy-S-smartphone-/blob/master/run_analysis.R

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值