三星GALAXY智能手机数据分析的准备：Preparation OF Data Analysis.Data from" Samsung Galaxy S smartphone"

kidpea_lau

于 2018-09-21 14:19:03 发布

阅读量591

点赞数 3

CC 4.0 BY-SA版权

分类专栏： R语言文章标签： R

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.youkuaiyun.com/kidpea_lau/article/details/82792969

R语言专栏收录该内容

25 篇文章

订阅专栏

本文档是关于使用R语言对三星Galaxy智能手机数据进行分析的项目。主要步骤包括加载数据、合并训练和测试集、提取测量的均值和标准差、使用描述性活动名称、适当地为数据集命名变量以及创建平均值数据集。

This s my "Getting and Cleaning Data Course" Project.

目录

1.load the data in R

2.Merges the training and the test sets to create one data set.

3.Extracts only the measurements on the mean and standard deviation for each measurement.

4.Uses descriptive activity names to name the activities in the data set

5.Appropriately labels the data set with descriptive variable names.

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

One of the most exciting areas in all of data science right now is wearable computing .Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone.

And this time ,i downloaded the file into my workdir to read the readme.txt easier . If u wanna know sth about the download process in R, u can go to : 用R获得你想要的原始数据-如何下载 to check more detail.

1.load the data in R

here，i download the dataset into my wd already. If u wanna download from R coding ,and wanna know how to do it ,welcome to :how to LOAD the data .

#already set the dataset file as wd
setwd("C:/Users/zhong/Desktop/coursera/R/UCI HAR Dataset")

#load the data
train_x <- read.table("./train/X_train.txt")
train_y <- read.table("./train/y_train.txt")
train_subject <- read.table("./train/subject_train.txt")
test_x <- read.table("./test/X_test.txt")
test_y <- read.table("./test/y_test.txt")
test_subject <- read.table("./test/subject_test.txt")

2.Merges the training and the test sets to create one data set.

#combine the data
trainData <- cbind(train_subject, train_y, train_x)
testData <- cbind(test_subject, test_y, test_x)

#merge the train and test data
MergeData <- rbind(trainData, testData)

3.Extracts only the measurements on the mean and standard deviation for each measurement.

#Extract only the measurements on the mean and standard deviation for each measurement. 
##get the feature of the data
Feature <- read.table("./features.txt", stringsAsFactors = FALSE)[,2]

##add feature into the data
FeatureIndex <- grep(("mean\\(\\)|std\\(\\)"), Feature)
DATA <- MergeData[, c(1, 2, FeatureIndex+2)]
colnames(DATA) <- c("subject", "activity", Feature[FeatureIndex])

4.Uses descriptive activity names to name the activities in the data set

#Uses descriptive activity names to name the activities in the data set
## get activity name
ActivityName <- read.table("./activity_labels.txt")

##replace activity names
DATA$activity <- factor(DATA$activity, levels = ActivityName[,1], labels = ActivityName[,2])

5.Appropriately labels the data set with descriptive variable names.

#Appropriately labels the data set with descriptive variable names.

names(DATA) <- gsub("\\()", "", names(DATA))
names(DATA) <- gsub("^t", "time", names(DATA))
names(DATA) <- gsub("^f", "frequence", names(DATA))
names(DATA) <- gsub("-mean", "Mean", names(DATA))
names(DATA) <- gsub("-std", "Std", names(DATA))

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

#From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
library(plyr)
tidyData<-aggregate(. ~subject + activity, DATA, mean)
tidyData<-tidyData[order(tidyData$subject,tidyData$activity),]

#save the data which s clean and tidy
write.table(tidyData, file = "tidyData.txt",row.name=FALSE)

more info. and code update ：https://github.com/kidpea/Preparation-OF-Data-Analysis.Data-from-Samsung-Galaxy-S-smartphone-/blob/master/run_analysis.R

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。