R语言文摘：Subsetting Data

最新推荐文章于 2025-06-08 08:15:00 发布

转载最新推荐文章于 2025-06-08 08:15:00 发布 · 311 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/chickenwrap/p/10166562.html

文章标签：

#r语言

本文详细介绍了如何使用R语言进行数据子集操作，包括选择和排除变量、选取观察值以及使用subset函数和随机抽样。通过实例展示了如何根据变量值选择观察值，以及如何轻松地从数据集中抽取随机样本。

原文地址：https://www.statmethods.net/management/subset.html

R has powerful indexing features for accessing object elements. These features can be used to select and exclude variables and observations. The following code snippets demonstrate ways to keep or delete variables and observations and to take random samples from a dataset.

Selecting (Keeping) Variables

# select variables v1, v2, v3 myvars <- c("v1", "v2", "v3") newdata <- mydata[myvars] # another method myvars <- paste("v", 1:3, sep="") newdata <- mydata[myvars] # select 1st and 5th thru 10th variables newdata <- mydata[c(1,5:10)]

To practice this interactively, try the selection of data frame elements exercises in the Data frames chapter of this introduction to R course.

Excluding (DROPPING) Variables

# exclude variables v1, v2, v3 myvars <- names(mydata) %in% c("v1", "v2", "v3") newdata <- mydata[!myvars] # exclude 3rd and 5th variable newdata <- mydata[c(-3,-5)] # delete variables v3 and v5 mydata$v3 <- mydata$v5 <- NULL

Selecting Observations

# first 5 observations newdata <- mydata[1:5,] # based on variable values newdata <- mydata[ which(mydata$gender=='F' & mydata$age > 65), ] # or attach(mydata) newdata <- mydata[ which(gender=='F' & age > 65),] detach(mydata)

Selection using the Subset Function

The subset( ) function is the easiest way to select variables and observations. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

# using subset function newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight))

In the next example, we select all men over the age of 25 and we keep variables weight through income (weight, income and all columns between them).

# using subset function (part 2) newdata <- subset(mydata, sex=="m" & age > 25, select=weight:income)

To practice the subset() function, try this this interactive exercise. on subsetting data.tables.