js截取第一个逗号之前的字符串_R学习:字符串

本文介绍了R语言中字符串的基本操作,包括创建和打印字符串、格式化数字、更改大小写、截取字符串以及分割字符串。重点讲解了如何使用`substring`和`substr`函数截取字符串,以及`strsplit`函数按特定分隔符分割字符串。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

83f98b5711121c247fa3fba191ebff8c.png

公众号“生信小课堂”

6b7d5c96c9e2bf7350c39552eb220a18.png

R学习往期回顾:

R学习:环境和函数​mp.weixin.qq.com
c9bd697cdbc3bf261de8579de820843e.png
R学习:数据框的基本操作​mp.weixin.qq.com
33ebac6d2386ce2e33a5b8eb5070a1a9.png
R学习:R for Data Science(五)​mp.weixin.qq.com
c68855547832e04e161e4af9ea49a561.png
R学习:R for Data Science(四)​mp.weixin.qq.com
18dae14275bfa2812642b784aa340504.png
R学习:R for Data Science(三)​mp.weixin.qq.com
1611b60df7acd6e247dff39f09c483da.png
R学习:R for Data Science(二)​mp.weixin.qq.com
609af731a2884a966328b65c369fee81.png
R学习:R for Data Science(一)​mp.weixin.qq.com
1ae696b2418fbad516debeb1a1ad8dd7.png

今天我们学习一下最常见的字符串

字符串

文本数据存储在字符向量中(或字符数组中,虽然这较少见)。重要的是, 字符向量中的每个元素都是字符串, 而非单独的字符。

创建和打印字符串

字符向量可用 c 函数创建。我们可以用单引号或双引号把字符串引用起来, 只要引号之间匹配即可。不过, 使用双引号更为标准

c(
  "You should use double quotes most of the time",
  'Single quotes are better for including " inside the string'
)

fc616974776b39a74bbe637fb2b90e64.png

paste函数能将不同字符串组合在起来。在它传入的参数向量中,每个元素都能自我循环以达到最长的矢量长度, 然后字符串就被拼接在一起, 中间以空格分开。可以使用参数sep 更改分隔符, 或使用相关的 paste0 函数去掉分隔符。所有的字符串被组合后, 可使用collapse 参数把结果收缩成一个包含所有元素的字符串

paste(c("red", "yellow"), "lorry")
## [1] "red lorry" "yellow lorry"
paste(c("red", "yellow"), "lorry", sep = "-")
## [1] "red-lorry" "yellow-lorry"
paste(c("red", "yellow"), "lorry", collapse = ", ")
## [1] "red lorry, yellow lorry"
paste0(c("red", "yellow"), "lorry")
## [1] "redlorry" "yellowlorry

toString 函数是 paste 的变种, 它在打印向量时非常有用。它使用逗号和空格分隔每个元素, 且可限制打印的数量。在下例中, width = 40 将输出限制为 40 个字符

toString(x)
## [1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225"
toString(x, width = 40)
## [1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100....

通常情况下, 当字符串打印到控制台时, 它们会以双引号括起来。如果对它们使用noquote 函数,就可以去掉这些引号。有时,这会使文本更具可读性

x <- c(
  "I", "saw", "a", "saw", "that", "could", "out",
  "saw", "any", "other", "saw", "I", "ever", "saw"
) 
y<- noquote(x)
x #
# [1] "I" "saw" "a" "saw" "that" "could" "out" "saw"
## [9] "any" "other" "saw" "I" "ever" "saw"
y #
# [1] I saw a saw that could out saw any other saw
## [12] I ever saw

格式化数字

有几个函数可用于数字的格式化。formatC 可让你使用 C 语言的格式化风格来指定使用固定型或科学型的格式、小数的位数以及输出的宽度。无论使用哪种选项, 输入都应该是numeric 类型(包括数组), 且输出是 character 字符向量或数组

pow <- 1:3
(powers_of_e <- exp(pow))
## [1] 2.718 7.389 20.086
formatC(powers_of_e)
## [1] "2.718" "7.389" "20.09"
formatC(powers_of_e, digits = 3) # 指定三个数字
## [1] "2.72" "7.39" "20.1"
formatC(powers_of_e, digits = 3, width = 10) # 前面加上一个空格
## [1] " 2.72" " 7.39" " 20.1"
formatC(powers_of_e, digits = 3, format = "e") # 科学格式
## [1] "2.718e+00" "7.389e+00" "2.009e+01"
formatC(powers_of_e, digits = 3, flag = "+") # 前面加上 +
## [1] "+2.72" "+7.39" "+20.1"

更改大小写

使用 toupper tolower 函数能把字符串中的字符全部转换为大写或小写

toupper("I'm Shouting")
## [1] "I'M SHOUTING"
tolower("I'm Whispering")
## [1] "i'm whispering

截取字符串

有两个函数可用于从字符串中截取子串:substringsubstr。在大多数情况下, 你可以随便选一个使用。不过, 如果你传入了不同长度的向量参数, 它们的行为会略有不同。对substring 来说, 输出的长度与最长的输入一样;而对 substr 来说, 输出的长度只与第一个输入的相等:

woodchuck <- c(
  "How much wood would a woodchuck chuck",
  "If a woodchuck could chuck wood?",
  "He would chuck, he would, as much as he could",
  "And chuck as much wood as a woodchuck would",
  "If a woodchuck could chuck wood."
)
substring(woodchuck, 1:6, 10)

## [1] "How much w" "f a woodc" " would c" " chuck " " woodc"
## [6] "uch w

substr(woodchuck, 1:6, 10)
## [1] "How much w" "f a woodc" " would c" " chuck " " woodc"

分割字符串

paste 及其相关函数能把字符串组合在一起。strsplit 则正好相反, 它在指定的某些点上分割字符串。我们可以把上例中的土拨鼠绕口字符串按空格分开。在下例中,fixed =TRUE 意味着 split 的参数是固定长度的字符串而非正则表达式

strsplit(woodchuck, " ", fixed = TRUE)
## [[1]]
## [[1]]
## [1] "How" "much" "wood" "would" "a" "woodchuck"
## [7] "chuck"
##
## [[2]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood?"
##
## [[3]]
## [1] "He" "would" "chuck," "he" "would," "as" "much"
## [8] "as" "he" "could"
##
## [[4]]
## [1] "And" "chuck" "as" "much" "wood" "as"
## [7] "a" "woodchuck" "would"
##
## [[5]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood."

请注意, strsplit 返回的是列表(而非字符向量或矩阵)。这是因为它的结果可能由不同长度的字符向量组成。当你只传入一个字符串时, 这种情况很容易被忽视

最好的方法是在空格分割符后加一个可选的逗号, 使用正则表达式就很容易搞定。? 意味着“前面的字符可选”

strsplit(woodchuck, ",? ")
## [[1]]
## [1] "How" "much" "wood" "would" "a" "woodchuck"
## [7] "chuck"
##
## [[2]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood?"
##
## [[3]]
## [1] "He" "would" "chuck" "he" "would" "as" "much" "as"
## [9] "he" "could"
##
## [[4]]
## [1] "And" "chuck" "as" "much" "wood" "as"
## [7] "a" "woodchuck" "would"
##
## [[5]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood."

文件路径

R 有一个工作目录, 默认为文件被读写的地方。我们可以使用 getwd 查看到它的位置, 并使用 setwd 来改变它

getwd()
## [1] "C:/Users/liu/Desktop"
setwd("c:/windows")
getwd()
## [1] "c:/windows"

每个路径的目录部分由正斜杠分隔

单基因泛癌分析链接

TCGA单基因免疫相关泛癌分析,懒人福音, 重磅来袭​mp.weixin.qq.com
9b338c8dff111d8c5a11dd1086e94e0a.png

公众号“生信小课堂”

6b7d5c96c9e2bf7350c39552eb220a18.png

TCGA数据分析课程:生物信息学教学

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值