
公众号“生信小课堂”

R学习往期回顾:
R学习:环境和函数mp.weixin.qq.com






今天我们学习一下最常见的字符串
字符串
文本数据存储在字符向量中(或字符数组中,虽然这较少见)。重要的是, 字符向量中的每个元素都是字符串, 而非单独的字符。
创建和打印字符串
字符向量可用 c 函数创建。我们可以用单引号或双引号把字符串引用起来, 只要引号之间匹配即可。不过, 使用双引号更为标准
c(
"You should use double quotes most of the time",
'Single quotes are better for including " inside the string'
)

paste函数能将不同字符串组合在起来。在它传入的参数向量中,每个元素都能自我循环以达到最长的矢量长度, 然后字符串就被拼接在一起, 中间以空格分开。可以使用参数sep 更改分隔符, 或使用相关的 paste0 函数去掉分隔符。所有的字符串被组合后, 可使用collapse 参数把结果收缩成一个包含所有元素的字符串
paste(c("red", "yellow"), "lorry")
## [1] "red lorry" "yellow lorry"
paste(c("red", "yellow"), "lorry", sep = "-")
## [1] "red-lorry" "yellow-lorry"
paste(c("red", "yellow"), "lorry", collapse = ", ")
## [1] "red lorry, yellow lorry"
paste0(c("red", "yellow"), "lorry")
## [1] "redlorry" "yellowlorry
toString 函数是 paste 的变种, 它在打印向量时非常有用。它使用逗号和空格分隔每个元素, 且可限制打印的数量。在下例中, width = 40 将输出限制为 40 个字符
toString(x)
## [1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225"
toString(x, width = 40)
## [1] "1, 4, 9, 16, 25, 36, 49, 64, 81, 100....
通常情况下, 当字符串打印到控制台时, 它们会以双引号括起来。如果对它们使用noquote 函数,就可以去掉这些引号。有时,这会使文本更具可读性
x <- c(
"I", "saw", "a", "saw", "that", "could", "out",
"saw", "any", "other", "saw", "I", "ever", "saw"
)
y<- noquote(x)
x #
# [1] "I" "saw" "a" "saw" "that" "could" "out" "saw"
## [9] "any" "other" "saw" "I" "ever" "saw"
y #
# [1] I saw a saw that could out saw any other saw
## [12] I ever saw
格式化数字
有几个函数可用于数字的格式化。formatC 可让你使用 C 语言的格式化风格来指定使用固定型或科学型的格式、小数的位数以及输出的宽度。无论使用哪种选项, 输入都应该是numeric 类型(包括数组), 且输出是 character 字符向量或数组
pow <- 1:3
(powers_of_e <- exp(pow))
## [1] 2.718 7.389 20.086
formatC(powers_of_e)
## [1] "2.718" "7.389" "20.09"
formatC(powers_of_e, digits = 3) # 指定三个数字
## [1] "2.72" "7.39" "20.1"
formatC(powers_of_e, digits = 3, width = 10) # 前面加上一个空格
## [1] " 2.72" " 7.39" " 20.1"
formatC(powers_of_e, digits = 3, format = "e") # 科学格式
## [1] "2.718e+00" "7.389e+00" "2.009e+01"
formatC(powers_of_e, digits = 3, flag = "+") # 前面加上 +
## [1] "+2.72" "+7.39" "+20.1"
更改大小写
使用 toupper 和 tolower 函数能把字符串中的字符全部转换为大写或小写
toupper("I'm Shouting")
## [1] "I'M SHOUTING"
tolower("I'm Whispering")
## [1] "i'm whispering
截取字符串
有两个函数可用于从字符串中截取子串:substring 和 substr。在大多数情况下, 你可以随便选一个使用。不过, 如果你传入了不同长度的向量参数, 它们的行为会略有不同。对substring 来说, 输出的长度与最长的输入一样;而对 substr 来说, 输出的长度只与第一个输入的相等:
woodchuck <- c(
"How much wood would a woodchuck chuck",
"If a woodchuck could chuck wood?",
"He would chuck, he would, as much as he could",
"And chuck as much wood as a woodchuck would",
"If a woodchuck could chuck wood."
)
substring(woodchuck, 1:6, 10)
## [1] "How much w" "f a woodc" " would c" " chuck " " woodc"
## [6] "uch w
substr(woodchuck, 1:6, 10)
## [1] "How much w" "f a woodc" " would c" " chuck " " woodc"
分割字符串
paste 及其相关函数能把字符串组合在一起。strsplit 则正好相反, 它在指定的某些点上分割字符串。我们可以把上例中的土拨鼠绕口字符串按空格分开。在下例中,fixed =TRUE 意味着 split 的参数是固定长度的字符串而非正则表达式
strsplit(woodchuck, " ", fixed = TRUE)
## [[1]]
## [[1]]
## [1] "How" "much" "wood" "would" "a" "woodchuck"
## [7] "chuck"
##
## [[2]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood?"
##
## [[3]]
## [1] "He" "would" "chuck," "he" "would," "as" "much"
## [8] "as" "he" "could"
##
## [[4]]
## [1] "And" "chuck" "as" "much" "wood" "as"
## [7] "a" "woodchuck" "would"
##
## [[5]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood."
请注意, strsplit 返回的是列表(而非字符向量或矩阵)。这是因为它的结果可能由不同长度的字符向量组成。当你只传入一个字符串时, 这种情况很容易被忽视
最好的方法是在空格分割符后加一个可选的逗号, 使用正则表达式就很容易搞定。? 意味着“前面的字符可选”
strsplit(woodchuck, ",? ")
## [[1]]
## [1] "How" "much" "wood" "would" "a" "woodchuck"
## [7] "chuck"
##
## [[2]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood?"
##
## [[3]]
## [1] "He" "would" "chuck" "he" "would" "as" "much" "as"
## [9] "he" "could"
##
## [[4]]
## [1] "And" "chuck" "as" "much" "wood" "as"
## [7] "a" "woodchuck" "would"
##
## [[5]]
## [1] "If" "a" "woodchuck" "could" "chuck" "wood."
文件路径
R 有一个工作目录, 默认为文件被读写的地方。我们可以使用 getwd 查看到它的位置, 并使用 setwd 来改变它
getwd()
## [1] "C:/Users/liu/Desktop"
setwd("c:/windows")
getwd()
## [1] "c:/windows"
每个路径的目录部分由正斜杠分隔
单基因泛癌分析链接
TCGA单基因免疫相关泛癌分析,懒人福音, 重磅来袭mp.weixin.qq.com
公众号“生信小课堂”

TCGA数据分析课程:生物信息学教学