目的:有下三角矩阵文本文件,怎么读取到R中?
貌似有R包,
- https://search.r-project.org/CRAN/refmans/umx/html/umx_read_lower.html
- https://github.com/tbates/umx/blob/master/R/misc_and_utility.R
不过没有过,本文自己写了一个读取函数。
1. 两个测试文件:下三角矩阵
(1). test.data.txt 带行名
a1 1
a2 2 5
a3 5 6 8
a4 2 3 5 6
(2). test2.data.txt 纯数字
8
1 1
2 2 5
3 5 6 8
4 2 3 5 6
2. 读取函数v1
#' 读取下三角矩阵文件,并自动填充为对称矩阵
#' Title
#'
#' @param fileName
#' @param withRownames
#'
#' @return
#' @export
#'
#' @examples
#' read_lower_triangle("dustbin/test.data.txt", T)
#' read_lower_triangle("dustbin/test2.data.txt")
read_lower_triangle=function(fileName, withRownames=F){
# step1: load file
con <- file(fileName, "r")
#length( scan(file="dustbin/test.data.txt"))
text.list=list()
line=readLines(con,n=1)
i=0
while( length(line) != 0 ) {
i=i+1
text.list[[i]]=line;
line=readLines(con,n=1)
}
close(con)
length(text.list)
text.list
# matrix
mat=matrix( nrow=i, ncol=i)
mat
# step2: get matrix elements
t1 = lapply(text.list, function(x, withRownames){
arr=strsplit(x, " ")[[1]]
n=length(arr)
# to array
if(withRownames){
arr=arr[2:n];
n=n-1;
}
arr2=as.numeric(arr)
# stuff into matrix
for(i in 1:n){
mat[n, i] <<- arr2[i]
}
return(NULL) #不要返回值
}, withRownames)
# step3: get the upper from lower
for(i in 1:nrow(mat)){
for(j in 1:ncol(mat)){
sprintf("i=%d, j=%d", i,j)
if(i<j){
mat[i,j]=mat[j,i];
}
}
}
# step4: get rownames
if(withRownames){
rownames(mat)=do.call(c, lapply(text.list, function(x){
strsplit(x, " ")[[1]][1]
}))
}
return(mat)
}
测试效果:
> read_lower_triangle("dustbin/test.data.txt", T)
[,1] [,2] [,3] [,4]
a1 1 2 5 2
a2 2 5 6 3
a3 5 6 8 5
a4 2 3 5 6
> read_lower_triangle("dustbin/test2.data.txt")
[,1] [,2] [,3] [,4] [,5]
[1,] 8 1 2 3 4
[2,] 1 1 2 5 2
[3,] 2 2 5 6 3
[4,] 3 5 6 8 5
[5,] 4 2 3 5 6
3. 读取函数 v2
- https://bbs.pinggu.org/thread-3077486-1-1.html
read.table("dustbin/test2.data.txt", sep = " ", fill=T)
可以读取 纯数字 下三角矩阵R + r(R) - diag(R)
可以填充空白部分。当然,语法还需要符合R语言。
#' 仅适用于 纯数字 下三角矩阵
#'
#' @param fileName
#' @param withRownames
#'
#' @return
#' @export
#'
#' @examples
#' read_lower_triangle_v2("dustbin/test.data.txt", T)
#' read_lower_triangle_v2("dustbin/test2.data.txt")
read_lower_triangle_v2=function(fileName, withRownames=F){
if(!withRownames){ #no rownames
message("new method")
mat=read.table(fileName, sep = " ", fill=T);
mat[is.na(mat)] <- 0;
mat=as.matrix(mat);
mat=mat + t(mat) - diag( diag(mat) )
}else{
mat=read_lower_triangle(fileName, withRownames) #with rownames
}
return(mat);
}
不是纯数字的,带有行名的,还是按照上一版本读取。
测试效果同上:
> read_lower_triangle_v2("dustbin/test.data.txt", T)
[,1] [,2] [,3] [,4]
a1 1 2 5 2
a2 2 5 6 3
a3 5 6 8 5
a4 2 3 5 6
> read_lower_triangle_v2("dustbin/test2.data.txt")
new method
V1 V2 V3 V4 V5
[1,] 8 1 2 3 4
[2,] 1 1 2 5 2
[3,] 2 2 5 6 3
[4,] 3 5 6 8 5
[5,] 4 2 3 5 6
== End ==