1. 输入3列
只要前三列,第一列是行名,第二列是列名,第三列为值。
> head(df.net2.order)
from to strength type
12439 CSTF2 ENST0000056844 -0.6859788 neg
12015 CSTF2 ENST0000056190 -0.5153181 neg
11208 CSTF2 GAPDH -0.4570489 neg
2. 输出数据框
行为基因调控因子,列为基因表达,值为相关系数。
> df.net2.df=df3toMatrix(df.net2.order)
> dim(df.net2.df)
[1] 27 4022
> df.net2.df[df.net2.df==0]=NA
> df.net2.df[1:4,1:5]
ENST0000056844 ENST0000056190 GAPDH ENST0000063431 ARL6
CSTF2 -0.6859788 -0.5153181 -0.4570489 -0.4380417 -0.4351847
NUDT21 NA -0.4719560 -0.4080007 NA -0.4125685
CPSF3 -0.4883905 -0.3955025 -0.4318929 NA -0.4517824
CPSF1 NA -0.3722944 -0.3625508 NA -0.3016818
3. 转换函数(很慢)
# from 3 columns to matrix: col1-row, col2-col, col2-value
df3toMatrix=function(df3){
rows.id=df3[,1] |> unique()
cols.id=df3[,2] |> unique()
output=data.frame(matrix(0, nrow=length(rows.id), ncol=length(cols.id)))
rownames(output)=rows.id
colnames(output)=cols.id
for(i in 1:nrow(df3)){
output[df3[i, 1], df3[i, 2]]=df3[i,3]
}
print(dim(output))
output
}
4. 转换函数 (使用稀疏矩阵Matrix::sparseMatrix,特快)
参考: https://blog.youkuaiyun.com/wangjunliang/article/details/126709378
#' From 3 columns to matrix: col1-row, col2-col, col3-value
#'
#' V2: use sparse matrix to be faster
#'
#' @param df3 a data frame with 3 colums, and 3rd col must be value
#'
#' @return a sparse matrix with rownames(col1) and colnames(col2)
#' @export
#'
#' @examples
df3toMatrix2=function(df3){
rows.id=df3[,1] |> unique() #row
cols.id=df3[,2] |> unique() #col
output = Matrix::sparseMatrix(i=match(df3[,1], rows.id),
j=match(df3[,2], cols.id),
x=df3[,3])
rownames(output)=rows.id
colnames(output)=cols.id
print(dim(output))
output
}
5. 性能比较:使用Matrix包稀疏矩阵后速度飞快
输入矩阵:
> head(all.eqtl_sig[, c(2, 1, 6)])
Exp APA beta
1 ANGPTL7 SLC35D2 -0.9441904
2 RALYL CD52 -1.3428530
3 GABRA4 SLC35D2 -1.1038045
5 CPNE7 THAP7-AS1 1.4633570
6 SYT3 UBE3D -0.8403139
10 SYT3 ZNF140 -0.7900230
> dim(all.eqtl_sig[, c(2, 1, 6)])
[1] 450933 3
3和4实现函数,耗时差了100倍。
system.time({
dat.htmap0 = df3toMatrix(all.eqtl_sig[, c(2, 1, 6)])
})
#[1] 12651 3492
# user system elapsed
#100.692 6.903 107.756
system.time({
dat.htmap = df3toMatrix2(all.eqtl_sig[, c(2, 1, 6)])
})
#[1] 12651 3492
# user system elapsed
# 0.075 0.005 0.080
输出结果完全一样:
> dat.htmap0[1:4, 1:5]
SLC35D2 CD52 THAP7-AS1 UBE3D ZNF140
ANGPTL7 -0.9441904 0.000000 0.000000 0 0
RALYL 0.0000000 -1.342853 0.000000 0 0
GABRA4 -1.1038045 0.000000 0.000000 0 0
CPNE7 0.0000000 0.000000 1.463357 0 0
> dat.htmap[1:4, 1:5]
4 x 5 sparse Matrix of class "dgCMatrix"
SLC35D2 CD52 THAP7-AS1 UBE3D ZNF140
ANGPTL7 -0.9441904 . . . .
RALYL . -1.342853 . . .
GABRA4 -1.1038045 . . . .
CPNE7 . . 1.463357 . .
> table(abs(dat.htmap0 - dat.htmap0 ) < 1e-10)
TRUE
44177292
ref
- R语言稀疏矩阵详解 https://blog.youkuaiyun.com/jeffery0207/article/details/122507934