探序基因肿瘤研究院 整理
1. 将单细胞转录组的python数据结构的矩阵输出成txt文件:
单细胞转录组的python数据结构用h5ad格式的保存
import scanpy as sc
import numpy as np
import sys
h5adfile = sys.argv[1]
workdir = sys.argv[2]
adata = sc.read_h5ad(h5adfile)
matrix = adata.X.todense()
gene_names = np.array(adata.var_names)
cell_name = np.array(adata.obs_names)
matfile = "%s/expression_matrix.csv" % (workdir)
genefile = "%s/gene.txt" % (workdir)
ctfile = "%s/cellname.txt" % (workdir)
np.savetxt(matfile, matrix, delimiter='\t',fmt='%6.2f')
with open(genefile, 'w') as file:
for i in gene_names:
file.write(i + '\n')
with open(ctfile, 'w') as file:
for i in cell_types:
file.write(i + '\n')
注意:fmt='%6.2f' 表示只保留小数点后两位
2. 将txt文件读取到R语言中变成矩阵形式
Mat <- read.table("expression_matrix.csv",sep="\t") #注意矩阵的行和列的含义。一般python的是行为细胞,列为基因
gene <- readLines(con="/xxx/genename.txt")
cell <- readLines(con="/xxx/celltype.txt")
Mat <- apply(Mat,2,as.numeric)
Mat <- as(Mat,"dgCMatrix") #转换成稀疏矩阵的形式
rownames(Mat) <- cell
colnames(Mat) <- gene
Mat <- t(Mat)
得到矩阵Mat后,按照seurat包的流程,创建seurat数据结构进行后续分析就可以了。
3. 将R中的基因表达矩阵输出成txt
write.table(GCMat,file="/xxx/xxx.GCMat.txt",row.names = TRUE,col.names = TRUE,sep = "\t",quote = FALSE)