探序基因肿瘤研究院 整理
python版本的参考手册:
Welcome to velocyto.py! — velocyto 0.17.16 documentation
运行代码:
velocyto run10x [OPTIONS] SAMPLEFOLDER GTFFILE
eg: velocyto run10x -@ 6 /xxx/test-sample /xxxx/genes.gtf
参数说明:
Runs the velocity analysis for a Chromium 10X Sample
10XSAMPLEFOLDER specifies the cellranger sample folder
GTFFILE genome annotation file
Options:
-s, --metadatatable FILE Table containing metadata of the various samples (csv fortmated rows are samples and cols are entries)
-m, --mask FILE .gtf file containing intervals to mask
-l, --logic TEXT The logic to use for the filtering (default: Default)
-M, --multimap Consider not unique mappings (not reccomended)
-@, --samtools-threads INTEGER The number of threads to use to sort the bam by cellID file using samtools
--samtools-memory INTEGER The number of MB used for every thread by samtools to sort the bam file
-t, --dtype TEXT The dtype of the loom file layers - if more than 6000 molecules/reads per gene per cell are expected set uint32 to avoid truncation (default run_10x: uint16)
-d, --dump TEXT For debugging purposes only: it will dump a molecular mapping report to hdf5. --dump N, saves a cell every N cells. If p is prepended a more complete (but huge) pickle report is printed (default: 0)
-v, --verbose Set the vebosity level: -v (only warinings) -vv (warinings and info) -vvv (warinings, info and debug)
--help Show this message and exit.
关于读入所需要的文件
查看/root/miniconda3/lib/python3.12/site-packages/velocyto/commands/run10x.py文件,发现:
bamfile = os.path.join(samplefolder, "outs", "possorted_genome_bam.bam")
bcmatches = glob.glob(os.path.join(samplefolder, os.path.normcase("outs/filtered_gene_bc_matrices/*/barcodes.tsv")))
if len(bcmatches) == 0:
bcmatches = glob.glob(os.path.join(samplefolder, os.path.normcase("outs/filtered_feature_bc_matrix/barcodes.tsv.gz")))
if len(bcmatches) == 0:
logging.error("Can not locate the barcodes.tsv file!")
bcfile = bcmatches[0]
outputfolder = os.path.join(samplefolder, "velocyto")
sampleid = os.path.basename(samplefolder.rstrip("/").rstrip("\\"))
assert not os.path.exists(os.path.join(outputfolder, f"{sampleid}.loom")), "The output already exist. Aborted!"
additional_ca = {}
由此可知,需要outs文件夹下的possorted_genome_bam.bam文件,以及outs文件夹下的filtered_feature_bc_matrix文件夹下的barcodes.tsv.gz文件。
运行完后,会在 /xxx/test-sample生成一个velocyto文件夹,里面有loom文件
在R中安装velocyto.R:
library(devtools)
install_github("velocyto-team/velocyto.R")
发现报错:ERROR: dependency ‘pcaMethods’ is not available for package ‘velocyto.R’
再安装pcaMethods包:Bioconductor - pcaMethods
再接着安装velocyto.R,发现报错:
/usr/bin/ld: 找不到 -lboost_filesystem
/usr/bin/ld: 找不到 -lboost_system
collect2: 错误:ld 返回 1
于是安装boost库,去Boost C++ Libraries 下载安装包,boost_1_87_0.tar.gz,安装即可。整个过程很快也不复杂。
过程如下:
./bootstrap.sh
./b2 install --prefix=/usr/local
最后配置自己的系统环境变量vi ~/.bashrc或者系统环境变量vi /etc/profile:
export BOOST_ROOT=/usr/local
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
更新环境变量后,顺利安装好。