python sci数据_scanpy学习笔记:用Python分析单细胞数据-优快云博客

本文介绍了如何使用Python的Scanpy库分析单细胞数据，包括安装、载入数据、预处理、数据可视化、聚类和标记基因的寻找。通过实例展示了从10X PBMC数据集中进行预处理、PCA、Leiden聚类和差异基因分析的过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Scanpy 是一个基于 Python 分析单细胞数据的软件包，内容包括预处理，可视化，聚类，拟时序分析和差异表达分析等。本文翻译自 scanpy 的官方教程 Preprocessing and clustering 3k PBMCs[1]，用 scanpy 重现Seurat 聚类教程[2] 中的绝大部分内容。

0. scanpy 安装

Anaconda

# scanpyconda install-c bioconda scanpy# Leiden clustering packageconda install-c conda-forge leidenalg

安装 scanpy 时报错，搞了好久也没成功。。。重建环境也不行。

conda install-c bioconda scanpyCollecting packagemetadata(current_repodata.json): doneSolvingenvironment:failedwithinitial frozen solve. Retrying withflexible solve.Solvingenvironment:failedwithrepodatafromcurrent_repodata.json,willretry with nextrepodata source.Collecting packagemetadata(repodata.json): doneSolvingenvironment:failedwithinitial frozen solve. Retrying withflexible solve.Solvingenvironment:Foundconflicts! Looking forincompatible packages.Thiscan take several minutes. PressCTRL-C to abort.failedUnsatisfiableError: Thefollowing specifications were found to be incompatiblewitheach other:Output informat: Requested package -> Availableversions

PyPI

直接用 pip ，安装成功。

pip install scanpy[louvain]

Docker

docker pull fastgenomics/scanpy:1.4-p368-v1-stretch-slim

1. 载入数据

# 下载PBMC 数据集## 其实就是 Seurat 那个示例数据，之前下过就不用重复下了!mkdir data!wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz!cd data;tar-xzf pbmc3k_filtered_gene_bc_matrices.tar.gz

importnumpyasnpimportpandasaspdimportscanpyassc

/Users/baozhiwei/anaconda3/lib/python3.7/site-packages/anndata/core/anndata.py:17: FutureWarning:pandas.core.indexisdeprecatedandwill be removedina future version. The publicclasses are availableinthe top-levelnamespace.frompandas.core.indeximport RangeIndex

# verbosity 的取值表示测试结果显示的详细程度，数字越大越详细## errors (0), warnings (1), info (2), hints (3)sc.settings.verbosity= 3# 输出版本号sc.logging.print_versions()# set_figure_params 设置图片的分辨率/大小以及其他样式sc.settings.set_figure_params(dpi=80)

scanpy==1.4.5.post3anndata==0.6.22.post1umap==0.3.10numpy==1.18.1scipy==1.4.1pandas==1.0.1scikit-learn==0.22.1statsmodels==0.11.0python-igraph==0.7.1

# 设置结果文件保存路径results_file= './pbmc3k.h5ad'

# 导入 10X 数据adata=sc.read_10x_mtx('./data/filtered_gene_bc_matrices/hg19/', # 包含有 `.mtx` 文件的目录var_names='gene_symbols', # 用 gene symbols 作为变量名 (variables-axis index)cache=True) # 使用缓存文件加快读取

...writing an h5ad cache file to speedup readingnexttime</