python 分析单细胞数据教程 scanpy---初探

最新推荐文章于 2025-10-28 11:02:02 发布

原创

最新推荐文章于 2025-10-28 11:02:02 发布 · 4.2k 阅读

14 ·

CC 4.0 BY-SA版权

本文介绍使用Python的Scanpy库分析单细胞RNA测序数据的过程。涵盖了数据预处理、主成分分析、聚类分析及标记基因鉴定等内容。

python 分析单细胞数据 scanpy

数据下载
流程分析
参考

数据下载

# !mkdir data
# !wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
# !cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
# !mkdir write

流程分析

函数解释：
Preprocessing: pp
Tools: tl
Plotting: pl

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 18 14:15:49 2021

@author: dujidan
"""


import os
import numpy as np
import pandas as pd
import scanpy as sc


sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor='white')

results_file = 'write/pbmc3k.h5ad'  # the file that will store the analysis results


adata = sc.read_10x_mtx(
    './data/filtered_gene_bc_matrices/hg19/',  # the directory with the `.mtx` file
    var_names='gene_symbols',                # use gene symbols for the variable names (variables-axis index)
    cache=True)
adata.var_names_make_unique()  # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`
adata
'''
Step1, 数据预处理
'''
# =============================================================================
# Preprocessing
# =============================================================================
# 查看占比最高的 20 个 gene
sc.pl.highest_expr_genes(adata, n_top=20, )
# Basic filtering
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)

# 线粒体的量是质控的一个重要标准，如果线粒体比例很高，表示细胞质中的 RNA 流失严重。（线粒体较大，更不易流出细胞膜）
adata.var['mt'] = adata.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)

# 计算质量的小提琴图。
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'],  # ['n_genes', 'n_counts', 'percent_mito']
             jitter=0.4, multi_panel=True)

# 做一个散点图，也可以直观地显示出一些异常分布的数据点
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt')
sc

最低0.47元/天解锁文章