数据下载
# !mkdir data
# !wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
# !cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz
# !mkdir write
流程分析
函数解释:
Preprocessing: pp
Tools: tl
Plotting: pl
# -*- coding: utf-8 -*-
"""
Created on Thu Mar 18 14:15:49 2021
@author: dujidan
"""
import os
import numpy as np
import pandas as pd
import scanpy as sc
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor='white')
results_file = 'write/pbmc3k.h5ad' # the file that will store the analysis results
adata = sc.read_10x_mtx(
'./data/filtered_gene_bc_matrices/hg19/', # the directory with the `.mtx` file
var_names='gene_symbols', # use gene symbols for the variable names (variables-axis index)
cache=True)
adata.var_names_make_unique() # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`
adata
'''
Step1, 数据预处理
'''
# =============================================================================
# Preprocessing
# =============================================================================
# 查看占比最高的 20 个 gene
sc.pl.highest_expr_genes(adata, n_top=20, )
# Basic filtering
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# 线粒体的量是质控的一个重要标准,如果线粒体比例很高,表示细胞质中的 RNA 流失严重。(线粒体较大,更不易流出细胞膜)
adata.var['mt'] = adata.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
# 计算质量的小提琴图。
sc.pl.violin(adata, ['n_genes_by_counts', 'total_counts', 'pct_counts_mt'], # ['n_genes', 'n_counts', 'percent_mito']
jitter=0.4, multi_panel=True)
# 做一个散点图,也可以直观地显示出一些异常分布的数据点
sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt')
sc

本文介绍使用Python的Scanpy库分析单细胞RNA测序数据的过程。涵盖了数据预处理、主成分分析、聚类分析及标记基因鉴定等内容。
最低0.47元/天 解锁文章
1万+

被折叠的 条评论
为什么被折叠?



