cf#-337 C. Harmony Analysis

该博客讨论了如何根据给定的n值,构造2^n个长度为2^n的向量,这些向量仅包含1和-1,并确保任意两个向量的乘积为零。提供了一个递归策略,即从n-1的情况出发,通过复制并反转部分向量来生成n的情况。

http://codeforces.com/contest/610/problem/C


题目要 给你n,

让你写出2^N 个长度为2^N的向量, (只含1,-1) 使得任意两个向量的乘积都是为零


+表示1,*表示-1

如n=2

一个合法的答案为  (任意两行乘积为零)

++**
+*+*
++++
+**+

构造的方法是,如果知道k-1的情况,那么k的情况就是 k-1的图复制4份,拼成新的一个正方形,对其中一小块取反,便可以得到k的答案;

#include <cstdio>
#include <cmath>
#include <cstring>
#include <string>
#include <algorithm>
#include <queue>
#include <map>
#include <set>
#include <vector>
#include <iostream>
using namespace std;

int n;
int mp[1<<10][1<<10];

void rec (int k)
{
    if (k==0)
    {
        mp[0][0]=1;
    }
    else
    {
        int i,j;
        rec(k-1);
        for (i=(1<<(k-1));i<(1<<k);i++)  //copyto down 
            for (j=0;j<(1<< (k-1));j++) 
                mp[i][j]=mp[i-(1<<(k-1))][j]; 
            
            for (j=0;j<(1<< (k-1));j++) 
                for (i=(1<<(k-1));i<(1<<k);i++)  //copyto right  
                    mp[j][i]=mp[j][i-(1<<(k-1))]; 
                
                for (i=(1<<(k-1));i<(1<<k);i++)  //copyto down 
                    for (j=(1<<(k-1));j<(1<<k);j++)  //copyto down 
                        mp[i][j]=!mp[i][j-(1<<(k-1))];  
                    
    }
    
}
int main()
{
    
    cin>>n;
    rec(n);
    int i,j;
    for (i=0;i<(1<<n);i++)
    {
        for (j=0;j<(1<<n);j++)
        {
            if (mp[i][j]) printf("+");
            else
                printf("*");
        }
        printf("\n");
    }
    return 0;
    
}


首页/编程语言 蛋白测序代码问题(SCTmodel.list包含多个SCT模型无法统一) r语言 本人用Seurat v5做多样本的单细胞分析,用SCT归一化,然后用harmony消除批次效应,一直做到降为聚类和umap图都没什么问题,接下来想做细胞注释,在寻找各簇的差异基因的时候报错显示: 警告: No DE genes identified 警告: The following tests were not performed: 警告: When testing 0 versus all: Object contains multiple models with unequal library sizes. Run PrepSCTFindMarkers() before running FindMarkers() 在尝试运行PrepSCTFindMarkers()后,又显示minimum UMI unchanged. Skiped re-correct. 现在的问题是PrepSCTFindMarkers()无法运行导致SCTmodel.list里包含的多喝SCT模型无法被统一,因此不能进行FindMakers。请求解决方法。 library(remotes) library(Seurat) library(tidyverse) library(DoubletFinder) library(harmony) library(SingleR) library(celldex) library(ggplot2) library(patchwork) library(glmGamPoi) library(devtools) # ----------------------------- # 1. 数据读取与元数据构建 # ----------------------------- data_dir <- "D:/GSEXXXX_RAW" h5_files <- list.files(data_dir, pattern = "\\.h5$", full.names = TRUE # 提取样本ID和实验条件 meta_data <- data.frame( file = h5_files, sample_id = str_extract(basename(h5_files), "GSM\\d+"), condition = str_extract(basename(h5_files), "(Ctrl|EcP|EcO)"), stringsAsFactors = FALSE ) # ----------------------------- # 2. 分样本创建Seurat对象及初步QC # ----------------------------- seurat_list <- lapply(seq_along(h5_files), function(i) { # 读取数据并创建对象 counts <- Read10X_h5(meta_data$file[i]) obj <- CreateSeuratObject( counts = counts, project = meta_data$sample_id[i], min.cells = 3, min.features = 200 ) # 添加元数据 obj$condition <- meta_data$condition[i] obj$sample_id <- meta_data$sample_id[i] # 计算QC指标 obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-") obj[["percent.rb"]] <- PercentageFeatureSet(obj, pattern = "^RP[SL]") obj[["percent.hb"]] <- PercentageFeatureSet(obj, pattern = "^HB[AB]") # 分样本初步QC过滤 obj <- subset(obj, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & percent.mt < 20 & percent.rb < 50 & percent.hb < 0.5 ) return(obj) }) # 为列表命名确保可追溯性 names(seurat_list) <- meta_data$sample_id saveRDS(seurat_list,file = "seurat_list.qc.rds" # ----------------------------- # 3. 双细胞检测(分样本SCTransform标准化) # ----------------------------- seurat_list <- lapply(seurat_list, function(obj) { # 使用SCTransform标准化 obj <- SCTransform( obj, vars.to.regress = c("percent.mt") ) # 运行PCA(基于SCTransform生成的SCT assay) obj <- RunPCA(obj, assay = "SCT", npcs = 20) # 自动检测双细胞 sweep.res <- paramSweep(obj, PCs = 1:20, sct = TRUE) sweep.stats <- summarizeSweep(sweep.res) bcmvn <- find.pK(sweep.stats) optimal_pK <- as.numeric(bcmvn$pK[which.max(bcmvn$BCmetric)]) # 动态计算双细胞率(按10x公式) n_cells <- ncol(obj) doublet_rate <- 0.008 * (n_cells / 1000) nExp <- round(doublet_rate * n_cells) nExp <- max(nExp, 50) # 运行DoubletFinder obj <- doubletFinder( obj, PCs = 1:20, pN = 0.25, pK = optimal_pK, nExp = nExp, sct = TRUE ) # 过滤双细胞 df_col <- grep("DF.classifications", colnames(obj@meta.data), value = TRUE) obj <- subset(obj, cells = colnames(obj)[obj[[df_col]] == "Singlet"]) return(obj) }) saveRDS(seurat_list,file = "seurat_list.qc.DF.rds") # ----------------------------- # 5. 跨样本整合(Harmony校正批次效应) # ----------------------------- # 合并所有样本为一个Seurat对象 merged_seurat <- merge(seurat_list[[1]], seurat_list[-1]) # 整合全局可变基因 var_features <- SelectIntegrationFeatures(object.list = seurat_list, nfeatures = 3000) VariableFeatures(merged_seurat) <- var_features # 运行PCA DefaultAssay(merged_seurat) <- "SCT" merged_seurat <- RunPCA(merged_seurat, npcs = 20, verbose = FALSE) # 运行Harmony校正批次效应 merged_seurat <- RunHarmony( merged_seurat, group.by.vars = "sample_id", assay.use = "SCT", project.dim = FALSE, plot_convergence = TRUE ) # 保存批次校正后的对象 saveRDS(merged_seurat, file = "merged_seurat.harmony.rds") merged_seurat = readRDS("merged_seurat.harmony.rds") # ----------------------------- # (1) 计算KNN图(基于Harmony嵌入) merged_seurat <- FindNeighbors( merged_seurat, reduction = "harmony", dims = 1:20, assay = "SCT", graph.name = "harmony_snn" ) # (2) 基于Leiden或Louvain算法聚类(分辨率决定簇数量) merged_seurat <- FindClusters( merged_seurat, graph.name = "harmony_snn", resolution = seq(0.2, 1.2, 0.2), algorithm = 1, verbose = FALSE ) # (3) 可视化聚类结果(UMAP) merged_seurat <- RunUMAP( merged_seurat, reduction = "harmony", dims = 1:20, reduction.name = "umap_harmony" ) # 绘制UMAP图(按聚类和样本分组) p1 <- DimPlot(merged_seurat, reduction = "umap_harmony", group.by = "seurat_clusters", label = TRUE) p2 <- DimPlot(merged_seurat, reduction = "umap_harmony", group.by = "sample_id") p1 | p2 # -------------------------------- # 6. 差异基因分析(识别各簇的标记基因) # -------------------------------- # (1) 设置默认Assay为RNA或SCT(根据标准化方法选择) DefaultAssay(merged_seurat) <- "SCT" # 若使用SCTransform # 步骤 2: 整合 SCT 模型 merged_seurat <- PrepSCTFindMarkers( merged_seurat, assay = "SCT") # (2) 寻找各簇的差异基因 markers <- FindAllMarkers( merged_seurat, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.5, test.use = "wilcox", assay = "SCT" ) # 过滤并排序(按平均logFC和p值) top_markers <- markers %>% group_by(cluster) %>% filter(p_val_adj < 0.05) %>% slice_max(n = 5, order_by = avg_log2FC) # 每簇取top5基因 # 查看标记基因 print(top_markers) saveRDS(merged_seurat, file = "top_markers.rds") # (2) 下载参考数据集(以Human Primary Cell Atlas为例) ref <- celldex::HumanPrimaryCellAtlasData() # (3) 提取单细胞表达矩阵(需与参考数据基因符号匹配) counts <- GetAssayData(merged_seurat, assay = "SCT", slot = "data") # 使用log-normalized数据 # (4) 运行SingleR预测 pred <- SingleR( test = counts, ref = ref, labels = ref$label.main, clusters = merged_seurat$seurat_clusters ) # (5) 将预测结果添加到metadata merged_seurat$singleR_labels <- pred$labels[match(merged_seurat$seurat_clusters, rownames(pred))] # ----------------------------- # 7. 结果保存与输出 # ----------------------------- # 保存完整对象 saveRDS(merged_seurat, file = "integrated_seurat_final.rds") # 导出标记基因表格 markers <- FindAllMarkers(merged_seurat, assay = "SCT", only.pos = TRUE) write.csv(markers, "cluster_markers.csv", row.names = FALSE) # 生成PDF报告 pdf("QC_plots.pdf") VlnPlot(merged_seurat, features = c("nFeature_RNA", "percent.mt"), group.by = "sample_id") DimPlot(merged_seurat, group.by = "celltype", label = TRUE) FeaturePlot(merged_seurat, features = c("CD3D", "CD19", "EPCAM")) dev.off()
最新发布
10-28
回答参考qwen 你遇到的问题是: > 在使用 Seurat v5 进行单细胞数据分析时,完成了 SCTransform 标准化、Harmony 批次校正、聚类和 UMAP 可视化后,在进行差异基因分析(`FindAllMarkers()`)时报错: > > ``` > 警告: No DE genes identified > 警告: The following tests were not performed: > 警告: When testing 0 versus all: Object contains multiple models with unequal library sizes. Run PrepSCTFindMarkers() before running FindMarkers() > ``` > > 尝试运行 `PrepSCTFindMarkers()` 后提示: > ``` > minimum UMI unchanged. Skipped re-correct. > ``` --- ## 🔍 问题本质 这个错误的核心在于:**你的 Seurat 对象中存在多个独立的 SCT 模型(SCT model),它们是在每个样本分别做 `SCTransform` 时生成的,且这些模型之间的“平均 UMI 数”不一致。** Seurat v5 的 `FindMarkers()` 函数在使用 `assay = "SCT"` 时要求: ✅ 必须对所有细胞统一使用一个 **全局共享的负二项回归模型(SCT model)** ❌ 如果存在多个不同 library size 假设下的模型 → 无法比较 → 报错跳过 而你目前的状态是: - 每个样本单独做了 `SCTransform(..., sct = TRUE)` → 产生了多个独立的 `SCT` assay 中的 `model` - 合并后没有重新拟合或整合这些模型 - 因此 `PrepSCTFindMarkers()` 检测到“已有模型”,但发现不能合并(因为 UMI 分布不同)→ 直接跳过 --- ## ✅ 正确解决方案(关键步骤) 你需要 **放弃早期分样本的 SCT 模型**,改为在合并后 **统一执行一次全局 SCTransform**。 ### ❌ 错误流程(你现在走的): ```r for each sample: CreateSeuratObject → SCTransform → DoubletFinder → merge ``` 👉 导致多模型残留 ### ✅ 正确流程(推荐做法): ```r for each sample: CreateSeuratObject → QC filtering ↓ merge all samples ↓ Run SCTransform ONCE on the merged object ↓ DoubletFinder (optional), PCA, Harmony, Clustering... ``` --- ## ✅ 修改后的代码修正建议 以下是对你原始代码的关键修改部分(重点在第3步和第5步): ```r # ----------------------------- # 1. 数据读取与元数据构建(不变) # ----------------------------- data_dir <- "D:/GSEXXXX_RAW" h5_files <- list.files(data_dir, pattern = "\\.h5$", full.names = TRUE) meta_data <- data.frame( file = h5_files, sample_id = str_extract(basename(h5_files), "GSM\\d+"), condition = str_extract(basename(h5_files), "(Ctrl|EcP|EcO)"), stringsAsFactors = FALSE ) # ----------------------------- # 2. 分样本创建Seurat对象及初步QC(只做QC,不做SCT!) # ----------------------------- seurat_list <- lapply(seq_along(h5_files), function(i) { counts <- Read10X_h5(meta_data$file[i]) obj <- CreateSeuratObject( counts = counts, project = meta_data$sample_id[i], min.cells = 3, min.features = 200 ) # 添加元数据 obj$condition <- meta_data$condition[i] obj$sample_id <- meta_data$sample_id[i] # 计算QC指标 obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-") obj[["percent.rb"]] <- PercentageFeatureSet(obj, pattern = "^RP[SL]") obj[["percent.hb"]] <- PercentageFeatureSet(obj, pattern = "^HB[AB]") # 初步QC过滤 obj <- subset(obj, subset = nFeature_RNA > 500 & nFeature_RNA < 6000 & percent.mt < 20 & percent.rb < 50 & percent.hb < 0.5 ) return(obj) }) names(seurat_list) <- meta_data$sample_id saveRDS(seurat_list, file = "seurat_list.qc.rds") # ----------------------------- # 3. 合并后再统一做 SCTransform(核心修改点!) # ----------------------------- merged_seurat <- merge(seurat_list[[1]], y = seurat_list[-1]) # 先合并 # 在合并后的对象上运行 SCTransform(仅一次) merged_seurat <- SCTransform( merged_seurat, vars.to.regress = c("percent.mt"), new.genes = TRUE, # 是否添加新基因到 variable features verbose = TRUE ) # 可选:再次过滤双细胞(现在可以用 SCT assay) merged_seurat <- RunPCA(merged_seurat, assay = "SCT", npcs = 20) sweep.res <- paramSweep_v3(merged_seurat, PCs = 1:20, sct = TRUE) sweep.stats <- summarizeSweep(sweep.res) bcmvn <- find.pK(sweep.stats) optimal_pK <- as.numeric(bcmvn$pK[which.max(bcmvn$BCmetric)]) n_cells <- ncol(merged_seurat) doublet_rate <- 0.008 * (n_cells / 1000) nExp <- max(round(doublet_rate * n_cells), 50) merged_seurat <- doubletFinder( merged_seurat, PCs = 1:20, pN = 0.25, pK = optimal_pK, nExp = nExp, sct = TRUE ) # 过滤双细胞 df_col <- grep("DF.classifications", colnames(merged_seurat@meta.data), value = TRUE) merged_seurat <- subset(merged_seurat, cells = colnames(merged_seurat)[merged_seurat[[df_col]] == "Singlet"]) # ----------------------------- # 4. 整合并降维聚类(Harmony + UMAP) # ----------------------------- # 提取可变特征用于整合 var_features <- SelectIntegrationFeatures(object.list = SplitObject(merged_seurat, split.by = "sample_id"), nfeatures = 3000) VariableFeatures(merged_seurat) <- var_features # 使用 SCT assay 运行 PCA merged_seurat <- RunPCA(merged_seurat, assay = "SCT", npcs = 20, verbose = FALSE) # 运行 Harmony 批次校正 merged_seurat <- RunHarmony( merged_seurat, group.by.vars = "sample_id", assay.use = "SCT", project.dim = FALSE, plot_convergence = TRUE ) # 构建邻居图 & 聚类 merged_seurat <- FindNeighbors( merged_seurat, reduction = "harmony", dims = 1:20, assay = "SCT", graph.name = "harmony_snn" ) merged_seurat <- FindClusters( merged_seurat, graph.name = "harmony_snn", resolution = seq(0.2, 1.2, 0.2), algorithm = 1, verbose = FALSE ) # UMAP merged_seurat <- RunUMAP( merged_seurat, reduction = "harmony", dims = 1:20, reduction.name = "umap_harmony" ) # 绘图 p1 <- DimPlot(merged_seurat, reduction = "umap_harmony", group.by = "seurat_clusters", label = TRUE) p2 <- DimPlot(merged_seurat, reduction = "umap_harmony", group.by = "sample_id") p1 | p2 # ----------------------------- # 5. 差异基因分析(现在可以正常运行!) # ----------------------------- DefaultAssay(merged_seurat) <- "SCT" # ✅ 现在 PrepSCTFindMarkers 应该会成功 merged_seurat <- PrepSCTFindMarkers( merged_seurat, assay = "SCT", verbose = TRUE ) # 查找所有标记基因 markers <- FindAllMarkers( merged_seurat, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.5, test.use = "wilcox", assay = "SCT" ) # 提取每簇 top5 top_markers <- markers %>% group_by(cluster) %>% filter(p_val_adj < 0.05) %>% slice_max(n = 5, order_by = avg_log2FC) print(top_markers) saveRDS(top_markers, file = "top_markers.rds") # ----------------------------- # 6. 细胞注释(SingleR) # ----------------------------- # 注意:SingleR 推荐使用 RNA assay 的 log-normalized data # 所以我们要先从 SCT 模型获取表达值 # 方法一:提取 SCT 校正后的数据(log-normalized) counts <- GetAssayData(merged_seurat, assay = "SCT", slot = "data") # log(x+1) # 或方法二:切换回 RNA assay 并标准化 if (!"RNA" %in% names(merged_seurat@assays)) { merged_seurat[['RNA']] <- CreateAssayObject(counts = GetAssayData(merged_seurat, assay = 'SCT', slot = 'counts')) merged_seurat <- NormalizeData(merged_seurat, assay = 'RNA') } counts_rna <- GetAssayData(merged_seurat, assay = "RNA", slot = "data") ref <- celldex::HumanPrimaryCellAtlasData() pred <- SingleR( test = counts, # 或用 counts_rna ref = ref, labels = ref$label.main, clusters = merged_seurat$seurat_clusters ) merged_seurat$singleR_labels <- pred$labels[match(merged_seurat$seurat_clusters, rownames(pred))] # ----------------------------- # 7. 结果保存 # ----------------------------- saveRDS(merged_seurat, file = "integrated_seurat_final.rds") write.csv(markers, "cluster_markers.csv", row.names = FALSE) pdf("QC_plots.pdf") VlnPlot(merged_seurat, features = c("nFeature_RNA", "percent.mt"), group.by = "sample_id") DimPlot(merged_seurat, group.by = "singleR_labels", label = TRUE) FeaturePlot(merged_seurat, features = c("CD3D", "CD19", "EPCAM")) dev.off() ``` --- ## 📌 关键要点总结 | 问题 | 解决方案 | |------|----------| | ❌ 多个 SCT 模型导致 `FindMarkers` 失败 | ✅ 合并后再统一只做一次 `SCTransform` | | ⚠️ `PrepSCTFindMarkers()` 被跳过 | ✅ 因为已有模型未更新 → 改为后期统一建模 | | 💡 是否还能用 Harmony? | ✅ 可以,只要在 `SCT` assay 上运行即可 | | 🧪 双细胞检测怎么做? | ✅ 在 `SCT` + PCA 后进行,使用 `doubletFinder(sct = TRUE)` | | 📊 SingleR 用什么数据? | ✅ 推荐用 `SCT` assay 的 `data` slot(log-normalized) | --- ## ✅ 额外建议 - 若你坚持想保留早期 SCT 模型(如资源有限无法重跑),可尝试: ```r # 强制清除旧模型 merged_seurat[['SCT']]@misc$prepped.markers <- NULL merged_seurat[['SCT']]@model <- NULL ``` 然后再运行 `PrepSCTFindMarkers()`,但风险较高,不如重做可靠。 - 推荐阅读 Seurat 官方最新教程(v5): https://satijalab.org/seurat/articles/integration_introduction.html --- 完成以上修改后,你的 `FindAllMarkers()` 将能正常运行并输出各簇的 marker genes。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值