百分比堆叠柱状图有三个地方是可以进行排序处理的:
1,X轴上的样本顺序
2,柱状图内部方块的顺序
3,图例的顺序
分三步:
一, 数据读取与预处理
#分步读入数据
anabaena <- read.table("anabaena.txt", stringsAsFactors = FALSE, header = TRUE)
filter <- read.table("filter.txt", stringsAsFactors = FALSE, header = TRUE)
polysiphonia <- read.table("polysiphonia.txt", stringsAsFactors = FALSE, header = TRUE)
sediment <- read.table("sediment.txt", stringsAsFactors = FALSE, header = TRUE)
#合并数据
for(i in list(filter, polysiphonia, sediment)){
anabaena <- anabaena %>% full_join(i, by = c("anabaena" = colnames(i)[1]))
}
#更改列名并处理缺失值
sumall <- anabaena
colnames(sumall) <- c("genus", "anabaena", "filter", "polysiphonia", "sediment")
sumall[is.na(sumall)] <- 0
二,各要素排序处理
#添加一列,计算每个属的数量和
sumall$sum <- rowSums(sumall[ , 2:ncol(sumall)])
#按升序排序genus因子水平,用于柱状图内部排序
order1 <- sort(sumall$sum, index.return = TRUE, decreasing = FALSE )
sumall$genus <- factor(sumall$genus, levels = sumall$genus[order1$ix])
#按降序记录genus顺序,用于图例排序
sumall <- arrange(sumall, desc(sum))
genus_order <- as.vector(sumall$genus)
#删除添加的sum列
sumall <- sumall[ , -6]
#计算最多的属在各样本中的比例,以降序排列样本位置,用于X轴样本排序
Per <- (as.matrix(sumall[1, 2:ncol(sumall)])) / t(as.matrix(colSums(sumall[ , 2:ncol(sumall)])))
order2 <- sort(as.numeric(Per), index.return = TRUE, decreasing = TRUE)
sumall <- sumall[ ,c(1, order2$ix+1)]
#宽数据转长数据
sumalls <- melt(sumall, id.vars = "genus")
三,作图
ggplot(data = sumalls,aes(variable, value, fill = genus))+
geom_bar(stat = "identity", position = "fill", color = "SlateGrey", width = 0.8, size = 0.25)+
ylab("Relative Abundance") +
scale_fill_discrete(limits = genus_order) +
theme(
axis.title.x = element_blank(),
axis.title = element_text(size = 15, face = "plain", color = "black"),
axis.text = element_text(size = 12, face = "plain", color = "black"),
legend.title = element_text(size = 14, face = "plain", color = "black"),
legend.position = "right"
)

四,如果只想展示丰度最高的10个属
在第二步处理前进行如下处理:
#降序排列各属
sumall <- arrange(sumall, desc(sum))
#取丰度前10的部分
sumall1 <- sumall[c(1:10), ]
#其余部分折叠为others
others <- colSums(sumall[11:nrow(sumall) , 2:ncol(sumall)])
others <- as.data.frame(t(others))
others$genus <- "others"
#合并前10部分与其余others部分
sumall <- rbind(sumall1, others)
作图:
ggplot(data = sumalls,aes(variable, value, fill = genus))+
geom_bar(stat = "identity", position = "fill", color = "SlateGrey", width = 0.8, size = 0.25)+
ylab("Relative Abundance") +
scale_fill_manual(values=brewer.pal(11,"Set3"),limits = genus_order) +
theme(
axis.title.x = element_blank(),
axis.title = element_text(size = 15, face = "plain", color = "black"),
axis.text = element_text(size = 12, face = "plain", color = "black"),
legend.title = element_text(size = 14, face = "plain", color = "black"),
legend.position = "right"
)

该博客介绍了如何对百分比堆叠柱状图的X轴、柱状图内部和图例进行排序,并展示了如何根据丰度筛选数据。首先,通过读取和合并多个数据集,然后计算每个属的总数并排序。接着,按降序排列属以确定图例顺序,并计算样本中属的比例以决定X轴排序。最后,通过ggplot2绘制图表,并在需要时仅展示丰度最高的10个属。整个过程详细阐述了数据预处理和可视化的方法。
1352

被折叠的 条评论
为什么被折叠?



