Gapless genome assembly of azalea and multi-omics investigation into divergence between two species with distinct flower color
Abstract
The genus Rhododendron (Ericaceae), with more than 1000 species highly diverse in flower color, is providing distinct ornamental values and a model system for flower color studies. Here, we investigated the divergence between two parental species with different flower color widely used for azalea breeding. Gapless genome assembly was generated for the yellow-flowered azalea, Rhododendron molle. Comparative genomics found recent proliferation of long terminal repeat retrotransposons (LTR-RTs), especially Gypsy, has resulted in a 125 Mb (19%) genome size increase in species-specific regions, and a significant amount of dispersed gene duplicates (13 402) and pseudogenes (17 437). Metabolomic assessment revealed that yellow flower coloration is attributed to the dynamic changes of carotenoids/flavonols biosynthesis and chlorophyll degradation. Time-ordered gene co-expression networks (TO-GCNs) and the comparison confirmed the metabolome and uncovered the specific gene regulatory changes underpinning the distinct flower pigmentation. B3 and ERF TFs were found dominating the gene regulation of carotenoids/flavonols characterized pigmentation in R. molle, while WRKY, ERF, WD40, C2H2, and NAC TFs collectively regulated the anthocyanins characterized pigmentation in the red-flowered R simsii. This study employed a multi-omics strategy in disentangling the complex divergence between two important azaleas and provided references for further functional genetics and molecular breeding.
杜鹃花属(Rhododendron,杜鹃花科)包含1000多个物种,其花色高度多样化,不仅具有独特的观赏价值,还为花色研究提供了理想的模型系统。本研究以两种常用于杜鹃花育种且花色不同的亲本物种为研究对象,探究了它们之间的分化机制。我们对开黄花的杜鹃花物种——羊踯躅(Rhododendron molle)进行了无间隙基因组组装。通过比较基因组学分析发现,近期长末端重复反转录转座子(LTR-RTs,尤其是Gypsy家族)的增殖导致了物种特异性区域基因组大小增加了125 Mb(19%),并产生了大量分散的基因重复(13,402个)和假基因(17,437个)。代谢组学评估表明,黄花的花色形成归因于类胡萝卜素/黄酮醇生物合成和叶绿素降解的动态变化。时间顺序基因共表达网络(TO-GCNs)及比较分析验证了代谢组数据,并揭示了支撑不同花色形成的特异性基因调控变化。研究发现,B3和ERF转录因子主导了R. molle中以类胡萝卜素/黄酮醇为特征的花色调控,而WRKY、ERF、WD40、C2H2和NAC转录因子共同调控了红花杜鹃(R. simsii)中以花青素为特征的花色形成。本研究采用多组学策略解析了两种重要杜鹃花之间的复杂分化机制,并为进一步的功能遗传学和分子育种研究提供了参考。
Introduction
As the largest genus of woody plants, Rhododendron comprises more than 1000 species widely distributed in natural settings throughout the temperate and montane tropical world. Among the diverse lineages of angiosperms, the genus is best-known for its great species diversity and magnificent flowers [1, 2]. Several high-quality Rhododendron genomes were released recently, including one draft genome [3] and eight pseudochromosome-level genomes [4–10], but no gapless genome has been obtained up to now. Gap-free or gapless genome assemblies have been accomplished in banana [11], rice [12], and Arabidopsis [13] with the developments of third-generation sequencing technologies. A high-quality genome will bring valuable opportunities for the investigation of more genomic variations, and make it easier to target suitable mutants and develop them for genetic and breeding research.
### 引言
作为木本植物中最大的属,杜鹃花属(Rhododendron)包含1000多个物种,广泛分布于全球温带和山地热带地区的自然环境中。在被子植物的多样化谱系中,该属以其丰富的物种多样性和绚丽的花朵而闻名[1, 2]。近年来,多个高质量的杜鹃花基因组相继发布,包括一个草图基因组[3]和八个伪染色体水平的基因组[4–10],但迄今为止尚未获得无间隙基因组。随着第三代测序技术的发展,香蕉[11]、水稻[12]和拟南芥[13]等物种已成功实现了无间隙基因组组装。高质量的基因组将为研究更多基因组变异提供宝贵的机会,并更容易筛选合适的突变体,用于遗传和育种研究。
Transposable elements (TEs), the most difficult parts of ‘dark matter’ regions, are ubiquitous and abundant genome sequences that can adaptably move into new locations in the host genomes, multiply, and integrate there [14]. TEs can generate genetic novelty at the molecular level via both active and passive modes [15, 16] by causing de novo gene birth [17] and pseudogenization [18], or homology-driven ectopic (non-allelic) DNA recombination. TEs can also passively cause large-scale chromosomal rearrangements such as inversions and fusions [19, 20]. These features make TEs ideal facilitators of genotypic evolution [21]. However, our understanding of how TEs contribute to genetic variation is limited in azaleas, partly because TEs are difficult to accurately identify by short-read sequencing, because they are highly homologous and exist in large numbers.
转座元件(TEs)是基因组“暗物质”区域中最复杂的部分,它们普遍存在于基因组中,能够适应性地移动到宿主基因组的新位置,进行复制和整合[14]。TEs可以通过主动和被动模式在分子水平上产生遗传创新[15, 16],例如导致新基因的产生[17]和假基因化[18],或通过同源驱动的异位(非等位)DNA重组。TEs还可能被动地引发大规模的染色体重排,如倒位和融合[19, 20]。这些特性使TEs成为基因型进化的理想推动者[21]。然而,我们对TEs如何促进杜鹃花遗传变异的理解仍然有限,部分原因是TEs高度同源且数量庞大,难以通过短读长测序技术准确识别。
In angiosperms, flower color is one of the most considerable and well-studied traits because it exhibits a bewildering diversity across evolutionary groups or individuals over a range of spatial, geographic, and temporal scales. Flower color shifts have occurred repeatedly in angiosperms, largely reflecting adaptation to novel pollinator regimes, thus facilitating speciation [22, 23]. Rhododendrons are valued for their marvelous range of flower colors, which are mainly determined by two major groups of pigments: flavonoids and carotenoids [5, 24–28]. Flower color shifts occurred repeatedly in the main clades of Rhododendron and even within species [25, 29], which was confirmed earlier by results of pigment analyses [25–28]. The complexity of the variance in flower colors appears to be comparable to the diversity of the genus’s numerous species, and the Rhododendron has the potential to serve as a model to probe the evolution of flower color. Rhododendron molle is a perennial shrub native to East Asia [10], and is the yellow-flowered parent of many cultivated hybrids. However, the biochemical and molecular basis for the petal color formation remains unclear, although the genome of a red-flowered parent species of widely cultivated azaleas was assembled and the entire metabolic pathways for flower pigmentation were reconstructed recently [5].
在被子植物中,花色是最重要且研究最广泛的性状之一,因为在不同的进化类群或个体间,以及在不同的空间、地理和时间尺度上,花色呈现出令人眼花缭乱的多样性。花色变化在被子植物中反复发生,主要反映了对新传粉者模式的适应,从而促进了物种形成 [22, 23]。
杜鹃花因其丰富多样的花色而备受关注,这些颜色主要由两大类色素决定:黄酮类和类胡萝卜素 [5, 24–28]。花色变化在杜鹃花主要分支中反复出现,甚至在同一物种内也可观察到 [25, 29],这一现象早先已通过色素分析得到了证实 [25–28]。花色变化的复杂性似乎可与该属众多物种的多样性相媲美,因此,杜鹃花有潜力成为研究花色演化的模式植物。
羊踯躅 (Rhododendron molle) 是一种原产于东亚的多年生灌木 [10],也是许多栽培杂交种的黄花亲本。然而,尽管一种广泛栽培的红花杜鹃亲本的基因组已被组装,并且整个花色素代谢途径已被重建 [5],但花瓣颜色形成的生化和分子基础仍不清楚。
Here, we achieved a gapless genome assembly of R. molle, the highest quality assembly publicly available in the genus to date. More than 99% of the assembled genome were anchored on 13 chromosomes with seven in gap-free. Comparative genomic analyses enabled us to track the highly dynamic expansion of specific LTR-RT superfamilies in R. molle. We discovered that LTR-RT proliferations were important forces underpinning genomic divergence between R. molle and the red-flowered R. simsii. Furthermore, we unraveled the metabolic dynamic of the yellow-flowered azalea and investigated the remodulation of gene regulation during flower pigmentation through integrating metabolomic and time-ordered gene co-expression analyses, and compared with the red-flowered azalea. The gapless genome and the perception on the molecular mechanisms of flower coloration presented valuable resources for further genomic investigation and genetic breeding in azalea.
在这项研究中,我们成功地完成了羊踯躅 (R. molle) 的无缝基因组组装,这是目前该属中公开的最高质量的组装。组装的基因组中有超过99%被锚定在13条染色体上,其中七条染色体无间隙。比较基因组分析使我们能够追踪羊踯躅中特定LTR-RT超家族的高度动态扩展。我们发现,LTR-RT的增殖是推动羊踯躅与红花杜鹃 (R. simsii) 之间基因组分化的重要力量。
此外,我们揭示了黄花杜鹃的代谢动态,并通过整合代谢组学和时间序列基因共表达分析,探讨了花色素形成过程中基因调控的重塑,并与红花杜鹃进行了比较。该无缝基因组以及对花色形成分子机制的理解为杜鹃花的进一步基因组研究和遗传育种提供了宝贵的资源。
Results
无间隙基因组组装与注释(针对 R. molle )Gapless genome assembly and annotation for R. molle
We produced approximately 51.97 giga bases (Gb) (100× coverages) of Oxford Nanopore Technologies (ONT) long sequencing reads, 49.12 Gb of PCR-free Illumina paired-end sequencing data, and 193.78 Gb of Hi-C sequencing data (Fig. S1;