Structural conservation of R2R3-MYBs
R2R3-MYBs 的结构保守性
The R2R3-MYB protein consists of two major functional parts: a DNA-binding domain (MYB domain) located at the N terminus and a regulatory region (non-MYB region) located at the C terminus. The MYB domain is a signature, highly conserved feature in the gene family, whereas the non-MYB regions have diverged among plant species (Fig. 2a). In addition, the intron patterns in the MYB domains of the R2R3-MYBs are highly conserved within each subfamily across land plants [6, 9, 10, 22, 49]. In most cases, the MYB domain has one or two conserved intron insertion sites, with rare intron-less and multi-intron genes [6, 10]. Although the number of intron patterns differs slightly among taxonomic groups, there is clearly high structural conservation of MYB domains across land plants. In our previous study, we identified 12 highly conserved intron patterns across the plant kingdom [10] that may have arisen early in the transition from aquatic to terrestrial plants based on conserved intron numbers, insertion positions, and intron phases (Supplemental Fig. 1). The majority (~70%) of the tested R2R3-MYBs share three types (patterns a, b, and c) of the 12 intron patterns (patterns a to l) [10], suggesting biased expansion during evolution. A recent study has shown that some charophyte R2R3-MYBs exhibit intron–exon structures identical to those of their corresponding land homologs in the same subfamily, whereas others show clear differences, indicating a difference among ancestral genes [6].
R2R3-MYB蛋白由两个主要的功能部分组成:位于N端的DNA结合域(MYB结构域)和位于C端的调控区域(非MYB区域)。MYB结构域是该基因家族的标志性特征,高度保守,而非MYB区域在不同植物物种之间则存在差异(见图2a)。此外,陆生植物中R2R3-MYB的MYB结构域的内含子模式在每个亚家族内高度保守[6, 9, 10, 22, 49]。在大多数情况下,MYB结构域有一个或两个保守的内含子插入位点,尽管存在少数无内含子和多内含子的基因[6, 10]。尽管不同分类群之间的内含子模式数量略有差异,但MYB结构域在陆生植物中具有明显的高度结构保守性。在我们之前的研究中,我们在整个植物界中发现了12种高度保守的内含子模式[10],这些模式可能在从水生向陆生植物过渡的早期就出现了,这是基于保守的内含子数量、插入位置和内含子相位(补充图1)。在测试的R2R3-MYB中,大约70%的成员共享这12种内含子模式中的三种类型(模式a、b和c)[10],这表明在进化过程中存在有偏的扩张。最近的一项研究表明,一些轮藻的R2R3-MYB的内含子-外显子结构与其同一亚家族中相应的陆生同源物完全相同,而其他一些则显示出明显的差异,这表明祖先基因之间存在差异[6]。
Figure 2
Domain structure and functional characterization status of R2R3-MYB transcription factors. a. The R2R3-MYB transcription factor is composed of a MYB region and a non-MYB region. The DNA-binding domain, also called the MYB region, contains conserved R2 and R3 repeats. In most activators and some repressors [43], there is a conserved bHLH-interacting motif ([D/E] Lx2[R/K]x3Lx6Lx3R) within the first two helixes of the R3 domain that enables interactions with bHLH proteins [48] to form the MYB-bHLH-WDR transcriptional complex. The protein sequences in the C-terminal region often show divergence, with one or a few typical repressor motif(s) such as the EAR motif (ERF-associated amphiphilic repression), SID motif (Sensitive to ABA and Drought 2 protein interact motif), and TLLLFR. H, Helix; T, turn; W, tryptophan; X, amino acid. * indicates that the motif is not included in all R2R3-MYBs, but only in some. b. Number of species for which the functions of one or more R2R3-MYB genes were identified as of 2020. In the last decade (2011–2020), there was explosive growth in the breadth of taxa for which R2R3-MYB data were available. It is noteworthy that most of the increase in species number is ascribed to the vast number of orthologous MYB genes characterized in horticultural plants that have similar functions, especially in phenylpropanoid biosynthesis. c. Number of Arabidopsis R2R3-MYBs for which biological role(s) have been identified. In Arabidopsis, all the increase is due to paralogs with new functions. When a gene had different functions published in different years, we sorted it into the year of the first publication. The dark grey bar represents the number of R2R3-MYBs with unknown (i.e. not experimentally verified) functions.
图2 R2R3-MYB转录因子的结构域结构和功能特征状态
a. R2R3-MYB转录因子由MYB区域和非MYB区域组成。DNA结合域,也称为MYB区域,包含保守的R2和R3重复序列。在大多数激活因子和一些抑制因子中,R3结构域的前两个螺旋中存在一个保守的bHLH相互作用基序([D/E]Lx2[R/K]x3Lx6Lx3R),使其能够与bHLH蛋白相互作用,形成MYB-bHLH-WDR转录复合体。C末端区域的蛋白序列通常存在差异,包含一个或几个典型的抑制基序,如EAR基序(ERF相关两亲性抑制)、SID基序(ABA和干旱敏感蛋白相互作用基序)和TLLLFR。H,螺旋;T,转角;W,色氨酸;X,氨基酸。*表示该基序并非所有R2R3-MYB都包含,而只存在于部分R2R3-MYB中。