1、Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization
Lihua Zhang, Shihua Zhang
Nucleic Acids Research, Volume 47, Issue 13, 26 July 2019, Pages 6606–6617, https://doi.org/10.1093/nar/gkz488
Published: 08 June 2019 Article history
Abstract
High-throughput biological technologies (e.g. ChIP-seq, RNA-seq and single-cell RNA-seq) rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios(场景) (e.g. cells, tissues and conditions). Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patterns(组合模式) of difference or require matched dimensions of the data. Here, we propose a flexible framework CSMF to combine them into one paradigm (示例)to simultaneously reveal Common and Specific patterns via Matrix Factorization from data generated under interrelated biological scenarios. We demonstrate the effectiveness of CSMF with four representative applications including pairwise ChIP-seq data describing the chromatin modification map between K562 and Huvec cell lines; pairwise RNA-seq data representing the expression profiles of two different cancers; RNA-seq data of three breast cancer subtypes; and single-cell RNA-seq data of human embryonic stem cell differentiation at six time points. Extensive analysis yields novel insights into hidden combinatorial patterns in these multi-modal data. Results demonstrate that CSMF is a powerful tool to uncover common and specific patterns with significant biological implications from data of interrelated biological scenarios.
2.Effect of mutations on binding of ligands to guanine riboswitch probed by free energy perturbation and molecular dynamics simulations
Jianzhong Chen, Xingyu Wang, Laixue Pang, John Z H Zhang, Tong Zhu
Nucleic Acids Research, Volume 47, Issue 13, 26 July 2019, Pages 6618–6631, https://doi.org/10.1093/nar/gkz499
Published: 07 June 2019 Article history
Abstract
Riboswitches()can regulate gene expression by direct and specific interactions with ligands and have recently attracted interest as potential drug targets for antibacterial. In this work, molecular dynamics (MD) simulations, free energy perturbation (FEP) and molecular mechanics generalized Born surface area (MM-GBSA) methods were integrated to probe the effect of mutations on the binding of ligands to guanine riboswitch (GR). The results not only show that binding free energies predicted by FEP and MM-GBSA obtain an excellent correlation, but also indicate that mutations involved in the current study can strengthen the binding affinity(亲和力) of ligands GR. Residue-based free energy decomposition was applied to compute ligand-nucleotide interactions and the results suggest that mutations highly affect interactions of ligands with key nucleotides U22, U51 and C74. Dynamics analyses based on MD trajectories indicate that mutations not only regulate the structural flexibility but also change the internal motion modes of GR, especially for the structures J12, J23 and J31, which implies that the aptamer domain activity of GR is extremely plastic and thus readily tunable by nucleotide mutations. This study is expected to provide useful molecular basis and dynamics information for the understanding of the function of GR and possibility as potential drug targets for antibacterial.
3.Quantifying gene selection in cancer through protein functional alteration bias
Nadav Brandes, Nathan Linial, Michal Linial
Nucleic Acids Research, Volume 47, Issue 13, 26 July 2019, Pages 6642–6655, https://doi.org/10.1093/nar/gkz546
Published: 25 June 2019 Article history
Abstract
Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit(隐式) and explicit(显式) assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying(量化) the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.
4.Limits to a classic paradigm: most transcription factors in E. coli regulate genes involved in multiple biological processes
Daniela Ledezma-Tejeida, Luis Altamirano-Pacheco, Vicente Fajardo, Julio Collado-Vides
Nucleic Acids Research, Volume 47, Issue 13, 26 July 2019, Pages 6656–6667, https://doi.org/10.1093/nar/gkz525
Published: 13 June 2019 Article history
Abstract
Transcription factors (TFs) are important drivers of cellular decision-making. When bacteria encounter a change in the environment, TFs alter the expression of a defined set of genes in order to adequately respond. It is commonly assumed that genes regulated by the same TF are involved in the same biological process. Examples of this are methods that rely on coregulation to infer function of not-yet-annotated genes. We have previously shown that only 21% of TFs involved in metabolism regulate functionally homogeneous genes, based on the proximity(临近、接近) of the gene products’ catalyzed reactions in the metabolic network. Here, we provide more evidence to support the claim that a 1-TF/1-process relationship is not a general property. We show that the observed functional heterogeneity of regulons is not a result of the quality of the annotation of regulatory interactions, nor the absence of protein–metabolite interactions, and that it is also present when function is defined by Gene Ontology terms. Furthermore, the observed functional heterogeneity is different from the one expected by chance, supporting the notion that it is a biological property. To further explore the relationship between transcriptional regulation and metabolism, we analyzed five other types of regulatory groups and identified complex regulons (i.e. genes regulated by the same combination of TFs) as the most functionally homogeneous, and this is supported by coexpression data. Whether higher levels of related functions exist beyond metabolism and current functional annotations remains an open question.