signature=3713db8ab6257c674f2ea1d338758145,Genomic signatures of mitonuclear coevolution across popu...-优快云博客

研究了加州海螺T.californicus不同性别和生活阶段的组织样本，采用抗生素处理减少杂质DNA，通过 Qiagen DNeasy 和 PacBio 技术进行高覆盖和长片段测序。后续进行了基因组组装、重复消除、功能注释和多个群体比较，揭示了物种的进化和基因家族扩展情况。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Tissue sampling and DNA isolation

T. californicus of mixed sex and life stages were collected from high intertidal rocky pools in San Diego (32° 44′ 44′′ N, 117° 15′ 18′′ W) and kept en masse in 1 l beakers with filtered seawater (0.2 µm pore filter) at 20 °C. To reduce the contribution of gut contents and bacteria to the DNA extract, the seawater was treated with an antibiotic mixture (ampicillin, 200 mg l–1; streptomycin, 50 mg l–1; spectinomycin, 50 mg l–1; kanamycin, 50 mg l–1; penicillin, 200 units ml–1; neomycin, 50 mg l–1; and chloramphenicol, 20 mg l–1) and the animals were moved to clean filtered seawater daily for 3 days, without food, before extraction. DNA was isolated from groups of hundreds of copepods using a Qiagen DNeasy Blood and Tissue Kit, and samples were pooled to ~100 µg of DNA.

Genome sequencing and assembly

Cofactor Genomics generated 3 short-insert libraries (200, 500 and 800 bp) and 2 long-insert libraries (2 and 5 kb), and then barcoded and sequenced the libraries in 1 lane using the Illumina HiSeq 2000 platform to obtain 2× 100 bp paired-end (short inserts) or mate-paired sequences (Supplementary Table 1). Illumina sequences from the five libraries were first filtered using an internal quality control script to remove the low-quality reads. For the 200 and 500 bp insert libraries, which have high coverage, the quality control parameters were more stringent. All raw reads with unknown base 'N' were removed. In addition, raw reads were removed if the sum of per-base error probability across all bases was greater than two. For other libraries, whose coverage is relative lower, reads with 'N' were kept. However, reads were deleted if the sum of per-base error probability was greater than three. We only kept read pairs for which both read ends passed the quality filter. Duplicated read pairs were identified by CD-HIT-DUP in the CD-HIT package with the parameters '-u 50 –e'. Here, the read pairs having near-identical sequences with up to 2 mismatches within the first 50 bases on both ends were considered duplicated. The duplicated read pairs with lower-quality scores were removed. The processed reads from these five libraries were assembled using AllPaths-LG (release 46212) following AllPaths’ standard workflow, including initial input file preparation and final assembly runs. The assembled scaffold sequences were further extended using the GapCloser programme in the SOAPdenovo package.

To improve contiguity of the T. californicus assembly, proximity ligation libraries based on the Chicago and Hi-C methods were prepared by Dovetail Genomics as described previously

In an effort to fill assembly gaps, 1.68 million PacBio reads were obtained for the San Diego population and used to improve the final HiRise assembly using PBJellySupplementary Note) and removing those that appeared to be bacterial. Over 1.5 million PacBio reads with a mean read length of 5,470 bp remained. Next, these filtered reads were mapped to the HiRise assembly using BLASR via PBJelly, and gaps were filled or ends extended for regions for which there was a depth of at least ten PacBio reads. Overfilled gaps (gaps that were larger than the estimated size in the HiRise assembly) were set to an arbitrary size of 51 bp when they could not be completely filled with this set of PacBio reads at this minimum coverage threshold of 10. The resulting assembly then went through two more rounds of gap filling and end extension using this procedure, yielding our current T. californicus assembly, SDv2.1. These three rounds of PBJelly dropped the percentage of Ns from 4.67 to 1.97%, decreased the number of contigs in scaffolds from 7,105 to 4,789, and increased the overall size of the assembly by 4.7 megabase pairs (Mb) while increasing the sequence in contigs by 9.7 Mb.

Genome annotation

We used the pipeline in MAKER2 (ref. T. californicus assembly. First, we generated a de novo repeat library for T. californicus using the programme RepeatModelerT. californicus transcriptsT. californicus from the same populationT. californicus RNA-Seq reads and transcripts, and using the Benchmarking Universal Single-Copy Orthologs approach

Functional annotation of gene models was performed by BLASTP searches of the UniProt/Swiss-Prot database, followed by assignment of Gene Ontology terms and identification of protein motifs and domains from InterProScanSupplementary Note.

Orthology among arthropod models

We employed clustering algorithms to determine orthology relationships among T. californicus and metazoan model genomes, and to detect possible gene family expansions in the copepod. In OrthoVennDrosophila melanogaster and Apis mellifera), and a branchiopod crustacean (D. pulex) (Supplementary Table 12). To assess orthology among many taxa (Supplementary Table 13), we used OrthoMCLT. californicus proteins to pre-clustered orthologous groups curated in OrthoMCL-DB version 5 (ref. –5 and 1.5, respectively).

Sequencing and assembly of additional T. californicus populations

Copepods were collected from intertidal pools in San Roque (27° 11′ 12′′ N, 114° 23′ 52′′ W), Bird Rock (32° 48′ 54′′ N, 117° 16′ 23′′ W), Abalone Cove (33° 44′ 16′′ N, 118° 22′ 31′′ W), Catalina Island (33° 26.8′ N, 118° 28.6′ W), Santa Cruz (36° 56′ 58′′ N, 122° 02′ 49′′ W), Bodega Bay (38° 19′ 4′′ N, 123° 4′ 23′′ W) and Friday Harbor (48° 32′ 47′′ N, 123° 0′ 35′′ W). Animals were maintained as described previously. For each population, 250–300 adult individuals were pooled and DNA was isolated using the Qiagen DNeasy Blood and Tissue Kit or a phenol–chloroform procedure

Samples were prepared and sequenced by the University of North Carolina High Throughput Genomic Sequencing Facility as 100-bp paired-end libraries on the Illumina HiSeq 2000. Reads were trimmed for quality using PoPoolation

Assembly and annotation of mitochondrial genomes

Reads of mitochondrial origin were identified by mapping the Illumina samples above to the geographically nearest published Tigriopus mitochondrial genomeT. californicus mitochondrial genomes

Intraspecific evolution of nuclear coding regions

For each of the eight T. californicus populations, we used the ‘extractfeat’ script within GenomeTools

We used the programme PRANK-codon5). Poorly aligned regions were then removed with Gblocks version 0.91b (ref. dN/dS using the model M0 in the programme CODEML within PAML version 4.7 (ref. dN/dS estimates to test for general patterns consistent with the hypothesis of compensatory evolution. Using the RNA-Seq mapping performed above, we quantified transcription levels across the San Diego genes, and used ANCOVAs to account for expression in the hypothesis tests above.

We screened the genome for signals of positive selection on amino acid changes by employing codon-level evolutionary models of nucleotide substitution implemented in CODEML. We tested the ‘sites’ hypothesis by applying models M7 and M8 on each coding sequence alignment, and we compared the resulting log-likelihood ratios with a chi-squared distribution with two degrees of freedomSupplementary Note.

Molecular evolution and population genetic analysis of mtDNA

Multiple sequence alignment of each mtDNA gene was done in MEGA6 (ref. π, weighted for gene length, for each protein-coding gene using PoPoolation

Patterns of positive selection across the genus Tigriopus

We downloaded the annotated gene set of T. kingsejongensisT. japonicus (accession numbers SAMN05933003 and SAMN05933004) generated previouslyT. japonicus using TrinityTigriopus species, including T. japonicus, T. kingsejongensis, our reference T. californicus from San Diego and the reproductively isolated T. californicus from San Roque. Therefore, the orthologues identified above were aligned across the four taxa, then quality-trimmed with PRANKdS for all genes and discarded those with dS > 5 in order to reduce the influence of site saturation or poor alignments. The final set of alignments contained 4,863 genes.

We concatenated 20 randomly picked alignments and estimated a maximum likelihood phylogeny in MEGA6 (ref. dN/dS to vary among sites and among branches, and tested for evidence of positive selection on each of the 4,863 aligned coding sequences along each internal branch and tip. For each gene and each foreground branch, we calculated a log-likelihood ratio between a model allowing for positive selection and a null model in which dN/dS is fixed at 1, and these ratios were compared with a chi-squared distribution with one degree of freedom. After the analysis, we applied an FDR correction for multiple testing

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Raw Illumina and PacBio DNA sequencing reads have been deposited in the NCBI Sequence Read Archive database (San Diego reference: SRX469409–SRX469413; additional populations: SRX2746698–SRX2746704; PacBio reads for the San Diego reference: SRX3778522–SRX3778523). The T. californicus annotated genome assembly was deposited in the i5k Workspace of the National Agricultural Library (US Department of Agriculture), where it can be browsed, searched and downloaded: https://i5k.nal.usda.gov/Tigriopus_californicus.