BnaScope · Data Notes

Data Notes

Name: BnaScope: A Micro-scope for Brassica napus Genomics
Creator: Huazhong Agricultural University

BnaScope integrates 22 Brassica napus assemblies (~2.1M genes, 3.4M transcripts).

22 varieties 2.1M genes · 3.4M transcripts

Data Overview

Genome Assemblies and Upgrades
BnaScope integrates 22 reference-grade genome assemblies of Brassica napus. The cornerstone of this database is the upgrade of the ZS11 and Westar genomes to near-complete, gap-free versions (v1), providing an unprecedented resolution of the rapeseed genetic landscape (Figure 1).

Assembly Strategy & Quality Control
- Sequencing Technology: A hybrid assembly approach was utilized, combining Oxford Nanopore Technology (ONT) ultra-long reads (ZS11: ~175× coverage, 175.76 GB; Westar: ~105× coverage, 105.98 GB) with high-fidelity PacBio HiFi data.
- Gap Closure: Using previous v0 assemblies as backbones, the v1 versions achieved a gap-free status. This process anchored an additional 63-75 Mb of previously "dark" sequences to chromosomes.
- T2T Achievement: By searching for telomere-specific tandem repeats, we confirmed that 15 chromosomes in both ZS11 v1 and Westar v1 achieved Telomere-to-Telomere (T2T) assembly.
Figure 1. Comparison of different versions for ZS11 and Westar. A-B. Distribution of telomeres and gaps in the ZS11 v1 and Westar v1 genomes. (A) is the ZS11 v1 genomes. (B) is the Westar v1 genomes. Green highlights the chromosomes that have achieved Telomere-to-Telomere (T2T) continuity. C. Chromosomal anchoring rates across different assembly versions. Grey indicates the proportion of unanchored sequences, while orange and blue represent the proportions of sequences anchored to chromosomes in the v1 and v0 genomes, respectively. D. Synteny analysis between the different assembly versions of ZS11 and Westar. Grey shaded areas indicate the centromeric regions in the v1 genomes.

Table 1. Summary statistics of ONT sequencing data.
Comprehensive Gene Annotation & Discovery
The transition from v0 to v1 has led to a substantial increase in the identification of functional elements that were previously hidden in unassembled gaps.
- Improved Annotation Density: The refined assembly allowed for the annotation of thousands of new gene models, particularly in repetitive regions.
- Functional Enrichment: Newly discovered genes are significantly enriched in pathways related to energy metabolism and photosynthetic electron transport.
- Standardized Nomenclature: To ensure cross-cultivar compatibility, BnaScope utilizes a unified naming convention:
Bna[Chr]G[Number][Suffix]
- Expression Validation: Integrated transcriptomic data confirms that a high proportion of these newly annotated genes (up to 47%) are actively expressed.
Figure 2. Comprehensive benchmarking of assembly and annotation quality between the newly assembled ZS11 and Westar genomes and previously published versions. A. Summary statistics comparing the new ZS11 v1 and Westar v1 genomes against previously released versions (ZS11 v0, v2, v10, and Westar v0). B. Comparison of RNA-seq mapping rates to cDNA (quantified via Salmon) across different genome versions for ZS11 (v0, v2, v10, v1) and Westar (v0, v1).
Integration of the 22 Genome Assemblies
BnaScope serves as a centralized hub for the rapeseed research community by integrating 22 representative genome assemblies of Brassica napus. This collection encompasses eight genomes assembled in this study and 14 additional published genomes, including diverse accessions such as Da-Ae, Tapidor, Darmor, Express617, Gangan, No2127, P130, P202, Quinta, Shengli, GH06, ZY821, Xiaoyun, and Zheyou7.

Standardized and High-Quality Re-annotation

To overcome the challenges of inconsistent annotation standards across multiple genome versions, we implemented a uniform re-annotation pipeline for all 22 genomes.
- Consistent Workflow: All 14 previously published genomes, including those lacking prior annotations (e.g., GH06 and ZY821), were re-annotated using the same high-precision pipeline employed for our de novo assemblies.
- Superior Quality: BUSCO evaluation of the newly annotated protein sequences yielded completeness scores ranging from 98.45% to 99.31%, representing an improvement of 0.13% to 4.43% over previously public versions.
- Gene Density: The pipeline identified between 98,474 and 102,295 genes per genome.
- Conservation Metrics: OMArk analysis confirmed consistently high proportions of conserved genes (98.98%–99.37%) and consistent genes (95.21%–95.57%) across the entire dataset.
Pan-Genome and Core Genome Analysis

BnaScope provides a robust framework for exploring the pan-genome architecture of B. napus, revealing the complex composition of gene families across different accessions.
- Gene Family Composition: Analysis of the 22 genomes categorized gene families into:
  - Core: Found in all 22 accessions (23.20%, ~28,602 gene families).
  - Softcore: Present in most but not all accessions (16.50%, ~20,334 gene families).
  - Dispensable: Present in a subset of accessions (50.50%, ~62,157 gene families).
  - Private: Unique to a single accession (9.76%, ~12,014 gene families).
- Evolutionary Constraints: Distribution of Ka/Ks ratios indicates that core genes undergo the strongest purifying selection, while dispensable and private genes exhibit higher evolutionary rates.
- Functional Divergence: GO enrichment analysis highlights that core gene families are significantly enriched in essential biological processes like RNA processing and hydrolase activity, whereas dispensable/private families are often associated with response to auxin and acyltransferase activity.

Annotation Methods

Annotation Methods
To provide a high-quality, standardized functional interface, BnaScope utilizes a rigorous annotation pipeline for all 22 integrated genome assemblies. This ensures that researchers can accurately dissect functional divergence between subgenomes and identify key agronomic genes.

1. Unified Structural Annotation Pipeline

To resolve the issues of inconsistent annotation standards across various published versions, we implemented a uniform structural annotation pipeline.
- Standardized Nomenclature: All 22 collected reference genomes follow a unified naming convention: Bna[Chr]G[Number][Suffix].
- High-Precision Models: We identified between 98,474 and 102,295 genes per genome using this uniform pipeline.
- Quality Validation: Annotation quality was validated using BUSCO, yielding completeness scores of 98.45% to 99.31%, representing an improvement of up to 4.43% over previous versions.
- Conservation Analysis: OMArk analysis confirmed that the new annotations are of superior quality, with conserved gene proportions ranging from 98.98% to 99.37%.
2. Functional Annotation and Subcellular Prediction

For key variety assemblies such as ZS11, Westar, and Darmor, we performed deep functional characterization:
- Protein Domain Annotation: We systematically annotated 463,895 protein sequences using InterProScan v5.76-107.0, covering databases such as Pfam, PANTHER, CDD, and ProSite.
- GO terms Assignment: Genes were assigned GO terms identifiers via Interproscan v5.76-107.0.
- KEGG KO Assignment: Genes were assigned KEGG Orthology (KO) identifiers via KofamScan.
- Subcellular Localization: We predicted protein localization using the language model-based DeepLoc v2.1 and signal peptides using SignalP v6.0.
3. Homoeolog Cluster and Functional Mapping

Addressing the genomic complexity of B. napus (allotetraploid, AACC), BnaScope provides specialized homoeolog evidence:
- Homoeologous Gene Clusters: We established BnaOG clusters across all 22 genomes using OrthoFinder to enable cross-variety comparison.
- Arabidopsis-based Functional Mapping: Protein sequences from all genomes were aligned to the Arabidopsis Araport11 database. Rapeseed copies aligning to the same Arabidopsis gene (e.g., FLC) are defined as functional homoeologs.
4. Integration of Multi-omics Evidence

For the ZS11 v1 reference, the platform integrates diverse omics layers to provide transcriptomic and regulatory evidence:
- Tissue-Specific Expression: We aligned 273 RNA-seq datasets (91 tissues) to construct expression patterns, providing direct transcriptomic evidence for gene function.
- Regulatory Landscapes: Multi-omics data including ATAC-seq (chromatin accessibility) and ChIP-seq (histone modifications) were aggregated to help dissect cis-regulatory mechanisms.
- Transposable Elements (TEs): TEs were annotated using the EDTA pipeline to facilitate exploration of their regulatory effects.

Data Sources

Content coming soon.

Data Notes

Data Overview

Standardized and High-Quality Re-annotation

Pan-Genome and Core Genome Analysis

Annotation Methods

1. Unified Structural Annotation Pipeline

2. Functional Annotation and Subcellular Prediction

3. Homoeolog Cluster and Functional Mapping

4. Integration of Multi-omics Evidence

Data Sources