Knowledge Base Linked Resources

GnomAD

GnomAD is a large-scale resource that aggregates and harmonises sequencing data from diverse cohorts to provide a comprehensive view of human genetic variation. We used the v4 dataset which comprises genomics data from 807,162 individuals. Beyond its extensive catalogue of variant calls, including over 786 million single nucleotide variants and 122 million InDels, the database offers detailed annotations that support analyses of population-specific allele frequencies.

Citations:

  • Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434-443. doi:10.1038/s41586-020-2308-7

OMIM - Online Mendelian Inheritance in Man

An online catalogue of human genes and genetic disorders, OMIM provides comprehensive information on gene-disease relationships, genetic mechanisms, and phenotype descriptions. Curated by the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University, this resource is essential for clinicians, researchers, and geneticists. An Online Catalog of Human Genes and Genetic Disorders. Online Mendelian Inheritance in Man, OMIM. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD) https://omim.org/.

AlphaFold and AlphaMissense

We link to access detailed data for your gene - including information on gene structure, function with AlphaFold and AlphaMissense pathogenicity predictions. This page provides a view:

  • Gene Details: Basic gene data and annotations.
  • Protein Structure: 3D coordinates, per-residue confidence (pLDDT) and Predicted Aligned Error (PAE) for assessing domain packing.
  • Function: Annotations on protein function and regions of interest.
  • AlphaMissense Predictions: AI-driven scoring categorising missense mutations as likely pathogenic, likely benign or ambiguous, helping to highlight potentially disease-causing variants.

Citations:

ClinVar: A Public Archive of Human Variation

ClinVar provides detailed classifications (e.g., “Pathogenic,” “Likely pathogenic,” “Benign,” etc.) along with supporting evidence and review status. It aggregates submissions from multiple sources to present consensus and conflicting interpretations, maps variants to reference sequences per HGVS standards, and collaborates with expert panels like ClinGen for continual re-evaluation. Data is accessible via the website, FTP, and APIs for diverse clinical and research applications.

Citations:

  • Landrum MJ, Chitipiralla S, Kaur K, et al. ClinVar: updates to support classifications of both germline and somatic variants. Nucleic Acids Res. 2024 Nov 23:gkae1090. doi: 10.1093/nar/gkae1090.

dbNSFP

For the research and testing phase we also used dbNSFP, a comprehensive database designed for the functional prediction and annotation of all potential non-synonymous single-nucleotide variants (nsSNVs) and splicing-site SNVs in human protein-coding genes. It includes over 120 million variant entries and aggregates prediction scores from 33 different sources and allele frequencies from major population datasets.

Citations:

  • Xiaoming Liu, Chang Li, Chengcheng Mou, Yibo Dong, and Yicheng Tu. dbNSFP v4: a comprehensive database of transcript-specific functional pre- dictions and annotations for human nonsynonymous and splice-site SNVs. Genome Medicine, 12(1):103, December 2020. ISSN 1756-994X. doi:10.1186/s13073-020-00803-9