DisGeNET Database Information

The DisGeNET database integrates human gene-disease associations (GDAs) from various expert curated databases and text-mining derived associations including Mendelian, complex and environmental diseases (Piñero et al., 2015; Bauer-Mehren et al, 2011). The integration is performed by means of gene and disease vocabulary mapping and by using the DisGeNET association type ontology. For a detailed information of the methodology, see the original publications Piñero et al, 2015, Bauer-Mehren et al, 2011 and Bauer-Mehren et al, 2010.

1. Original Data Sources

The data in DisGeNET is organized according to type and level of curation:

  • CURATED: GDAs from UniProt, ClinVar, Orphanet, the GWAS Catalog, and CTD (human data)
  • PREDICTED: GDAs from RGD, MGD, and CTD (mouse and rat data)
  • ALL: GDAs from all previous sources and from GAD, LHGDN and BeFree

Curated Data

    UNIPROT: UniProt/SwissProt is a database containing curated information about protein sequence, structure and function (The UniProt Consortium, 2014). Disease associated proteins were obtained from the ‘humsavar’ file, along with the dbSNP identifier(s) associated to the disease. UniProt GDAs are assigned to the type ‘Genetic Variation’ from the ‘DisGeNET association type ontology’.

    CTDTM: The Comparative Toxicogenomics DatabaseTM contains manually curated information about gene-disease relationships with focus on understanding the effects of environmental chemicals on human health (Davis, et al., 2014). GDAs obtained from CTD are classified as ‘Biomarker’ or ‘Therapeutic’ classes from the DisGeNET association type ontology, according to its labeling in the original source (‘Marker’ or ‘Therapeutic’).

    CLINVAR:ClinVar is a freely accessible, public archive of reports of the relationships among medically relevant variants and phenotypes, with supporting evidence (Landrum, et al., 2014).

    ORPHANET: Orphanet: an online rare disease and orphan drug data base (© INSERM 1997) is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet’s aim is to help improve the diagnosis, care and treatment of patients with rare diseases. Orphanet Data was accessed on January 25, 2016.

    GWAS CATALOG: The NHGRI-EBI GWAS Catalog is a quality controlled, manually curated, literature-derived collection of all published genome-wide association studies assaying at least 100,000 SNPs and all SNP-trait associations with p-values < 1.0 x 10-5 (Hindorff et al., 2009; Welter et al., 2014).

Predicted Data

    CTDTM: CTD data containing Rattus Norvergicus and Mus Musculus gene-disease associations

    MGD: The Mouse Genome Database is the international community resource for integrated genetic, genomic and biological data about the laboratory mouse (Mouse Genome Database Group, 2014). MGD provides full annotation of phenotypes and human disease associations for mouse models (genotypes) using terms from the Mammalian Phenotype Ontology and disease names from OMIM®. GDAs obtained from MGD are assigned to the association type class ‘Biomarker’ from the DisGeNET association type ontology.

    RGD: The Rat Genome Database is a collaborative effort between leading research institutions involved in rat genetic and genomic research (Shimoyama, et al., 2015). We did not include the associations labeled as ‘resistance’, ‘induced’ or ‘no association’, nor the ones annotated with the following evidence codes ‘Inferred from electronic annotation’, ‘Inferred from sequence or structural similarity’ and ‘Non-traceable author statement’. GDAs obtained from RGD assigned to the association type class ‘Biomarker’ from the DisGeNET association type ontology, except for those labeled as ‘treatment’, which are classified as ‘Therapeutic’.

Literature Data

    GAD: The Genetic Association Database is an archive of human genetic association studies of complex diseases. GAD is primarily focused on archiving information on common complex human disease rather than rare Mendelian disorders as found in the OMIM® (Becker, et al., 2004). It includes curated summary data extracted from published papers in peer reviewed journals on candidate gene and Genome Wide Association Studies (GWAS). GDAs obtained from GAD are assigned the association type ‘Genetic Variation’ from the DisGeNET association type ontology.

    LHGDN: The literature-derived human gene-disease network (LHGDN) is a text mining derived database with focus on extracting and classifying gene-disease associations with respect to several biomolecular conditions. It uses a machine learning based algorithm to extract semantic gene-disease relations from a textual source of interest. The semantic gene-disease relations were extracted with F-measures of 78 (see (Bundschus et al, 2008) for further details). More specifically, the textual source utilized here originates from Entrez Gene's GeneRIF (Gene Reference Into Function) database (Mitchell, et al., 2003). LHGDN was created based on a GeneRIF version from March 31st, 2009, consisting of 414241 phrases. These phrases were further restricted to the organism Homo sapiens, which resulted in a total of 178004 phrases. We extracted all data from LHGDN and classified the original associations using the DisGeNET association type ontology. LHGDN GDAs has been annotated as ‘Biomarker’, ‘Genetic Variation’, ‘PostTranslational Modification’ or ‘Altered Expression’.

    BeFree Data: We extracted gene-disease associations from MEDLINE abstract using the BeFree system. BeFree is composed of a Biomedical Named Entity Recognition (BioNER) module to detect diseases and genes (Bravo et al., 2014) and a relation extraction module based on morphosyntactic information (Bravo et al., 2015). The document set used to extract the gene-disease associations was defined by the following PubMed query:

    ("Psychiatry and Psychology Category"[Mesh] AND "genetics"[Subheading]) OR ("Diseases Category"[Mesh] AND "genetics"[Subheading]) AND (hasabstract[text] AND ("1980"[PDAT] : "2016"[PDAT]) AND "humans"[MeSH Terms] AND English[lang])

    BeFree GDAs are classified as ‘Biomarker’, ‘Genetic Variation’, ‘PostTranslational Modification’ or ‘Altered Expression’.

    After this processing, we have removed negative associations using regular expression approaches. Additionally, we have detected some text mining errors, and we have removed them.

Variant Data

The variants in DisGeNET are obtained from ClinVar, the GWAS Catalog, Uniprot, GAD, and from BeFree data. For BeFree data, we apply SETH (Thomas, et al., 2016), a tool for the recognition of variations (SNPs) from text and their subsequent normalization to dbSNP, on the sentences describing the GDA. The tool assigns dbSNP identifiers corresponding to NCBI dbSNP Build 137 to the extracted variants. Additionally, variants in DisGeNET are annotated with data from:

    dbSNP: The NCBI Short Genetic Variations database catalogs short variations in nucleotide sequences from a wide range of organisms (Sherry, et al., 2001). From dbSNP, we obtained the chromosome, and position in the chromosome of the variant. The data was retrieved in May, 2016 (corresponding to NCBI dbSNP Human Build 141).

    EXAC: The Exome Aggregation Consortium aggregates and harmonizes exome sequencing data from a variety of large-scale sequencing projects (Exome Aggregation Consortium, 2016) The data contains information for 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. We downloaded the VCF for release 0.3 (updated 10-29-2014).

    1000 Genomes Project: The 1000 Genomes Project is a public catalogue of human variation and genotype data (1000 Genomes Project Consortium, 2015). We downloaded the data corresponding to the Phase 3 of the project containing information on 2504 individuals from 26 populations. For more information, please visit the 1000 Genomes portal.

    Ensembl: The Ensembl Project creates tools and data resources to facilitate genomic analysis in several species, with an emphasis on human (Yates, et al., 2016). The Ensembl Variant Effect Predictor determines the effect of a variant, or a list of variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions (McLaren, et al, 2016). We use the the Ensembl API (release 4.5) to obtain the most severe consequence type of the SNP. For more information on the effects that each allele of the variant may have on a particular transcript, check the ENSEMBL documentation.

2. Database Statistics

The current version of DisGeNET (v4.0) contains 429,036 associations, between 17,381 genes and 15,093 diseases and phenotypes. In the table below, the distribution of gene, disease and unique associations provided by each source. Curated data correspond to associations from CTD (human data), UniProt, ClinVar, Orphanet, and the GWAS Catalog. Predicted are the associations from RGD, MGD, and CTD rat and mouse. Literature-based data is composed of associations coming from LHGDN and BeFree datasets, and from GAD. See further details on Original data sources section.

Source Genes Diseases* GDAs
CTD human 7,690 4,893 25,106
UNIPROT 2,315 3,016 3,262
CLINVAR 2,921 4,127 5,237
ORPHANET 2,639 2,589 4,266
GWAS Catalog 2,113 285 3149
CURATED 9,362 7,607 32,834
CTD mouse 53 103 145
CTD rat 11 13 16
MGD 1,390 1,257 1,890
RGD 1,777 826 8,247
PREDICTED 2,743 2,064 10,264
GAD 9,280 2,753 61,861
LHGDN 5,949 1,807 31,731
BeFree 14,416 10,757 354,386
LITERATURE 16,141 11,447 403,925
ALL 17,381 15,093 429,036

* Diseases, phenotypes, and disease groups. See more information here

Distribution of clinical concepts, genes, and SNPs annotations according to the DisGeNET disease type.

Number of clinical concepts Number of associated genes Number of associated SNPs
disease 13,418 16,418 43,122
disease group 309 7,708 3,838
phenotype 1,366 10,614 2,891



Venn diagram representing the overlaps among the different types of global sources (curated, predicted and text mining).

3. DisGeNET Score

We have developed a score to rank the gene-disease associations according to their level of evidence. DisGeNET gene-disease association score takes into account the number and type of sources (level of curation, model organisms), and the number of publications supporting the association. The score ranges from 0 to 1 and it is computed according to:

explaining terms to compute the score

where:
explaining CURATED term explaining CURATED term
explaining MODEL term explaining MODEL term
explaining literature term explaining literature term

    where:

      ngdk is the number of publication supporting a GDA in the source k

      NLK is the total number of publications in the source k



Distribution of the DisGeNET score according to the number of sources reporting the association

distribution of the score according to the number of sources

    In the graph, we show a boxplot of the distribution of the score of the GDAs versus the number of sources reporting the GDA. The number in the boxplot corresponds to the number of GDAs in each category. For example, only 4 GDAs in DisGeNET are supported by 9 sources, 63 by 8 and so on.

4. Disease Specificity Index


There are genes that are associated to multiple diseases (e.g. TNF) while other genes are associated to a small set of diseases or even to a single disease. The Disease Specificity Index (DSI) is a measure of this property of the genes. It reflects if a gene is associated to several or fewer diseases. It is computed according to:

explaining Specificity Index

where:
    - Nd is the number of diseases associated to the gene
    - NT is the total number of diseases in DisGeNET (13,674)

The DSI ranges from 0 to 1.

DSI = 0 implies that the gene is associated only to phenotypes.

Example: TNF, associated to more than 1,500 diseases, has a DSI of 0.247, while IDH3A is associated to one disease, with a DSI of 1.

5. Disease Pleiotropy Index


The rationale is similar than for the DSI, but we consider if the multiple diseases associated to the gene are similar among them (belong to the same MeSH disease class, e.g. Cardiovascular Diseases) or are completely different diseases and belong to different disease classes. The Disease Pleiotropy Index (DPI) is computed according to:

explaining Pleiotropy Index

where:
    - Ndc is the number of the different MeSH disease classes of the diseases associated to the gene
    - NTC is the total number of MeSH diseases classes in DisGeNET (27)

The DPI ranges from 0 to 1.

DPI = 0 implies that the gene is associated only to phenotypes, or that the associated diseases do not map to any MeSH classes.


Example: gene KCNE2 is associated to 38 diseases and 10 phenotypes. 36 out of the 38 diseases have a MeSH disease class. The 36 diseases are associated to 10 different MeSH classes. The DPI index for KCNE2 = 10/27*100 ~ 0.37. Nevertheless, gene APOE, associated to more than 700 diseases, of different disease classes, has a DPI of 1.

6. Vocabulary Mapping


Diseases:

The vocabulary used for diseases in the current release of DisGeNET is the Unified Medical Language System® (UMLS®) vocabulary. The repositories of gene-disease associations use different disease vocabularies, OMIM® terms for diseases from UniProt, CTDTM, and MGD; MeSH terms used by CTDTM, LHGDN, and RGD, UMLS® Concept Unique Identifiers (CUIs) from CLINVAR; Orphanet identifiers are mapped using Orphanet cross-references. Disease names from GAD and the GWAS Catalog are normalized using the UMLS Metathesaurus. We also used UMLS® Metathesaurus® concept structure to map MIM and MeSH terms to UMLS® CUIs.

Genes:

For human genes, HGNC symbols (used for some entries in GAD), and Uniprot accession numbers (used by Uniprot) are converted to NCBI Entrez gene identifiers using an in house dictionary that crossreferences HGNC, Uniprot and NCBI-Gene information. For mapping of mouse and rat genes, we used files ftp://ftp.informatics.jax.org/pub/reports/HOM_MouseHumanSequence.rpt, and ftp://rgd.mcw.edu/pub/data_release/RGD_ORTHOLOGS.txt both with information of orthology from MGD and RGD, respectively to map rat and mouse Entrez gene identifiers to human Entrez identifiers. We discarded the relationships when a human ortholog of the mouse or rat gene could not be found.


7. The DisGeNET Association Type Ontology


For a seamless integration of gene-disease association data, we developed the DisGeNET association type ontology. All association types as found in the original source databases are formally structured from a parent GeneDiseaseAssociation class if there is a relationship between the gene/protein and the disease, and represented as ontological classes. The DisGeNET association type ontology is depicted below.



The description of each association type in our ontology is:

  • Therapeutic: This relationship indicates that the gene/protein has a therapeutic role in the amelioration of the disease.
  • Biomarker: This relationship indicates that the gene/protein either plays a role in the etiology of the disease (e.g. participates in the molecular mechanism that leads to disease) or is a biomarker for a disease.
  • Genomic Alterations: This relationship indicates that a genomic alteration is linked to the gene associated with the disease phenotype.
  • GeneticVariation: This relationship indicates that a sequence variation (a mutation, a SNP) is associated with the disease phenotype, but there is still no evidence to say that the variation causes the disease.
  • Causal Mutation: This relationship indicates that there are allelic variants or mutations known to cause the disease.
  • Germline Causal Mutation: This relationship indicates that there are germline allelic variants or mutations known to cause the disease, and they may be passed on to offspring.
  • Somatic Causal Mutation: This relationship indicates that there are somatic allelic variants or mutations known to cause the disease, but they may not be passed on to offspring.
  • Chromosomal Rearrangement: This relationship indicates that a gene is included in a chromosomal rearrangement associated with a particular manifestation of the disease.
  • Fusion Gene: This relationship indicates that the fusion between two different genes (between promoter and/or other coding DNA regions) is associated with the disease.
  • Susceptibility Mutation: This relationship indicates that a gene mutation in a germ cell that predisposes to the development of a disorder, and that is necessary but not sufficient for the manifestation of the disease.
  • Modifying Mutation: This relationship indicates that a gene mutation is known to modify the clinical presentation of the disease.
  • Germline Modifying Mutation: This relationship indicates that a germline gene mutation modifies the clinical presentation of the disease, and it may be passed on to offspring.
  • Somatic Modifying Mutation: This relationship indicates that a somatic gene mutation modifies the clinical presentation of the disease, but it may not be passed on to offspring.
  • AlteredExpression: This relationship indicates that an altered expression of the gene is associated with the disease phenotype.
  • Post-translational Modification: This relationship indicates that alterations in the function of the protein by means of post-translational modifications (methylation or phosphorylation of the protein) are associated with the disease phenotype.

The labels from the original sources are mapped to DisGeNET Gene-Disease Ontology according to:

Association Type Original Source Label
Altered Expression BeFree, LHGDN
Biomarker BeFree, CTD (marker/mechanism), LHGDN, MGD, and RGD
Causal Mutation CLINVAR (Pathogenic)
Chromosomal Rearrangement ORPHANET (Role in the phenotype of)
Fusion Gene ORPHANET (Part of a fusion gene in)
Genetic Variation BeFree, CLINVAR (Affects, Likely pathogenic), GAD, GWASCAT, LHGDN, ORPHANET (Candidate gene tested in), UNIPROT
Germline Causal Mutation ORPHANET (Disease-causing germline mutation(s) in, Disease-causing germline mutation(s) (gain of function) in, Disease-causing germline mutation(s) (loss of function) in)
Germline Modifying Mutation ORPHANET (Modifying germline mutation in)
Modifying Mutation RGD (severity, disease_progression, onset)
PostTranslational Modification BeFree, LHGDN
Somatic Causal Mutation ORPHANET (Disease-causing somatic mutation(s) in)
Somatic Modifying Mutation ORPHANET (Modifying somatic mutation in)
Susceptibility Mutation CLINVAR (confers sensitivity, risk factor), ORPHANET (Major susceptibility factor in), susceptibility (RGD)
Therapeutic CTD (therapeutic), RGD (treatment)

8. Data attributes

In order to ease the interpretation and analysis of gene-disease associations, we provide the following information for the data.

Diseases:

  • the disease name, provided by the UMLS® Metathesaurus®
  • the UMLS® semantic types
  • the MeSH class: We classify the diseases according the MeSH hierarchy using 23 upper level concepts of the MeSH tree branch C (Diseases) plus three concepts of the F branch (Psychiatry and Psychology: "Behavior and Behavior Mechanisms", "Psychological Phenomena and Processes", and "Mental Disorders").
  • The top level concepts from the Human Disease Ontology.
  • The DisGeNET disease type: disease, phenotype and group.

    We consider a disease entries mapping to the following UMLS® semantic types:

      - Disease or Syndrome
      - Neoplastic Process
      - Acquired Abnormality
      - Anatomical Abnormality
      - Congenital Abnormality
      - Mental or Behavioral Dysfunction

    We consider a phenotype entries mapping to the following UMLS® semantic types:

      - Pathologic Function
      - Sign or Symptom
      - Finding
      - Laboratory or Test Result
      - Individual Behavior
      - Clinical Attribute
      - Organism Attribute
      - Organism Function
      - Organ or Tissue Function
      - Cell or Molecular Dysfunction

    These classifications were manually checked. In addition, disease entries referring to disease groups such as "Cardiovascular Diseases", "Autoimmune Diseases", "Neurodegenerative Diseases, and "Lung Neoplasms" were classified as disease group .

    Additionally, we have removed terms considered as diseases by other sources, but are not strictly diseases, such as terms belonging to the following UMLS® semantic types:

      - Gene or Genome
      - Genetic Function
      - Immunologic Factor
      - Injury or Poisoning

    These attributes are shown in the different views of the browser, and they are all shown in the Disease Tab .

    Genes:


    Variants:

    • The position in the chromosome
    • The reference and alternative alleles
    • The class of the variant: SNP, deletion, insertion, indel, somatic SNV, substitution, sequence alteration, and tandem repeat
    • The allelic frequency according to the 1000 Genomes Project
    • The allelic frequency according to the Exome Aggregation Consortium
    • The most severe consequence type according to the VEP
    • Links to dbSNP
    • Links to ClinVar
    • Links to Ensembl

    Gene-disease associations

    • the DisGeNET score
    • the DisGeNET Gene-Disease Association Type
    • the publication(s) that reports the gene-disease association, with the Pubmed Identifier
    • a representative sentence from the publication describing the association between the gene and the disease (If a representative sentence is not found, we provide the title of the paper)
    • the original source reporting the Gene-Disease Association
    • For some sources, we provide the variant(s) associated to the gene-disease association

9. DisGeNET presentations

  • Slides of the DisGeNET tutorial at the ECCB 2016 in The Hague, Netherlands (September, 2016)
  • Poster of disgenet2r, an R package to explore the molecular underpinnings of human diseases at JBI in Valencia, Spain (May, 2016)
  • Poster of DisGeNET at the conference Linking Life Science Data: Design to Implementation, and Beyond, Vienna, Austria (February, 2016)
  • Slides of the DisGeNET presentation at the BioHackathon 2015 in Nagasaki, Japan (September, 2015)
  • Slides of the DisGeNET tutorial at the SWAT4LS 2015 in Cambridge, UK (December, 2015)
  • Slides of the DisGeNET presentation at the Big Data in Biomedicine debate. Barcelona, Spain (November, 2014 )

10. Papers citing DisGeNET

  1. Allele, phenotype and disease data at Mouse Genome Informatics: improving access and analysis. Bello, S. M., Smith, C. L., & Eppig, J. T. Mammalian Genome (2015) doi: 10.1007/s00335-015-9582-y
  2. Interoperability of text corpus annotations with the semantic web. Verspoor, K., Kim, J. D., & Dumontier, M. BMC Proceedings (2015) doi:10.1186/1753-6561-9-S5-A2
  3. How to build personalised multi-omics comorbidity profiles Moni MA, & Lio P. Front. Cell Dev. Biol. (2015) doi: 10.3389/fcell.2015.00028
  4. Integrating proteomics profiling data sets: a network perspective. Bhat A, Dakna M, Mischak H. Methods Mol Biol. (2015) 10.1007/978-1-4939-1872-0_14
  5. Network Analysis in the Investigation of Chronic Respiratory Diseases. From Basics to Application Diez D, Agustí A, and Wheelock CE. American Journal of Respiratory and Critical Care Medicine, (2014) doi: 10.1164/rccm.201403-0421PP
  6. Associating disease-related genetic variants in intergenic regions to the genes they impact. Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. PeerJ. (2014) 10.7717/peerj.639
  7. Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Frantzi M, Bhat A, Latosinska A. Clin Transl Med. (2014) 10.1186/2001-1326-3-7
  8. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Csermely P, Korcsmáros T, Kiss HJ, London G, Nussinov R. Pharmacol Ther. (2013) 10.1016/j.pharmthera.2013.01.016
  9. ChemProt-2.0: visual navigation in a disease chemical biology database. Kim Kjærulff S, Wich L, Kringelum J, Jacobsen UP, Kouskoumvekaki I, Audouze K, Lund O, Brunak S, Oprea TI, Taboureau O. Nucleic Acids Res. (2013) 10.1093/nar/gks1166
  10. State of the art in silico tools for the study of signaling pathways in cancer. Villaamil VM, Gallego GA, Cainzos IS, Valladares-Ayerbes M, Antón Aparicio LM. Int J Mol Sci. (2012) 10.3390/ijms13066561
  11. iCTNet: a Cytoscape plugin to produce and analyze integrative complex traits networks. Wang L, Khankhanian P, Baranzini SE, Mousavi P. BMC Bioinformatics (2011) 10.1186/1471-2105-12-380.

Papers using DisGeNET data

  1. Cell type-selective disease-association of genes under high regulatory load Galhardo M, Berninger P, Nguyen T, Sauter T, Sinkkonen L. Nucl. Acids Res. (2015) doi:10.1093/nar/gkv863
  2. Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types Cornish AJ, Filippis I, David Asternberg MJE Genome Medicine (2015) doi:10.1186/s13073-015-0212-9
  3. Dissecting Xuesaitong's mechanisms on preventing stroke based on the microarray and connectivity map Wang L, Yu Y, Yang J, Zhao X, Li Z.Mol Biosyst. (2015) doi:10.1039/c5mb00379b
  4. MicroRNA and Transcription Factor Mediated Regulatory Network Analysis Reveals Critical Regulators and Regulatory Modules in Myocardial Infarction. Zhang G, Shi H, Wang L, Zhou M, Wang Z, Liu X, Cheng L, Li W, & Li X. PLoS One (2015) doi:10.1371/journal.pone.0135339
  5. Insights from Chromosome-Centric Mapping of Disease-Associated Genes: Chromosome 12 Perspective. Jayaram S, Gupta MK, Shivakumar BM, Ghatge M, Sharma A, Vangala RK, & Sirdeshmukh R. J Proteome Res (2015) doi:10.1021/acs.jproteome.5b00488
  6. Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization. Biswas AK, Kang M, Kim DC, Ding CH, Zhang B, Wu X, & Gao JX. (2015) Netw Model Anal Health Inform Bioinforma. (2015) doi:10.1007/s13721-015-0081-6
  7. Genetic mutations associated with status epilepticus. Bhatnagar, M, & Shorvon, S. Epilepsy Behav. (2015) doi:10.1016/j.yebeh.2015.04.013
  8. Novel scripts for improved annotation and selection of variants from whole exome sequencing in cancer research. Hansen MC, Nederby L, Roug A, Villesen P, Kjeldsen E, Nyvold CG, & Hokland P. MethodsX (2015) doi:10.1016/j.mex.2015.03.003
  9. Molecular Architecture of Spinal Cord Injury Protein Interaction Network. Alawieh A, Sabra M, Sabra Z, Tomlinson S, & Zaraket FA PLoS One (2015) doi:10.1371/journal.pone.0135024
  10. A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis. Gilchrist, MJ, Sobral D, Khoueiry P, Daian F, Laporte B, Patrushev I, Matsumoto J, Dewar K, Hastings KEM, Satou Y, Lemairea P & Rothbächer U. Dev Biol. (2015) doi:10.1016/j.ydbio.2015.05.014
  11. Analysis of Deregulated microRNAs and Their Target Genes in Gastric Cancer Juzėnas S, Saltenienė V, Kupcinskas J, Link A, Kiudelis G, Jonaitis G, Jarmalaite S Kupcinskas L, Malfertheiner P, Skieceviciene J PLoS One (2015) doi:10.1371/journal.pone.0132327
  12. Nature and nurture: a case of transcending haematological pre-malignancies in a pair of monozygotic twins adding possible clues on the pathogenesis of B-cell proliferations. Hansen MC, Nyvold CG, Roug AS, Kjeldsen E, Villesen P, Nederby L, Hokland P. Br J Haematol. (2015) doi:10.1111/bjh.13305
  13. Pathway reporter genes define molecular phenotypes of human cells Zhang JD, Küng E, Boess F, Certa U and Ebeling M BMC Genomics (2015) doi:10.1186/s12864-015-1532-2
  14. Global Mapping of Herpesvirus-Host Protein Complexes Reveals a Transcription Strategy for Late Genes. Davis ZH, Verschueren E, Jang GM, Kleffman K, Johnson JR, Park J, Von Dollen J, Maher MC, Johnson T, Newton W, Jäger S, Shales M, Horner J, Hernandez RD, Krogan NJ, Glaunsinger BA Mol Cell. (2015) doi:10.1016/j.molcel.2014.11.026
  15. Integromics network meta-analysis on cardiac aging offers robust multi-layer modular signatures and reveals micronome synergism. Dimitrakopoulou K, Vrahatis AG, and Bezerianos A. BMC Genomics (2015) doi:10.1186/s12864-015-1256-3
  16. Discovery of new candidate genes related to brain development using protein interaction information. Chen L, Chu C, Kong X, Huang T, Cai YD. PLoS One. (2015) 10.1371/journal.pone.0118003
  17. ncRNA-Disease association prediction through tripartite network based inference Alaimo S, Giugno R, & Pulvirenti A Front. Bioeng. Biotechnol. (2014) 10.3389/fbioe.2014.00071
  18. A network approach to clinical intervention in neurodegenerative diseases. Santiago, JA & Potashkin JA. Trends Mol Med. (2014) 10.1016/j.molmed.2014.10.002
  19. Control of VEGF-A transcriptional programs by pausing and genomic compartmentalization. Kaikkonen MU, Niskanen H, Romanoski CE, Kansanen E, Kivelä AM, Laitalainen J, Heinz S, Benner C, Glass CK, Ylä-Herttuala S. Nucleic Acids Res. (2014) 10.1093/nar/gku1036
  20. Network medicine analysis of COPD multimorbidities. Grosdidier S, Ferrer A, Faner R, Piñero J, Roca J, Cosío B, Agustí A, Gea J, Sanz F, Furlong LI. Respir Res. (2014) 10.1186/s12931-014-0111-4
  21. An R-based tool for miRNA data analysis and correlation with clinical ontologies. Cristiano F, Veltri P. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014) 10.1145/2649387.2660847
  22. Using 2-node hypergraph clustering coefficients to analyze disease-gene networks. Renick Gallagher S, Dombrower M, Goldberg DS. Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (2014) 10.1145/2649387.2660817
  23. Organ system heterogeneity DB: a database for the visualization of phenotypes at the organ system level. Mannil D, Vogt I, Prinz J, Campillos M. Nucleic Acids Res. (2014) 10.1093/nar/gku948
  24. Molecularly and clinically related drugs and diseases are enriched in phenotypically similar drug-disease pairs. Vogt I, Prinz J, Campillos M. Genome Med. (2014) 10.1186/s13073-014-0052-z
  25. System-based approaches to decode the molecular links in Parkinson's disease and diabetes. Santiago, JA, Potashkin JA. Neurobiol Dis. (2014) 10.1016/j.nbd.2014.03.019
  26. Prioritizing Disease‐Linked Variants, Genes, and Pathways with an Interactive whole Genome Analysis Pipeline. Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Hum Mutat. (2014) 10.1002/humu.22520
  27. A Computational Framework to Infer Human Disease-Associated Long Noncoding RNAs. Liu MX, Chen X, Chen G, Cui QH, Yan GY. PloS One 9.1 (2014) 10.1371/journal.pone.0084408
  28. Choline protects against cardiac hypertrophy induced by increased after-load. Zhao Y, Wang C, Wu J, Wang Y, Zhu W, Zhang Y, Du Z. Int J Biol Sci. (2013) 10.7150/ijbs.5976
  29. Detection of differentially methylated gene promoters in failing and nonfailing human left ventricle myocardium using computation analysis. Koczor CA, Lee EK, Torres RA, Boyd A, Vega JD, Uppal K, Yuan F, Fields EJ, Samarel AM, Lewis W. Physiol Genomics. (2013) 10.1152/physiolgenomics.00013.2013

  30. Global DNA methylation and transcriptional analyses of human ESC-derived cardiomyocytes. Gu Y, Liu GH, Plongthongkum N, Benner C, Yi F, Qu J, Suzuki K, Yang J, Zhang W, Li M, Montserrat N, Crespo I, Del Sol A, Esteban CR, Zhang K, Belmonte JC. Protein Cell. (2013) 10.1007/s13238-013-0016-x
  31. Integrated analysis of transcript-level regulation of metabolism reveals disease-relevant nodes of the human metabolic network. Galhardo M1, Sinkkonen L, Berninger P, Lin J, Sauter T, Heinäniemi M. Nucleic Acids Res. (2013) doi:10.1093/nar/gkt989
  32. Charting the NF-κB Pathway Interactome Map. Tieri P1, Termanini A, Bellavista E, Salvioli S, Capri M, Franceschi C. PLoS One (2012) doi: 10.1371/journal.pone.0032678

Version History

DisGeNET 4.0 - October, 2016

  • 254 clinical concepts were reclassified as "group"

DisGeNET 4.0 - June, 2016

  • New entry point in the web interface for variants
  • New data for variants: the chromosomal coordinates, and the reference and alternative alleles
  • New data for variants: the class of the variant: SNP, deletion, insertion, indel, somatic SNV, substitution, sequence alteration, and tandem repeat
  • New data for variants: the allelic frequency according to the 1000 Genomes Project and Exome Aggregation Consortium
  • New data for variants: the most severe consequence type according to the VEP

DisGeNET 4.0 - April 15, 2016

  • All data sources were updated
  • New data sources added: Orphanet, GWAS Catalog
  • New association types added to the DisGeNET GDA ontology
  • New disease annotations added to the browser: HPO,and HDO
  • New disease classification: disease, phenotype, and group
  • New: Specificity and Pleiotropy indexes for genes were added
  • Information on SNP-gene and SNP-disease association is now available

DisGeNET 3.0 - May 15, 2015

  • All data sources were updated
  • New data source added: ClinVar
  • Improved text mining data: GDAs from BeFree classified by association type
  • More information on SNPs: links to dbSNP, ENSEMBL, and ClinVar

DisGeNET 2.1 - May 5, 2014

  • Second release of DisGeNET as Linked Data (DisGeNET RDF v2.1.0)
  • New text mining information using the BeFree System

DisGeNET 2.0 - February 5, 2014

  • First release of DisGeNET as Linked Data (DisGeNET RDF v1.0)
  • Added information about rat disease models from CTDTM and RGD
  • New Text mining information

DisGeNET - July 20, 2012

  • Added information about mouse disease models from CTDTM and MGD
  • Changed disease identifiers from MeSH, OMIM® to UMLS® CUIs
  • DisGeNET web interface is launched

DisGeNET 1.02 - Oct 7th 2010

  • added README.txt
  • changed citation of application note
  • fixed a bug in build script which did not copy the images
  • added plugin.props so DisGeNET properly shows up in the Cytoscape plugin manager

DisGeNET 1.01 - Oct 4th 2010

  • fixed minor bug, two columns (pmids, sentence) were mixed up in the database table geneDiseaseNetwork
  • build script added to source code

DisGeNET 1.0 - Sep 21st 2010

  • initial release of DisGeNET as a Cytoscape plugin and SQLite database