DisGeNET RDF

The DisGeNET-RDF Linked Dataset is an alternative way to access the DisGeNET data and provides new opportunities for data integration, querying and integrating DisGeNET data to other external RDF datasets.

The DisGeNET RDF API has been selected among the 10 ELIXIR Recommended Interoperability Resources ELIXIR announced its first portfolio of Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management on December 2018. The full list of ELIXIR Recommended Interoperability Resources is available here.

The RDF version of DisGeNET has been developed in the context of the Open PHACTS project to provide disease relevant information to the knowledge base on pharmacological data. DisGeNET-RDF has been integrated in the Open PHACTS Discovery Platform among other resources such as ChEMBL, WikiPathways and neXtProt. Aimed at exploring and querying DisGeNET data across the linked data in the platform, APIs are currently available in the Open PHACTS API v1.5 (see the OPS API Web site for up to date information).

To perform faceted and precise searches the DisGeNET-RDF linked data is accessible via a Faceted browser.

In addition, DisGeNET-RDF linked data can be accessed for question-answering via a SPARQL endpoint . An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface here, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI. Furthermore, some DisGeNET queries are available at Bioqueries. See the SPARQL Endpoint Example Queries section for more details and query examples.

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. For more information on the dataset description of the RDF Dataset go to Metadata Description section.

To download the dump files please, go to the Data Downloads section.

Release Information

DisGeNET-RDF v6.0

The RDF distribution of DisGeNET includes new annotation and new linksets:

  • All linksets updated, and all ontologies updated.
  • Changed the data model for the variant attributes

Linked Dataset Description

There are four main components in the RDF dataset: GDA content, VDA content, metadata description of the RDF dataset (VoID description), and linkouts to other Linked Datasets. The current RDF representation of DisGeNET (v6.0.0) has 62,359,775 triples serialized in Turtle syntax. The triples annotate 628,685 gene-disease associations (GDAs), 17,549 genes, and 24,166 diseases, disorders, traits, and clinical or abnormal human phenotypes and 210,498 variant-disease associations (VDAs), between 117,337 variants and 10,358 diseases, traits, and phenotypes. The RDF graph model is centered around two concepts the GDA concept and the VDA concept and their attributes. The genes are identified by the NCBI Entrez identifiers and the diseases are identified by the UMLS CUI and the variants are identified by the dbSNP id. Entities and properties are semantically defined using standard ontologies such as the National Cancer Institute thesaurus (NCIt), and resources identified by using de-referenceable IRIs. GDAs are integrated using the DisGeNET Association Type Ontology and they are semantically harmonized using SIO classes (see the DisGeNET ontology section below).

A full dataset description of the RDF Linked Dataset is provided using among others the Vocabulary of Interlinked Datasets (VoID), an RDF Schema W3C recommended vocabulary for expressing metadata about RDF datasets. This dataset description, which is compliant with the W3C HCLS specification and the Open PHACTS specification, includes the provenance of the DisGeNET relational database, the primary databases, and the BeFree text mining tool (see the DisGeNET VoID file description). The type of curation and level of evidence of each original database are also tracked and annotated. Each data instance in DisGeNET is explicitly referenced to this dataset description in order to granulate and trace back the provenance to the instance level.

In addition, linkouts to the LOD are set in order to both enrich DisGeNET GDAs annotations with external Semantic Web resources, and to extend the current GDAs content of the Web of knowledge. Specifically, a total number of 4,962,315 linksets to the LOD through Bio2RDF, linked life data network projects among others exists in the current version. All entities linked are related using the same SKOS predicate skos:exactMatch. Other linkset statistics between entities can be found at the DisGeNET DataHub site in the DataHub registry. Consequently, DisGeNET appears in the last update of the LOD cloud diagram (2014-08-30 update). This diagram shows datasets published in Linked Data format and it is built based on their metadata description on the DataHub as well as on metadata extracted from a crawl of the Linked Data Web.

Metadata Description

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. The full VoID description at DisGeNET_VoID.ttl.gz.

DisGeNET-RDF Schema

The data model of the RDF representation of DisGeNET is shown below. Click on the picture to zoom in.

In this new release, GDAs are now identified by "303 URIs" following the W3C recommendation to build URIs for the Semantic Web. Each GDA is defined by a unique combination of a gene (NCBI GeneID), a disease (UMLS CUI), an association type defined by our ontology (see section below), a data source of provenance, and a PubMed article (PMID) giving evidence to the gene-disease association. A unique identifier based on Universally Unique Identifiers (UUID) generated by a cryptographic hash function, is established for each GDA. The DisGeNET GDA ID is composed by: 'DGN' + UUID, e.g. DGN7ab3d8cae0c9f1150cb65a985aa8c0a1. The new namespace is 'http://rdf.disgenet.org/resource/gda/'. The new GDA IRI pattern is: namespace + DisGeNET ID,

e.g. 'http://rdf.disgenet.org/resource/gda/DGN7ab3d8cae0c9f1150cb65a985aa8c0a1'.

For an example of triples related to a single gene-disease association in DisGeNET, see here.

The DisGeNET Association Type Ontology

The DisGeNET Association Type Ontology was developed in our group to fill the gap in formal semantics for the definition of types of associations described between a gene and a disease in biological databases. This ontology was generated using all terms provided by the GDAs original databases. It is an OWL ontology that can be accessed at GeneDiseaseAssociation.owl. The DisGeNET ontology is integrated into the Sematicscience Integrated Ontology (SIO), which is an OWL ontology that provides essential types and relations for the rich description of objects, processes and their attributes [PDF]. You can check SIO gene-disease association classes from this URL or download the entire SIO OWL-DL ontology file . The SIO ontology can be also accessed at the NCBO Bioportal. DisGeNET GDAs in RDF are semantically harmonized using SIO classes.

Gene-Disease Ontology

Access to the RDF Linked Dataset

Data Downloads

The DisGeNET-RDF data dump and the VoID description file are accessible to download.

Faceted Browser

DisGeNET-RDF linked data can be navigated via a Faceted browser.

SPARQL Endpoint

DisGeNET-RDF data are accessible using the query language SPARQL via our public SPARQL endpoint. The dataset is stored in a Virtuoso's QUAD Store in which the name of the graph is 'http://rdf.disgenet.org'. It is powered by Virtuoso open-source v7.1.0.

An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface at the DisGeNET LODEStar Endpoint, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI.

DisGeNET GRAPH

The DisGeNET-RDF dataset is deployed in the graph: 'http://rdf.disgenet.org'.

DisGeNET NAMESPACES*

The namespaces required to query DisGeNET are:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX so: <http://purl.obolibrary.org/obo/SO_>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dctypes: <http://purl.org/dc/dcmitype/>
PREFIX wi: <http://http://purl.org/ontology/wi/core#>
PREFIX eco: <http://http://purl.obolibrary.org/obo/eco.owl#>
PREFIX prov: <http://http://http://www.w3.org/ns/prov#>
PREFIX pav: <http://http://http://purl.org/pav/>
PREFIX obo: <http://purl.obolibrary.org/obo/>

*Our SPARQL endpoint is configured with these prefixes, thus their definition is not required when executing queries from our endpoint.

RDF Entity Examples

In order to help the user to query DisGeNET RDF data, for each type of entity represented in DisGeNET we provide an example of its RDF annotation serialized in Turtle syntax, see here.

Access DisGeNET via ontology

To facilitate the retrieval of data, several ontologies are deployed in the quad store in order to perform question/answering walking the ontologies. The deployed ontologies are:

  • The Semanticscience Integrated Ontology (SIO),
  • the Human Disease Ontology (DO),
  • the Orphanet Rare Disease Ontology (ORDO),
  • the NCI thesaurus (NCIt),
  • the Human Phenotype Ontology (HPO),
  • the Experiment Factor Ontology (EFO).
  • the Evidence Code Ontology (ECO).

Please, note the coverage of DisGeNET with other disease terminologies summarized in the disease table in downloads.

SPARQL Endpoint Example Queries

The purpose of DisGeNET linked dataset is to enable richer queries over the data. Below we provide examples of how to explore DisGeNET data.

Examples

Query 1.1: Retrieve all the gene-disease associations (GDAs) and their general description

# Retrieve all the GDAs of type 'Therapeutic' (sio:SIO_001120) and their general description. SELECT ?gda ?sio_type ?label ?comment ?title ?id ?voidSubset WHERE { ?gda rdf:type ?type ; rdfs:label ?label ; rdfs:comment ?comment ; dcterms:title ?title ; dcterms:identifier ?id ; void:inDataset ?voidSubset . ?type rdfs:label ?sio_type FILTER(?type=sio:SIO_001122) } LIMIT 20

Execute


Query 1.2: Retrieve all the GDAs and their related gene and disease entities

# Retrieve all the GDAs, associated gene and disease URIs based on the DisGeNET ID, NCBI GeneID, and UMLS CUI, respectively. SELECT ?gda ?gene ?disease WHERE { ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 } LIMIT 20

Execute


Query 1.3: Retrieve the association evidences of an specific gene-disease association

# Retrieve the supporting evidences in DisGeNET, for the association between the "Rett Syndrome" disease (umls:C0035372) and the MECP2 gene (ncbigene:4204). SELECT DISTINCT ?gda <http://linkedlifedata.com/resource/umls/id/C0035372> as ?disease <http://identifiers.org/ncbigene/4204> as ?gene ?score ?source ?associationType ?pmid ?sentence WHERE { ?gda sio:SIO_000628 <http://linkedlifedata.com/resource/umls/id/C0035372>, <http://identifiers.org/ncbigene/4204> ; rdf:type ?associationType ; sio:SIO_000216 ?scoreIRI ; sio:SIO_000253 ?source . ?scoreIRI sio:SIO_000300 ?score . OPTIONAL { ?gda sio:SIO_000772 ?pmid . ?gda dcterms:description ?sentence . } }

Execute


Query 1.4: Retrieve all the GDAs from CURATED sources and with a score greater than or equal to 0.4

# Retrieve all the GDAs from CURATED sources (UNIPROT, CTD_human, PSYGENET, ORPHANET,CLINGEN, GENOMICS_ENGLAND, CGI, CLINGEN) with a score greater than or equal to 0.4. SELECT DISTINCT ?gda ?disease ?source ?score WHERE { ?gda sio:SIO_000628 ?gene, ?disease ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?disease a ncit:C7057 . ?scoreIRI sio:SIO_000300 ?score . FILTER regex( ?source, "UNIPROT|CTD_human|PSYGENET|ORPHANET|CLINGEN|GENOMICS_ENGLAND|CGI|CLINGEN" ) FILTER (?score >= 0.4) } ORDER BY DESC(?score) LIMIT 100

Execute


Query 1.5: Retrieve all the diseases associated with transporters

# Retrieve all the diseases associated with proteins classified as 'transporter' according to the PANTHER classification. SELECT DISTINCT ?disease ?diseaselabel ?diseasename WHERE { ?panther rdfs:subClassOf sio:SIO_000275 ; dcterms:title ?panthername . ?protein sio:SIO_000095 ?panther . ?gene sio:SIO_010078 ?protein . ?gda sio:SIO_000628 ?gene, ?disease . ?disease dcterms:title ?diseasename . ?disease rdfs:label ?diseaselabel . FILTER regex(?panthername, "transporter") FILTER regex(STR(?disease), "umls/id") } LIMIT 100

Execute


Query 1.6: Retrieve the genes associated with Alzheimer disease

# Retrieve all genes associated to Alzheimer Disease with a score greater or equal to 0.4. SELECT DISTINCT ?gene str(?geneName) as ?name ?score WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?disease rdf:type ncit:C7057 ; dcterms:title "Alzheimer's Disease"@en . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score >= 0.4) } ORDER BY DESC(?score)

Execute


Query 1.7: Retrieve the number of publications supporting the gene-disease associations

# Retrieve the top ten gene-disease association by number of publications. SELECT DISTINCT str(?dName) as ?dName str(?gSymbol) as ?geneSymbol count(?article) as ?publications WHERE { ?gda rdf:type ?type; sio:SIO_000628 ?gene,?disease; sio:SIO_000772 ?article . ?disease a ncit:C7057; dcterms:title ?dName . ?gene a ncit:C16612; sio:SIO_000205 ?symbolUri . ?symbolUri dcterms:title ?gSymbol . } ORDER BY DESC (?publications) LIMIT 10

Execute


Query 1.8: Retrieve the diseasome

# Retrieve the associations between diseases (diseasome) based on shared genes. SELECT DISTINCT ?disease ?diseaseName ?gene ?disease2 ?diseaseName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease2,?gene . ?disease dcterms:title ?diseaseName . ?disease2 dcterms:title ?diseaseName2 . FILTER regex(?gene, "ncbigene") FILTER regex(?disease, "umls/id") FILTER regex(?disease2, "umls/id") FILTER (?disease != ?disease2) FILTER (?gda != ?gda2) } LIMIT 50

Execute


Query 1.9: Retrieve the gene-gene network

# Retrieve the associations between genes based on shared diseases. SELECT DISTINCT ?gene ?geneName ?disease ?gene2 ?geneName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease,?gene2 . ?gene dcterms:title ?geneName . ?gene2 dcterms:title ?geneName2 . FILTER regex(?disease, "umls/id") FILTER regex(?gene, "ncbigene") FILTER regex(?gene2, "ncbigene") FILTER (?gene != ?gene2) FILTER (?gda != ?gda2) } LIMIT 50

Execute


Query 1.10: Retrieve diseases by DO class.

# Retrieve all diseases belonging to 'Ovarian cancer' class in the Human Disease Ontology (DOID:2394). SELECT DISTINCT ?umls ?umlsTerm ?doid ?doTerm WHERE { ?gda sio:SIO_000628 ?umls . ?umls dcterms:title ?umlsTerm ; skos:exactMatch ?doid . ?doid rdfs:label ?doTerm ; rdfs:subClassOf+ <http://purl.obolibrary.org/obo/DOID_2394> . FILTER regex(?umls, "umls/id") } LIMIT 20

Execute


Query 1.11: Retrieve all variants, its associated diseases and associated genes

# Retrieve all variants, their associated genes (if any) and the diseases associated to these variants. SELECT distinct ?disease str(?diseaseTitle) as ?diseaseName ?variant str(?variantTitle) as ?variantName ?gene str(?geneTitle) as ?geneName str(?chrValue) as ?chromosome str(?posValue) as ?position str(?refValue) as ?refAllele str(?altValue) as ?altAllele ?type as ?mostSevereConsequence str(?speValue) as ?specificity str(?pleioValue) as ?pleiotropy WHERE { ?vda sio:SIO_000628 ?variant,?disease . ?disease a ncit:C7057 . ?disease dcterms:title ?diseaseTitle . ?variant a ?type . ?variant dcterms:title ?variantTitle . ?variant sio:SIO_000216 ?spe,?pleio . ?spe a sio:SIO_001351 ; sio:SIO_000300 ?speValue . ?pleio a sio:SIO_001352 ; sio:SIO_000300 ?pleioValue . ?variant sio:SIO_000061 ?chr . ?chr sio:SIO_000300 ?chrValue . ?pos sio:SIO_000300 ?posValue . ?variant sio:SIO_000791 ?pos . ?variant sio:SIO_000223 ?ref,?alt . ?ref a geno:0000152 ; sio:SIO_000300 ?refValue . ?alt a geno:0000476 ; sio:SIO_000300 ?altValue . FILTER (?type!=so:0001060) OPTIONAL { ?variant so:associated_with ?gene . ?gene a ncit:C16612 . ?gene dcterms:title ?geneTitle } } ORDER BY ASC(?disease) ASC(?variant) LIMIT 20

Execute


Query 1.12: Retrieve the reference allele and all its alternatives for an specific variant

#Retrieve the reference allele the possible alternatives and the the alternative frequency if available for the variant rs5035.

SELECT DISTINCT ?variant ?refAlVal ?altAlVal ?source ?altAlFreqVal WHERE { ?variant a so:0001060 ; sio:SIO_000223 ?refAl,?altAl . ?refAl rdf:type geno:0000152; sio:SIO_000300 ?refAlVal; sio:SIO_000253 ?source; sio:SIO_000628 ?vsap . ?altAl rdf:type geno:0000476; sio:SIO_000300 ?altAlVal; sio:SIO_000253 ?source; sio:SIO_000628 ?vsap; sio:SIO_000900 ?altAlFreq . OPTIONAL { ?altAlFreq a sio:SIO_001367; sio:SIO_000300 ?altAlFreqVal . } FILTER (?variant=<http://identifiers.org/dbsnp/rs5035>) } LIMIT 20

Execute


SPARQL Endpoint Example Federated Queries

The purpose of the Federated queries is to integrate DisGeNET data with other Linked Datasets in the LOD cloud. The DisGeNET SPARQL endpoint supports federated queries over other linked datasets such as UniProt, Gene Expression Atlas (GXA), and WikiPathways. The DisGeNET SPARQL endpoint supports the syntax and semantics of SPARQL 1.1 for executing queries distributed over different SPARQL endpoints. Below we provide examples of how to explore perform federated queries.

DisGeNET + WikiPathways DisGeNET + other SPARQL endpoints (ChEMBL, Ensembl, Uniprot,..)

Examples

FED1: DisGeNET + WikiPathways (queries made in collaboration with the WikiPathways RDF team. Thanks!!!)

NAMESPACE

PREFIX wp:<http://vocabularies.wikipathways.org/wp#>

Query 2.1: Retrieve the genes and pathways associated with 'Marfan Syndrome'

# Give me all disease genes for 'Marfan Syndrome' (MeSH:D008382 or OMIM:601665) in DisGeNET and the pathways for these genes from WikiPathways. Output the disease name, NCBI Gene ID, HGNC gene name, gene label, WikiPathways pathway ID and name. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene str(?GeneName) as ?GeneTitle ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D008382> . # alternatively, searching by MIM term: # ?disease skos:exactMatch <http://bio2rdf.org/omim:601665> ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName . # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?GeneName)

Execute


Query 2.2: Retrieve the pathways associated with 'Pulmonary Emphysema'

# Give me all pathways in WikiPathways for CURATED disease genes associated with 'Pulmonary Emphysema' (MeSH:D011656) in DisGeNET. Output the disease name, WikiPathways pathway ID and name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http: //id.nlm.nih.gov/mesh/D011656> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . FILTER regex(?source, "uniprot|ctd_human|clinvar") # Query WikiPathways for gene-pathways SERVICE <http: //sparql.wikipathways.org /> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } } ORDER BY DESC(?PathwayName)

Execute


Query 2.3: Retrieve the pathways associated with 'Schizophrenia', and show the number Schizophrenia genes in each pathway

# Give me all pathways in WikiPathways and the total number of disease genes in each pathway for 'Schizophrenia' (MeSH:D012559). We will consider associations from CURATED sources with DisGeNET score greater than 0.35. Output the disease name, WikiPathways pathway ID, pathway name, and the number of disease genes. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName count(DISTINCT ?gene) AS ?genes WHERE { # DisGeNET: get disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D012559> ; dcterms:title ?DiseaseName . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.45"^^xsd:decimal) # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } # end of service } # end of query ORDER BY DESC(?genes)DESC(?PathwayName) LIMIT 100

Execute


Query 2.4: Retrieve the genes and pathways associated with 'Diabetes Mellitus, Type 2'

# Give me all disease genes for 'Diabetes Mellitus, Type 2' (MeSH:D003924) with DisGeNET score greater than 0.35, that are involved in pathways and the number of pathways in WikiPathways in which each gene is involved. Output the disease name, gene URI, gene name, and the number of pathways. Please, be aware that this query takes some time due to the amount of data crossed. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?diseaseName ?gene ?geneName ?nPathways WHERE { ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D003924> ; rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?scoreIRI sio:SIO_000300 ?score . { SELECT ?gene ?nPathways WHERE { # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT ?gene COUNT(DISTINCT ?pathway) as ?nPathways WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?pathwayid } GROUP BY ?gene } # end of service } # end of where } # end of subquery FILTER (?score > "0.35"^^xsd:decimal) } # end of query LIMIT 100

Execute


Query 2.5: Retrieve the number of genes for 'Bardet-Biedl Syndrome' disease and indicate the number of genes present in pathways

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), give me the total number of associated genes in DisGeNET and the total number of these genes in WikiPathways. Output the disease name, the total number of disease genes in Wikipathways and the total number of disease genes in DisGeNET. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?GeneInPathway ?TotalGene WHERE { SELECT * WHERE { # Total # of DisGeNET genes in WikiPathways ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT * WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway } } # Total # genes in DisGeNET { SELECT DISTINCT ?DiseaseName ount(distinct ?gene2) as ?TotalGene WHERE { ?gda sio:SIO_000628 ?gene2,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene2 rdf:type ncit:C16612 } } } }

Execute


Query 2.6: Retrieve the genes and the pathways associated with 'Bardet-Biedl Syndrome' disease. In addition, list all the genes involved in each of the pathways found

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the genes, the pathway(s) in which the gene is involved from WikiPathways, and all the genes present in each of these pathways. Output the disease name, gene in DisGeNET, pathway ID, and gene in Wikipathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName ?gene ?PathwayID ?allGeneInPw WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?gene) DESC(?PathwayID) DESC(?allGeneInPw)

Execute


Query 2.7: Retrieve the total number of both disease genes and all genes involved in each pathway for the 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID, and the number of all genes in each of these pathway. Output the disease name, DisGeNET genes in the pathway, pathway ID, and all genes in the Wikipathways pathway. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute


Query 2.8: Retrieve the total number of both disease genes and all genes involved in each disease pathway and secondary pathways for 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes shared between the disease pathway and a secondary pathway, and the secondary pathway ID. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, and all genes in the disease pathway shared with another pathway, and the secondary pathway ID. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway ?PathwayID2 WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute


Query 2.9: Retrieve the number of disease genes, the number of all genes, and the number of secondary pathways in each disease pathway for 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes in the disease pathway, and the number of secondary pathways annotated to genes in each disease pathway. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, the number of all genes in the disease pathway, and the number of secondary pathways. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway count(distinct ?PathwayID2) as ?totalPathways WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute


Query 2.10: Retrieve the pathways associated with 'Lafora Disease'

# For 'Lafora Disease' (MeSH:D020192), give me the associated genes from LITERATURE sources in DisGeNET with a score less or equal than 0.2, and the pathways annotated to these disease genes in WikiPathways. Output the disease name, the gene URI, the score, the number of publications, the pathway URI, and the pathway name. PREFIX wp:<http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene ?score count(distinct ?publication) as ?numberOfPublications ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000772 ?publication ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020192> . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?source wi:evidence ?evidence . ?evidence rdfs:label ?label . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score < "0.2"^^xsd:decimal || ?score = "0.2"^^xsd:decimal) FILTER regex(?label, "literature", "i") # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?score) DESC(?numberOfPublications) DESC(?PathwayName)

Execute


EBI RDF Source

Query 3.1: Retrieve the potential drug targets for 'Aarskog Syndrome'

# For 'Aarskog Syndrome' disease (UMLS_CUI:C0175701), give me the associated proteins from CURATED sources that are targets for molecules in ChEMBL. Output the disease name, the source, the number of supporting evidences for each GDA, the gene, the target, the molecule and the activity. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT str(?diseaseName) as ?diseasename ?source count(distinct ?gda) as ?evidences ?gene ?target ?molecule ?activity WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?source, "UNIPROT|CTD_human|CLINVAR") FILTER (?disease = <http://linkedlifedata.com/resource/umls/id/C0175701>) FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") # Query ChEMBL for activity data { SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?uniprot a cco:UniprotRef . ?targetcmpt cco:targetCmptXref ?uniprot . ?target cco:hasTargetComponent ?targetcmpt . ?assay cco:hasTarget ?target . ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . FILTER ( ?target = <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2862> ) } # end of service } # end of select } # end of subquery } # end of query

Execute


Query 3.2: Retrieve diseases associated with genes in Ensembl

Attention this example must be executed from the EBI-Ensembl sparql endpoint.

# Give me all diseases in DisGeNET associated with a gene in Ensembl with EnsemblGeneId ENSG00000169057 (MECP2). Output the Ensemble gene ID, the disease, the gene-disease association and the score for this association.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX dbpedia2: <http://dbpedia.org/property/> PREFIX dbpedia: <http://dbpedia.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX sio: <http://semanticscience.org/resource/> PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/> PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/> PREFIX core: <http://purl.uniprot.org/core/> SELECT DISTINCT ?gda ?ncbigene ?disease ?scoreValue { ensembl:ENSG00000169057 ensemblterms:DEPENDENT ?ncbigene . FILTER regex(?ncbigene,'ncbigene') . SERVICE <http://rdf.disgenet.org/sparql/>{ ?gda sio:SIO_000628 ?ncbigene,?disease; sio:SIO_000216 ?score . ?disease a ncit:C7057 . ?score sio:SIO_000300 ?scoreValue FILTER (?scoreValue >= 0.8) } }LIMIT 20

Execute


Query 3.3: Retrieve disease coding genes in DisGeNET with disease annotation in UniProt

# Give me all proteins in DisGeNET encoded by disease genes that have disease annotation in UniProt. Output the protein and the disease annotation. SELECT ?protein ?comment WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query UniProt for proteins with disease annotation SERVICE <http://sparql.uniprot.org/sparql> { ?uniprot up:annotation ?annotation . ?annotation a up:Disease_Annotation ; rdfs:comment ?comment . } } LIMIT 10

Execute


Documentation

About RDF, Linked Data, Semantic Web technologies

Good introductions to the field are:

  • Wikipedia, always a good place to start!
  • W3C, look at whom develops the standards. The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards.
  • EBI RDF Platform documentation, the EBI is one of the major Linked Data providers of Life Sciences. There is a comprehensive documentation in its website that is worth reading.

Nanopublications

The Integrative Biomedical Informatics Group is pleased to announce the second publication of the DisGeNET Nanopublications that is a Linked Dataset implemented in combination of the nanopublication approach [nanopub.org] and the Trusty URIs technique [PDF]. It is an alternative way to mine statements about gene-disease associations contained in DisGeNET. Nanopublications are a new way of publishing structured data that allows the tracking of provenance along with the scientific statement. The Trusty URIs is a novel technique to make resources in the Web immutable and verifiable, and to ensure the unambiguity of the data linking in the (semantic) Web. This new Linked Dataset provides nanopublications about scientific statements of human GDAs. These GDAs published as Trusty URI nanopublications are machine-interpretable, immutable, permanent, and verifiable. Each GDA statement has its provenance description providing evidence, attribution, creation time, and further context of its creation. Each GDA is classified as “CURATED”, “PREDICTED”, or “LITERATURE” in the DisGeNET context to categorize the evidence of the statement based on the type of assertion and curation made in the original databases. DisGeNET nanopublications include metadata annotations about the general topic of the nanopublications, i.e. ‘Gene-Disease Association’, semantically described by SIO to facilitate its discoverability in the Semantic Web (see PDF).

Linked Dataset Description

The third release of DisGeNET published as nanopublications is a distribution of DisGeNET v4.0 (Nanopublications version v4.0.0.0). The dataset consists of 1,414,902 nanopublications, representing the same number of scientific statements for 429,036 different GDAs with their detailed provenance, levels of evidence and publication information descriptions, all annotated as RDF statements and encapsulated into the nanopublication RDF graphs (5,659,608 graphs in total). Specifically, the dataset is composed of 48,106,668 N-Quads, i.e. RDF triples with their graph (or “context”) added as the fourth member in the tuple (Subject, Predicate, Object, Context), everything being serialized in TriG syntax.

DisGeNET Nanopublication Schema

The official guidelines to create nanopublications were used. A DisGeNET nanopublication is modeled by 4 named graphs: head, assertion, provenance and publication information. The head graph defines the structure of the nanopublication by linking to the other graph URIs. The assertion graph contains the description for a specific single GDA assertion. The provenance graph includes provenance, evidence and attribution statements that were directly mapped from the VoID description of the RDF dataset. Finally, the publication information graph includes all the metadata information regarding the nanopublication itself, see figure below (Click on the image to zoom in). The source of data for the DisGeNET nanopublications set is the RDF Linked Dataset version of DisGeNET. To implement Trusty URIs, the GitHub Java implementation was used.

Nanopublication Example

Access to the Nanopublications Linked Dataset

DisGeNET nanopublications can be accessed in two ways: they can be downloaded as a file in TriG format from the download section, and they are deployed in a new decentralized nanopublication server network, which is a distributed server network with a REST API to provide and propagate nanopublications identified by trusty URIs [ref( Kuhn et. al. 2015 )]. DisGeNET nanopublications are registered in datahub with other datasets formatted as nanopubublications. New: For performance reasons DisGeNET nanopublications are not accessible anymore via our SPARQL endpoint. To download the current dataset, which is the nanopublication distribution of the DisGeNET v4.0: nanopubs-v4.0.0.0.

For performance reasons DisGeNET nanopublications are not accessible anymore via our SPARQL endpoint.

To download the current dataset, which is the nanopublication distribution of the DisGeNET v4.0: nanopubs-v4.0.0.0.

SPARQL Example queries

DisGeNET nanopublications can be explored using the query language SPARQL via a SPARQL endpoint. With illustrative queries we show how to explore GDAs with DisGeNET nanopublications and how to integrate them with relationships published in other LOD sources. As example we can query DisGeNET nanopubs to answer the following question:

What are the proteins (and their protein interactions) associated to Alzheimer Disease with curated evidence?


Query 1.1: Retrieving Gene-Disease Associations

# First, we query DisGeNET for all the genes associated to Alzheimer Disease (umls:C0002395). This query only involves the assertion graph. SELECT DISTINCT ?gene FROM <http: //rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } } LIMIT 10

Query 1.2: Filtering By Evidence

# Second, we filter the prior results with those assertions annotated as CURATED DisGeNET evidence. This query involves the provenance graph. SELECT DISTINCT ?gene ?evidence FROM <http: //rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } } LIMIT 10

Query 1.3: Linking with Other LOD Resources

# Finally, we cross DisGeNET prior results with the Interaction Reference Index database data, which contains protein-protein interactions (PPI) annotations, through Bio2RDF::irefindex SPARQL endpoint, federating the query. Since in DisGeNET-RDF is also represented the relation between gene and the protein/s that encodes, we are able to cross DisGeNET with Bio2RDF::irefindex by Protein resources through the corresponding linkset to 'http://bio2rdf.org/uniprot:UniProtID'. PREFIX bio2rdf-ifx: <http: //bio2rdf.org/irefindex_vocabulary:> SELECT DISTINCT ?gene ?protein ?protein_dgn ?evidence ?ppi ?protein_irx WHERE { ?gene sio:SIO_010078 ?protein . ?protein skos:exactMatch ?protein_dgn . FILTER regex(?protein_dgn, "bio2rdf.org/uniprot:") GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } # Get the interactome data from Bio2RDF::irefindex SERVICE <http: //irefindex.bio2rdf.org/sparql> { OPTIONAL { ?ppi a bio2rdf-ifx:Pairwise-Interaction ; bio2rdf-ifx:interactor_a ?protein_dgn ; bio2rdf-ifx:interactor_b ?protein_irx . } } } LIMIT 100

Version History

DisGeNET-RDF v6.0

UPDATE December - 2019 - Protein URIs update.

  • Changed the Protein (ncit:C17021) URIs from <http://identifiers.org/uniprot/> to <http://purl.uniprot.org/uniprot/>
  • Removed the linkout via skos:exactMatch to <http://purl.uniprot.org/uniprot/> no longer needed as this URIs act as the main URI for the protein class.

UPDATE November - 2019 - Bugfix release.

  • Fixed a bug with the sequence ontology URIs for the variant_consequence type, not mapping correctly the subclasses.
    • Removed the sequence ontology URIs.
    • Added obo library URIs to map variant consequence types.
    • The PREFIX so:<http://www.sequenceontology.org/miso/current_svn/term/SO:> changed to so:<http://purl.obolibrary.org/obo/SO_>

RELEASE July - 2019 - The RDF distribution of DisGeNET includes new annotation and new linksets:

  • All linksets updated, and all ontologies updated.
  • Changed the data model for the variant attributes

DisGeNET-RDF v5.0

The RDF distribution of DisGeNET includes new annotation and new linksets:

  • All linksets updated, i.e. all ontologies updated.
  • Disease-phenotype annotation data have been integrated from 3 different sources:
    • Annotations from the Human Phenotype Ontology
    • Text-mined annotations from The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Groza et al., 2015
    • Text-mined annotations from Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Hoehndorf et al., 2015.
  • RDF enhancement and data model changes

DisGeNET v4.0 RDF Release Information

The RDF distribution of DisGeNET includes all DisGeNET v4.0 new content, besides new annotation and new linksets:

  • More GDAs comprising more than 17,000 genes and 15,000 diseases as linked data in the Semantic Web.
  • All linksets updated, i.e. all ontologies updated.
  • Disease-phenotype annotation data from the Human Phenotype Ontology.
  • New linksets to the Experimental Factor Ontology (EFO).
  • New annotation: all diseases annotated to the original term(s) of provenance.
  • EFO is also deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See examples in the SPARQL section.
  • RDF enhancement, data model changes, and fixed bugs:
    • Updated RDF Schema to encompass new annotations.
    • Changed formal description of the property linking diseases to their phenotypic profile: sio:'is manifested as' (sio:SIO_000341) replaced by sio:'has phenotype' (sio:SIO_001279).
    • Fixed data typing language: language tags correctly added on RDF Literal data types.

DisGeNET v3.0 RDF Release Information

The RDF distribution of DisGeNET includes new annotation and new linksets:
  • More GDAs comprising 17 000 genes and more than 14 000 diseases as linked data in the Semantic Web.
  • All linksets updated, i.e. all ontologies updated.
  • New disease-phenotype annotation data from the Human Phenotype Ontology.
  • New linksets to NCI Thesaurus, Orphanet Rare Disease Ontologies (ORDO), and DECIPHER.
  • New taxonomic annotation: all GDAs annotated to the Homo sapiens (Human) taxon.
  • New full metadata description of the dataset compliant with the W3C HCLS and the Open PHACTS specifications (the Open PHACTS specifications specially used for linkset descriptions).
  • More mappings to the Linked Open Data cloud.
  • New and alternative LODEStar SPARQL access.
  • New types of searches: six ontologies are deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See an example in the SPARQL section.
  • RDF enhancement, data model changes, and fixed bugs:
    • New "303 URIs" for DisGeNET GDAs and PANTHER class entities.
    • New labels.
    • Primary source evidence better described with the Evidence Code Ontology and new properties.
    • New name descriptions: foaf:name predicate replaced by dcterms:title.
    • Fixed formal description of the DisGeNET Score: Score described as an object property, and not as a datatype property.
    • Fixed formal description of gene-disease association type 'label' from original source attribute: now described as a datatype property by a new predicate: sio:SIO_000255 replaced by sio:SIO_000300.

DisGeNET Nanopublication v5.0

The nanopublication distribution of DisGeNET includes all DisGeNET v5.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.

DisGeNET v4.0 Nanopublication release:

The nanopublication distribution of DisGeNET includes all DisGeNET v4.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.

DisGeNET v3.0 Nanopublication release:

The nanopublication distribution of DisGeNET includes all DisGeNET v3.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications. Please, refer to the release notes and the RDF section in this page for more details.