DisGeNET-RDF - gene-disease associations for the Web of data

NEWS

DisGeNET v4.0 RDF is here!

After the release of DisGeNET v4.0, the release of its RDF distribution is out!

Go to the DisGeNET-RDF v4.0.0 release information to see the novel features

RELEASE INFORMATION

DisGeNET v4.0 RDF Release Information

After the release of DisGeNET v4.0, the release of its RDF distribution is here!

What's New, Release Notes and Bugs

* DisGeNET v4.0:
  • New text mined dataset from MEDLINE (BeFree source v4.0).
  • New data sources: Orphanet and NHGRI-EBI GWAS Catalog.
  • All data from authoritative sources updated.
  • Updated the DisGeNET association type ontology with new gene-disease association types.
  • New GDA annotation: publication year.
  • New annotations of genes to phenotypes.
  • New disease annotations:
    • Disease type: disease, phenotype and group
    • DO and HPO first level branches
  • New gene annotations: DisGeNET Disease Specificity Index (DSI) and a DisGeNET Disease Pleiotropy Index (DPI).
  • Information on more than 45,000 SNPs associated to diseases.

* DisGeNET v4.0 RDF release:
The RDF distribution of DisGeNET includes all DisGeNET v4.0 new content, besides new annotation and new linksets:
  • More GDAs comprising more than 17,000 genes and 15,000 diseases as linked data in the Semantic Web.
  • All linksets updated, i.e. all ontologies updated.
  • Disease-phenotype annotation data from the Human Phenotype Ontology.
  • New linksets to the Experimental Factor Ontology (EFO).
  • New annotation: all diseases annotated to the original term(s) of provenance.
  • EFO is also deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See examples in the SPARQL section.
  • RDF enhancement, data model changes, and fixed bugs:
    • Updated RDF Schema to encompass new annotations.
    • Changed formal description of the property linking diseases to their phenotypic profile: sio:'is manifested as' (sio:SIO_000341) replaced by sio:'has phenotype' (sio:SIO_001279).
    • Fixed data typing language: language tags correctly added on RDF Literal data types.

* DisGeNET v4.0 Nanopublication release:
The nanopublication distribution of DisGeNET includes all DisGeNET v4.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications.

Please, refer to the release notes and the RDF section in this page for more details.

RELEASE INFORMATION HISTORIAL

DisGeNET v3.0 RDF Release Information

What's New, Release Notes and Bugs

* DisGeNET v3.0:
  • All data sources were updated
  • New data source added: ClinVar
  • New text mined dataset from MEDLINE (BeFree source), which includes:
    • GDA association type classified (GeneticVariation, AlteredExpression, PostTranslationalModification)
    • SNPs identified
  • More information on SNPs: links to dbSNP, ENSEMBL, and ClinVar

* DisGeNET v3.0 RDF release:
The RDF distribution of DisGeNET includes all DisGeNET v3.0 new content, besides new annotation and new linksets:
  • More GDAs comprising 17 000 genes and more than 14 000 diseases as linked data in the Semantic Web.
  • All linksets updated, i.e. all ontologies updated.
  • New disease-phenotype annotation data from the Human Phenotype Ontology.
  • New linksets to NCI Thesaurus, Orphanet Rare Disease Ontologies (ORDO), and DECIPHER.
  • New taxonomic annotation: all GDAs annotated to the Homo sapiens (Human) taxon.
  • New full metadata description of the dataset compliant with the W3C HCLS and the Open PHACTS specifications
  • (the Open PHACTS specifications specially used for linkset descriptions).
  • More mappings to the Linked Open Data cloud.
  • New and alternative LODEStar SPARQL access.
  • New types of searches: six ontologies are deployed in our SPARQL endpoint such as the Human Disease Ontology, the Human Phenotype Ontology and ORDO, in order to perform queries walking the ontology hierarchy. See an example in the SPARQL section.
  • RDF enhancement, data model changes, and fixed bugs:
    • New "303 URIs" for DisGeNET GDAs and PANTHER class entities.
    • New labels.
    • Primary source evidence better described with the Evidence Code Ontology and new properties.
    • New name descriptions: foaf:name predicate replaced by dcterms:title.
    • Fixed formal description of the DisGeNET Score: Score described as an object property, and not as a datatype property.
    • Fixed formal description of gene-disease association type 'label' from original source attribute: now described as a datatype property by a new predicate: sio:SIO_000255 replaced by sio:SIO_000300.

* DisGeNET v3.0 Nanopublication release:
The nanopublication distribution of DisGeNET includes all DisGeNET v3.0 gene-disease association statements along with its provenance, evidence, and attribution structured as nanopublications.

Please, refer to the release notes and the RDF section in this page for more details.

DisGeNET-RDF

The DisGeNET-RDF Linked Dataset is an alternative way to access the DisGeNET data and provides new opportunities for data integration, querying and integrating DisGeNET data to other external RDF datasets. The RDF version of DisGeNET has been developed in the context of the Open PHACTS project to provide disease relevant information to the knowledge base on pharmacological data. DisGeNET-RDF has been integrated in the Open PHACTS Discovery Platform among other resources such as ChEMBL, WikiPathways and neXtProt. Aimed at exploring and querying DisGeNET data across the linked data in the platform, APIs are currently available in the Open PHACTS API v1.5 (see the OPS API Web site for up to date information).

To perform faceted and precise searches the DisGeNET-RDF linked data is accessible via a Faceted browser.

In addition, DisGeNET-RDF linked data can be accessed for question-answering via a SPARQL endpoint. An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface here, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI. Furthermore, some DisGeNET queries are available at Bioqueries. See the 4.3 SPARQL Endpoint: Example Queries section for more details and query examples.

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. For more information on the dataset description of the RDF Dataset go to 1.1 Metadata Description section.

To download the dump files please, go to the 4.1 Data Downloads section.

1. Linked Dataset Description

There are three main components in the RDF dataset: GDA content, metadata description of the RDF dataset (VoID description), and linkouts to other Linked Datasets. The current RDF representation of DisGeNET (v4.0.0) has 30,506,021 triples serialized in Turtle syntax that annotate 429,036 gene-disease associations (GDAs), 17,381 genes, and 15,093 diseases involved in these associations. The RDF graph model is centered on the GDA concept, and different information around GDA, such as the gene and disease involved, and the type of association is represented. Also, the gene identified by the Entrez or NCBI GeneID and the disease identified by the UMLS CUI have different attributes annotated (see the Schema below). Entities and properties are semantically defined using standard ontologies such as the National Cancer Institute thesaurus (NCIt), and resources identified by using de-referenceable IRIs. GDAs are integrated using the DisGeNET Association Type Ontology and they are semantically harmonized using SIO classes (see the DisGeNET ontology section below).

A full dataset description of the RDF Linked Dataset is provided using among others the Vocabulary of Interlinked Datasets (VoID), an RDF Schema W3C recommended vocabulary for expressing metadata about RDF datasets. This dataset description, which is compliant with the W3C HCLS specification and the Open PHACTS specification, includes the provenance of the DisGeNET relational database, the primary databases, and the BeFree text mining tool (see the DisGeNET VoID file description). The type of curation and level of evidence of each original database are also tracked and annotated. Each data instance in DisGeNET is explicitly referenced to this dataset description in order to granulate and trace back the provenance to the instance level.

In addition, linkouts to the LOD are set in order to both enrich DisGeNET GDAs annotations with external Semantic Web resources, and to extend the current GDAs content of the Web of knowledge. Specifically, a total number of 4,962,315 linksets to the LOD through Bio2RDF, linked life data network projects among others exists in the current version. All entities linked are related using the same SKOS predicate skos:exactMatch. Other linkset statistics between entities can be found at the DisGeNET DataHub site in the DataHub registry. Consequently, DisGeNET appears in the last update of the LOD cloud diagram (2014-08-30 update). This diagram shows datasets published in Linked Data format and it is built based on their metadata description on the DataHub as well as on metadata extracted from a crawl of the Linked Data Web.

1.1 Metadata Description

The RDF Linked Dataset is accompanied with a full dataset description, which is compliant with the W3C HCLS specification. The full VoID description at DisGeNET_VoID.ttl.

2. RDF Schema

The data model of the RDF representation of DisGeNET is shown below. Click on the picture to zoom in.

In this new release, GDAs are now identified by "303 URIs" following the W3C recommendation to build URIs for the Semantic Web. Each GDA is defined by a unique combination of a gene (NCBI GeneID), a disease (UMLS CUI), an association type defined by our ontology (see section below), a data source of provenance, and a PubMed article (PMID) giving evidence to the gene-disease association. A unique identifier based on Universally Unique Identifiers (UUID) generated by a cryptographic hash function, is established for each GDA. The DisGeNET GDA ID is composed by: 'DGN' + UUID, e.g. DGN7ab3d8cae0c9f1150cb65a985aa8c0a1. The new namespace is 'http://rdf.disgenet.org/resource/gda/'. The new GDA IRI pattern is: namespace + DisGeNET ID,

e.g. 'http://rdf.disgenet.org/resource/gda/DGN7ab3d8cae0c9f1150cb65a985aa8c0a1'.

For an example of triples related to a single gene-disease association in DisGeNET, see here.





3. The DisGeNET Association Type Ontology

The DisGeNET Association Type Ontology was developed in our group to fill the gap in formal semantics for the definition of types of associations described between a gene and a disease in biological databases. This ontology was generated using all terms provided by the GDAs original databases. It is an OWL ontology that can be accessed at GeneDiseaseAssociation.owl. The DisGeNET ontology is integrated into the Sematicscience Integrated Ontology (SIO), which is an OWL ontology that provides essential types and relations for the rich description of objects, processes and their attributes [PDF]. You can check SIO gene-disease association classes from this URL or download the entire SIO OWL-DL ontology file . The SIO ontology can be also accessed at the NCBO Bioportal. DisGeNET GDAs in RDF are semantically harmonized using SIO classes.

Gene-Disease Ontology

4. Acces to the RDF dataset

4.1 Data Downloads

The DisGeNET-RDF data dump and the VoID description file are accessible to download.

4.2 Faceted Browser

DisGeNET-RDF linked data can be navigated via a Faceted browser.

4.3 SPARQL Endpoint and Example Queries

4.3.1 SPARQL Endpoint

DisGeNET-RDF data are accessible using the query language SPARQL via our public SPARQL endpoint. The dataset is stored in a Virtuoso's QUAD Store in which the name of the graph is 'http://rdf.disgenet.org'. It is powered by Virtuoso open-source v7.1.0.

An alternative SPARQL interaction with the DisGeNET-RDF data is via a LODEStar interface at the DisGeNET LODEStar Endpoint, which is a SPARQL endpoint and linked data browser for querying and browsing RDF datasets developed in the EBI.

4.3.2 Example Queries

DisGeNET GRAPH

The DisGeNET-RDF dataset is deployed in the graph: 'http://rdf.disgenet.org'.


DisGeNET NAMESPACES*

The namespaces required to query DisGeNET are:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX sio: <http://semanticscience.org/resource/>
PREFIX ncit: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dctypes: <http://purl.org/dc/dcmitype/>
PREFIX wi: <http://http://purl.org/ontology/wi/core#>
PREFIX eco: <http://http://purl.obolibrary.org/obo/eco.owl#>
PREFIX prov: <http://http://http://www.w3.org/ns/prov#>
PREFIX pav: <http://http://http://purl.org/pav/>
PREFIX obo: <http://purl.obolibrary.org/obo/>

*Our SPARQL endpoint is configured with these prefixes, thus their definition is not required when executing queries from our endpoint.


RDF Entity Examples

In order to help the user to query DisGeNET RDF data, for each type of entity represented in DisGeNET we provide an example of its RDF annotation serialized in Turtle syntax, see here.


Access DisGeNET via ontology

New: To facilitate the retrieval of data, several ontologies are deployed in the quad store in order to perform question/answering walking the ontologies. The deployed ontologies are:

  • The Semanticscience Integrated Ontology (SIO),
  • the Human Disease Ontology (DO),
  • the Orphanet Rare Disease Ontology (ORDO),
  • the NCI thesaurus (NCIt),
  • the Human Phenotype Ontology (HPO),
  • the Experiment Factor Ontology (EFO).
  • the Evidence Code Ontology (ECO).
Please, note the coverage of DisGeNET with other disease terminologies summarized in the following table.
UMLS MeSH OMIM NCIt DO ORDO ICD9CM EFO HPO DECIPHER
% Diseases 100 57 40 34 20 14 11 11 8 0.4


Federated Queries

The purpose of DisGeNET linked dataset is to enable richer queries over the data. The DisGeNET SPARQL endpoint also supports federated queries over other linked datasets such as UniProt, Gene Expression Atlas (GXA), and WikiPathways. The DisGeNET SPARQL endpoint supports the syntax and semantics of SPARQL 1.1 for executing queries distributed over different SPARQL endpoints. See the following section for some examples.



1. Exploring DisGeNET Data

Query 1.1: Retrieve all the gene-disease associations (GDAs) and their general description

# Get all the GDAs of type 'Therapeutic' (sio:SIO_001120) and their related annotation to general description. SELECT ?gda ?label ?comment ?title ?id ?voidSubset FROM <http://rdf.disgenet.org> WHERE { ?gda rdf:type sio:SIO_001120 ; rdfs:label ?label ; rdfs:comment ?comment ; dcterms:title ?title ; dcterms:identifier ?id ; void:inDataset ?voidSubset } LIMIT 20

Execute

Back to top

Query 1.2: Retrieve all the GDAs and their related gene and disease entities

# Get all the GDAs, associated gene and disease URIs based on the DisGeNET ID, NCBI GeneID, and UMLS CUI, respectively. SELECT ?gda ?gene ?disease FROM <http://rdf.disgenet.org> WHERE { ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 } LIMIT 20

Execute

Back to top

Query 1.3: Retrieve all the supporting evidences for the association between Rett Syndrome and the MECP2 gene

# Give me all the supporting evidences in DisGeNET, for the association between the "Rett Syndrome" disease (umls:C0035372) and the MECP2 gene (ncbigene:4204). SELECT DISTINCT ?gda <http://linkedlifedata.com/resource/umls/id/C0035372> as ?disease <http://identifiers.org/ncbigene/4204> as ?gene ?score ?source ?associationType ?pmid ?sentence WHERE { ?gda sio:SIO_000628 <http://linkedlifedata.com/resource/umls/id/C0035372>,<http://identifiers.org/ncbigene/4204> ; rdf:type ?associationType ; sio:SIO_000216 ?scoreIRI ; sio:SIO_000253 ?source . ?scoreIRI sio:SIO_000300 ?score . OPTIONAL { ?gda sio:SIO_000772 ?pmid . ?gda dcterms:description ?sentence . } }

Execute

Back to top

Query 1.4: Retrieve all the GDAs from CURATED sources and with a score greater than or equal to 0.4

# Give me all the GDAs from CURATED sources (uniprot, ctd_human, clinvar) with a score greater than or equal to 0.4. SELECT DISTINCT ?gda ?disease ?source ?score WHERE { ?gda sio:SIO_000628 ?gene, ?disease ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?scoreIRI sio:SIO_000300 ?score . FILTER regex(?source, "uniprot|ctd_human|clinvar") FILTER (?score > "0.4"^^xsd:decimal || ?score = "0.4"^^xsd:decimal) } ORDER BY DESC(?score) LIMIT 100

Execute

Back to top

Query 1.5: Retrieve all the diseases associated with transporters

# Give me all the diseases associated with proteins classified as 'transporter' according to the PANTHER classification. SELECT DISTINCT ?disease ?diseaselabel ?diseasename WHERE { ?panther rdfs:subClassOf sio:SIO_000275 ; dcterms:title ?panthername . FILTER regex(?panthername, "transporter") ?gene sio:SIO_000095 ?panther . ?gda sio:SIO_000628 ?gene, ?disease . FILTER regex(STR(?disease), "umls/id") ?disease dcterms:title ?diseasename . ?disease rdfs:label ?diseaselabel } LIMIT 100

Execute

Back to top

Query 1.6: Retrieve the genes associated with Alzheimer disease

# For Alzheimer Disease, give me all the genes associated with the disease with a score greater than 0.29. SELECT DISTINCT ?gene str(?geneName) as ?name ?score WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?disease rdf:type ncit:C7057 ; dcterms:title "Alzheimer's Disease"@en . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.29"^^xsd:decimal) } ORDER BY DESC(?score)

Execute

# For Alzheimer Disease, give me all the genes associated with the disease with a score greater than 0.29, and gene-related annotation such as the protein they encode, and the the REACTOME pathway in which they are known to be involved. SELECT DISTINCT ?gene str(?geneName) as ?name ?score ?protein ?proteinlinkout str(?pathwayname) as ?pathwayname WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName ; sio:SIO_010078 ?protein ; sio:SIO_000062 ?pathway . ?disease rdf:type ncit:C7057 ; dcterms:title "Alzheimer's Disease"@en . ?scoreIRI sio:SIO_000300 ?score . ?protein skos:exactMatch ?proteinlinkout . ?pathway dcterms:title ?pathwayname . FILTER (?score > "0.29"^^xsd:decimal) } ORDER BY DESC(?score)

Execute

Back to top

Query 1.7: Retrieve all the GDAs classified according to the association types of the DisGeNET ontology

# Give me all GDAs in DisGeNET and the type of relationship between genes and diseases. SELECT DISTINCT ?gda ?type ?label WHERE { ?gda rdf:type ?type . ?type rdfs:subClassOf+ sio:SIO_000983 . ?type rdfs:label ?label } LIMIT 50

Execute

Back to top

Query 1.8: Retrieve the diseasome

# Give me all the associations between diseases (diseasome) based on shared genes. SELECT DISTINCT ?disease ?diseaseName ?gene ?disease2 ?diseaseName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease2,?gene . ?disease dcterms:title ?diseaseName . ?disease2 dcterms:title ?diseaseName2 . FILTER regex(?gene, "ncbigene") FILTER regex(?disease, "umls/id") FILTER regex(?disease2, "umls/id") FILTER (?disease != ?disease2) FILTER (?gda != ?gda2) } LIMIT 50

Execute

Back to top

Query 1.9: Retrieve the gene-gene network

# Give me all the associations between genes based on shared diseases. SELECT DISTINCT ?gene ?geneName ?disease ?gene2 ?geneName2 WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gda2 sio:SIO_000628 ?disease,?gene2 . ?gene dcterms:title ?geneName . ?gene2 dcterms:title ?geneName2 . FILTER regex(?disease, "umls/id") FILTER regex(?gene, "ncbigene") FILTER regex(?gene2, "ncbigene") FILTER (?gene != ?gene2) FILTER (?gda != ?gda2) } LIMIT 50

Execute

Back to top

Query 1.10: Retrieve the number of phenotypes annotated for each rare disease

# Give me the number of phenotypes from HPO annotated for each rare disease from Orphanet. SELECT ?orphanet ?orphanetName count(DISTINCT ?phenotype) as ?phenotypes WHERE { ?orphanet sio:SIO_001279 ?phenotype ; dcterms:title ?orphanetName . FILTER regex(?orphanet, "identifiers.org/orphanet") } GROUP BY ?orphanet ?orphanetName ORDER BY DESC(?phenotypes) LIMIT 50

Execute

Back to top

Query 1.11: Retrieve all diseases in DisGeNET classified as 'Ovarian cancer'

# Give me all diseases in DisGeNET that belong to 'Ovarian cancer' class in the Human Disease Ontology (DOID:2394). SELECT DISTINCT ?umls ?umlsTerm ?doid ?doTerm WHERE { ?gda sio:SIO_000628 ?umls . ?umls dcterms:title ?umlsTerm ; skos:exactMatch ?doid . ?doid rdfs:label ?doTerm ; rdfs:subClassOf+ <http://purl.obolibrary.org/obo/DOID_2394> . FILTER regex(?umls, "umls/id") } LIMIT 20

Execute

Back to top

Query 1.12: Retrieve the number of genes and the number of phenotypes of the disease 'Nodular lymphocyte predominant Hodgkin lymphoma'

# Give me the number of genes and the number of phenotypes associated with 'Nodular lymphocyte predominant Hodgkin lymphoma' (Orphanet_86893). SELECT count(DISTINCT ?gene) as ?genes ?disease ?diseaseName <http://identifiers.org/orphanet/86893> as ?orphanet count(DISTINCT ?phenotype) as ?phenotypes WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?disease dcterms:title ?diseaseName ; skos:exactMatch <http://identifiers.org/orphanet/86893> . <http://identifiers.org/orphanet/86893> dcterms:title ?orphanetName ; sio:SIO_001279 ?phenotype . ?phenotype dcterms:title ?phenotypeName FILTER regex(?gene, 'ncbigene') FILTER regex(?disease, 'umls/id') } GROUP BY ?disease ?diseaseName

Execute

Back to top



2. Federated Queries (FED): integrating DisGeNET data with other Linked Datasets in the LOD cloud.

Under Construction!

FED1: DisGeNET + WikiPathways (queries made in collaboration with the WikiPathways RDF team. Thanks!!!)

NAMESPACE

PREFIX wp: <http://vocabularies.wikipathways.org/wp#>

Query 2.1.1: Retrieve the genes and pathways associated with 'Marfan Syndrome'

# Give me all disease genes for 'Marfan Syndrome' (MeSH:D008382 or OMIM:601665) in DisGeNET and the pathways for these genes from WikiPathways. Output the disease name, NCBI Gene ID, HGNC gene name, gene label, WikiPathways pathway ID and name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene str(?GeneName) as ?GeneTitle ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D008382> . # alternatively, searching by MIM term: # ?disease skos:exactMatch <http://bio2rdf.org/omim:601665> ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName . # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?GeneName)

Execute

Back to top

Query 2.1.2: Retrieve the pathways associated with 'Pulmonary Emphysema'

# Give me all pathways in WikiPathways for CURATED disease genes associated with 'Pulmonary Emphysema' (MeSH:D011656) in DisGeNET. Output the disease name, WikiPathways pathway ID and name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D011656> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . FILTER regex(?source, "uniprot|ctd_human|clinvar") # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } } ORDER BY DESC(?PathwayName)

Execute

Back to top

Query 2.1.3: Retrieve the pathways associated with 'Schizophrenia', and show the number Schizophrenia genes in each pathway

# Give me all pathways in WikiPathways and the total number of disease genes in each pathway for 'Schizophrenia' (MeSH:D012559). We will consider associations from CURATED sources with DisGeNET score greater than 0.35. Output the disease name, WikiPathways pathway ID, pathway name, and the number of disease genes. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?PathwayID str(?PathwayName) as ?PathwayName count(DISTINCT ?gene) AS ?genes WHERE { # DisGeNET: get disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D012559> ; dcterms:title ?DiseaseName . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.35"^^xsd:decimal) # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . } # end of service } # end of query ORDER BY DESC(?genes)DESC(?PathwayName) LIMIT 100

Execute

Back to top

Query 2.1.4: Retrieve the genes and pathways associated with 'Diabetes Mellitus, Type 2'

# Give me all disease genes for 'Diabetes Mellitus, Type 2' (MeSH:D003924) with DisGeNET score greater than 0.35, that are involved in pathways and the number of pathways in WikiPathways in which each gene is involved. Output the disease name, gene URI, gene name, and the number of pathways. Please, be aware that this query takes some time due to the amount of data crossed. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?diseaseName ?gene ?geneName ?nPathways WHERE { ?disease skos:exactMatch <http://id.nlm.nih.gov/mesh/D003924> ; rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?scoreIRI sio:SIO_000300 ?score . { SELECT ?gene ?nPathways WHERE { # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT ?gene COUNT(DISTINCT ?pathway) as ?nPathways WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?pathwayid } GROUP BY ?gene } # end of service } # end of where } # end of subquery FILTER (?score > "0.35"^^xsd:decimal) } # end of query LIMIT 100

Execute

Back to top

Query 2.1.5: Retrieve the number of genes for 'Bardet-Biedl Syndrome' disease and indicate the number of genes present in pathways

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), give me the total number of associated genes in DisGeNET and the total number of these genes in WikiPathways. Output the disease name, the total number of disease genes in Wikipathways and the total number of disease genes in DisGeNET. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?GeneInPathway ?TotalGene WHERE { SELECT * WHERE { # Total # of DisGeNET genes in WikiPathways ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { SELECT * WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway } } # Total # genes in DisGeNET { SELECT DISTINCT ?DiseaseName count(distinct ?gene2) as ?TotalGene WHERE { ?gda sio:SIO_000628 ?gene2,?disease . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gene2 rdf:type ncit:C16612 } } } }

Execute

Back to top

Query 2.1.6: Retrieve the genes and the pathways associated with 'Bardet-Biedl Syndrome' disease. In addition, list all the genes involved in each of the pathways found

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the genes, the pathway(s) in which the gene is involved from WikiPathways, and all the genes present in each of these pathways. Output the disease name, gene in DisGeNET, pathway ID, and gene in Wikipathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName ?gene ?PathwayID ?allGeneInPw WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?gene) DESC(?PathwayID) DESC(?allGeneInPw)

Execute

Back to top

Query 2.1.7: Retrieve the total number of both disease genes and all genes involved in each pathway for the 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID, and the number of all genes in each of these pathway. Output the disease name, DisGeNET genes in the pathway, pathway ID, and all genes in the Wikipathways pathway. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute

Back to top

Query 2.1.8: Retrieve the total number of both disease genes and all genes involved in each disease pathway and secondary pathways for 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes shared between the disease pathway and a secondary pathway, and the secondary pathway ID. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, and all genes in the disease pathway shared with another pathway, and the secondary pathway ID. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway ?PathwayID2 WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute

Back to top

Query 2.1.9: Retrieve the number of disease genes, the number of all genes, and the number of secondary pathways in each disease pathway for 'Bardet-Biedl Syndrome'

# For 'Bardet-Biedl Syndrome' disease (MeSH:D020788), retrieve from DisGeNET the number of disease genes in a pathway in WikiPathways, the pathway ID or let's call it disease pathway, the number of all genes in the disease pathway, and the number of secondary pathways annotated to genes in each disease pathway. Output the disease name, DisGeNET genes in the pathway, disease pathway ID, the number of all genes in the disease pathway, and the number of secondary pathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?DiseaseName count(distinct ?gene) as ?diseasegenesinthepathway ?PathwayID count(distinct ?allGeneInPw) as ?allgenesinthepathway count(distinct ?PathwayID2) as ?totalPathways WHERE { # DisGeNET: get disease-genes ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020788> ; dcterms:title ?DiseaseName . ?gda sio:SIO_000628 ?gene,?disease . ?gene rdf:type ncit:C16612 . # WikiPathways: get gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . { SELECT ?PathwayID ?allGeneInPw WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName . FILTER regex(str(?allGeneInPw), "ncbigene") } } { SELECT ?allGeneInPw ?PathwayID2 WHERE { ?geneProduct a wp:GeneProduct ; dc:identifier ?allGeneInPw ; dcterms:isPartOf ?pathway2 . ?pathway2 dc:identifier ?PathwayID2 ; dc:title ?PathwayName2 . FILTER regex(str(?allGeneInPw), "ncbigene") } # end of select } # end of subquery } # end of service } # end of query ORDER BY DESC(?diseasegenesinthepathway) DESC(?allgenesinthepathway)

Execute

Back to top

Query 2.1.10: Retrieve the total number of phenotypes, genes, and pathways for 'Nodular Lymphocyte Predominant Hodgkin Lymphoma'

# For 'Nodular Lymphocyte Predominant Hodgkin Lymphoma' disease (Orpha:86893), give me the number of associated phenotypes and disease genes in DisGeNET, and pathways for these disease genes in WikiPathways. Output the disease URI, disease name, number of phenotypes, genes, and pathways. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?disease ?diseaseName count(distinct ?phenotype) as ?phenotypes count(distinct ?gene) as ?genes count(distinct ?pathwayid) as ?pathways WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://identifiers.org/orphanet/86893> ; dcterms:title ?diseaseName . <http://identifiers.org/orphanet/86893> sio:SIO_001279 ?phenotype . OPTIONAL { { SELECT DISTINCT ?gene ?pathwayid WHERE { ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://identifiers.org/orphanet/86893> . ?gda sio:SIO_000628 ?disease,?gene . ?gene rdf:type ncit:C16612 . SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dcterms:identifier ?pathwayid . } # end of service } # end of service query } # end of subquery } # end of optional } # end of query LIMIT 100

Execute

Back to top

Query 2.1.11: Retrieve the phenotypes, genes, and pathways for 'Nodular Lymphocyte Predominant Hodgkin Lymphoma'

# For 'Nodular Lymphocyte Predominant Hodgkin Lymphoma' disease (Orpha:86893), give me the associated phenotypes and disease genes in DisGeNET, and pathways for these disease genes in WikiPathways. Output the disease URI, disease name, phenotype name, gene name, and pathway name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT ?disease ?diseaseName str(?phenotypeName) as ?phenotypeName str(?geneName) as ?geneName str(?pathwayName) as ?pathwayName WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?gene rdf:type ncit:C16612 ; dcterms:title ?geneName . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://identifiers.org/orphanet/86893> ; dcterms:title ?diseaseName . <http://identifiers.org/orphanet/86893> sio:SIO_001279 ?phenotype . ?phenotype dcterms:title ?phenotypeName . OPTIONAL { { SELECT DISTINCT ?gene ?pathwayName WHERE { ?gda sio:SIO_000628 ?disease,?gene . ?disease rdf:type ncit:C7057 ; skos:exactMatch <http://identifiers.org/orphanet/86893> . ?gene rdf:type ncit:C16612 . SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; dcterms:isPartOf ?pathway . ?pathway dcterms:identifier ?pathwayid ; dc:title ?pathwayName . } # end of service } # end of service query } # end of subquery } # end of optional } # end of query ORDER BY DESC(?phenotypeName) DESC(?geneName) DESC(?pathwayName) LIMIT 100

Execute

Back to top

Query 2.1.12: Retrieve the pathways associated with 'Lafora Disease'

# For 'Lafora Disease' (MeSH:D020192), give me the associated genes from LITERATURE sources in DisGeNET with a score less or equal than 0.2, and the pathways annotated to these disease genes in WikiPathways. Output the disease name, the gene URI, the score, the number of publications, the pathway URI, and the pathway name. PREFIX wp: <http://vocabularies.wikipathways.org/wp#> SELECT DISTINCT str(?DiseaseName) as ?DiseaseName ?gene ?score count(distinct ?publication) as ?numberOfPublications ?PathwayID str(?PathwayName) as ?PathwayName WHERE { # Query DisGeNET for disease-genes ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000772 ?publication ; sio:SIO_000253 ?source ; sio:SIO_000216 ?scoreIRI . ?disease rdf:type ncit:C7057 ; dcterms:title ?DiseaseName ; skos:exactMatch <http://id.nlm.nih.gov/mesh/D020192> . ?gene rdf:type ncit:C16612 ; dcterms:title ?GeneName . ?source wi:evidence ?evidence . ?evidence rdfs:label ?label . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score < "0.2"^^xsd:decimal || ?score = "0.2"^^xsd:decimal) FILTER regex(?label, "literature", "i") # Query WikiPathways for gene-pathways SERVICE <http://sparql.wikipathways.org/> { ?geneProduct a wp:GeneProduct ; dc:identifier ?gene ; rdfs:label ?GeneLabel ; dcterms:isPartOf ?pathway . ?pathway dc:identifier ?PathwayID ; dc:title ?PathwayName } } ORDER BY DESC(?score) DESC(?numberOfPublications) DESC(?PathwayName)

Execute

Back to top



EBI RDF Source

FED2: DisGeNET + Gene Expression Atlas (GXA)

NAMESPACE

PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX efo: <http://www.ebi.ac.uk/efo/>
PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/>
PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>


Query 2.2.1: Retrieve the disease genes with differential gene expression

# Give me all genes in DisGeNET that have differential gene expression data in the Gene Expression Atlas (GXA) database. Output the gene, expression value, p-value, and EFO factor associated. PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT ?gene ?expressionValue ?pvalue ?propertyValue WHERE { ?gene sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") # Query GXA for differential expression values { SELECT ?uniprot ?expressionValue ?pvalue ?propertyValue WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?probe atlasterms:dbXref ?uniprot . ?uniprot rdf:type atlasterms:UniprotDatabaseReference . ?value atlasterms:isMeasurementOf ?probe ; atlasterms:pValue ?pvalue ; rdfs:label ?expressionValue ; atlasterms:hasFactorValue ?factor . ?factor a <http://www.ebi.ac.uk/efo/EFO_0001073> ; # obesity atlasterms:propertyValue ?propertyValue ; atlasterms:propertyType ?propertyType #filter regex(?propertyType, "organism_part|disease") } # end of service } # end of select } # end of subquery } # end of fed ORDER BY ASC(?pvalue) LIMIT 10

Execute

Back to top

Query 2.2.2: Retrieve genes with differential gene expression for the 'Obesity' disease

# Give me all genes from DisGeNET associated with 'Obesity' from CURATED sources that have differential gene expression data in the Gene Expression Atlas (GXA) database. Output the gene, expression value, p-value, and tissue/organism part. PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT ?gene str(?expressionValue) as ?expressionValue ?pvalue str(?propertyValue) as ?propertyValue WHERE { ?gda sio:SIO_000628 ?gene, <http://linkedlifedata.com/resource/umls/id/C0028754> ; sio:SIO_000253 ?source . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?source, "uniprot|ctd_human|clinvar") FILTER regex(?uniprot, "^http://purl.uniprot.org/uniprot/") { # Get the expression for the genes differentially expressed in GXA SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?probe atlasterms:dbXref ?uniprot . ?uniprot rdf:type atlasterms:UniprotDatabaseReference . ?value atlasterms:isMeasurementOf ?probe ; atlasterms:pValue ?pvalue ; rdfs:label ?expressionValue ; atlasterms:hasFactorValue ?factor . ?factor a <http://www.ebi.ac.uk/efo/EFO_0001073> ; # obesity atlasterms:propertyType ?propertyType ; atlasterms:propertyValue ?propertyValue . FILTER regex(?propertyType, "organism_part|disease") } # end of service } # end of subquery } # end of query ORDER BY ASC(?pvalue) LIMIT 10

Execute

Back to top

Query 2.2.3: Retrieve all the diseases associated with genes differentially expressed in 'Pancreatic Cancer'

# For genes differentially expressed in 'Pancreatic Cancer' (efo:EFO_0002618), give me the associated diseases in DisGeNET with a score greater or equal than 0.1. Output the disease name, the gene name, the score for GDA, the source, the expression value, the p-value, and the value of the experimental factor in GXA. PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT str(?diseasename) as ?diseasename str(?genename) as ?genename ?score str(?expressionValue) as ?expressionValue ?pvalue str(?propertyValue) as ?propertyValue WHERE { { # Get the expression for the genes differentially expressed in GXA SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { SELECT * WHERE { ?probe atlasterms:dbXref ?uniprot . ?uniprot rdf:type atlasterms:UniprotDatabaseReference . ?value atlasterms:isMeasurementOf ?probe ; atlasterms:pValue ?pvalue ; rdfs:label ?expressionValue ; atlasterms:hasFactorValue ?factor . ?factor a <http://www.ebi.ac.uk/efo/EFO_0002618> ; atlasterms:propertyType ?propertyType ; atlasterms:propertyValue ?propertyValue } LIMIT 100 } } } ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.1"^^xsd:decimal || ?score = "0.1"^^xsd:decimal) ?gene rdf:type ncit:C16612 ; rdfs:label ?genename ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . ?disease rdf:type ncit:C7057 ; rdfs:label ?diseasename . FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") } # end of query ORDER BY DESC(?score)ASC(?pvalue) LIMIT 100

Execute


# For genes differentially expressed in 'Pancreatic Cancer' (efo:EFO_0002618), give me the associated diseases in DisGeNET with a score greater or equal than 0.1 and the original source of provenance. Output the disease name, the gene name, the score for GDA, the source, the expression value, the p-value, and the value of the experimental factor in GXA. PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT str(?diseasename) as ?diseasename str(?genename) as ?genename ?score ?source str(?expressionValue) as ?expressionValue ?pvalue str(?propertyValue) as ?propertyValue WHERE { { # Get the expression for the genes differentially expressed in GXA SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { SELECT * WHERE { ?probe atlasterms:dbXref ?uniprot . ?uniprot rdf:type atlasterms:UniprotDatabaseReference . ?value atlasterms:isMeasurementOf ?probe ; atlasterms:pValue ?pvalue ; rdfs:label ?expressionValue ; atlasterms:hasFactorValue ?factor . ?factor a <http://www.ebi.ac.uk/efo/EFO_0002618> ; atlasterms:propertyType ?propertyType ; atlasterms:propertyValue ?propertyValue } LIMIT 100 } } } ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000216 ?scoreIRI ; sio:SIO_000253 ?source . ?gene rdf:type ncit:C16612 ; rdfs:label ?genename ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . ?disease rdf:type ncit:C7057 ; rdfs:label ?diseasename . ?scoreIRI sio:SIO_000300 ?score . FILTER (?score > "0.1"^^xsd:decimal || ?score = "0.1"^^xsd:decimal) FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") } # end of query ORDER BY DESC(?score) ASC(?pvalue) LIMIT 100

Execute

Back to top


FED3: DisGeNET + ChEMBL

NAMESPACE

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/>


Query 2.3.1: Retrieve all disease genes in DisGeNET whose proteins are targeted by molecules in ChEMBL

# Give me all genes in DisGeNET that encode a protein(s) that are targeted by compounds in the ChEMBL database with experimental activity evidence. Output the gene, encoded protein, and compound. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT ?gene ?uniprot ?molecule WHERE { ?gene sio:SIO_010078 ?identifiers . ?identifiers skos:exactMatch ?uniprot . # Query ChEMBL for active molecules { SELECT ?uniprot ?molecule WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } # end of service } # end of select } # end of subquery } # end of query LIMIT 10

Execute

Back to top

Query 2.3.2: Retrieve the protein targets of Gleevec (CHEMBL941), the genes that encode these protein targets and their associated diseases

# Give me all gene(s), associated disease(s), and encoded protein(s) in DisGeNET targeted by the drug Gleevec (CHEMBL941). Output the activity, target and target compound, encoded protein, gene, and disease. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/> SELECT DISTINCT ?activity ?target ?targetcmpt ?uniprot ?gene ?diseaseName WHERE { ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") # Query ChEMBL for activity data for Gleevec { SELECT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { SELECT ?activity ?assay ?target ?targetcmpt ?uniprot WHERE { ?activity a cco:Activity ; cco:hasMolecule chembl_molecule:CHEMBL941 ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } # end of chembl query } # end of service } } } # end of query LIMIT 10

Execute

Back to top

Query 2.3.3: Retrieve potential drug targets for 'Brain Infarction'

# Give me all protein(s) associated with 'Brain Infarction' disease in DisGeNET that are hits of molecules in ChEMBL. Output the activity, target and target compound, encoded protein, gene, and disease. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT ?activity ?molecule ?target ?targetcmpt ?uniprot ?gene ?diseaseName WHERE { # Query ChEMBL for activity data { SELECT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } # end of service } # end of select } # end of subquery ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER (str(?diseaseName) = "Brain Infarction") FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") } # end of query LIMIT 10

Execute

# To execute the same query based on evidence for the gene-disease association, i.e. give me all drug target candidates ranked by the number of evidences that support each GDA.

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT ?activity ?molecule ?target ?targetcmpt ?uniprot ?gene ?diseaseName count(distinct ?gda) as ?numberOfEvidences WHERE { # Query ChEMBL for activity data { SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } # end of service } # end of select } # end of subquery ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") FILTER (str(?diseaseName) = "Brain Infarction") } # end of query order by desc(?numberOfEvidences) LIMIT 50

Execute

Back to top

Query 2.3.4: Retrieve the number of molecules for 'Brain Infarction'

# Give me the number of molecules in ChEMBL that target disease proteins for 'Brain Infarction' disease. Output the target and target compound, encoded protein, gene, and the total number of molecules. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT ?target ?targetcmpt ?uniprot ?gene count(distinct ?molecule) as ?molecules WHERE { # Query ChEMBL for active molecules { SELECT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } # end of service } # end of select } # end of subquery ?gda sio:SIO_000628 ?gene,?disease . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") FILTER (str(?diseaseName) = "Brain Infarction") } # end of query LIMIT 10

Execute

Back to top

Query 2.3.5: Retrieve the potential drug targets for 'Aarskog Syndrome'

# For 'Aarskog Syndrome' disease (UMLS_CUI:C0175701), give me the associated proteins from CURATED sources that are targets for molecules in ChEMBL. Output the disease name, the source, the number of supporting evidences for each GDA, the gene, the target, the molecule and the activity. PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> SELECT DISTINCT str(?diseaseName) as ?diseasename ?source count(distinct ?gda) as ?evidences ?gene ?target ?molecule ?activity WHERE { ?gda sio:SIO_000628 ?gene,?disease ; sio:SIO_000253 ?source . ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName . ?gene rdf:type ncit:C16612 ; sio:SIO_010078 ?protein . ?protein skos:exactMatch ?uniprot . FILTER regex(?source, "uniprot|ctd_human|clinvar") FILTER (?disease = <http://linkedlifedata.com/resource/umls/id/C0175701>) FILTER regex(?uniprot, "http://purl.uniprot.org/uniprot/") # Query ChEMBL for activity data { SELECT DISTINCT * WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> { ?uniprot a cco:UniprotRef . ?targetcmpt cco:targetCmptXref ?uniprot . ?target cco:hasTargetComponent ?targetcmpt . ?assay cco:hasTarget ?target . ?activity a cco:Activity ; cco:hasMolecule ?molecule ; cco:hasAssay ?assay . FILTER (?target = <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2862>) } # end of service } # end of select } # end of subquery } # end of query

Execute

Back to top


FED4: DisGeNET + Ensembl

NAMESPACE

PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/>
PREFIX ensembltranscript: <http://rdf.ebi.ac.uk/resource/ensembl.transcript/>
PREFIX ensemblexon: <http://rdf.ebi.ac.uk/resource/ensembl.exon/>
PREFIX ensemblprotein: <http://rdf.ebi.ac.uk/resource/ensembl.protein/>
PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/>
PREFIX faldo: <http://biohackathon.org/resource/faldo#>


Query 2.4.1: Retrieve diseases associated with genes in Ensembl

# Give me all diseases in DisGeNET associated with a gene in Ensembl with NCBI Gene ID 675. Output the Ensemble gene ID, the disease, and the disease name ordered alphabetically. PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/> SELECT DISTINCT ?ensemblg ?disease str(?diseasename) as ?diseaseName WHERE { # Query Ensembl for genes SERVICE <https://www.ebi.ac.uk/rdf/services/ensembl/sparql> { ?ensemblg ensemblterms:DEPENDENT ?gene ; obo:RO_0002162 <http://identifiers.org/taxonomy/9606> . FILTER regex(str(?ensemblg), 'ensg', 'i') FILTER (?gene = <http://identifiers.org/ncbigene/675>) } # end of service # Query DisGeNET for associated diseases ?gda sio:SIO_000628 ?gene, ?disease . ?disease a <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C7057> ; dcterms:title ?diseasename . } # end of query ORDER BY ASC(UCASE(str(?diseaseName)))

Execute

Back to top


FED5: DisGeNET + UniProt

Query 2.5.1: Retrieve disease coding genes in DisGeNET with disease annotation in UniProt

# Give me all proteins in DisGeNET encoded by disease genes that have disease annotation in UniProt. Output the protein and the disease annotation. SELECT ?protein ?comment WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query UniProt for proteins with disease annotation SERVICE <http://sparql.uniprot.org/sparql> { ?uniprot up:annotation ?annotation . ?annotation a up:Disease_Annotation ; rdfs:comment ?comment . } } LIMIT 10

Execute

Back to top


FED6: DisGeNET + Biomodels

NAMESPACE

PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#>


Query 2.6.1: Retrieve disease proteins that are involved in computational models

# Give me all proteins in DisGeNET encoded by disease genes that are participants in Biomodels. Output the protein, the model element URI, the type of model element URI, and the qualifier. PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#> SELECT * WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query biomodels { SELECT * WHERE { SERVICE <https://www.ebi.ac.uk/rdf/services/biomodels/sparql> { ?modelElement rdf:type ?elementType ; ?qualifier ?protein . ?qualifier rdfs:subPropertyOf sbmlrdf:sbmlAnnotation . FILTER (strstarts(str(?protein), "http://identifiers.org/uniprot/")) } } } } LIMIT 20

Execute

Back to top

Query 2.6.2: Retrieve disease proteins and the computational models in which are modelled

# Give me all proteins in DisGeNET encoded by disease genes and the models from Biomodels where participate. Output the protein, the model URI, and the model element URI. PREFIX sbmlrdf: <http://identifiers.org/biomodels.vocabulary#> SELECT * WHERE { ?protein a ncit:C17021; skos:exactMatch ?uniprot . FILTER(strstarts(str(?uniprot), "http://purl.uniprot.org/uniprot")) # Query biomodels { SELECT * WHERE { SERVICE <https://www.ebi.ac.uk/rdf/services/biomodels/sparql> { ?modelElement rdf:type ?elementType ; ?qualifier ?protein . ?qualifier rdfs:subPropertyOf sbmlrdf:sbmlAnnotation . FILTER (strstarts(str(?protein), "http://identifiers.org/uniprot/")) } } } } LIMIT 20

Execute

Back to top



5. Documentation

The aim of this section is at providing supporting material to the new DisGeNET-RDF user. Please, give us your feedback and comments:


5.1 About RDF, Linked Data, Semantic Web technologies

Good introductions to the field are:

  • Wikipedia, always a good place to start!
  • W3C, look at whom develops the standards. The World Wide Web Consortium (W3C) is an international community where Member organizations, a full-time staff, and the public work together to develop Web standards.
  • EBI RDF Platform documentation, the EBI is one of the major Linked Data providers of Life Sciences. There is a comprehensive documentation in its website that is worth reading.

Back to top


5.2 DisGeNET-RDF Getting Started

Don't get scared, get started: because starts may be difficult, we provide documents and tutorials to aid the user to get started and to have a greater knowledge about DisGeNET-RDF features, in order to facilitate its understanding and application.

Back to top

5.3 Do you want to provide Linked Data? the best place to start is reading the guidelines!

In this subsection we provide links to documents that had been very useful to develop DisGeNET-RDF. HOW-TO:

Back to top





DisGeNET Nanopublications

The Integrative Biomedical Informatics Group is pleased to announce the second publication of the DisGeNET Nanopublications that is a Linked Dataset implemented in combination of the nanopublication approach [nanopub.org] and the Trusty URIs technique [PDF]. It is an alternative way to mine statements about gene-disease associations contained in DisGeNET. Nanopublications are a new way of publishing structured data that allows the tracking of provenance along with the scientific statement. The Trusty URIs is a novel technique to make resources in the Web immutable and verifiable, and to ensure the unambiguity of the data linking in the (semantic) Web. This new Linked Dataset provides nanopublications about scientific statements of human GDAs. These GDAs published as Trusty URI nanopublications are machine-interpretable, immutable, permanent, and verifiable. Each GDA statement has its provenance description providing evidence, attribution, creation time, and further context of its creation. Each GDA is classified as “CURATED”, “PREDICTED”, or “LITERATURE” in the DisGeNET context to categorize the evidence of the statement based on the type of assertion and curation made in the original databases. DisGeNET nanopublications include metadata annotations about the general topic of the nanopublications, i.e. ‘Gene-Disease Association’, semantically described by SIO to facilitate its discoverability in the Semantic Web (see PDF).

1. Linked Dataset Description

The third release of DisGeNET published as nanopublications is a distribution of DisGeNET v4.0 (Nanopublications version v4.0.0.0). The dataset consists of 1,414,902 nanopublications, representing the same number of scientific statements for 429,036 different GDAs with their detailed provenance, levels of evidence and publication information descriptions, all annotated as RDF statements and encapsulated into the nanopublication RDF graphs (5,659,608 graphs in total). Specifically, the dataset is composed of 48,106,668 N-Quads, i.e. RDF triples with their graph (or “context”) added as the fourth member in the tuple (Subject, Predicate, Object, Context), everything being serialized in TriG syntax.

2. DisGeNET Nanopublication Schema

The official guidelines to create nanopublications were used. A DisGeNET nanopublication is modeled by 4 named graphs: head, assertion, provenance and publication information. The head graph defines the structure of the nanopublication by linking to the other graph URIs. The assertion graph contains the description for a specific single GDA assertion. The provenance graph includes provenance, evidence and attribution statements that were directly mapped from the VoID description of the RDF dataset. Finally, the publication information graph includes all the metadata information regarding the nanopublication itself, see figure below (Click on the image to zoom in). The source of data for the DisGeNET nanopublications set is the RDF Linked Dataset version of DisGeNET. To implement Trusty URIs, the GitHub Java implementation was used.

Nanopublication Example

3. Access to the Nanopublications Linked Dataset

DisGeNET nanopublications can be accessed in two ways: they can be downloaded as a file in TriG format from the download section, and they are deployed in a new decentralized nanopublication server network, which is a distributed server network with a REST API to provide and propagate nanopublications identified by trusty URIs [ref]. DisGeNET nanopublications are registered in datahub with other datasets formatted as nanopubublications.

New: For performance reasons DisGeNET nanopublications are not accessible anymore via our SPARQL endpoint.


3.1 Data Downloads

To download the current dataset, which is the nanopublication distribution of the DisGeNET v4.0: nanopubs-v4.0.0.0.


3.2 SPARQL Example queries: Exploring DisGeNET as Nanopublications

DisGeNET nanopublications can be explored using the query language SPARQL via a SPARQL endpoint. With illustrative queries we show how to explore GDAs with DisGeNET nanopublications and how to integrate them with relationships published in other LOD sources. As example we can query DisGeNET nanopubs to answer the following question:

What are the proteins (and their protein interactions) associated to Alzheimer Disease with curated evidence?


Query 1.1: Retrieving Gene-Disease Associations

# First, we query DisGeNET for all the genes associated to Alzheimer Disease (umls:C0002395). This query only involves the assertion graph. SELECT DISTINCT ?gene FROM <http://rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } } LIMIT 10

Query 1.2: Filtering By Evidence

# Second, we filter the prior results with those assertions annotated as CURATED DisGeNET evidence. This query involves the provenance graph. SELECT DISTINCT ?gene ?evidence FROM <http://rdf.disgenet.org/nanopubs> WHERE { GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } } LIMIT 10

Query 1.3: Linking with Other LOD Resources

# Finally, we cross DisGeNET prior results with the Interaction Reference Index database data, which contains protein-protein interactions (PPI) annotations, through Bio2RDF::irefindex SPARQL endpoint, federating the query. Since in DisGeNET-RDF is also represented the relation between gene and the protein/s that encodes, we are able to cross DisGeNET with Bio2RDF::irefindex by Protein resources through the corresponding linkset to 'http://bio2rdf.org/uniprot:UniProtID'. PREFIX bio2rdf-ifx: <http://bio2rdf.org/irefindex_vocabulary:> SELECT DISTINCT ?gene ?protein ?protein_dgn ?evidence ?ppi ?protein_irx WHERE { ?gene sio:SIO_010078 ?protein . ?protein skos:exactMatch ?protein_dgn . FILTER regex(?protein_dgn, "bio2rdf.org/uniprot:") GRAPH ?head { ?assertion a np:Assertion . ?provenance a np:Provenance . } GRAPH ?assertion { ?gda sio:SIO_000628 ?gene, ?disease . ?gene rdf:type ncit:C16612 . ?disease rdf:type ncit:C7057 . FILTER regex(?disease, "umls/id/C0002395") } GRAPH ?provenance { ?assertion wi:evidence ?evidence . FILTER regex(?evidence, "curated") } # Get the interactome data from Bio2RDF::irefindex SERVICE <http://irefindex.bio2rdf.org/sparql> { OPTIONAL { ?ppi a bio2rdf-ifx:Pairwise-Interaction ; bio2rdf-ifx:interactor_a ?protein_dgn ; bio2rdf-ifx:interactor_b ?protein_irx . } } } LIMIT 100



The Open PHACTS Project

DisGeNET-RDF has been implemented in the Open PHACTS Discovery Platform (OPS), which is a Semantic Web platform developed under the Innovative Medicines Initiative (IMI; http://www.imi.europa.eu) funded Open PHACTS project. Remarkably, the integration of DisGeNET in OPS is essential to answer important research questions such as which compounds could effectively inhibit targets involved in a key pathway for the development of a disease. Please, see the Open PHACTS website for more details.




Contact Us

If you have questions or feedback about DisGeNET-RDF resources, please, don't hesitate to contact us at support(at)disgenet(dot)org