Introduction

The disgenet2r package contains a set of functions to retrieve, visualize and expand DisGeNET data. DisGeNET is a discovery platform that contains information about the genetic basis of human diseases (J. Piñero et al. 2015, 2017; Janet Piñero et al. 2019). DisGeNET integrates data from several expert curated databases and from text-mining the biomedical literature.

The current version of DisGeNET (v7.0) contains 1134942 gene-disease associations (GDAs), between 21671 genes and 30170 diseases, disorders, traits, and clinical or abnormal human phenotypes, and 369554 variant-disease associations (VDAs), between 194515 variants and 14155 diseases, traits, and phenotypes.

The information in DisGeNET is organized according to the original data source (Table 1). Diseases are identified using the UMLS concept unique identifier (CUI), but mappings to commonly employed biomedical vocabularies such as MeSH, OMIM, DO, HPO, and ICD-9 are also provided. The genes are identified using the NCBI Entrez Identifier, but annotations to the official gene symbol, the UniProt identifier, and the Panther Protein Class are also supplied. Finally, the GDAs and VDAs can be ranked using the DisGeNET score. The DisGeNET score ranges from 0 to 1, and takes into account the evidence supporting the association (See more information at http://disgenet.org/dbinfo/)

DisGeNET data is also represented as a Resource Description Framework (RDF), which provides new opportunities for data integration, making possible to link DisGeNET data to other external RDF datasets (Queralt-Rosinach et al. 2016).

Table: Sources of DisGeNET data

Source_Name Type_of_data Description
CTD_human GDAs The Comparative Toxicogenomics Database, human data
CGI GDAs The Cancer Genome Interpreter
CLINGEN GDAs The Clinical Genome Resource
GENOMICS_ENGLAND GDAs The Genomics England PanelApp
ORPHANET GDAs The portal for rare diseases and orphan drugs
PSYGENET GDAs Psychiatric disorders Gene association NETwork
HPO GDAs Human Phenotype Ontology
UNIPROT GDAs/VDAs The Universal Protein Resource
CLINVAR GDAs/VDAs ClinVar, public archive of relationships among sequence variation and human phenotype
GWASCAT GDAs/VDAs The NHGRI-EBI GWAS Catalog
GWASDB GDAs/VDAs The GWas Database
CTD_mouse GDAs The Comparative Toxicogenomics Database, Mus musculus data
MGD GDAs The Mouse Genome Database
CTD_rat GDAs The Comparative Toxicogenomics Database, Rattus norvergicus data
RGD GDAs The Rat Genome Database
BEFREE GDAs/VDAs Data from text mining medline abstracts using the BeFree System (Bravo et al. 2015)
LHGDN GDAs Literature-derived human gene-disease network generated by text mining NCBI GeneRIFs (Bundschus et al. 2008)
CURATED GDAs/VDAs Human curated sources: CTD, ClinGen, CGI, UniProt, Orphanet, PsyGeNET, Genomics England PanelApp
INFERRED GDAs Inferred data from: HPO,ClinVar, GWASCat, GwasDB
ANIMAL_MODELS GDAs Data from animal models: CTD_rat, RGD, CTD_mouse, MGD
ALL GDAs/VDAs All data sources

Contact

For questions regarding disgenet2r, contact our support account at .

Installation and first run

The package disgenet2r is available through Bitbucket. The package requires an R version > 3.5. Additionally, the following packages are needed: VennDiagram, stringr, tidyr, SPARQL, RCurl, igraph, ggplot2, and reshape2.

Install disgenet2r by typing in R:

library(devtools)
install_bitbucket("ibi_group/disgenet2r")

To load the package:

library(disgenet2r)

To be able to use the disgenet2r package, you need to register for a free account here. Once you have completed the registration process, use the get_disgenet_api_key to retrieve your API key.

disgenet_api_key <- get_disgenet_api_key(
                  email = "user@email.com", 
                  password = "myspwd" )

After retrieving the API key, run the line below so the key is available for all the disgenet2r functions.

Sys.setenv(DISGENET_API_KEY= disgenet_api_key)

In the following document, we illustrate how to use the disgenet2r package through a series of examples.

Retrieving Gene-Disease Associations from DisGeNET

Searching by gene

The gene2disease function retrieves the GDAs in DisGeNET for a given gene, or a for a list of genes. The gene(s) can be identified by either the NCBI gene identifier, or the official Gene Symbol, and the type of identifier used must be specified using the parameter vocabulary. By default, vocabulary = "HGNC", to switch to Entrez NCBI Gene identifiers, set vocabulary to ENTREZ.

The function also requires the user to specify the source database using the argument database. By default, all the functions in the disgenet2r package use as source database CURATED, which includes GDAs from CTD (human data), PsyGeNET, the HPO, Genomics England PanelApp, ClinGen, CGI, UniProt, and Orphanet.

The information can be filtered using the DisGeNET score. The argument score is filled with a range of score to perform the search. The score is entered as a vector which first position is the initial value of score, and the second argument is the final value of score. Both values will always be included. By default, score=c(0,1).

In the example, the query for the Leptin Receptor (Gene Symbol LEPR, and Entrez NCBI Identifier 3953) is performed in all databases in DisGeNET (database = "ALL").

data1 <- gene2disease( gene = 3953, vocabulary = "ENTREZ",
                       database = "CURATED")

The function gene2disease produces an object DataGeNET.DGN that contains the results of the query.

class(data1)
## [1] "DataGeNET.DGN"
## attr(,"package")
## [1] "disgenet2r"

Type the name of the object to display its attributes: the input parameters such as whether a single entity, or a list were searched (single or list), the type of entity (gene-disease), the selected database (ALL), the score range used in the search (0-1), and the gene NCBI identifier (3953).

data1
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        3953 
##  . Results:  89

To obtain the data frame with the results of the query, apply the extract function:

results <- extract(data1)
head( results, 3 )
##   year_initial protein_class     el disease_type
## 1         1998  DTO:05007599 strong      disease
## 2         1966  DTO:05007599             disease
## 3         1966  DTO:05007599             disease
##                                                                      disease_class_name protein_class_name gene_dpi year_final
## 1                                                                                                Signaling    0.434       2019
## 2    Pathological Conditions, Signs and Symptoms;    Nutritional and Metabolic Diseases          Signaling    0.434       2021
## 3                      Nutritional and Metabolic Diseases;    Endocrine System Diseases          Signaling    0.434       2020
##   score disease_class disease_semantic_type    ei gene_symbol  source gene_dsi                             disease_name geneid
## 1  0.82                 Disease or Syndrome 1.000        LEPR CURATED  0.99475               LEPTIN RECEPTOR DEFICIENCY   3953
## 2  0.80       C23;C18   Disease or Syndrome 0.937        LEPR CURATED  0.99475                                  Obesity   3953
## 3  0.60       C18;C19   Disease or Syndrome 0.975        LEPR CURATED  0.99475 Diabetes Mellitus, Non-Insulin-Dependent   3953
##   gene_pli uniprotid diseaseid
## 1     0.84    P48357  C3554225
## 2     0.84    P48357  C0028754
## 3     0.84    P48357  C0011860

The same query can be performed using the Gene Symbol (LEPR). Additionally, a minimum threshold for the score can be defined. In the example, a cutoff of score=c(0.2,1) is used. Notice how the number of diseases associated to the Leptin Receptor drops from 264 to 68 when the score is restricted.

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        gene-disease 
##  . Database:     ALL 
##  . Score:        0.3-1 
##  . Term:        LEPR 
##  . Results:  89

Visualizing the diseases associated to a single gene

The disgenet2r package offers two options to visualize the results of querying DisGeNET for a single gene: a network showing the diseases associated to the gene of interest (Gene-Disease Network), and a network showing the MeSH Disease Classes of the diseases associated to the gene (Gene-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Gene-Disease Network on a DataGeNET.DGN object (Figure 1). In the Gene-Disease Network the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association. The prop parameter allows to adjust the width of the edges while keeping the proportionality to the score.

plot( data1,
      class = "Network",
      prop = 20)
Figure 1: The **Gene-Disease Network** for the Leptin Receptor gene

Figure 1: The Gene-Disease Network for the Leptin Receptor gene

The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to “DiseaseClass” (Gene-Disease Class Network, Figure 2). In the Gene-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease classes associated to the gene. In the example, the Lepin Receptor is associated mainly to Nutritional and Metabolic Diseases. There is 1 disease in the example that does not have annotations to MeSH disease class (Shown as a warning).

plot( data1,
      class = "DiseaseClass",
      prop = 3)
## [1] "warning: 1 disease(s) not shown in the plot"
Figure 2: The **Disease Class Network** for the Leptin Receptor Gene

Figure 2: The Disease Class Network for the Leptin Receptor Gene

Exploring the evidences associated to a gene

You can extract the evidences associated to a particular gene using the function gene2evidence. Additionally, you can explore the evidences for a specific gene-disease pair by specifying the disease identifier using the argument disease.

data1 <- gene2evidence( gene = "LEPR",
                        vocabulary = "HGNC",
               disease ="C3554225",
                       database = "ALL",
                       score =c(0.3,1))
data1
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     ALL 
##  . Score:        0.3-1 
##  . Term:        LEPR
results <- extract(data1)
The results are shown in the table below.
Evidences supporting the association between LEPR & C3554225
pmid year sentence
16284652 2005 Complete rescue of obesity, diabetes, and infertility in db/db mice by neuron-specific LEPR-B transgenes.
17229951 2007 Clinical and molecular genetic spectrum of congenital deficiency of the leptin receptor.
23275530 2013 Homozygous leptin receptor mutation due to uniparental disomy of chromosome 1: response to bariatric surgery.
25751111 2015 Seven novel deleterious LEPR mutations found in early-onset obesity: a deltaExon6-8 shared by subjects from Reunion Island, France, suggests a founder effect.
29545012 2018 Potential role of gender specific effect of leptin receptor deficiency in an extended consanguineous family with severe early-onset obesity.
30847515 2019 Whole genome sequencing reveals that genetic conditions are frequent in intensively ill children.
9537324 1998 A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9537324 1998 A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
9537324 1998 A mutation in the human leptin receptor gene causes obesity and pituitary dysfunction.
29568105 2018 A cohort of n = 21 children living in Germany or Austria with monogenic obesity due to congenital leptin deficiency (group LEP, n = 6), leptin receptor deficiency (group LEPR, n = 6) and primarily heterozygous MC4 receptor deficiency (group MC4R, n = 9) was analyzed.
31702041 2019 In conclusion, activation of the leptin receptor ob‑R is an important pathogenic mechanism of UC, and leptin receptor deficiency may provide resistance against TNBS‑induced colitis by inhibiting the NF‑κB and RhoA signaling pathways.

Searching multiple genes

The gene2disease function can also receive a list of genes as input, either as NCBI Gene Identifiers or Gene Symbols. In the example, we show how to create a vector with the Gene Symbols of several genes belonging to the family of voltage-gated potassium channels (Table 2) and then, we apply the function gene2disease.

Table 2: Example of voltage-gated potassium channel family members

Name Description
KCNE1 potassium channel, voltage gated subfamily E regulatory beta subunit 1
KCNE2 potassium channel, voltage gated subfamily E regulatory beta subunit 2
KCNH1 potassium channel, voltage gated eag related subfamily H, member 1
KCNH2 potassium channel, voltage gated eag related subfamily H, member 2
KCNG1 potassium voltage-gated channel modifier subfamily G member 1

Creating the vector with the list of genes belonging to the voltage-gated potassium channel family.

myListOfGenes <- c( "KCNE1", "KCNE2", "KCNH1", "KCNH2", "KCNG1")

The gene2disease function also requires the user to specify the source database using the argument database, and optionally, the DisGeNET score can also be applied to filter the results.

data2 <- gene2disease(
  gene     = myListOfGenes,
  score =c(0.2, 1),
  verbose  = TRUE
)
## Warning in gene2disease(gene = myListOfGenes, score = c(0.2, 1), verbose = TRUE): 
##  One or more of the genes in the list is not in DisGeNET ( 'CURATED' ):
##    - KCNG1
data2
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        gene-disease 
##  . Database:     CURATED 
##  . Score:        0.2-1 
##  . Term:       KCNE1 ... KCNH2 
##  . Results:  71

Visualizing the diseases associated to multiple genes

By default, plotting a DataGeNET.DGN resulting of the query with a list of genes produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 3).

plot( data2,
      class = "Network",
      prop = 10)
Figure 3: The **Gene-Disease Network** for a list of genes belonging to the voltage-gated potassium channel family

Figure 3: The Gene-Disease Network for a list of genes belonging to the voltage-gated potassium channel family

Setting the argument class to “Heatmap” produces a Gene-Disease Heatmap (Figure 4), where the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs. The argument nchars can be used to limit the length of the name of the disease. By default, the plot shows the 50 highest scoring GDAs.

plot( data2,
      class  ="Heatmap",
      limit  = 100, nchars = 50 )
Figure 4: The **Gene-Disease Heatmap** for a list of genes belonging to the voltage-gated potassium channel family

Figure 4: The Gene-Disease Heatmap for a list of genes belonging to the voltage-gated potassium channel family

These results can also be visualized as a Gene-Disease Class Heatmap by setting the argument class to “DiseaseClass” (Figure 5). In this case, diseases are grouped by the their MeSH disease classes, and the color scale is proportional to the percentage of diseases in each MeSH disease class. In the example, genes are associated mainly to Cardiovascular Diseases, and to Congenital, Hereditary, and Neonatal Diseases and Abnormalities.

plot( data2,
      class="DiseaseClass", nchars=60)
## [1] "warning: 13 disease(s) not shown in the plot"
Figure 5: The **Gene-Disease Class Heatmap** for a list of genes belonging to the voltage-gated potassium channel family

Figure 5: The Gene-Disease Class Heatmap for a list of genes belonging to the voltage-gated potassium channel family

Searching by disease

The disease2gene function allows to retrieve the genes associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (as UMLS CUI, MeSH, OMIM, Disease Ontology, ICD9CM, NCIt, EFO, or Orphanet Identifiers), the disease vocabulary employed (OMIM (OMIM), MESH (MeSH),ICD9CM (ICD9-CM), DO (Disease Ontology), NCI (NCI thesaurus), ORDO (Orphanet), or EFO (EFO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

In the example, we will use the disease2gene function to retrieve the genes associated to the UMLS CUI C0036341. This function also receives as input the database, in the example, CURATED, and a score range, in the example, from 0.4 to 1.

data3 <- disease2gene( disease  = "C0036341",
                          database = "CURATED",
                          score    = c( 0.4,1 ) )
data3
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        C0036341 
##  . Results:  227

The same results are obtained when querying DisGeNET with the MeSH identifier for Schizophrenia (D012559).

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        D012559 
##  . Results:  227

The same results are obtained when querying DisGeNET with the OMIM identifier for Schizophrenia (181500).

data4 <- disease2gene( disease  = "181500", vocabulary = "OMIM",
                          database = "CURATED",
                          score    = c(0.4,1 ) )
data4
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        181500 
##  . Results:  227

The same results are obtained when querying DisGeNET with the ICD9-CM identifier for Schizophrenia (295).

data4 <- disease2gene( disease  = "295", vocabulary = "ICD9CM",
                          database = "CURATED",
                          score    = c(0.4,1 ) )
data4
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        295 
##  . Results:  227

The same results are obtained when querying DisGeNET with the NCI identifier for Schizophrenia (C3362).

data4 <- disease2gene( disease  = "C3362", vocabulary = "NCI",
                          database = "CURATED",
                          score    = c(0.4,1 ) )
data4
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        C3362 
##  . Results:  227

The same results are obtained when querying DisGeNET with the DO identifier for Schizophrenia (5419).

data4 <- disease2gene( disease  = "HP:0100753", vocabulary = "HPO",
                          database = "CURATED",
                          score    = c(0.4,1 ) )
data4
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-gene 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        HP:0100753 
##  . Results:  227

Visualizing the genes associated to a single disease

There are two options to visualize the results from searching a single disease: a Gene-Disease Network showing the genes related to the disease of interest (Figure 6), and a Disease-Protein Class Network with the genes grouped by Panther Protein Class (Figure 7).

Figure 6 shows the default Gene-Disease Network for Schizophrenia. As in the case of the gene2disease function, the blue nodes is the disease, the pink nodes are genes, and the width of the edges is proportional to the score of the association.

plot ( data3,
       prop = 10)
Figure 6: The **Gene-Disease Network** for genes associated to Schizophrenia

Figure 6: The Gene-Disease Network for genes associated to Schizophrenia

Alternatively, in the Disease-Protein Class Network, genes are grouped by the Panther Protein Class (Figure 7). This is a better choice when there is a large number of genes associated to the disease. This plot uses as class argument “ProteinClass.” The resulting network will show in blue the disease, and in green the Protein Classes of the genes associated to the disease. The node size is proportional to the number of genes in the Panther Protein Class. In the example, the largest proportion of the genes associated to Schizophrenia are receptors. Notice again that not all genes have annotations to Panther Protein classes (69 genes in Figure 7)

plot( data3,
      class="ProteinClass"
)
## [1] "warning: 86 gene(s) not shown in the plot"
Figure 7: The **Protein Class-Disease Network** for genes associated to Schizophrenia

Figure 7: The Protein Class-Disease Network for genes associated to Schizophrenia

Exploring the evidences associated to a disease

To explore the evidences supporting the associations for Schizophrenia using the function disease2evidence.

data3 <- disease2evidence( disease  = "C0036341",
                           type = "GDA",
                          database = "CURATED",
                          score    = c( 0.4,1 ) )
data3
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        C0036341

Additionally, you can explore the evidences for a specific gene-disease pair by specifying the gene identifier using the argument gene.

data3 <- disease2evidence( disease  = "C0036341",
                           gene = c("DRD2", "DRD3"),
                           type = "GDA",
                          database = "CURATED",
                          score    = c( 0.4,1 ) )
data3
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     CURATED 
##  . Score:        0.4-1 
##  . Term:        C0036341
results <- extract(data3)

The results are shown in the table below.

Evidences supporting the association between C0036341 & DRD2,DRD3
pmid year sentence
10831489 2000 Subjective experience and striatal dopamine D(2) receptor occupancy in patients with schizophrenia stabilized by olanzapine or risperidone.
11245917 2001 Because dopamine D2 receptors are the primary targets for antipsychotic drugs, including clozapine and quetiapine, and because some studies have found D2 receptors to be elevated in schizophrenia, we examined the mRNA of three forms of the D2 receptor, particularly the new form of the dopamine D2 receptor, D2(Longer), in post-mortem brains from patients who died with schizophrenia.
18583979 2008 Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database.
21187413 2011 DRD2/AKT1 interaction on D2 c-AMP independent signaling, attentional processing, and response to olanzapine treatment in schizophrenia.
10523822 1999 Homozygosity at the dopamine DRD3 receptor gene in cocaine dependence.

Searching multiple diseases

The disease2gene function also accepts as input a list of diseases (as UMLS CUI, MeSH, OMIM, Disease Ontology, Orphanet, Dechipher, or ICD9CM Identifiers), the database (by default, CURATED), and optionally, a value range for the score. In the example, we have selected a list of 10 diseases. Table 3 shows the UMLS CUIs and the corresponding disease names.

Table 3: Disease list selected for illustrating the disease2gene multiple search

UMLS_CUI Disease_Name
C0036341 Schizophrenia
C0036341 Alzheimer’s Disease
C0030567 Parkinson Disease
C0005586 Bipolar Disorder

Creating the vector with the list of diseases.

diseasesOfInterest <- c("C0036341", "C0002395", "C0030567","C0005586")

In the example, we will search in CURATED data, using a score range of 0.4-1.

data5 <- disease2gene(
  disease = diseasesOfInterest,
  database = "CURATED",
  score =c(0.4,1),
  verbose  = TRUE )

Visualizing the genes associated to multiple diseases

The default plot of the results of querying DisGeNET with a list of diseases produces a Gene-Disease Network where the blue nodes are diseases, the pink nodes are genes, and the width of the edges is proportional to the score of the association (Figure 8).

plot( data5,
      class = "Network",
      prop = 10)
Figure 8: **The Gene-Disease Network** for genes associated to a list of diseases

Figure 8: The Gene-Disease Network for genes associated to a list of diseases

To visualize the results as a Gene-Disease Heatmap (Figure 9) change the argument class to “Heatmap.” In this plot, the scale of colors is proportional to the score of the GDA. The argument limit can be used to limit the number of rows to the top scoring GDAs when the results are large. By default, the plot shows the 50 highest scoring GDAs.

plot( data5,
      class="Heatmap",
      limit =30,
      cutoff=0.2)
## [1] "Dataframe of 440 rows has been reduced to 30 rows."
Figure 9: The **Gene-Disease Heatmap** for genes associated to a list of diseases

Figure 9: The Gene-Disease Heatmap for genes associated to a list of diseases

A third visualization option is a Protein Class-Disease Heatmap (Figure 10), in which genes are grouped by protein class. This plot is obtained by setting the class argument to “ProteinClass.” In this case, the color of the heatmap is proportional to the percentage of genes for each disease in each protein class. This heatmap displays the protein classes associated to each disease.

plot( data5,
      class="ProteinClass" )
## [1] "warning: 143 gene(s) not shown in the plot"
Figure 10: The **Protein Class-Disease** Heatmap for genes associated to a list of diseases

Figure 10: The Protein Class-Disease Heatmap for genes associated to a list of diseases

Retrieving Variant-Disease Associations from DisGeNET

Searching by variant

The variant2disease function receives a variant, or a list of variants as input, identified by the dbSNP identifier. It produces an object DataGeNET.DGN, with Type = "variant-disease".

data6 <- variant2disease( variant= "rs121913279",
                         database = "CURATED")
data6
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        variant-disease 
##  . Database:     CURATED 
##  . Score:        0-1 
##  . Term:        rs121913279 
##  . Results:  44

Visualizing the diseases associated to a single variant

The disgenet2r package offers several options to visualize the results of querying DisGeNET for a single variant: a Variant-Disease Network showing the diseases associated to the variant of interest, a Variant-Gene-Disease Network showing the genes, diseases, and variant of interest, and a network showing the MeSH Disease Classes of the diseases associated to the variant (Variant-Disease Class Network). These graphics can be obtained by changing the class argument in the plot function.

By default, the plot function produces a Variant-Disease Network on a DataGeNET.DGN object (Figure 11). In the Variant-Disease Network the blue nodes are diseases, the yellow nodes are variants, the blue nodes are diseases, and the width of the edges is proportional to the score of the association.

plot( data6, 
      class = "Network",
      prop  = 10)
Figure 11: The **Variant-Disease Network** for the variant rs121913279

Figure 11: The Variant-Disease Network for the variant rs121913279

The results can also be visualized in a network in which diseases are grouped by the MeSH Disease Class if the class argument is set to “DiseaseClass” (Figure 12). In the Variant-Disease Class Network, the node size of is proportional to the fraction of diseases in the disease class, with respect to the total number of diseases with disease class associated to the gene. In the example, the variant is associated mainly to Neoplasms.

plot( data6,
      class = "DiseaseClass",
      prop = 3)
## [1] "warning: 2 disease(s) not shown in the plot"
## [1] "warning: 2 disease(s) not shown in the plot"
Figure 12: The **Variant- Disease Class Network** for the variant rs121913279

Figure 12: The Variant- Disease Class Network for the variant rs121913279

Exploring the evidences associated to a variant

You can extract the evidences associated to a particular variant using the function variant2evidence. Additionally, you can explore the evidences for a specific variant-disease pair by specifying the argument disease.

## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     ALL 
##  . Score:        0-1 
##  . Term:        rs121913279

The results are shown in the table below.

Evidences supporting the association between C0009402 & rs121913279
pmid year sentence
23826249 2013 Collectively, our data suggest that PF-04691502 exhibits potent anticancer activity in colorectal cancer by targeting both PIK3CA (H1047R) mutant CSCs and their derivatives.
27405731 2016 We developed a sensitive, simple and rapid approach to detect the low-abundance PIK3CA (H1047R) mutation in real CRC specimens, providing an effective tool for guiding cancer targeted therapy.

Searching multiple variants

The variant2disease function retrieves the information in DisGeNET for a list of variants identified by the dbSNP identifier. The function also requires the user to specify the source database using the argument database. By default, variant2disease function uses as source database CURATED.

data7 <- variant2disease(
         variant = c("rs121913013", "rs1060500621",
              "rs199472709", "rs72552293",
              "rs74315445", "rs199472795"),
         database = "ALL")

Visualizing the diseases associated to multiple variants

The results of querying DisGeNET with a list of variants can be visualized as a Variant-Disease Network (Figure 13), as a Variant-Gene-Disease Network (Figure 15) and as heatmap (Variant-Disease Class Heatmap, Figure 16). Below, the Variant-Disease Network.

plot( data7,
      class = "Network",
      prop = 10)
Figure 13: The **Variant-Disease Network** for a list of variants

Figure 13: The Variant-Disease Network for a list of variants

To obtain the Variant-Gene-Disease Network, Figure 14), change the showGenes argument to “TRUE.”

plot( data7,
      class = "Network",
      showGenes = T)
Figure 14: The **Variant-Gene-Disease Network** for a list of variants

Figure 14: The Variant-Gene-Disease Network for a list of variants

The results of querying DisGeNET variant information with a list of diseases can be visualized as a Variant-Disease Heatmap by changing the class argument to Heatmap (Figure 15).

plot( data7,
      class = "Heatmap",
      prop = 10)
Figure 15: The **Variant-Disease Heatmap** for a list of variants

Figure 15: The Variant-Disease Heatmap for a list of variants

The results of querying DisGeNET variant information with a list of diseases can also be visualized as a Variant-Disease Class Network.

plot( data7,
      class = "DiseaseClass")
Figure 16: The **Variant-Disease Class Heatmap** for a list of variants

Figure 16: The Variant-Disease Class Heatmap for a list of variants

Searching by disease

The disease2variant function allows to retrieve the variants associated to a disease, or a list of diseases. The function uses as input the disease, or list of diseases of interest (as UMLS CUI, MeSH, OMIM, Disease Ontology, ICD9CM, NCIt, EFO, or Orphanet Identifiers), the disease vocabulary employed (OMIM (OMIM), MESH (MeSH),ICD9CM (ICD9-CM), DO (Disease Ontology), NCI (NCI thesaurus), ORDO (Orphanet), or EFO (EFO) and the database (by default, CURATED). A threshold value for the score can be set, like in the gene2disease function.

data8 <- disease2variant(disease = c("C0752166"),
                       database = "CLINVAR", 
                       score = c(0, 1) )
data8
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-variant 
##  . Database:     CLINVAR 
##  . Score:        0-1 
##  . Term:        C0752166 
##  . Results:  219

Visualizing the variants associated to a single disease

The results of querying DisGeNET variant information with a list of diseases can be visualized as a Variant-Disease Network (Figure 17).

plot( data8,     
      class = "Network")
Figure 17: The **Variant-Disease Network** for a single disease

Figure 17: The Variant-Disease Network for a single disease

Explore the evidences associated to a single disease

To explore the evidences supporting the VDAs for Schizophrenia, run the disease2evidence function. You can use the argument variant to inspect the evidences for a particular variant and Schizophrenia. You can also specify a year range for the publications

data8 <- disease2evidence( disease  = "C0036341",
                               variant = "rs1344706",
                           type = "VDA",
                           year = c(2019, 2020),
                          database = "ALL",
                          score    = c( 0.1,1 ) )
data8
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     ALL 
##  . Score:        0.1-1 
##  . Term:        C0036341
Evidences supporting associations between C0036341 & rs1344706
pmid year sentence
5 32544854 2020 ZNF804A has now been recognized as a schizophrenia risk gene by multiple genome-wide association studies with its intronic polymorphism rs1344706 being reported as the first genome-wide significant risk variant for schizophrenia.
1 30079586 2019 CACNA1C-rs1006737 and ZNF804A-rs1344706 polymorphisms are among the most robustly associated with schizophrenia (SCZ) and bipolar disorder (BD), and recently with brain phenotypes.
2 30852803 2019 ZNF804A rs1344706 has been identified as one of the risk genes for schizophrenia.
3 30962687 2019 rs1344706 risk allele carriers of schizophrenia had increased gray matter in the brain regions including frontal lobe, temporal lobe, and other brain regions, but the carriers of healthy individuals had decreased gray matter and white matter integrity in the frontal lobe, central network, and other brain regions.
4 31122238 2019 Previous studies indicated that single-nucleotide polymorphism (SNP) rs1344706 in ZNF804A was strongly associated with schizophrenia and might influence social interaction.

If you want to inspect the evidences for Schizophrenia, and all the variants in a particular gene, use the argument gene.

data8 <- disease2evidence( disease  = "C0036341",
                               gene = "CACNA1C",
                           type = "VDA",
                          database = "BEFREE",
                          score    = c( 0.1,1 ) )
data8
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        evidence 
##  . Database:     BEFREE 
##  . Score:        0.1-1 
##  . Term:        C0036341
results <- extract(data8)
Selection of evidences supporting associations between C0036341 & CACNA1C
pmid year sentence
24 30607529 2020 Both CACNA1C (rs1006737) and KCNH2 (rs3800779) were genotyped in 101 controls and 50 schizophrenia patients.
26 32770953 2020 However, the influence model for rs1006737 on schizophrenia in Asians and Europeans demonstrated both similarities and differences between the two ancestors.
29 32770953 2020 Rs1006737, rs2007044, and rs4765905 of the CACNA1C gene were associated with susceptibility to schizophrenia.
33 32770953 2020 Rs1006737, rs2007044, and rs4765905 of the CACNA1C gene were associated with susceptibility to schizophrenia.
23 30079586 2019 CACNA1C-rs1006737 and ZNF804A-rs1344706 polymorphisms are among the most robustly associated with schizophrenia (SCZ) and bipolar disorder (BD), and recently with brain phenotypes.

Searching multiple diseases

data8 <- disease2variant(
              disease = c("C3150943",  "C1859062", "C2678485", "C4015695"),
              database = "CURATED", 
              score = c(0.75, 1) )
data8
## Object of class 'DataGeNET.DGN'
##  . Search:      list 
##  . Type:        disease-variant 
##  . Database:     CURATED 
##  . Score:        0.75-1 
##  . Term:       C3150943 ... C4015695 
##  . Results:  67

Visualizing the variants associated to multiple diseases

The results of querying DisGeNET variant information with a list of diseases can be visualized as a Variant-Disease Network, or as a Variant-Disease Heatmap, Figure 18), by changing the class argument from “Network” to “Heatmap.”

plot( data8,     
      class = "Network")
Figure 18: The **Variant-Disease Network** for a list of diseases

Figure 18: The Variant-Disease Network for a list of diseases

The Variant-Disease Network can be displayed as a Variant-Disease-Gene Network, by setting the showGenes parameter to TRUE.

plot( data8, 
      class = "Network",
      showGenes = T)
Figure 19: The **Variant-Gene-Disease Network** for a list of diseases

Figure 19: The Variant-Gene-Disease Network for a list of diseases

The results can be visualized as a Heatmap.

plot( data8,
      class = "Heatmap",
      prop = 10)
## [1] "warning: dataframe of 67 rows has been reduced to 50 rows."
Figure 20: The **Variant-Disease Heatmap** for a list of diseases

Figure 20: The Variant-Disease Heatmap for a list of diseases

Retrieving Disease-Disease Associations from DisGeNET

The disgenet2r package also allows to obtain a list of diseases that share genes or variants with a particular disease, or disease list (disease-disease associations, or DDAs).

Searching DDAs by genes for a single disease

To obtain disease-disease associations, use the disease2disease_by_gene function. This function uses as input a disease, in the same format that in disease2gene, the database to perform the search (by default, CURATED), and the argument ndiseases, the number of diseases that share genes with the disease(s) of interest. By default ndiseases = 10. The output is a DataGeNET.DGN object that contains the top 10 diseases that share genes with the disease that has been searched.

data9 <- disease2disease_by_gene(
  disease  = "C0010674",
  database = "CURATED", ndiseases = 5 )
data9
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-gene 
##  . Database:     CURATED 
##  . Score:         
##  . Term:        C0010674 
##  . Results:  5
qr <- extract(data9)
head(qr[c("disease1_name", "disease2_name","jaccard_genes","ngenes", "pvalue")])
##     disease1_name                                            disease2_name jaccard_genes ngenes pvalue
## 1 Cystic Fibrosis                                Pulmonary Cystic Fibrosis         0.375      9      0
## 2 Cystic Fibrosis                          Fibrocystic Disease of Pancreas         0.375      9      0
## 3 Cystic Fibrosis BRONCHIECTASIS WITH OR WITHOUT ELEVATED SWEAT CHLORIDE 1         0.292      7      0
## 4 Cystic Fibrosis             Congenital bilateral aplasia of vas deferens         0.280      7      0
## 5 Cystic Fibrosis                                  Hereditary pancreatitis         0.200      6      0

The DataGeNET.DGN object produced by the disease2disease_by_gene function also contains the Jaccard Index, also known as the Jaccard similarity coefficient for each disease pair. The Jaccard Coefficient is a similarity metric, computed as as the size of the intersection divided by the size of the union of two sample sets, in this case, the genes associates to each disease:

\[\begin{equation*} J(A, B) = \frac{\mid A \cap B \mid}{\mid A \cup B \mid} \end{equation*}\]

The pvalue column estimates the significance of the Jaccard coefficient for a list of disease pairs. The estimation of the pvalue is estimated by randomizing 1000 times.

Visualizing the diseases associated to a single disease

The plot function applied to the DataGeNET.DGN object generated by the disease2disease_by_gene function results in a Disease-Disease Network, where the node in dark blue is the disease of interest and nodes in ligth blue are the diseases that share genes with it (Figure 21). The node size is proportional to the number of genes associated to each disease.

plot( data9, 
      class = "Network",
      layout="layout.lgl" ,
      prop = 0.4 )
Figure 21: The **Disease-Disease Network** by shared genes for MELAS Syndrome

Figure 21: The Disease-Disease Network by shared genes for MELAS Syndrome

To obtain the fully connected network, set the class argument of the plot function to Diseasome(Figure 22). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them, and the edges size is proportional to the number of genes that are shared between the diseases they are connecting. The nodes in dark blue are the query diseases, and in light blue, the diseases that share genes with them.

plot( data9, 
      class = "Diseasome",
      layout="layout.lgl" ,
      prop = 5 )
Figure 22: The **Diseasome plot** for MELAS Syndrome

Figure 22: The Diseasome plot for MELAS Syndrome

A second option to visualize the results of disease2disease_by_gene is as a Disease-Disease Barplot (Figure 23), obtained by setting the class argument to Barplot. The Disease-Disease Barplot shows the top n diseases that share genes with the disease of interest. This barplot also provides the jaccard index for each disease pair.

plot( data9,
      class="Barplot")
Figure 23: **Disease-Disease Barplot** visualizing the top 5 diseases associated to MELAS Syndrome

Figure 23: Disease-Disease Barplot visualizing the top 5 diseases associated to MELAS Syndrome

Searching DDAs via genes for multiple diseases

Visualizing the diseases associated to multiple diseases

The function disease2disease_by_gene can receive a list of diseases as input, in any of the previously described vocabularies. It will retrieve the top 10 diseases that share genes with each of the diseases in the input list. This results can be visualized as a Disease-Disease Network, a Diseasome plot, or as a Barplot.

Table 4: Disease list selected for illustrating the disease2disease_by_gene function

UMLS_CUI Disease_Name
C0162671 MELAS Syndrome
C0023264 Leigh Disease
C0917796 Optic Atrophy, Hereditary, Leber
diseasesOfInterest <-  c("C0013182", "C0013221", "C3658290", "C0860207", "C1274933")
data10 <- disease2disease_by_gene(
  disease = diseasesOfInterest,
  database = "CURATED", ndiseases =  5)

To obtain the network, set the class argument of the plot function to Network(Figure 24). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.

plot( data10,
      class = "Network",
      prop  = 0.1)
Figure 24: The **Disease-Disease Network** by shared genes for a list of diseases

Figure 24: The Disease-Disease Network by shared genes for a list of diseases

To obtain the fully connected network, set the class argument of the plot function to Diseasome(Figure 25). In this network, the nodes are the diseases of interest, and the node size is proportional to the number of genes associated with them. On the other hand, the edges size is proportional to the number of genes that are shared between the diseases they are connecting.

plot( data10,
      class = "Diseasome",
      prop  = 0.1)
Figure 25: The **Diseasome plot** for a list of diseases

Figure 25: The Diseasome plot for a list of diseases

To further inspect the relationship between a pair of diseases, create a Venn Diagram by setting the class argument of the plot function to Venn (Figure 26).

plot( data10,
      class="Venn")
Figure 26: The **Venn Diagram** for 3 diseases

Figure 26: The Venn Diagram for 3 diseases

The Venn diagram for 4 diseases

data11 <- disease2disease_by_gene(
  disease =  c("C0018801", "C0028754", "C0004153", "C0011849"),
  database = "CURATED",
  ndiseases = 5)
plot( data11,
      class="Venn"
)
Figure 27: The **Venn Diagram** for a list of 4 diseases

Figure 27: The Venn Diagram for a list of 4 diseases

The Venn diagram for 5 diseases

data11 <- disease2disease_by_gene(
  disease =  c("C0018801", "C0028754", "C0004153", "C0011849", "C0021368"),
  database = "CURATED",
  ndiseases = 5)
plot( data11,
      class="Venn"
)
Figure 28: The **Venn Diagram** for a list of 5 diseases

Figure 28: The Venn Diagram for a list of 5 diseases

Searching DDAs via shared variants for a single disease

To obtain disease-disease associations, use the disease2disease_by_variant function. This function uses as input a disease, in the same format that in disease2disease_by_gene, the database to perform the search (by default, CURATED), and the argument ndiseases, the number of diseases that share genes with the disease(s) of interest. By default ndiseases = 10. The output is a DataGeNET.DGN object that contains the top 10 diseases that share genes with the disease that has been searched.

data12 <- disease2disease_by_variant(
  disease  = "C0011860",
  database = "CURATED", ndiseases = 3 )
## A total of 168051 variants obtained from DisGeNET database CURATED are
##                  being used for the bootstrap process
## Pvalue estimation
data12
## Object of class 'DataGeNET.DGN'
##  . Search:      single 
##  . Type:        disease-disease-variant 
##  . Database:     CURATED 
##  . Score:         
##  . Term:        C0011860 
##  . Results:  3
plot( data12, layout="layout.lgl" ,
      prop = 0.2 )
Figure 29: An example of **Disease-Disease Network**

Figure 29: An example of Disease-Disease Network

data13 <- disease2disease_by_variant(
  disease  = c("C0018801", "C0020538"),
  database = "CURATED", ndiseases = 5 )

data13
plot( data13, layout="layout.lgl" ,
      class = "Diseasome",
      prop = 0.1)

Performing a disease enrichment

The disease_enrichment function receives a list of genes and performs an enrichment analysis over the diseases in DisGeNET. The input list of genes should be identified with HGNC symbols, or Entrez Gene Identifiers. The vocabulary should be specified using the parameter vocabulary. By default, vocabulary = "HGNC". The function has other optional argument: the source database (by default, database = “CURATED”). The background for the Fisher test are all the genes in DisGeNET. The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

To perform the enrichment, run:

list_of_genes <- disease2gene(disease = "C0004352", database = "GENOMICS_ENGLAND")

list_of_genes <- list_of_genes@qresult$gene_symbol

res_enrich <-disease_enrichment( entities =list_of_genes, vocabulary = "HGNC",
                             database = "CURATED" )

table1 <- res_enrich@qresult[1:10, c("Description", "FDR", "Ratio",  "BgRatio")]

The column Ratio represents the number of genes from the input list that are associated to the disease. The BgRatio are the number of genes associated to the disease from the length of the universe, in the example, all genes in DisGeNET curated (9703).

Description FDR Ratio BgRatio
Autistic Disorder 0.0000000 28/28 298/10654
Intellectual Disability 0.0000000 20/28 649/10654
Global developmental delay 0.0000000 11/28 338/10654
Neurodevelopmental Disorders 0.0000000 8/28 114/10654
Abnormal behavior 0.0000006 5/28 30/10654
Seizures 0.0000077 9/28 357/10654
Cardiovascular Abnormalities 0.0003252 3/28 17/10654
Autism Spectrum Disorders 0.0003252 5/28 114/10654
Epilepsy 0.0004925 5/28 131/10654
Abnormality of the urinary system 0.0004925 2/28 3/10654

To visualize the results of the enrichment, use the function plot. Use the argument cutoff to set a minimum p value threshold, and the argument limit to reduce the number of records shown. By default, limit=50. Use the argument nchars to control the lenght of the disease name.

plot( res_enrich, class = "Enrichment", count =3,  cutoff= 0.05, nchars=70)
Figure : The **Enrichment plot** for a list of genes

Figure : The Enrichment plot for a list of genes

res_enrich <-disease_enrichment( entities =list_of_genes, vocabulary = "HGNC",
                             database = "ALL" )

table1 <- res_enrich@qresult[1:10, c("Description", "FDR", "Ratio",  "BgRatio")]
Description FDR Ratio BgRatio
Autistic Disorder 0.0e+00 28/28 1119/22929
Autism Spectrum Disorders 0.0e+00 22/28 1296/22929
Intellectual Disability 0.0e+00 23/28 2299/22929
Neurodevelopmental Disorders 0.0e+00 15/28 556/22929
Pervasive Development Disorder 0.0e+00 14/28 545/22929
Autistic behavior 0.0e+00 12/28 375/22929
Global developmental delay 0.0e+00 19/28 1958/22929
Profound Mental Retardation 3.0e-07 8/28 297/22929
Seizures 1.8e-06 15/28 2331/22929
Dysmorphic facies 1.8e-06 8/28 378/22929

A disease_enrichment for a list of variants can also be performed. The input list of variants should be identified with dbSNP identifiers and the vocabulary should be specified using the parameter vocabulary = "DBSNP". The function has other optional argument: the source database (by default, database = “CURATED”). The background for the Fisher test are all the genes in DisGeNET. The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

To perform the enrichment, run:

list_of_variants <- disease2variant(disease = "C0004352", database = "CLINVAR")

list_of_variants <- as.character(list_of_variants@qresult$variantid)

res_enrich <-disease_enrichment( entities =list_of_variants, vocabulary = "DBSNP",
                             database = "CURATED" )

table1 <- res_enrich@qresult[1:10, c("Description", "FDR", "Ratio",  "BgRatio")]
Description FDR Ratio BgRatio
Autistic Disorder 0e+00 33/33 151/183851
Seizures 0e+00 11/33 255/183851
Global developmental delay 0e+00 10/33 287/183851
Delayed speech and language development 0e+00 6/33 52/183851
Abnormality of vision 0e+00 3/33 5/183851
Coloboma of iris 0e+00 3/33 6/183851
Cryptorchidism 1e-07 3/33 19/183851
Motor delay 1e-07 3/33 18/183851
Uranostaphyloschisis 1e-07 3/33 19/183851
Impaired cognition 1e-07 4/33 122/183851

To visualize the results of the enrichment, use the function plot. Use the argument cutoff to set a minimum p value threshold, and the argument limit to reduce the number of records shown. By default, limit=50.

plot( res_enrich, class = "Enrichment", count =6,  cutoff= 0.05)
Figure : The **Enrichment plot** for a list of genes

Figure : The Enrichment plot for a list of genes

Expanding DisGeNET data with other resources available in the LOD cloud

Currently, Wikipathways-RDF has issues and these federated queries do not work!

Retrieving the pathways associated to a disease

To obtain the pathways from WikiPathways associated to a disease, use the disease2pathway function.

dis2path <- disease2pathway( disease = "C0018801" ,
                             database = "ALL", score = c(0.5, 1))
dis2path
# head(dis2path@qresult)
# qr <- extract(dis2path)
# head(qr, 3)

Retrieving the diseases associated to a given pathway

To obtain the pathways from WikiPathways associated to a disease, use the disease2pathway function.

path2dis <- pathway2disease( pathway = "WP1591" ,
                             database = "CURATED",
                             score = c(0.9, 1))
path2dis
head(path2dis@qresult)
qr <- extract(path2dis)
head(qr, 3)

Retrieve the drug targets for a disease

The function disease2compound allows retrieved the chemicals in ChEMBL that target the genes associated to the given disease, or list of diseases. The function produces a DataGeNET.RDF object with the input disease(s), the associated genes, the DisGeNET score, and the chemicals associated to the disease proteins. If there were no chemicals found, the genes are still included in the output. It is important to higlight that because of ChEMBL restrictions for their SPARQL endpoint if 1000 compounds are retrieved for a single protein it probably means that there are more than 1000. (See more information about ChEMBL SPARQL endpoint at https://www.ebi.ac.uk/rdf/what-are-limitations-sparql-endpoints).

library("SPARQL")
dis2com <- disease2compound(
  disease = c("C0020538"),
  database = "CURATED" )
dis2com
qr <- extract(dis2com)
head(qr, 3)

Retrieving Gene Ontology data for a gene

library("SPARQL")
gene2bp <- gene2biologicalprocess(gene = "351")
head(gene2bp@qresult)
gene2cc <- gene2cellcomponent(gene = "351")
head(gene2cc@qresult)
gene2mf <- gene2molecularfunction(gene = "351")
head(gene2mf@qresult)
gene2indi <- gene2indication( gene = "1588" )
head(gene2indi@qresult)

Retrieving Gene Ontology data for a disease

The functions disease2biologicalprocess, disease2molecularfunction, and disease2cellcomponent can retrieve the Gene Ontology (GO) molecular function, GO biological process and GO cellular component of the genes associated to a disease or a list of diseases.

Retrieving the biological processes associated to a disease

For a disease, the biological process information can be retrieved in GO terms using the function disease2biologicalprocess that receives as input an NCBI gene identfier. The function produces a DataGeNET.RDF object containing the disease, the genes and their GOBP terms.

disease2bp <- disease2biologicalprocess(
  disease = "C0036341",
  database = "CURATED",
  score = c( 0.6,1)
)
disease2bp
disease2bp <- extract(disease2bp)
head(disease2bp[c("gene_product_label","go_label")])

Retrieving the molecular functions of the genes associated to a disease}

For a particular disease, the molecular activities of associated genes can be retrieved as GO terms using the function disease2molecularfunction. The function produces a DataGeNET.RDF object containing the disease, the genes and their GOMF terms.

disease2mf <- disease2molecularfunction(
  disease = "C0002395",
  database = "UNIPROT"
)

disease2mf
disease2mf <- extract(disease2mf)
head(disease2mf[c("gene_product_label", "go_label")])

Retrieving the molecular functions of the genes associated to a disease

For a given gene, the cell component information (i.e. where gene products are active) can be retrieved as GO terms using the function gene2cellcomponent that receives as input an NCBI gene identfier. The function produces a DataGeNET.RDF object containing the disease, the genes and their GO CC terms.

disease2cc <- disease2cellcomponent(  disease = "C0002395",
                                      database = "MGD")

disease2cc <- extract(disease2cc)
head(disease2cc[c("gene_product_label", "go_label")])

Get DisGeNET data version

get_disgenet_version()
## [1] "{ api_version : 1.2.0 , database_version : DisGeNET Database v7 , lastUpdate : May 2021 , comments : - Added GDA and VDA evidences endpoints.- Added the disease similarity endpoint.- Added the disease mapping by name endpoint. }"

How to cite disgenet2r

Piñero, J., Ramírez-Anguita, J. M., Saüch-Pitarch, J., Ronzano, F., Centeno, E., Sanz, F., & Furlong, L. I. (2020). The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research, 48(D1), D845-D855.

References

Bravo, J. Piñero, N. Queralt-Rosinach, M. Rautschka, and L. I. Furlong. 2015. Extraction of relations between genes and diseases from text and large-scale data analysis: Implications for translational research.” BMC Bioinformatics 16 (1). https://doi.org/10.1186/s12859-015-0472-9.
Bundschus, Markus, Mathaeus Dejori, Martin Stetter, Volker Tresp, and Hans-Peter Kriegel. 2008. Extraction of semantic biomedical relations from text using conditional random fields.” BMC Bioinformatics 9 (1): 207. https://doi.org/10.1186/1471-2105-9-207.
Piñero, J., A. Bravo, N. Queralt-Rosinach, A. Gutiérrez-Sacristán, J. Deu-Pons, E. Centeno, J. García-García, F. Sanz, and L. I. Furlong. 2017. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants.” Nucleic Acids Research 45 (D1). https://doi.org/10.1093/nar/gkw943.
Piñero, J., N. Queralt-Rosinach, À. Bravo, J. Deu-Pons, A. Bauer-Mehren, M. Baron, F. Sanz, and L. I. Furlong. 2015. DisGeNET: A discovery platform for the dynamical exploration of human diseases and their genes.” Database 2015. https://doi.org/10.1093/database/bav028.
Piñero, Janet, Juan Manuel Ramírez-Anguita, Josep Saüch-Pitarch, Francesco Ronzano, Emilio Centeno, Ferran Sanz, and Laura I Furlong. 2019. The DisGeNET knowledge platform for disease genomics: 2019 update.” Nucleic Acids Research, November. https://doi.org/10.1093/nar/gkz1021.
Queralt-Rosinach, N., J. Piñero, À. Bravo, F. Sanz, and L. I. Furlong. 2016. DisGeNET-RDF: Harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases.” Bioinformatics 32 (14). https://doi.org/10.1093/bioinformatics/btw214.