The disgenet2r package

The disgenet2r package contains a set of functions to retrieve, visualize and expand DisGeNET data. The functions in DisGeNET allow filtering the information using several metrics in DisGeNET (score range, database source of the data). The package offers different types of plots (Heatmaps, venn diagrams, networks) to visualize the data.

Installation

The package disgenet2r is available through Bitbucket. The package requires an R version > 3.5. Additionally, the following packages are needed: VennDiagram, stringr, tidyr, SPARQL, RCurl, igraph, ggplot2, and reshape2.

Install disgenet2r by typing in R:

library(devtools)

install_bitbucket("ibi_group/disgenet2r")

To load the package:

library(disgenet2r)

Retrieving GDAs

To retrieve the diseases associated to a list of genes, use the following function:

results <- gene2disease( gene = c( "KCNE1", "KCNE2", "KCNH1", "KCNH2", "KCNG1"), verbose = TRUE)

To retrieve the genes associated to a list of diseases, use the following function:

results <- disease2gene( disease = c("C0036341", "C0002395", "C0030567","C0005586"), database = "CURATED", verbose = TRUE )

Retrieving VDAs

To retrieve the diseases associated to a list of variants, use the following function:

results <- variant2disease( variant= "rs121913279", database = "CURATED")

To retrieve the variants associated to a list of diseases, use the following function:

results <- disease2variant disease = c("C3150943", "C1859062", "C2678485", "C4015695"), database = "CURATED", score = c(0.75, 1) )

Performing a Disease Enrichment

The disease_enrichment function receives a list of genes and performs an enrichment analysis over the diseases in DisGeNET. The input list of genes should be identified with HGNC symbols, or Entrez Gene Identifiers. The vocabulary should be specified using the parameter vocabulary. By default, vocabulary = "HGNC". The function has other optional arguments: the source database (by default, database = “CURATED”), and an universe for the Fisher test. We provide 3 gene sets to be used as universe:

  • DISGENET - All genes in DisGeNET
  • HUMAN - All genes according to the NCBI
  • HUMAN_CODING - All protein coding genes to the NCBI
  • CUSTOM - A list of genes supplied by the user

If no universe is supplied, by default the function will use all the genes in DisGeNET.

The p-values resulting from the multiple Fisher tests are corrected for false discovery rate using the Benjamini-Hochberg method.

To perform the enrichment, run:

res_enrich <-disease_enrichment( entities =list_of_genes, vocabulary = "HGNC", database = "CURATED", universe = "HUMAN_CODING") .

More info here

COPYRIGHT

COPYRIGHT Copyright (C) 2020 IBI group.

disgenet2r is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

disgenet2r is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

How to cite disgenet2r

Piñero, J., Ramírez-Anguita, J. M., Saüch-Pitarch, J., Ronzano, F., Centeno, E., Sanz, F., & Furlong, L. I. (2020). The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic acids research, 48(D1), D845-D855