HPG BigData project

HPG BigData project aims to provide tools for processing genomic big data in a Hadoop cluster

License	License Apache License, Version 2
Categories	Categories Data
GroupId	GroupId org.opencb.hpg-bigdata
ArtifactId	ArtifactId hpg-bigdata
Last Version	Last Version 1.0.0-beta4
Release Date	Release Date Nov 21, 2017
Type	Type pom
Description	Description HPG BigData project HPG BigData project aims to provide tools for processing genomic big data in a Hadoop cluster
Project URL	Project URL https://github.com/opencb/hpg-bigdata
Source Code Management	Source Code Management https://github.com/opencb/cellbase

Download hpg-bigdata

Filename	Size
hpg-bigdata-1.0.0-beta4.pom	12 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.opencb.hpg-bigdata/hpg-bigdata/ -->
<dependency>
    <groupId>org.opencb.hpg-bigdata</groupId>
    <artifactId>hpg-bigdata</artifactId>
    <version>1.0.0-beta4</version>
    <type>pom</type>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.opencb.hpg-bigdata/hpg-bigdata/
implementation 'org.opencb.hpg-bigdata:hpg-bigdata:1.0.0-beta4'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.opencb.hpg-bigdata/hpg-bigdata/
implementation ("org.opencb.hpg-bigdata:hpg-bigdata:1.0.0-beta4")

Apache Buildr

'org.opencb.hpg-bigdata:hpg-bigdata:pom:1.0.0-beta4'

Apache Ivy

<dependency org="org.opencb.hpg-bigdata" name="hpg-bigdata" rev="1.0.0-beta4">
  <artifact name="hpg-bigdata" type="pom" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.opencb.hpg-bigdata', module='hpg-bigdata', version='1.0.0-beta4')
)

Scala SBT

libraryDependencies += "org.opencb.hpg-bigdata" % "hpg-bigdata" % "1.0.0-beta4"

Leiningen

[org.opencb.hpg-bigdata/hpg-bigdata "1.0.0-beta4"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

hpg-bigdata-app
hpg-bigdata-core
hpg-bigdata-analysis

description
Welcome to CellBase!

Overview

During the last years the advances of high-throughput technologies in biology have produced an unprecedented growth of repositories and databases storing relevant biological data. Today there is more biological information than ever but unfortunately the current status of many of these repositories is far from being optimal many times. Some of the most common problems are: a) information is spread out in many small repositories and databases, b) lack of standards between different repositories, c) unsupported databases, d) specific and unconnected information, etc.

All these problems make very difficult: a) to integrate or join many different sources into only one database to work or analyze experiments; b) to access and query this information in programmatically way.

To cope with all these problems we have designed and developed a NoSQL database that integrates the most relevant biological information about genomic features and proteins, gene expression regulation, functional annotation, genomic variation and systems biology information. We use the most relevant repositories such as Ensembl, Uniprot, ClinVar, COSMIC or IntAct among many others (you can browse them Data sources and species). The information integrated covers:

Core features: genes, transcripts, exons, proteins, genome sequence, etc.
Regulatory: Ensembl regulatory, TFBS, miRNA targets, CTCF, Open chromatin, etc.
Functional annotation: OBO ontologies (Gene Ontology, Human Disease Ontology), etc.
Genomic variation: Ensembl Variation, ClinVar, COSMIC, etc.
Systems biology: IntAct , Reactome, gene co-expression, etc.

To make this entire database accessible to researchers, an exhaustive RESTful Web service API has been implemented. This API contains many methods that will facilitate researchers to query and obtain different biological information from a single database saving a lot of time. Another benefit is that researchers can make easily queries about different biologTical topics and link all this information together as all information is integrated.

Currently Homo sapiens, Mus musculus and other 20 species are available and many others will be included soon. Results are offered in JSON format, making all this information accessible to both software or web applications.

Availability

Cellbase is a centralised database that integrates lots of information from several main genomic and biological databases used for genomic annotation and clinical variant prioritisation. See Overview for details.

CellBase is open-source and freely available at https://github.com/opencb/cellbase

You can search CellBase using your favourite programming language:

	installation	API	docs	tutorials
REST API			RESTful Web Services
Python	pypi
R	Bioconductor			Vignette
Java	Installation	Javadoc

CellBase is open-source and freely available at https://github.com/opencb/cellbase

Publications

CellBase was published at Nucleic Acids Research (2012):

http://nar.oxfordjournals.org/content/40/W1/W609.short

Open source for Computational Biology

Versions

Version
1.0.0-beta4 Nov 21, 2017
1.0.0-beta3 Aug 1, 2017
1.0.0-beta2 May 24, 2017
1.0.0-beta1 Jan 12, 2017
1.0.0-alpha Nov 7, 2016
0.6.0 Aug 31, 2016
0.5.1 Aug 26, 2016
0.5.0 Sep 19, 2015

HPG BigData project

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management