GTF parser for Java Dataframes
A GTF Reader and Writer for Java DataFrames.
The GTF Format is implemented according to this documentation:
Documentation
Install
Add this to you pom.xml
<dependencies>
...
<dependency>
<groupId>de.unknownreality</groupId>
<artifactId>dataframe-gtf</artifactId>
<version>0.2.4</version>
</dependency>
...
</dependencies>
Build
To build the library from sources:
-
Clone github repository
$ git clone https://github.com/nRo/DataFrame-GTF.git
-
Change to the created folder and run
mvn install
$ cd DataFrame-GTF
$ mvn install
-
Include it by adding the following to your project's
pom.xml
:
<dependencies>
...
<dependency>
<groupId>de.unknownreality</groupId>
<artifactId>dataframe-gtf</artifactId>
<version>0.2.4-SNAPSHOT</version>
</dependency>
...
</dependencies>
Usage
Create a DataFrame from a GTF file
File gtfFile = new File("genome.gtf");
DataFrame df = DataFrame.load(gtfFile,GTFFormat.GTF)
Per default, all GTF fields are included in the resulting DataFrame. Attributes can be added by adding them to the GTF reader.
GTFReader gtfReader = GTFReaderBuilder.create()
.withAttribute("gene_id")
.build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);
The column type of GTF fields is predefined:
GTF field | type |
---|---|
seqname | String |
source | String |
feature | String |
start | Long |
end | Long |
score | Double |
strand | String |
frame | Integer |
The type of attributes can be specified
GTFReader gtfReader = GTFReaderBuilder.create()
.withAttribute("gene_id")
.withAttribute("test_value", DoubleColumn.class)
.build();
DataFrame df = DataFrame.load(gtfFile, gtfReader);
DataFrames can be written according to the GTF format.
dataFrame.write(new File("result.gtf"), GTFFormat.GTF);