tabula-js

WebJar for tabula-js

License	License MIT
Categories	Categories JavaScript Languages Tabula Data PDF
GroupId	GroupId org.webjars.npm
ArtifactId	ArtifactId tabula-js
Last Version	Last Version 1.0.1
Release Date	Release Date Oct 4, 2020
Type	Type jar
Description	Description tabula-js WebJar for tabula-js
Project URL	Project URL https://www.webjars.org
Source Code Management	Source Code Management https://github.com/ezodude/tabula-js

Download tabula-js

Filename	Size
tabula-js-1.0.1.pom
tabula-js-1.0.1.jar	7 MB
tabula-js-1.0.1-sources.jar	22 bytes
tabula-js-1.0.1-javadoc.jar	22 bytes
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.webjars.npm/tabula-js/ -->
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>tabula-js</artifactId>
    <version>1.0.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.webjars.npm/tabula-js/
implementation 'org.webjars.npm:tabula-js:1.0.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.webjars.npm/tabula-js/
implementation ("org.webjars.npm:tabula-js:1.0.1")

Apache Buildr

'org.webjars.npm:tabula-js:jar:1.0.1'

Apache Ivy

<dependency org="org.webjars.npm" name="tabula-js" rev="1.0.1">
  <artifact name="tabula-js" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.webjars.npm', module='tabula-js', version='1.0.1')
)

Scala SBT

libraryDependencies += "org.webjars.npm" % "tabula-js" % "1.0.1"

Leiningen

[org.webjars.npm/tabula-js "1.0.1"]

Dependencies

compile (3)

Group / Artifact	Type	Version
org.webjars.npm : highland	jar	[2.8.1,3)
org.webjars.npm » highland-process	jar	[1.0.5,2)
org.webjars.npm : lodash	jar	[4.13.1,5)

Project Modules

There are no modules declared in this project.

tabula-js

PLEASE NOTE I AM NOT ACTIVELY MAINTAINING THIS REPO - however, I am humbled by all the interest and PRs to date.

Helps you extract CSV data tables from PDF files. It's a node wrapper for the mighty tabula-java 1.0.2.

Options

Not all tabula-java options are exposed. Particularly wirting to file but any extracted data is available through a callback or a stream.

Here are the options (for options with no value, pass true as the value):

Options:

area <AREA>           Portion of the page to analyze (top,left,bottom,right).
                       Example: "269.875,12.75,790.5,561". Default is entire page.
                       If there are multiple areas to analyze:
                       Example: ["269.875,12.75,790.5,561", "132.45,23.2,256.3,534"]

columns <COLUMNS>     X coordinates of column boundaries. Example 
                      "10.1,20.2,30.3"

debug                 Print detected table areas instead ofprocessing.

guess                 Guess the portion of the page to analyze per page.

silent                Suppress all stderr output.

noSpreadsheet        Force PDF not to be extracted using spreadsheet-style 
                     extraction 
                      (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)

pages <PAGES>        Comma separated list of ranges, or all.
                      Examples: pages: "1-3,5-7", pages: "3" or pages: "all". Default is pages: "1"

spreadsheet          Force PDF to be extracted using spreadsheet-style  
                     extraction
                     (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)

password <PASSWORD>  Password to decrypt document. Default is empty

useLineReturns       Use embedded line returns in cells. (Only in spreadsheet 
                     mode.)

Getting started

extractCsv no options

This is the simplest use case. It's uses a classic node style callback (err, data). The extracted CSV is an array of all rows found in the data table including any headers.

const tabula = require('tabula-js');
const t = tabula(source.pdf);
t.extractCsv((err, data) => console.log(data));

extractCsv with options

Here we use the area option to zero in on the data.

const tabula = require('tabula-js');
const t = tabula(source.pdf, {area: "269.875,150,690,545"});
t.extractCsv((err, data) => console.log(data));

streamCsv

Is similar to the callback version but with data extracted as a stream.

const tabula = require('tabula-js');
const stream = tabula(source.pdf).streamCsv();
stream.pipe(process.stdout);

streamCsv uses highland streams

In reality the library is built on the notion of streams all the way down. Highland.js is used to make this a breeze.

This also means the returned stream can readily perform highland.js style transformations and operations.

const tabula = require('tabula-js');
const stream = tabula(source.pdf).streamCsv();
stream
.split()
.doto(console.log)
.done(() => console.log('ALL DONE!'));

Thank yous

This library would not be possible without the amazing effort of the tabula-java team. Thank you!

Versions

Version
1.0.1 Oct 4, 2020

tabula-js

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download tabula-js

How to add to project

Dependencies

compile (3)

Project Modules

tabula-js

Options

Getting started

extractCsv no options

extractCsv with options

streamCsv

streamCsv uses highland streams

Thank yous

Versions