dap

Document Analysis Platform

License	License The Apache License, Version 2.0
GroupId	GroupId com.github.document-analysis
ArtifactId	ArtifactId dap
Last Version	Last Version 0.1.1
Release Date	Release Date Oct 10, 2017
Type	Type jar
Description	Description dap Document Analysis Platform
Project URL	Project URL https://github.com/document-analysis/dap
Source Code Management	Source Code Management https://github.com/document-analysis/dap

Download dap

Filename	Size
dap-0.1.1.pom
dap-0.1.1.jar	26 KB
dap-0.1.1-sources.jar	19 KB
dap-0.1.1-javadoc.jar	170 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.github.document-analysis/dap/ -->
<dependency>
    <groupId>com.github.document-analysis</groupId>
    <artifactId>dap</artifactId>
    <version>0.1.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.github.document-analysis/dap/
implementation 'com.github.document-analysis:dap:0.1.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.github.document-analysis/dap/
implementation ("com.github.document-analysis:dap:0.1.1")

Apache Buildr

'com.github.document-analysis:dap:jar:0.1.1'

Apache Ivy

<dependency org="com.github.document-analysis" name="dap" rev="0.1.1">
  <artifact name="dap" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.github.document-analysis', module='dap', version='0.1.1')
)

Scala SBT

libraryDependencies += "com.github.document-analysis" % "dap" % "0.1.1"

Leiningen

[com.github.document-analysis/dap "0.1.1"]

Dependencies

test (1)

Group / Artifact	Type	Version
junit : junit	jar	4.12

Project Modules

There are no modules declared in this project.

Document Analysis Platform

What it is:

The Document-Analysis Platform, or DAP, is a programming platform for integrating several NLP tools, making them:

interact with each other, and
conform to the same interface.

DAP is a lightweight, simple and easy-to-use alternative to UIMA. While UIMA is a revolutionary and strong platform, it suffers from significant drawbacks, which turned into high barriers for new-comers.

The need for a simple, easy-to-learn and easy-to-use alternative, which preserves only the core ideas of UIMA, is the motivation behind DAP development.

The advantages of DAP over UIMA are:

UIMA takes several weeks to learn, and requires reading of hundreds of user-manuals pages. Getting started with DAP takes no longer than 5-10 minutes. Learning DAP 100% A-to-Z takes only 20 minutes.
UIMA requires long and hard-to-maintain XML files. DAP requires nothing but pure-Java programming.
UIMA employs unusual paradigms for exception throwing, logging, constructing objects, etc. DAP follows normal Java conventions.

The core idea

NLP tools tend to depend on each other. Part-of-speech taggers operate over tokenized texts. Syntactic parsers operate over part-of-speech annotations. Coreference-resolvers operate over syntactic analyses. etc. In short, higher level tools rely on the output of lower-level ones.

This brings up the challenge of integration. Both the syntactic-parser and the part-of-speech tagger should agree on the data-structures and the format of a POS-tagged text. In other words, the POS-tagger output should be what the syntactic-parser expects. This requirement applies to every set of tools with dependencies between them.

Moreover, if all POS-taggers conform to the same format, then replacing one tagger by another is transparent to the syntactic-parser. Similarly, if all the parsers conform to the same format, then replacing one parser by another is transparent to the coreference-resolver.

The goal of DAP is to target this integration challenge. DAP provides data-structures with characteristics and utilities that make them fit for virtually every standard NLP tool. The main two data-structures are document and annotation. The output of every NLP tool can be stored as annotations in documents, with features, attributes, and inter-annotation relations.

In addition to data-structures, an actual set of part-of-speech tags, syntactic phrases types, syntactic-dependency-relations, etc. is required. The project DAP-DKPro_1_8 provides a standard set of NLP types, borrowing them from the DKPro project.

Batteries included

Users can start working with DAP right-away with dozens of state-of-the-art NLP tools for several languages, by using the DAP-DKPro_1_8 library, which wraps DKPro tools inside DAP.

A demo is provided in DAP-DKPro_1_8-demo.

Usage in Maven

The project has been uploaded to Maven central repository.

In a Maven project, add the following:

<dependency>
  <groupId>com.github.document-analysis</groupId>
  <artifactId>dap</artifactId>
  <version>0.1.1</version>
</dependency>

To get started, related projects should be imported as well. See:

Your first steps

Start by reading the 20-minutes-tutorial.

Then jump to the demo.

License

DAP is licensed under Apache 2.0 license, which is a permissive license that is good also for commercial use.

Note that DAP-DKPro_1_8-demo depends on external libraries, which have more restrictive licenses.

Versions

Version
0.1.1 Oct 10, 2017
0.1 Oct 10, 2017

dap

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download dap

How to add to project

Dependencies

test (1)

Project Modules

Document Analysis Platform

What it is:

The core idea

Batteries included

Usage in Maven

Your first steps

License

Versions