Concrete Gigaword

Library providing utilities for converting English Gigaword v5 to the Concrete NLP data schema.

License

License

GroupId

GroupId

edu.jhu.hlt
ArtifactId

ArtifactId

concrete-gigaword
Last Version

Last Version

4.4.0
Release Date

Release Date

Type

Type

jar
Description

Description

Concrete Gigaword
Library providing utilities for converting English Gigaword v5 to the Concrete NLP data schema.
Project URL

Project URL

https://github.com/hltcoe/concrete-gigaword
Project Organization

Project Organization

Johns Hopkins University HLTCOE
Source Code Management

Source Code Management

https://github.com/hltcoe/concrete-gigaword

Download concrete-gigaword

How to add to project

<!-- https://jarcasting.com/artifacts/edu.jhu.hlt/concrete-gigaword/ -->
<dependency>
    <groupId>edu.jhu.hlt</groupId>
    <artifactId>concrete-gigaword</artifactId>
    <version>4.4.0</version>
</dependency>
// https://jarcasting.com/artifacts/edu.jhu.hlt/concrete-gigaword/
implementation 'edu.jhu.hlt:concrete-gigaword:4.4.0'
// https://jarcasting.com/artifacts/edu.jhu.hlt/concrete-gigaword/
implementation ("edu.jhu.hlt:concrete-gigaword:4.4.0")
'edu.jhu.hlt:concrete-gigaword:jar:4.4.0'
<dependency org="edu.jhu.hlt" name="concrete-gigaword" rev="4.4.0">
  <artifact name="concrete-gigaword" type="jar" />
</dependency>
@Grapes(
@Grab(group='edu.jhu.hlt', module='concrete-gigaword', version='4.4.0')
)
libraryDependencies += "edu.jhu.hlt" % "concrete-gigaword" % "4.4.0"
[edu.jhu.hlt/concrete-gigaword "4.4.0"]

Dependencies

compile (3)

Group / Artifact Type Version
edu.jhu.hlt : concrete-util jar 4.4.3
edu.jhu.hlt : concrete-validation jar 4.4.3
gigaword » gigaword jar 2.0.1

test (1)

Group / Artifact Type Version
junit : junit jar 4.11

Project Modules

There are no modules declared in this project.

Deprecated

This library has been deprecated. Please see this page for information about the latest Concrete Gigaword ingester.

If starting a project using Concrete and Gigaword, please use the above link to the main concrete-java project.

Concrete Gigaword

Library to take Gigaword documents and convert them to Concrete Communication objects.

Maven dependency

<dependency>
  <groupId>edu.jhu.hlt</groupId>
  <artifactId>concrete-gigaword</artifactId>
  <version>4.4.0</version>
</dependency>

Quick start / API Usage

Create converter object:

ConcreteGigawordDocumentFactory factory = new ConcreteGigawordDocumentFactory();

SGML .gz file to Iterator<Communication>:

Path gzPath = Paths.get("path/to/sgml/file.gz");
Iterator<Communication> iter = factory.iterator(gzPath);
while (iter.hasNext()) {
  Communication c = iter.next();
  // process c
}

Concretely Annotated Gigaword

See GIGAWORD.md for instructions about how to reproduce the Concrete representation of English Gigaword v5, one of the data sets described in the publication Concretely Annotated Corpora.

License

Apache 2

edu.jhu.hlt

JHU Human Language Technology Center of Excellence

Versions

Version
4.4.0
4.3.1
4.2.1