Boilerpipe

A fork of tthe Boilerpipe Text Extraction library

License

License

GroupId

GroupId

com.robbypond
ArtifactId

ArtifactId

boilerpipe
Last Version

Last Version

1.2.3
Release Date

Release Date

Type

Type

jar
Description

Description

Boilerpipe
A fork of tthe Boilerpipe Text Extraction library
Project URL

Project URL

https://github.com/robbypond/boilerpipe
Source Code Management

Source Code Management

https://github.com/robbypond/boilerpipe

Download boilerpipe

How to add to project

<!-- https://jarcasting.com/artifacts/com.robbypond/boilerpipe/ -->
<dependency>
    <groupId>com.robbypond</groupId>
    <artifactId>boilerpipe</artifactId>
    <version>1.2.3</version>
</dependency>
// https://jarcasting.com/artifacts/com.robbypond/boilerpipe/
implementation 'com.robbypond:boilerpipe:1.2.3'
// https://jarcasting.com/artifacts/com.robbypond/boilerpipe/
implementation ("com.robbypond:boilerpipe:1.2.3")
'com.robbypond:boilerpipe:jar:1.2.3'
<dependency org="com.robbypond" name="boilerpipe" rev="1.2.3">
  <artifact name="boilerpipe" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.robbypond', module='boilerpipe', version='1.2.3')
)
libraryDependencies += "com.robbypond" % "boilerpipe" % "1.2.3"
[com.robbypond/boilerpipe "1.2.3"]

Dependencies

compile (3)

Group / Artifact Type Version
net.sourceforge.nekohtml : nekohtml jar 1.9.21
xerces : xercesImpl jar 2.11.0
net.htmlparser.jericho : jericho-html jar 3.3

test (3)

Group / Artifact Type Version
junit : junit jar 4.12
org.hamcrest : hamcrest-all jar 1.3
commons-io : commons-io jar 2.4

Project Modules

There are no modules declared in this project.

#boilerpipe 1.2.2

##Changes in this Version of Boilerpipe# This is an extended version of Boilerpipe 1.2.0 on Google Code.

###New features:

  • Media extraction (Youtube videos, Vimeo videos and Images) within an article
  • Extract an article with its HTML structure

Example

License

[Apache License 2.0] (http://www.apache.org/licenses/LICENSE-2.0)

Changes

  • mavenized

 <dependency>
  <groupId>de.l3s.boilerpipe</groupId>
  <artifactId>boilerpipe-core</artifactId>
  <version>1.2.2</version>
</dependency>

Classes added:

  • de.l3s.boilerpipe.document.Media
  • de.l3s.boilerpipe.document.Video
  • de.l3s.boilerpipe.document.VimeoVideo
  • de.l3s.boilerpipe.document.YoutubeVideo
  • de.l3s.boilerpipe.extractors.HtmlArticleExtractor - this Extractor extracts the article content with his basic HTML structure from the document
  • de.l3s.boilerpipe.sax.MediaExtractor - this Extractor returns a list of Medias that are contained in the documents article

files/folders removed (because of the change to maven):

  • /build.xml
  • /eclipse-build
  • /lib

Versions

Version
1.2.3