Simple pipeline parser for HTML

Simple parser for HTML, using the pipelines library. This is not intended to be a strict parser of HTML5; the main planned use is to help with "screen-scraping" of HTML websites. It may also find use as a tool for testing HTML generation.

License

License

Categories

Categories

Net
GroupId

GroupId

net.pwall.html
ArtifactId

ArtifactId

html-pipeline
Last Version

Last Version

0.1
Release Date

Release Date

Type

Type

jar
Description

Description

Simple pipeline parser for HTML
Simple parser for HTML, using the pipelines library. This is not intended to be a strict parser of HTML5; the main planned use is to help with "screen-scraping" of HTML websites. It may also find use as a tool for testing HTML generation.
Project URL

Project URL

https://github.com/pwall567/html-pipeline
Source Code Management

Source Code Management

https://github.com/pwall567/html-pipeline

Download html-pipeline

How to add to project

<!-- https://jarcasting.com/artifacts/net.pwall.html/html-pipeline/ -->
<dependency>
    <groupId>net.pwall.html</groupId>
    <artifactId>html-pipeline</artifactId>
    <version>0.1</version>
</dependency>
// https://jarcasting.com/artifacts/net.pwall.html/html-pipeline/
implementation 'net.pwall.html:html-pipeline:0.1'
// https://jarcasting.com/artifacts/net.pwall.html/html-pipeline/
implementation ("net.pwall.html:html-pipeline:0.1")
'net.pwall.html:html-pipeline:jar:0.1'
<dependency org="net.pwall.html" name="html-pipeline" rev="0.1">
  <artifact name="html-pipeline" type="jar" />
</dependency>
@Grapes(
@Grab(group='net.pwall.html', module='html-pipeline', version='0.1')
)
libraryDependencies += "net.pwall.html" % "html-pipeline" % "0.1"
[net.pwall.html/html-pipeline "0.1"]

Dependencies

compile (3)

Group / Artifact Type Version
net.pwall.util : pipelines jar 0.8
net.pwall.html : htmlutil jar 1.1
org.jetbrains.kotlin : kotlin-stdlib-jdk8 jar 1.3.50

test (2)

Group / Artifact Type Version
org.jetbrains.kotlin : kotlin-test-junit jar 1.3.50
net.pwall.dom : dom-kotlin jar 0.1.1

Project Modules

There are no modules declared in this project.

html-pipeline

Simple parser for HTML, using the pipelines library. This is not intended to be a strict parser of HTML5; the main planned use is to help with "screen-scraping" of HTML websites. It may also find use as a tool for testing HTML generation.

Quick Start

Create a pipeline which feeds data into the HTMLPipeline object. The result of the pipeline will be the org.w3c.dom.Document object.

    val htmlPipeline = DecoderFactory.getDecoder(Charsets.UTF_8, HTMLPipeline()).apply {
        accept(inputStream)
    }
    val document = htmlPipeline.result

Dependency Specification

The latest version of the library is 0.1, and it may be obtained from the Maven Central repository.

Maven

    <dependency>
      <groupId>net.pwall.html</groupId>
      <artifactId>html-pipeline</artifactId>
      <version>0.1</version>
    </dependency>

Gradle

    implementation 'net.pwall.html:html-pipeline:0.1'

Gradle (kts)

    implementation("net.pwall.html:html-pipeline:0.1")

Peter Wall

2020-03-01

Versions

Version
0.1