html-encoding-sniffer

WebJar for html-encoding-sniffer

License	License MIT
GroupId	GroupId org.webjars.npm
ArtifactId	ArtifactId html-encoding-sniffer
Last Version	Last Version 2.0.1
Release Date	Release Date May 27, 2020
Type	Type jar
Description	Description html-encoding-sniffer WebJar for html-encoding-sniffer
Project URL	Project URL https://www.webjars.org
Source Code Management	Source Code Management https://github.com/jsdom/html-encoding-sniffer

Download html-encoding-sniffer

Filename	Size
html-encoding-sniffer-2.0.1.pom
html-encoding-sniffer-2.0.1.jar	6 KB
html-encoding-sniffer-2.0.1-sources.jar	22 bytes
html-encoding-sniffer-2.0.1-javadoc.jar	22 bytes
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/ -->
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>html-encoding-sniffer</artifactId>
    <version>2.0.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/
implementation 'org.webjars.npm:html-encoding-sniffer:2.0.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/
implementation ("org.webjars.npm:html-encoding-sniffer:2.0.1")

Apache Buildr

'org.webjars.npm:html-encoding-sniffer:jar:2.0.1'

Apache Ivy

<dependency org="org.webjars.npm" name="html-encoding-sniffer" rev="2.0.1">
  <artifact name="html-encoding-sniffer" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.webjars.npm', module='html-encoding-sniffer', version='2.0.1')
)

Scala SBT

libraryDependencies += "org.webjars.npm" % "html-encoding-sniffer" % "2.0.1"

Leiningen

[org.webjars.npm/html-encoding-sniffer "2.0.1"]

Dependencies

compile (1)

Group / Artifact	Type	Version
org.webjars.npm : whatwg-encoding	jar	[1.0.5,2)

Project Modules

There are no modules declared in this project.

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBuffer = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBuffer);

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBuffer, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBuffer, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

Versions

Version
2.0.1 May 27, 2020
1.0.2 Mar 1, 2018
1.0.1 Nov 3, 2016

html-encoding-sniffer

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management