html-encoding-sniffer

WebJar for html-encoding-sniffer

License

License

MIT
GroupId

GroupId

org.webjars.npm
ArtifactId

ArtifactId

html-encoding-sniffer
Last Version

Last Version

2.0.1
Release Date

Release Date

Type

Type

jar
Description

Description

html-encoding-sniffer
WebJar for html-encoding-sniffer
Project URL

Project URL

https://www.webjars.org
Source Code Management

Source Code Management

https://github.com/jsdom/html-encoding-sniffer

Download html-encoding-sniffer

How to add to project

<!-- https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/ -->
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>html-encoding-sniffer</artifactId>
    <version>2.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/
implementation 'org.webjars.npm:html-encoding-sniffer:2.0.1'
// https://jarcasting.com/artifacts/org.webjars.npm/html-encoding-sniffer/
implementation ("org.webjars.npm:html-encoding-sniffer:2.0.1")
'org.webjars.npm:html-encoding-sniffer:jar:2.0.1'
<dependency org="org.webjars.npm" name="html-encoding-sniffer" rev="2.0.1">
  <artifact name="html-encoding-sniffer" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.webjars.npm', module='html-encoding-sniffer', version='2.0.1')
)
libraryDependencies += "org.webjars.npm" % "html-encoding-sniffer" % "2.0.1"
[org.webjars.npm/html-encoding-sniffer "2.0.1"]

Dependencies

compile (1)

Group / Artifact Type Version
org.webjars.npm : whatwg-encoding jar [1.0.5,2)

Project Modules

There are no modules declared in this project.

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBuffer = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBuffer);

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBuffer, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBuffer, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

  • transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
  • defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

org.webjars.npm

Versions

Version
2.0.1
1.0.2
1.0.1