chardet

WebJar for chardet

License

License

MIT
GroupId

GroupId

org.webjars.npm
ArtifactId

ArtifactId

chardet
Last Version

Last Version

0.8.0
Release Date

Release Date

Type

Type

jar
Description

Description

chardet
WebJar for chardet
Project URL

Project URL

http://webjars.org
Source Code Management

Source Code Management

https://github.com/runk/node-chardet

Download chardet

How to add to project

<!-- https://jarcasting.com/artifacts/org.webjars.npm/chardet/ -->
<dependency>
    <groupId>org.webjars.npm</groupId>
    <artifactId>chardet</artifactId>
    <version>0.8.0</version>
</dependency>
// https://jarcasting.com/artifacts/org.webjars.npm/chardet/
implementation 'org.webjars.npm:chardet:0.8.0'
// https://jarcasting.com/artifacts/org.webjars.npm/chardet/
implementation ("org.webjars.npm:chardet:0.8.0")
'org.webjars.npm:chardet:jar:0.8.0'
<dependency org="org.webjars.npm" name="chardet" rev="0.8.0">
  <artifact name="chardet" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.webjars.npm', module='chardet', version='0.8.0')
)
libraryDependencies += "org.webjars.npm" % "chardet" % "0.8.0"
[org.webjars.npm/chardet "0.8.0"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

There are no modules declared in this project.

chardet Build Status

Chardet is a character detection module written in pure Javascript (Typescript). Module uses occurrence analysis to determine the most probable encoding.

  • Packed size is only 22 KB
  • Works in all environments: Node / Browser / Native
  • Works on all platforms: Linux / Mac / Windows
  • No dependencies
  • No native code / bindings
  • 100% written in Typescript
  • Extensive code coverage

Installation

npm i chardet

Usage

To return the encoding with the highest confidence:

const chardet = require('chardet');

chardet.detect(Buffer.from('hello there!'));
// or
chardet.detectFile('/path/to/file').then(encoding => console.log(encoding));
// or
chardet.detectFileSync('/path/to/file');

To return the full list of possible encodings use analyse method.

const chardet = require('chardet');
chardet.analyse(Buffer.from('hello there!'));

Returned value is an array of objects sorted by confidence value in decending order

[
  { confidence: 90, name: 'UTF-8' },
  { confidence: 20, name: 'windows-1252', lang: 'fr' }
];

Working with large data sets

Sometimes, when data set is huge and you want to optimize performace (in tradeoff of less accuracy), you can sample only first N bytes of the buffer:

chardet
  .detectFile('/path/to/file', { sampleSize: 32 })
  .then(encoding => console.log(encoding));

Supported Encodings:

  • UTF-8
  • UTF-16 LE
  • UTF-16 BE
  • UTF-32 LE
  • UTF-32 BE
  • ISO-2022-JP
  • ISO-2022-KR
  • ISO-2022-CN
  • Shift_JIS
  • Big5
  • EUC-JP
  • EUC-KR
  • GB18030
  • ISO-8859-1
  • ISO-8859-2
  • ISO-8859-5
  • ISO-8859-6
  • ISO-8859-7
  • ISO-8859-8
  • ISO-8859-9
  • windows-1250
  • windows-1251
  • windows-1252
  • windows-1253
  • windows-1254
  • windows-1255
  • windows-1256
  • KOI8-R

Currently only these encodings are supported.

Typescript?

Yes. Type definitions are included.

References

Versions

Version
0.8.0
0.7.0
0.4.2