line-parser

A line parser for Java based on mmap()

License

License

MIT
GroupId

GroupId

com.github.marschall
ArtifactId

ArtifactId

line-parser
Last Version

Last Version

0.5.0
Release Date

Release Date

Type

Type

jar
Description

Description

line-parser
A line parser for Java based on mmap()
Project URL

Project URL

https://github.com/marschall/line-parser
Source Code Management

Source Code Management

https://github.com/marschall/line-parser

Download line-parser

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.marschall/line-parser/ -->
<dependency>
    <groupId>com.github.marschall</groupId>
    <artifactId>line-parser</artifactId>
    <version>0.5.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.marschall/line-parser/
implementation 'com.github.marschall:line-parser:0.5.0'
// https://jarcasting.com/artifacts/com.github.marschall/line-parser/
implementation ("com.github.marschall:line-parser:0.5.0")
'com.github.marschall:line-parser:jar:0.5.0'
<dependency org="com.github.marschall" name="line-parser" rev="0.5.0">
  <artifact name="line-parser" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.marschall', module='line-parser', version='0.5.0')
)
libraryDependencies += "com.github.marschall" % "line-parser" % "0.5.0"
[com.github.marschall/line-parser "0.5.0"]

Dependencies

test (4)

Group / Artifact Type Version
junit : junit jar 4.12
org.openjdk.jmh : jmh-core jar 1.19
org.openjdk.jmh : jmh-generator-annprocess jar 1.19
org.openjdk.jol : jol-core jar 0.8

Project Modules

There are no modules declared in this project.

line-parser Maven Central Javadocs Build Status

<dependency>
    <groupId>com.github.marschall</groupId>
    <artifactId>line-parser</artifactId>
    <version>0.5.0</version>
</dependency>

An mmap() based line parser for cases when:

  • the start byte position of a line in the file is required
  • the length in bytes of a line is required
  • only a few character of every line are required

In these cases this library can theoretically be more efficient than BufferedReader because:

  • the copy operations of buffered IO are avoided
  • the allocation and resizing of an intermediate StringBuffer is avoided
  • the allocation of the final String is avoided, only the required substrings are allocated

The performance may still be slower than a than BufferedReader based approach but it should consume much less memory bandwidth and produce only a fraction of the garbage.

As this project gives you a CharSequence instead of a String you may want to have a look at the charsequences which gives you some the String convenience methods while avoiding allocation.

Misc

  • the main parsing loop is likely to benefit from on-stack replacement (OSR)
  • if you're using UTF-8 with a BOM then the BOM is returned as well
  • if you're using UTF-16 with a BOM then the BOM is returned as well
  • the library runs on Java 8 but is also a Java 9 module that only requires the jdk.unsupported module besides the java.base module

Usage

LineParser parser = new LineParser();
parser.forEach(path, cs, (line) -> {
  System.out.printf("[%d,%d]%s%n", line.getOffset(), line.getLength(), line.getContent());
});

Versions

Version
0.5.0
0.4.2
0.4.1
0.4.0
0.3.0
0.2.0
0.1.0