A fast event based JSON parser for Java
Overview
This library provides a pair of lightweight event based JSON parsers (a pull parser and a push parser) for JSON as specified in RFC 7159 and ECMA-404. An event based parser just emits a stream of events and doesn't create a document model for the parsed JSON document.
Consult the documentation and usage description for further information:
Maven
This library is hosted in the Maven Central Repository. You can use it with the following coordinates:
<dependency>
<groupId>net.markenwerk</groupId>
<artifactId>utils-json-parser</artifactId>
<version>3.0.1</version>
</dependency>
Motivation
The original intention was, to create a pure Java implementation of an event based JSON parser, that (like Android's JsonReader
) has the ability, to stream through a character stream and process the contained JSON document, without the need to read the whole character stream into a string first.
As a pure Java implementation, this library is available on all Java based execution environments, not just on Android.
An event base parser that (unlike most JSON libraries) dosn't create a document model, may be used to
- efficiently process huge JSON documents on the fly,
- create a document model from another JSON library, if that library can only process a string (like the version of the JSON library, that comes with Android) and thus, removing the need to create a string,
- or to automagically create Java objects (using some reflective witchcraft) and thus, removing the need to create a document model.
This library is intended to be as lightweight as possible and make no assumptions about the tasks, it may be used for.
Usage
Sources of JSON text
Both parsers consume a JsonSource
which represents a character stream, which contains a JSON text, and the necessary methods to consume it.
This library provides the ReaderJsonSource
that processes the characters that are yielded by a given Reader
.
Reader reader = ...
// create a new json source for reader
JsonSource jsonSource = new ReaderJsonSource(reader);
Additionally, this library provides the StringJsonSource
and the CharacterArrayJsonSource
which process the characters from an existing string or an existing char[]
respectively.
String string = ...
char[] chars = ...
// create new json sources for string and chars
JsonSource jsonSource1 = new StringJsonSource(string);
JsonSource jsonSource2 = new CharacterArrayJsonSource(chars);
It's usually not necessary to create a
JsonSource
directly, because both parsers have convenient constructors, that create an appropriateJsonSource
.
Push parser
A JsonPushParser
takes a JsonHandler
and calls the appropriate callback methods while processing a JSON text.
This library provides the default implementation DefaultJsonPushParser
, which consumes a JsonSource
to process JSON text.
JsonSource jsonSource = ...
JsonHandler jsonHandler = ...
// creates a new parser for jsonSource
JsonPushParser jsonPushParser = new DefaultJsonPushParser(jsonSource);
// consumes jsonSource and reports events to jsonHandler
jsonPushParser.handle(jsonSource)
Creating a document model
This gist a pair of simple JsonHandlers
, which can be used to create a document model that consists of the well known JSONArrays
and JSONObjects
from the reference JSON library.
If the root structure of the JSON document is a JSON array:
Reader reader = ...;
// create a new json array from the file content
JSONArray jsonArray = new JsonPushParser(reader).handle(new ArrayHandler());
If the root structure of the JSON document is a JSON object:
Reader reader = ...;
// create a new json object from the file content
JSONObject jsonObject = new JsonPushParser(reader).handle(new ObjectHandler());
Pull parser
A JsonPullParser
, when asked, reports it's JsonState
, which reflects the immediate future (ARRAY_BEGIN
, NULL
, BOOLEN
, ..., ARRAY_END
, ...) of the processed JSON text and determines the appropriate method to be called on the JsonPullParser
.
This library provides the default implementation DefaultJsonPullParser
, which consumes a JsonSource
to process a JSON text.
JsonSource jsonSource = ...
// creates a new parser for jsonSource
JsonPullParser jsonPullParser = new DefaultJsonPullParser(jsonSource);
// consumes jsonSource and events
loop: while(true) {
// examine current state
switch(jsonPullParser.currentState()) {
case ARRAY_BEGIN:
jsonPullParser.beginArray();
...
break;
...
case SOURCE_END:
break loop;
}
}
Assuming the structure of the JSON document is known beforehand (e.g a JSON array containing JSON strings), it isn't necessary to call currentState()
:
// consumes a jsonarray of strings
jsonPullParser.beginDocument();
jsonPullParser.beginArray();
while(jsonPullParser.hasNextElement()) {
tags.add(jsonPullParser.nextString());
}
jsonPullParser.endArray();
jsonPullParser.endDocument();
Creating a document model
This gist a pair of simple JsonHandlers
, which can be used to create a document model that consists of the well known JSONArrays
and JSONObjects
from the reference JSON library.
This gist shows a pair of simple helper methods, that can be used to create a document model that consists of the well known JSONArrays
and JSONObjects
from the reference JSON library.
If the root structure of the JSON document is a JSON array:
Reader reader = ...;
// create a new json array from the file content
JSONArray jsonArray = JsonUtil.readArray(reader);
If the root structure of the JSON document is a JSON object:
Reader reader = ...;
// create a new json object from the file content
JSONArray jsonArray = JsonUtil.readObject(reader);
Skipping values
A JsonPullParser
can be instructed to skip the current value. If the current value state is ARRAY_BEGIN
or OBJECT_BEGIN
the whole JSON array or JSON object is skipped. Inside of an JSON object, a value may be skipped before or after the name has been pulled.
while(jsonPullParser.hasNext()) {
switch(jsonPullParser.currentState()) {
case NAME:
String name = jsonPullParser.nextName();
if(name.equals("optional_content") {
// ignore optional content
jsonPullParser.skipValue();
}
...
break;
...
}
}
Reading strings efficiently
A JsonPullParser
can be used to create a Reader
for a JSON string. This may yield considerable performance improvements if a JSON document contains large strings that need to be further processed (e.g. a Base64-encoded image), because no String
object has to be created for the intermediate value.
while(jsonPullParser.hasNext()) {
switch(jsonPullParser.currentState()) {
case NAME:
String name = jsonPullParser.nextName();
if(name.equals("image") {
Reader imageReader = jsonPullParser.readString();
image = ImageIO.read(new Base64InputStream(new ReaderInputStream(imageReader)));
}
...
break;
...
}
}
Performance comparison
The following table shows the results of a performance test with 1000 iterations that compares multiple methods of creating a document model:
- Using a
DefaultJsonPushParser
with aReaderJsonSource
for aFileReader
and the abovementionedArrayHandler
. - Using a
DefaultJsonPushParser
with aStringJsonSource
for a preloadedString
. - Using a
DefaultJsonPullParser
with aReaderJsonSource
for aFileReader
and the abovementionedJsonUtil
. - Using a
DefaultJsonPullParser
with aStringJsonSource
for a preloadedString
. - Using the constructor for a
JSONArray
with aJSONTokener
for aFileReader
. - Using the constructor for a
JSONArray
with aJSONTokener
for a preloadedString
.
Parser | Source | test0.json (~2.5kB) |
test1.json (~25.0kB) |
test2.json (~250.0kB) |
test3.json (~2.5MB) |
---|---|---|---|---|---|
DefaultJsonPushParser |
Reader |
0.15 | 0.37 | 3.36 | 33.18 |
DefaultJsonPushParser |
String |
0.05 | 0.34 | 3.15 | 31.51 |
DefaultJsonPullParser |
Reader |
0.09 | 0.39 | 3.56 | 35.57 |
DefaultJsonPullParser |
String |
0.05 | 0.36 | 3.46 | 34.71 |
JSONArray |
Reader |
0.08 | 0.59 | 5.74 | 57.88 |
JSONArray |
String |
0.06 | 0.54 | 5.31 | 53.48 |
The four JSON files of different size have been created with a random JSON generator.
Both parsers provided by this library are, for sufficiently large JSON documents, about 42% faster than the reference implementation. The relation between the size of the JSON input and the parsing duration is linear. Preloading the content of a JSON file into a String
instead of using a FileReader
yield no significant performance improvement.