+++ 2015-03-24: Streamflyer 1.2.0 released with a new groupId. New package names everywhere! +++
+++ 2015-03-24: Streamflyer has a new home on GitHub because Google Code is closing. +++
+++ 2014-10-08: Streamflyer 1.1.3 released. Available in Maven Central. +++
+++ 2013-11-10: New wiki page: How to implement a custom modifier for release 1.1.1 +++
+++ 2013-03-10: Regular expression on InputStream
: Differences to Java Regex explained +++
What it does
Wraps Java's Reader and Writer to modify characters in a stream - to apply regular expressions, to fix XML documents, whatever you want to do. Streamflyer is a convenient alternative to Java's FilterReader and FilterInputStream.
Contents
Usage
An example:
// choose the character stream to modify
Reader originalReader = ... // this reader is connected to the original data source
// select the modifier of your choice
Modifier myModifier = new RegexModifier("edit(\\s+)stream", Pattern.CASE_INSENSITIVE, "modify$1stream");
// create the modifying reader that wraps the original reader
Reader modifyingReader = new ModifyingReader(originalReader, myModifier);
... // use the modifying reader instead of the original reader
In this example the chosen Modifier replaces the string "edit stream" with "modify stream" while preserving the white space between edit and stream. You can write your own custom modifier or use a modifier that is shipped with Streamflyer, like the RegexModifier that replaces characters by using regular expressions.
The same can be done with a Writer instead of a Reader.
More information about the usage you find in the API documentation.
Implement custom modifiers
Read ImplementCustomModifier.
Compatibility to Java's Regular Expressions package
RegexModifier internally uses Java's Regex package. This is why it supports pattern flags, quantifiers, capturing groups the same way as Java does. An exception are look-behinds, see Section Known Limitations.
There is a small tutorial: AdvancedRegularExpressionsExample
Speed up your regular expressions
Have a look at streamflyer-regex-fast.
Fix invalid characters in XML streams
Sometimes you have to open XML documents that contain characters that are allowed in XML 1.1 documents but not allowed in XML 1.0 documents. And sometimes you have to open XML documents that contain characters that are entirely forbidden. For these kind of documents some pre-defined modifier exist so that the modified stream can be opened by standard XML parsers:
- InvalidXmlCharacterModifier - replaces the invalid characters
- XmlVersionModifier - fixes the XML version in the prolog of the XML stream
Modify byte streams
Streamflyer does not support modifications of byte streams out of the box. But you can convert your byte stream to a character stream, wrap the character stream by a modifying character stream, and then convert the character stream back to a byte stream. Don't expect an outstanding performance by this approach.
You find examples for modifying both InputStream and OutputStream on HowToModifyByteStreams.
Download
Go to the Installation page to get the latest release. This page provides also the Maven coordinates, prerequisites, and information about dependencies to other libraries.
Known limitations
RegexModifier
Look-behind constructs
If your regular expression contains look-behind constructs like
- ^
- \b
- \B
- (?<=X
- (?<!X)
then Streamflyer's behaviour (version 1.1.1) differs from the behaviour of Java's Regex package.
What exactly is the difference? Java's String.replaceAll() finds all matches in the original string and creates a modified string in parallel. In contrast to this, Streamflyer looks for the next match, applies the replacement on the original string, then looks for the next match behind the replacement. Therefore, if the regular expression contains look-behind constructs this can lead to varying results.
Examples:
Regex | Replacement | Input | Output (Java Regex) | Output (Streamflyer) |
---|---|---|---|---|
^a | (the empty string) | aaabb | aabb | bb |
(?<=foo)bar | foo | foobarbar | foofoobar | foofoofoo |
Streamflyer's behaviour is unexpected for Java users and ,therefore, this behaviour could be changed by the next major release. But as long nobody asks for a new release, as long no new major release is planned.
If you want to use look-behind constructs, please keep in mind that you can replace them with other expressions in many cases. As Streamflyer reads the entire stream, look-behind constructs are not of big use.
Boundary matcher \G
The boundary matcher that matches the end of the previous match (\G) is not supported yet.
XmlVersionModifier
This modifier does not work for XML documents with a prolog that contains more than 4096 characters.
Questions, Suggestions, Issues
Questions and suggestions are welcome and can be sent to the discussion group. Issues can be reported on the Issues page of this project.
Some answered questions can be found in the FAQ.
Please give me feedback of any kind. It is highly appreciated.
Future enhancements, third party modifiers
The next major release will change the behaviour of RegexModifier regarding Look-behind constructs.
Please let us know if you made a modifier that could be useful for others. Such modifiers could ...
- normalize unicode, i.e. transform characters into their canonical composed or decomposed form
- include nested content, i.e. markup in the stream is replaced with the content of another stream which itself can contain such markup
If you find typos in the API documentation let me know.
Acknowledgments
The logo is based on drafts by K. Dabels.