Scala Burst Trie

Scala implementation of a Burst Trie

License	License Apache 2
Categories	Categories Scala Languages
GroupId	GroupId com.nefariouszhen.trie
ArtifactId	ArtifactId scala-burst-trie_2.9.3
Last Version	Last Version 0.1
Release Date	Release Date Apr 4, 2014
Type	Type jar
Description	Description Scala Burst Trie Scala implementation of a Burst Trie
Project URL	Project URL https://github.com/nbauernfeind/scala-burst-trie
Project Organization	Project Organization com.nefariouszhen.trie
Source Code Management	Source Code Management https://github.com/nbauernfeind/scala-burst-trie

Download scala-burst-trie_2.9.3

Filename	Size
scala-burst-trie_2.9.3-0.1.pom
scala-burst-trie_2.9.3-0.1.jar	48 KB
scala-burst-trie_2.9.3-0.1-sources.jar	2 KB
scala-burst-trie_2.9.3-0.1-javadoc.jar	249 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/com.nefariouszhen.trie/scala-burst-trie_2.9.3/ -->
<dependency>
    <groupId>com.nefariouszhen.trie</groupId>
    <artifactId>scala-burst-trie_2.9.3</artifactId>
    <version>0.1</version>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/com.nefariouszhen.trie/scala-burst-trie_2.9.3/
implementation 'com.nefariouszhen.trie:scala-burst-trie_2.9.3:0.1'

Gradle Kotlin

// https://jarcasting.com/artifacts/com.nefariouszhen.trie/scala-burst-trie_2.9.3/
implementation ("com.nefariouszhen.trie:scala-burst-trie_2.9.3:0.1")

Apache Buildr

'com.nefariouszhen.trie:scala-burst-trie_2.9.3:jar:0.1'

Apache Ivy

<dependency org="com.nefariouszhen.trie" name="scala-burst-trie_2.9.3" rev="0.1">
  <artifact name="scala-burst-trie_2.9.3" type="jar" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='com.nefariouszhen.trie', module='scala-burst-trie_2.9.3', version='0.1')
)

Scala SBT

libraryDependencies += "com.nefariouszhen.trie" % "scala-burst-trie_2.9.3" % "0.1"

Leiningen

[com.nefariouszhen.trie/scala-burst-trie_2.9.3 "0.1"]

Dependencies

compile (1)

Group / Artifact	Type	Version
org.scala-lang : scala-library	jar	2.9.3

Project Modules

There are no modules declared in this project.

Scala Burst Trie

This is an implementation of Burst Tries. It is enhanced to also take advantage of techniques that are used in GWT's implementation of PrefixTree. I used this implementation on Stripe's CTF 3, level 3 in my fastest multi-host solution.

Maven Setup

<dependency>
  <groupId>com.nefariouszhen.trie</groupId>
  <artifactId>scala-burst-trie_${scala.binary.version}</artifactId>
  <version>0.2</version>
</dependency>

Note: This artifact is cross compiled against multiple versions of scala and follows the latest scala-version naming conventions.

Getting Started

def indexContent(getKey: T => String, content: Iterable[T]): BurstTrie[T] = {
  val trie = BurstTrie.newMap[T]()
  content.foreach(c => trie.put(getKey(c), c))
  trie
}

def queryContent(prefixToFind: String, trie: BurstTrie[T]) {
    trie.query(prefixToFind).foreach(println)
}

You may also be interested in the implementation parameterizations newMultiMap[T], newSuffixMap[T], newSet, and newSuffixSet.

Parameterization

The burstFactor (default = 10000, require >= 0) is the number of entries to store in a container node (i.e. leaf) before converting the container node into an access node (by splitting up the container into many different containers). Note that a traditional trie has a burstFactor of 0 (i.e. no container nodes)!
The growthFactor (default = 2, require > 0) is how quickly the length of the prefix chunk grows. At every depth of the tree, there is an explicit depth of how many characters to strip off of the key and use as a key-slice into the local node's internal structure(s). GWT's implementation is a fixed factor of 2. A traditional trie has a fixed factor of 1.
The allowDuplicateKeys (default = true), when using the multiMap variants, determines whether or not for the exact same (key, value) pair whether or not the map will store an additional entry. This defaults to true because of the performance hit that is taken when it is off (i.e. On internal nodes, instead of using an Array to store each value, to prevent duplicates it would instead use a HashSet). Turning this feature off is, clearly, much more costly than delegating duplicate prevention to the indexing caller.

Suffix Trees

val suffixMap = BurstTrie.newSuffixMap[T]()

Suffix maps allow you to find all of your content by any substring very efficiently. However, the traditional use case, indexing word placement in documents, needs to be implemented with care. Specifically, don't make a map from key to position. Instead, make a map from key to an object that represents that word and all of its positions. This way, all of the positions are listed once, instead of once for every suffix (i.e. it reduces memory needs by a factor of key.length).

Thread-Safety Warning

Don't use multithreaded writers, and don't read while you're writing. The code is currently not safe for such operations. If you'd be interested in a thread-safe implementation of this, please let me know and I can try to work something out.

Versions

Version
0.1 Apr 4, 2014

Scala Burst Trie

License

Categories

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Project Organization

Source Code Management

Download scala-burst-trie_2.9.3

How to add to project

Dependencies

compile (1)

Project Modules

Scala Burst Trie

Maven Setup

Getting Started

Parameterization

Suffix Trees

Thread-Safety Warning

Versions