annoy4s
A JNA wrapper around spotify/annoy which calls the C++ library of annoy directly from Scala/JVM.
Installation
For linux-x86-64 or Mac users, just add the library directly as:
libraryDependencies += "net.pishen" %% "annoy4s" % "0.10.0"
If you meet an error like below when using annoy4s, you may have to compile the native library by yourself.
java.lang.UnsatisfiedLinkError: Unable to load library 'annoy': Native library
To compile the native library and install annoy4s on local machine:
- Clone this repository.
- Check the values of
organization
andversion
inbuild.sbt
, you may change it to the value you want, it's recommended to letversion
have the-SNAPSHOT
suffix. - Run
compileNative
in sbt (Note that g++ installation is required). - Run
test
in sbt to see if the native library is successfully compiled. - Run
publishLocal
in sbt to install annoy4s on your machine.
Now you can add the library dependency as (organization and version may be different according to your settings):
libraryDependencies += "net.pishen" %% "annoy4s" % "0.10.0-SNAPSHOT"
The library file generated by the g++ command in compileNative
can also be installed independently on your machine. Please reference to library search paths for more details on how to make JNA able to load the library.
Usage
Create and query the index in memory mode:
import annoy4s._
val annoy = Annoy.create[Int]("./input_vectors", numOfTrees = 10, metric = Euclidean, verbose = true)
val result: Option[Seq[(Int, Float)]] = annoy.query(itemId, maxReturnSize = 30)
- The format of
./input_vectors
is<item id> <vector>
for each line, here is an example:
3 0.2 -1.5 0.3
5 0.4 0.01 -0.5
0 1.1 0.9 -0.1
2 1.2 0.8 0.2
<item id>
could beInt
,Long
,String
, orUUID
, just change the type parameter atAnnoy.create[T]
. You can also implement aKeyConverter[T]
by yourself to support your own type.metric
could beEuclidean
,Angular
,Manhattan
orHamming
.result
is a tuple list of id and distances, where the query item is itself contained.
To use the index in disk mode, one need to provide an outputDir
:
val annoy = Annoy.create[Int]("./input_vectors", 10, outputDir = "./annoy_result/", Euclidean)
val result: Option[Seq[(Int, Float)]] = annoy.query(itemId, maxReturnSize = 30)
annoy.close()
// load an created index
val reloadedAnnoy = Annoy.load[Int]("./annoy_result/")
val reloadedResult: Option[Seq[(Int, Float)]] = reloadedAnnoy.query(itemId, 30)