spark-mapper


License

License

GroupId

GroupId

com.github.log0ymxm
ArtifactId

ArtifactId

spark-mapper_2.11
Last Version

Last Version

1.0.0
Release Date

Release Date

Type

Type

jar
Description

Description

spark-mapper
spark-mapper
Project URL

Project URL

https://github.com/log0ymxm/spark-mapper
Project Organization

Project Organization

com.github.log0ymxm
Source Code Management

Source Code Management

https://github.com/log0ymxm/spark-mapper

Download spark-mapper_2.11

How to add to project

<!-- https://jarcasting.com/artifacts/com.github.log0ymxm/spark-mapper_2.11/ -->
<dependency>
    <groupId>com.github.log0ymxm</groupId>
    <artifactId>spark-mapper_2.11</artifactId>
    <version>1.0.0</version>
</dependency>
// https://jarcasting.com/artifacts/com.github.log0ymxm/spark-mapper_2.11/
implementation 'com.github.log0ymxm:spark-mapper_2.11:1.0.0'
// https://jarcasting.com/artifacts/com.github.log0ymxm/spark-mapper_2.11/
implementation ("com.github.log0ymxm:spark-mapper_2.11:1.0.0")
'com.github.log0ymxm:spark-mapper_2.11:jar:1.0.0'
<dependency org="com.github.log0ymxm" name="spark-mapper_2.11" rev="1.0.0">
  <artifact name="spark-mapper_2.11" type="jar" />
</dependency>
@Grapes(
@Grab(group='com.github.log0ymxm', module='spark-mapper_2.11', version='1.0.0')
)
libraryDependencies += "com.github.log0ymxm" % "spark-mapper_2.11" % "1.0.0"
[com.github.log0ymxm/spark-mapper_2.11 "1.0.0"]

Dependencies

compile (6)

Group / Artifact Type Version
org.scala-lang : scala-library jar 2.11.8
org.apache.spark : spark-core_2.11 jar 2.1.0
org.apache.spark : spark-mllib_2.11 jar 2.1.0
org.apache.spark : spark-graphx_2.11 jar 2.1.0
org.apache.spark : spark-sql_2.11 jar 2.1.0
org.scalanlp : breeze_2.11 jar 0.12

test (3)

Group / Artifact Type Version
com.holdenkarau : spark-testing-base_2.11 jar 2.1.0_0.6.0
org.scalatest : scalatest_2.11 jar 2.2.4
org.specs2 : specs2-core_2.11 jar 3.8.7

Project Modules

There are no modules declared in this project.

Spark Mapper

Build status codecov Maven Central

Mapper is a topological data anlysis technique for estimating a lower dimensional simplicial complex from a dataset. It was initially described in the paper "Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition." [1]

Concentric Circles MNIST Twos
Concentric circles MNIST

Things to do

  • Improve the handling of pairwise distances. This is likely the largest bottleneck for large datasets.
  • Implement some useful filter functions: Gaussian Density, Graph Laplacian, etc
  • Implement different methods for choosing cluster cutoff. There's a few simple ones we can try, and the scale graph idea.
  • Explore using a distributed clustering algorithm. Currently clustering is local for each cover segment, which means that as data grows you need to increase the cover intervals proportionally to keep the partitions within memory. A distributed cluster would remove this requirement.

Related Software

References

  1. G. Singh, F. Memoli, G. Carlsson (2007). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Point Based Graphics 2007, Prague, September 2007.
  2. Daniel Müllner and Aravindakshan Babu, Python Mapper: An open-source toolchain for data exploration, analysis and visualization, 2013, URL http://danifold.net/mapper

Versions

Version
1.0.0