Sparql Integrate

Sparql-based tool for the integration of heterogeneous data

License	License Apache License 2.0
GroupId	GroupId org.aksw.sparql-integrate
ArtifactId	ArtifactId sparql-integrate-parent
Last Version	Last Version 1.0.0
Release Date	Release Date Jun 19, 2019
Type	Type pom
Description	Description Sparql Integrate Sparql-based tool for the integration of heterogeneous data
Project URL	Project URL https://github.com/SmartDataAnalytics/SparqlIntegrate
Source Code Management	Source Code Management https://github.com/SmartDataAnalytics/SparqlIntegrate

Download sparql-integrate-parent

Filename	Size
sparql-integrate-parent-1.0.0.pom	13 KB
Browse

How to add to project

Apache Maven

<!-- https://jarcasting.com/artifacts/org.aksw.sparql-integrate/sparql-integrate-parent/ -->
<dependency>
    <groupId>org.aksw.sparql-integrate</groupId>
    <artifactId>sparql-integrate-parent</artifactId>
    <version>1.0.0</version>
    <type>pom</type>
</dependency>

Gradle Groovy

// https://jarcasting.com/artifacts/org.aksw.sparql-integrate/sparql-integrate-parent/
implementation 'org.aksw.sparql-integrate:sparql-integrate-parent:1.0.0'

Gradle Kotlin

// https://jarcasting.com/artifacts/org.aksw.sparql-integrate/sparql-integrate-parent/
implementation ("org.aksw.sparql-integrate:sparql-integrate-parent:1.0.0")

Apache Buildr

'org.aksw.sparql-integrate:sparql-integrate-parent:pom:1.0.0'

Apache Ivy

<dependency org="org.aksw.sparql-integrate" name="sparql-integrate-parent" rev="1.0.0">
  <artifact name="sparql-integrate-parent" type="pom" />
</dependency>

Groovy Grape

@Grapes(
@Grab(group='org.aksw.sparql-integrate', module='sparql-integrate-parent', version='1.0.0')
)

Scala SBT

libraryDependencies += "org.aksw.sparql-integrate" % "sparql-integrate-parent" % "1.0.0"

Leiningen

[org.aksw.sparql-integrate/sparql-integrate-parent "1.0.0"]

Dependencies

There are no dependencies for this project. It is a standalone project that does not depend on any other jars.

Project Modules

sparql-integrate-cli
sparql-integrate-debian-cli
sparql-integrate-web-service

RDF Processing Toolkit

RDF/SPARQL Workflows on the Command Line made easy. The toolkit provides the following commands for running SPARQL-queries on triple and quad based data

sparql-integrate: Ad-hoc querying and transformation of datasets featuring SPARQL-extensions for CSV, XML and JSON processing and JSON output that allows for building bash pipes in a breeze
ngs: Processor for named graph streams (ngs) which enables processing for collections of named graphs in streaming fashion. Process huge datasets without running into memory issues.

Example Usage

sparql-integrate allows one to load multiple RDF files and run multiple queries on them in a single invocation. Further prefixes from a snapshot of prefix.cc are predefined and we made the SELECT keyword of SPARQL optional in order to make scripting less verbose. The --jq flag enables JSON output for interoperability with the conventional jq tool

sparql-integrate loadFile.rdf update.sparql loadAnotherFile.rdf query.sparql

sparql-integrate --jq file.ttl '?s { ?s a foaf:Person }' | jq '.[].s'

ngs is your well known bash tooling such as head, tail, wc adapted to named graphs instead of lines of text

# Group RDF into graph based on consecutive subjects and for each named graph count the number of triples
cat file.ttl | ngs subjects | ngs map --sparql 'CONSTRUCT { ?s eg:triples ?c} { SELECT ?s COUNT(*) { ?s ?p ?o } GROUP ?s }

# Count number of named graphs
ngs wc file.trig

# Output the first 3 graphs produced by another command
./produce-graphs.sh | ngs head -n 3

Example Use Cases

Lodservatory implements SPARQL endpoint monitoring uses these tools in this script called from this git action.
Linked Sparql Queries provides tools to RDFize SPARQL query logs and run benchmark on the resulting RDF. The triples related to a query represent an instance of a sophisticated domain model and are grouped in a named graph. Depending on the input size one can end up with millions of named graphs describing queries amounting to billions of triples. With ngs one can easily extract complete samples of the queries' models without a related triple being left behind.

Building

The build requires maven.

mvn clean install

The all-in-one jar is built in the rdf-processing-toolkit-bundle folder, which is also the jar file available in the Releases Section.

java -cp rdf-processing-toolkit-bundle/target/rdf-processing-toolkit-bundle-VERSION-jar-with-dependencies.jar rpt

Installing the Debian packages can be easily accomplished using:

sudo dpkg -i $(find . -name "rdf-processing-toolkit*.deb")

The bare-metal approach is to manually start the tool from the 'rdf-processing-toolkit-cli/target` folder using:

java -cp ".:lib/*" "-Dloader.main=org.aksw.rdf_processing_toolkit.cli.main.MainCliRdfProcessingToolkit" "org.springframework.boot.loader.PropertiesLauncher" "your" "args"

License

The source code of this repo is published under the Apache License Version 2.0. Dependencies may be licensed under different terms. When in doubt please refer to the licenses of the dependencies declared in the pom.xml files.

Acknowledgements

This project is developed with funding from the QROWD H2020 project. Visit the QROWD GitHub Organization for more Open Source tools!

Smart Data Analytics

Software Projects by the Smart Data Analytics (SDA) Research Group - the README project contains links to other core projects

Versions

Version
1.0.0 Jun 19, 2019

Sparql Integrate

License

GroupId

ArtifactId

Last Version

Release Date

Type

Description

Project URL

Source Code Management

Download sparql-integrate-parent

How to add to project

Dependencies

Project Modules

RDF Processing Toolkit

Example Usage

Example Use Cases

Building

License

Acknowledgements

Smart Data Analytics

Versions