repodriller

Framework for researchers in MSR

License

License

GroupId

GroupId

org.repodriller
ArtifactId

ArtifactId

repodriller
Last Version

Last Version

2.0.1
Release Date

Release Date

Type

Type

jar
Description

Description

repodriller
Framework for researchers in MSR
Project URL

Project URL

http://github.com/mauricioaniche/repodriller
Source Code Management

Source Code Management

http://github.com/mauricioaniche/repodriller

Download repodriller

How to add to project

<!-- https://jarcasting.com/artifacts/org.repodriller/repodriller/ -->
<dependency>
    <groupId>org.repodriller</groupId>
    <artifactId>repodriller</artifactId>
    <version>2.0.1</version>
</dependency>
// https://jarcasting.com/artifacts/org.repodriller/repodriller/
implementation 'org.repodriller:repodriller:2.0.1'
// https://jarcasting.com/artifacts/org.repodriller/repodriller/
implementation ("org.repodriller:repodriller:2.0.1")
'org.repodriller:repodriller:jar:2.0.1'
<dependency org="org.repodriller" name="repodriller" rev="2.0.1">
  <artifact name="repodriller" type="jar" />
</dependency>
@Grapes(
@Grab(group='org.repodriller', module='repodriller', version='2.0.1')
)
libraryDependencies += "org.repodriller" % "repodriller" % "2.0.1"
[org.repodriller/repodriller "2.0.1"]

Dependencies

compile (9)

Group / Artifact Type Version
com.google.guava : guava jar 18.0
org.apache.logging.log4j : log4j-slf4j-impl jar 2.10.0
org.slf4j : slf4j-api jar 1.7.25
org.apache.logging.log4j : log4j-core jar 2.10.0
org.apache.commons : commons-lang3 jar 3.3.2
commons-io : commons-io jar 2.4
org.eclipse.jgit : org.eclipse.jgit jar 4.8.0.201706111038-r
com.thoughtworks.xstream : xstream jar 1.4.7
org.tmatesoft.svnkit : svnkit jar 1.8.10

test (2)

Group / Artifact Type Version
org.mockito : mockito-all jar 1.10.8
junit : junit jar 4.12-beta-3

Project Modules

There are no modules declared in this project.

(Before looking into RepoDriller, I suggest you to check Pydriller, a Python version of RepoDriller, which is now faster and easier to use! I am keeping this repo here for historical purposes, but I don't plan to update it anymore!)

RepoDriller

Build Status

RepoDriller is a Java framework that helps developers on mining software repositories. With it, you can easily extract information from any Git repository, such as commits, developers, modifications, diffs, and source codes, and quickly export CSV files.

Take a look at our manual folder and our many examples. Or talk to us in our mailing list.

Advice to researchers

Difficulties in mining git

You should read this paper:

  • Bird, Christian, et al. "The promises and perils of mining git." Mining Software Repositories, 2009. MSR'09. 6th IEEE International Working Conference on. IEEE, 2009. Link.

FAQs

Why use an MSR framework?

There's no question that Mining Software Repositories (MSR) studies benefit from automation. The datasets are too large to analyze manually.

So the choice is whether to use an MSR framework or to write your own scripts. An MSR framework offers two benefits:

  • The researcher can focus on their questions and not on the infrastructure.
  • Coding against a framework improves standardization and therefore reproducibility (see Robles, Gregorio. "Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings." Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on. IEEE, 2010.).

How is RepoDriller different from other MSR frameworks?

RepoDriller is a minimalist's MSR framework, a lightweight tool for flexible analysis.

  • RepoDriller is lightweight:
    1. It's a straightforward Java framework with the APIs you need -- no more, no less.
    2. You pay for storage and computation when you need to. No significant pre-processing stage, no giant database.
  • RepoDriller is flexible:
    1. Write arbitrary analyses in the popular Java programming language.
    2. RepoDriller has the right knobs -- tune which commits you visit, how much concurrency you want, etc.

Here's how it compares to some other MSR frameworks and tools:

  • GHTorrent lets you query GitHub events.
    1. You are restricted to querying projects on GitHub.
    2. You are restricted to the information exposed in a GitHub API.
  • Boa lets you query ASTs on a pre-defined set of repositories.
    1. You are restricted to the repositories tracked by Boa.
    2. You must write queries in the Boa language, largely against ASTs.
    3. If you roll your own Boa cluster, you are restricted to repositories with languages that Boa can import (i.e. parse into ASTs).
  • Alitheia Core is a scalable platform for MSR.
    1. Alitheia-Core is a heavyweight approach. You pay a lot of up-front costs (configuration, pre-processing, etc.) in exchange for a scalable analysis. If you're doing exploratory research, the overhead may not be worth it.
    2. Alitheia Core is no longer being maintained.

How do I cite RepoDriller?

For now, cite the repository.

Is there a discussion forum?

You can subscribe to our mailing list: https://groups.google.com/forum/#!forum/repodriller.

How do I contribute?

Required: Git, Maven.

git clone https://github.com/mauricioaniche/repodriller.git
cd repodriller/test-repos
unzip \*.zip

Then, you can:

  • compile : mvn clean compile
  • test : mvn test
  • eclipse : mvn eclipse:eclipse
  • build : mvn clean compile assembly:single

License

This software is licensed under the Apache 2.0 License.

Versions

Version
2.0.1
2.0.0
1.5.0
1.4.0
1.3.1
1.3.0
1.2.1
1.2.0
1.1.0
1.0.0