Mozkito is a general purpose framework that allows developers and data miners to effectively mine their own software archives without writing the same mining scripts over and over again.

The Brief

Mozkito is not a full-fledged mining tool that supports everything out of the box. Instead, our goal is to provide a uniform platform for many state-of-the-art mining version archive approaches and techniques. Its modular architecture allows easy extension and improvement for users, miners, and researchers. With Mozkito, we explicitly target researchers by providing a tool set of standard mining techniques and efforts but also provide a platform that allows them to make newly developed (and maybe published) mining approaches reproducible.

The Whole Story

Mining version archive became a popular research field. Version archive miner are collecting and analyzing software development artifacts to collect evidence—evidence on the effectiveness of testing, the quality of bug report, the role of software metrics, and so on. Once such evidence is found it is frequently used to build prediction or recommendation systems to support developers and quality managers.

But finding evidence on one project does not imply evidence on other projects. Prediction models and recommendation systems often depend on the software system, the programming language, the development process, or the software architecture. Thus, replicating published studies on mining version archives is important [1][2][3]. Replications confirm or contradict evidence presented by earlier studies. It is good practice to publish data sets containing the evidence to allow exact study replications. But what to do if the data sets were not published (e.g. due to confidential content) or if the replication study is conducted on other subject projects? The only solution is to re-implement the original study including the data collection routines and possible supporting tools.

The goal of Mozkito is to provide a flexible and extensible framework for mining software archives allowing researchers and developers to share their mining experiences and code bases. Why re-implementing standard data mining techniques over an over again? Why not sharing source code allowing the community to benefit from already published approaches and to allow the community to advance more quickly? Implement your mining approaches and techniques in Mozkito and share your code base. Allow exact replication on different projects with or without publishing your data sets.

Where to go from here

Mozkito is a module based framework. Each Mozkito-Module is self-contained and implements new functionality based on other Mozkito-Modules. For beginners, please refer to our "Getting started" guide in our documentation page.

Open Source

Mozkito is open source software published under the Apache License, Version 2.0.