Getting Started

A central aim of git2net is allowing you to conveniently obtain and visualise network projections of editing activity in git repositories. Let’s have a look at an example on how you can achieve this:

import git2net
import pathpy as pp

github_url = 'gotec/git2net'
git_repo_dir = 'git2net4analysis'
sqlite_db_file = 'git2net4analysis.db'

# Clone and mine repository from GitHub
git2net.mine_github(github_url, git_repo_dir, sqlite_db_file)

# Disambiguate author aliases in the resulting database
git2net.disambiguate_aliases_db(sqlite_db_file)

# Obtain temporal bipartite network representing authors editing files over time
t, node_info, edge_info = git2net.get_bipartite_network(sqlite_db_file, time_from=time_from)

# Aggregate to a static network
n = pp.Network.from_temporal_network(t)

# Visualise the resulting network
colour_map = {'author': '#73D2DE', 'file': '#2E5EAA'}
node_color = {node: colour_map[node_info['class'][node]] for node in n.nodes}
pp.visualisation.plot(n, node_color=node_color)

In the example above, we used three functions of git2net. First, we extract edits from the repository using mine_github. Then, we disambiguate author identities using disambiguate_aliases_db. Finally, we visualise the bipartite author-file network with get_bipartite_network.

Corresponding to the calls above, git2net’s functionality is partitionied into three modules: extraction, disambiguation, visualisation, and complexity. We outline the most important functions of each module here. For a comprehensive details on all functions of git2net we refer to the API reference.

Tutorials

To help you get started, we provide an extensive set of tutorials covering different aspects of analysing your repository with git2net. You can directly interact with the notebooks in Binder, or view them in NBViewer via the links below.

In addition, we provide links to the individual tutorial notebooks in the tabs below:

We show how to clone and prepare a git repository for analysis with git2net.

Usage Examples

We have published some motivating results as well as details on the mining algorithm in “git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories”.

In “Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net”, we use git2net to mine more than 1.2 million commits of over 25,000 developers. We use this data to test a hypothesis on the relation between developer productivity and co-editing patterns in software teams.

Finally, in “Big Data = Big Insights? Operationalising Brooks’ Law in a Massive GitHub Data Set”, we mine a corpus containing over 200 GitHub repositories using git2net. Based on the resulting data, we study the relationship between team size and productivity in OSS development teams. If you want to use this extensive data set for your own study, we made it publicly available on zenodo.org.