BanzaiDB Developer HOWTO

In addition to what is described here, this document by Jeff Forcier and this talk from Carl Meyer provide wonderful footings for developing on/in open source projects.

Maintaining a consistent development environment

1) Ensure all development in performed within a virtualenv. A good way too bootstrap this is via virtualenv-burrito.

Execute the installation using:

$ curl -sL | $SHELL

2) Make a virtualenv called BanzaiDB:

$ mkvirtualenv BanzaiDB

3) Install autoenv:

$ git clone git:// ~/.autoenv
$ echo 'source ~/.autoenv/' >> ~/.bashrc

Get the current code from GitHub

Something like this:

$ git clone

Install dependencies

Something like this:

$ cd BanzaiDB
$ # Assuming you installed autoenv -
$ # You'll want to say 'y' as this will activate the virtualenv each time you enter the code directory
$ # Otherwise -
$ # workon BanzaiDB
$ pip install -r requirements.txt
$ pip install -r requirements-dev.txt

Familiarise yourself with the code

The BanzaiDB/ is the core module. It handles database insertion, deletion and updating.

For example:

$ ~/BanzaiDB/BanzaiDB$ python -h
usage: [-h] [-v] {init,populate,update,query} ...

BanzaiDB v 0.3.0 - Database for Banzai NGS pipeline tool (

positional arguments:
                        Available commands:
    init                Initialise a DB
    populate            Populates a database with results of an experiment
    update              Updates a database with results from a new experiment
    query               List available or provide database query functions

optional arguments:
    -h, --help            show this help message and exit
    -v, --verbose         verbose output

Licence: ECL 2.0 by Mitchell Stanton-Cook <>

Listing help on populate:

$ python populate -h
usage: populate [-h] {qc,mapping,assembly,ordering,annotation} run_path

positional arguments:
                    Populate the database with data from the given pipeline step
    run_path              Full path to a directory containing finished experiments from a pipeline run

optional arguments:
    -h, --help            show this help message and exit

The fabfile (Fabric file) in fabfile directory contains query pre-written functions.

You can list them like this:

$ ~/BanzaiDB$ fab -l
Available commands:

    variants.get_variants_by_keyword           Return variants with a match in the "Product" with the regular_expression
    variants.get_variants_in_range             Return all the variants in given [start:end] range (inclusive of)
    variants.plot_variant_positions            Generate a PDF of SNP positions for given strains using GenomeDiagram
    variants.strain_variant_stats              Print the number of variants and variant classes for all strains
    variants.variant_hotspots                  Return the (default = 100) prevalent variant positions
    variants.variant_positions_within_atleast  Return positions that have at least this many variants
    variants.what_differentiates_strains       Provide variant positions that differentiate two given sets of strains

Note: python query simply calls the fabfile discussed above.

Development workflow

Use GitHub. You will have already cloned the BanzaiDB repo (if you followed instructions above). To make things easier, please fork ( and update your local copy to point to your fork.

Something like this:

$ # Assuming your fork is like this
$ vi .git/config
$ # Replace:
    $ # url =
$ #  with:
$ # url =$YOUR_USERNAME/BanzaiDB.git

With this setup you will be able to push development changes to your fork and submit Pull Requests to the core BanzaiDB repo when you’re happy.

Important Note: Upstream changes will not be synced to your fork by default. Please, before submitting a pull request please sync your fork with any upstream changes (specifically handle any merge conflicts). Info on syncing a fork can be found here.

Code style/testing/Continuous Integration

We try to make joining and/or modifying the BanzaiDB project simple.

  • As close to PEP8 as possible but I ain’t no Saint. Just a long as it’s clean and readable,
  • Using standard lib UnitTest. There are convenience functions & tests/ respectively. We would prefer SMART test vs 100 % coverage.
In the master GitHub repository we use hooks that call:
  • (code QC)
  • (continuous integration)
  • ReadTheDocs (documentation building)