KCBO – A Bayesian Data Analysis Toolkit

The goal of KCBO is to provide an easy to use, Bayesian framework to the masses.

KCBO is a toolkit for anyone who wants to do Bayesian data analysis without worrying about the implementation details for a certain test.

The Bayesian school of statistics offers a lot of incredible benefits. Bayesian analysis yields rich, meaniningful results and provides intuitive answers that should be available to anyone.

Currently KCBO is very much alpha/pre-alpha software and only implements a three tests. Here is a list of future objectives for the project. I have high hopes that soon KCBO will implement all the usual tests and that it will grow into something very powerful for analysts.

You can checkout the Github repo here. Feature requests, issues, criticisms and contributions are welcome.


KCBO is available through PyPI and on Github here.

Installation from PyPI:

pip install kcbo  

Installation from Source:

git clone https://github.com/HHammond/kcbo  
cd kcbo  
python setup.py sdist install  

If any of this fails, you may need to install numpy (pip install numpy) in order to install some dependencies of KCBO, then retry installing it.

Currently Available Tests

There are currently three tests implemented in the KCBO library:

  • Lognormal-Difference of medians: used to compare medians of log-normal distributed data with the same variance. The one caveat to this test is that since KCBO uses a conjugate prior to the lognormal and hence assumes that all data has the same variance.

  • Bayesian t-Test: an implementation of Kruschke's t-Test used to compare differences in mean and standard deviation for normally distributed samples.

  • Conversion Test or Beta-Binomial difference test: test of conversion to success using the Beta-Binomial model. Popular in A/B testing and estimation.

Documentation for these tests will be available on Github soon.

Example Usage

from kcbo import lognormal_comparison_test, t_test, conversion_test  

Lognormal-Difference of Medians test:

Because this test uses an MC simulation on the lognormal distribution conjugate prior, it assumes that both distributions have the same variance.

  1. Generate some data:
    A = {'group':'A', 'trials': 10000, 'successes':5000}  
    B = {'group':'B', 'trials': 8000, 'successes':4090}
    df = pd.DataFrame([A,B])  
  2. Run the test:
    summary, data = conversion_test(df, groupcol='group',successcol='successes',totalcol='trials')  
  3. Print your summary output:
    print summary  
                           Beta-Binomial Conversion Rate Test                       
    Groups: A, B
    | Group   |   Estimate |   95% CI Lower |   95% CI Upper |
    | A       |   0.49998  |       0.490183 |       0.509817 |
    | B       |   0.511237 |       0.500258 |       0.522202 |
    | Hypothesis   |   Difference |   P.Value |   95% CI Lower |   95% CI Upper |
    | A < B        |    0.0112434 |   0.93361 |     -0.0033844 |      0.0259017 |

And an example of what you can do with the raw data returned:

import seaborn as sns  
from matplotlib import pyplot as plt  
%matplotlib inline

A = data['A']['distribution']  
B = data['B']['distribution']  
diff = data[('A','B')]['distribution']

f, axes = plt.subplots(1,2, figsize=(12, 7))  
sns.set(style="white", palette="muted")  

sns.distplot(A, ax=axes[0], label='Density Estimate for A')  
sns.distplot(B, ax=axes[0], label='Density Estimate for B')  
sns.distplot(diff, ax=axes[1], label='Difference of Densities (B-A)')




A more complete API will be coming to the github repo soon.

I'm looking forward to any feedback and criticisms :)