binney.run.run package

The binney.run.run.BinneyRun class is what you use to run models. All settings for the model are encapsulated when you initialize a BinneyRun.

Run Module

class BinneyRun(df, col_success, col_total, covariates=None, splines=None, solver_method='scipy', solver_options=None, data_type='bernoulli', col_group=None, coefficient_prior_var=1.0)[source]
__init__(df, col_success, col_total, covariates=None, splines=None, solver_method='scipy', solver_options=None, data_type='bernoulli', col_group=None, coefficient_prior_var=1.0)[source]

Create a model run with binney. The model can handle either binomial data or Bernoulli data. If you have binomial data, your data will look something like “k successes out of n trials” – binney needs to know both k and n. If you have Bernoulli data, your data will look like “individual-record” or “unit-record” data with 1’s and 0’s. The data needs to be in the same form as the bernoulli type, however all of the “n trials” will be 1, and then the outcome in “success” is either 1 or 0. See the Jupyter notebooks in this repository for an example of Binomial data. The model looks like this in either case:

\[k_i \sim Binomial(n_i, p_i)\]

but where \(n_i = 1\) if you have Bernoulli data. The goal is to estimate \(p\) where \(p\) is the expit of some linear predictor, which may also contain include splines for different covariates. The linear predictor will automatically include an intercept, so do not specify one in your covariates.

This run class will create uncertainty with the bootstrap method. The particular type of bootstrap re-sampling will depend on whether you have binomial or Bernoulli type data. It is not enforced strictly, but do not mix the two types of data, as it will give inaccurate uncertainty quantification.

If you pass in a group column name, then it will fit multiple models. First, it will fit a model with all of the data. Then it will use those parameter estimates as Gaussian priors for each of the individual groups in the data. You can put more or less weight on these priors using the coefficient prior variance argument, coefficient_prior_var. If you want to give more weight to the prior, decrease this. If you want to give more weight to the group-specific data, increase this.

Parameters
  • df (DataFrame) – A pandas data frame with all of the columns in covariates, splines, and col_success and col_total.

  • col_success (str) – The column name of the number of successes (\(k\)).

  • col_total (str) – The column name of the number of trials (\(n\)).

  • covariates (Optional[List[str]]) – A list of column names for covariates to use.

  • splines (Optional[Dict[str, Dict[str, Any]]]) –

    A dictionary of spline covariates, each of which is a dictionary of spline specifications. For example,

    splines = {
        'x1': {
            'knots_type': 'domain',
            'knots_num': 3,
            'degree': 3,
            'convex': True
        }
    }
    

    The list of available options for splines is:

    • knots_type (str): type of knots, one of “domain” or “frequency”

    • knots_num (int): number of knots

    • degree (int): degree of the spline

    • r_linear (bool): include linear tails on the right

    • l_linear (bool): include linear tails on the left

    • increasing (bool): impose monotonic increasing constraint on spline shape

    • decreasing (bool): impose monotonic decreasing constraint on spline shape

    • concave (bool): impose concavity constraint on spline shape

    • convex (bool): impose convexity constraint on spline shape

  • solver_method (str) – Type of solver to use, one of “ipopt” (interior point optimizer – use this if you have spline shape constraints), or “scipy”.

  • solver_options (Optional[Dict[str, Any]]) – A dictionary of options to pass to your desired solver.

  • data_type (str) – The data type: one of “bernoulli” or “binomial”

  • col_group (Optional[str]) – An optional column name to define data groups.

  • coefficient_prior_var (float) – An optional float to be used if you’re passing in a col_group that determines the variance assigned to the prior when passing down priors in a hierarchy for col_group.

self.params_init

Initial parameters for the optimization

self.params_opt

Optimal parameters found through the fitting process

self.bootstrap

Bootstrap class that creates uncertainty. After running the BinneyRun.make_uncertainty() method you can access the parameter estimates across bootstrap replicates in self.bootstrap.parameters.

fit()[source]

Fit the binney model after initialization. Optimal parameters are stored in BinneyRun.params_opt.

Return type

None

make_uncertainty(n_boots=100)[source]

Runs bootstrap re-sampling to get uncertainty in the parameters. Access parameters in self.bootstrap.parameters.

Parameters

n_boots (int) – Number of bootstrap replicates

predict(new_df=None)[source]

Make predictions based on optimal parameter values.

Parameters

new_df (Optional[DataFrame]) – A pandas data frame to make predictions for. Must have all of the covariates used in the fitting.

Returns

Return type

A numpy array of predictions for the data frame.

predict_draws(df)[source]

Make draws based on the bootstrap parameters.

Parameters

df (DataFrame) – A pandas data frame to make predictions for. Must have all of the covariates used in the fitting.

Returns

Return type

A stacked numpy array of draws for each row in the df.