robustipy package

Submodules

robustipy.figures module

robustipy.models module

robustipy.prototypes module

class robustipy.prototypes.BaseRobust(*, y: list[str], x: list[str], data: DataFrame, model_name: str = 'BaseRobust')[source]

Bases: Protomodel

Base class for robust model estimation, including OLS and logistic.

Provides shared validation, bootstrapping, cross-validation, and composite outcome support.

y

Dependent variable column names.

Type:

list of str

x

Independent variable column names.

Type:

list of str

data

Input dataset containing variables in y, x, controls.

Type:

pandas.DataFrame

model_name

Custom label for the model run.

Type:

str

results

Fitted result object populated after fit().

Type:

object

parameters

Stores initialization parameters and any derived settings.

Type:

dict

fit(*, controls: List[str], group: str | None = None, draws: int = 500, kfold: int = 5, oos_metric: str = 'r-squared', n_cpu: int | None = None, seed: int | None = None) None[source]

Abstract fit method; must be overridden by subclasses.

Parameters:
  • controls (List[str]) – Optional control variable names to include in specifications.

  • group (str, optional) – Column name for grouping (fixed effects) variable.

  • draws (int, default=500) – Number of bootstrap draws.

  • kfold (int, default=5) – Number of cross-validation folds.

  • oos_metric (str, default='r-squared') – Out-of-sample metric (‘r-squared’, ‘rmse’, etc.).

  • n_cpu (int, optional) – Number of CPU cores for parallel computation.

  • seed (int, optional) – Random seed for reproducibility.

Raises:

NotImplementedError – Always, since this method must be implemented by subclasses.

get_results()[source]
multiple_y() None[source]
Build the lists
  • self.y_composites – pandas Series, one per composite Y

  • self.y_specs – tuple[str], names that form that composite

If self.composite_sample is a positive int, draw that many random non-empty subsets of the raw Y columns before we create any Series. Otherwise enumerate all non-empty subsets (original behaviour).

exception robustipy.prototypes.MissingValueWarning[source]

Bases: UserWarning

class robustipy.prototypes.Protomodel[source]

Bases: ABC

Prototype class, intended to be used in inheritance, not to be called.

abstract fit()[source]
class robustipy.prototypes.Protoresult[source]

Bases: ABC

Prototype class for results object, intended to be used in inheritance, not to be called.

abstract plot()[source]
abstract summary()[source]

robustipy.utils module

class robustipy.utils.IntegerRangeValidator(min_value, max_value)[source]

Bases: object

Validator that checks if an input value is an integer within a specified range.

Parameters:
  • min_value (int) – The minimum allowed integer value (inclusive).

  • max_value (int) – The maximum allowed integer value (inclusive).

Raises:

ValidationError – If the input is not an integer or is outside the specified range.

Usage:

validator = IntegerRangeValidator(1, 10) validator(_, current_value) # Returns True if valid, raises ValidationError otherwise.

exception robustipy.utils.ValidationError(*args, reason: str = '')[source]

Bases: Exception

Fallback so IntegerRangeValidator can raise a typed error safely.

robustipy.utils.all_subsets(ss)[source]

Generate all subsets of a given iterable.

Parameters:

ss (iterable) – Input iterable.

Returns:

A chain object containing all subsets of the input iterable.

Return type:

itertools.chain

robustipy.utils.calculate_imv_score(y_true, y_enhanced)[source]

Calculates the IMV (Information Metric Value) score.

Parameters: - y_true: array-like of binary true labels (0 or 1) - y_enhanced: array-like of predicted probabilities from an enhanced model

Returns: - IMV score: relative improvement of enhanced model over the null model

robustipy.utils.concat_results(objs: List[OLSResult], de_dupe=True) OLSResult[source]

Core routine: take a list of OLSResult objects and stack them into one OLSResult. All per‐spec fields (estimates, p_values, specs_names, etc.) are concatenated, and any exact duplicates in (y_name, x_name, spec) are dropped in lockstep. accepts: de_dupe: bool, default True

If True, drop exact duplicates in (y_name, x_name, spec) triplets.

This function assumes each element of objs is already an OLSResult (not the wrapper class). We defer the import of OLSResult until inside the function to avoid circular‐import errors.

robustipy.utils.decorator_timer(func: callable) callable[source]

Decorator to time function execution.

Parameters:

func (callable) – Function to wrap.

Returns:

Wrapped function returning (result, elapsed_seconds).

Return type:

callable

robustipy.utils.get_colormap_colors(num_colors: int = 3, colormap: str | Colormap = 'viridis') List[str][source]

Return (texttt{num_colors}) evenly spaced colors from a Matplotlib colormap.

Parameters:
  • num_colors (int, optional) –

    The number of colors to return. Must satisfy [

    1 ;le; texttt{num_colors},

    ] Defaults to 3.

  • colormap (str or matplotlib.colors.Colormap, optional) – Colormap name or object to sample from. Defaults to ‘viridis’.

Returns:

A list of hexadecimal color strings of length exactly (texttt{num_colors}).

Return type:

List[str]

Raises:
  • TypeError – If num_colors is not an integer.

  • ValueError – If num_colors < 1.

robustipy.utils.get_colors(specs: List[List[str]], color_set_name: str | None = 'Set1') List[Tuple[float, float, float, float]][source]

Generate a palette of colors for a list of specifications using a categorical colormap.

Parameters:
  • specs (list of list of str) – Each inner list represents one specification (set of variable names).

  • color_set_name (str, optional) – Name of a Matplotlib qualitative colormap (default ‘Set1’).

Returns:

A list of RGBA tuples, one per specification.

Return type:

List[Tuple[float, float, float, float]]

Raises:

ValueError – If specs is not a list of lists.

robustipy.utils.get_selection_key(specs: List[List[str]]) List[frozenset][source]

Convert list of spec lists into list of frozensets.

Parameters:

specs (list of list of str) – Each inner list is one specification.

Returns:

Immutable keys for each specification.

Return type:

list of frozenset

Raises:

ValueError – If specs is not list of lists.

robustipy.utils.group_demean(x: DataFrame, group: str | None = None) DataFrame[source]

Demean the input data within groups.

Parameters:
  • x (pd.DataFrame) – Input DataFrame.

  • group (str, optional) – Column name for grouping. Default is None.

Returns:

pd.DataFrame

Return type:

Demeaned DataFrame.

robustipy.utils.is_interactive() bool[source]
Return True if either:
  1. we are inside a Jupyter notebook/lab, OR

  2. we are running from a real terminal (both stdin and stdout are TTYs).

robustipy.utils.join_sig_test(*, results_target, results_shuffled, sig_level, positive)[source]

Calculate joint significance test for the entire specification curve.

Parameters:
  • results_target (OLSResult) – Results object from the original analysis.

  • results_shuffled (OLSResult) – Results object from shuffled analysis.

  • sig_level (float) – Significance level threshold for specifications.

  • positive (bool) – Direction of the joint significance test.

Returns:

Estimated p-value for the joint significance test.

Return type:

float

robustipy.utils.logistic_regression_sm(y, x) dict[source]

Perform logistic regression based on statsmodels.Logit.

Parameters:
  • y (array-like) – Dependent variable values.

  • x (array-like) – Independent variable values. The matrix should be shaped as (number of observations, number of independent variables).

Returns:

dict – AIC, BIC, and HQIC.

Return type:

Dictionary containing regression results, including coefficients, p-values, log-likelihood,

robustipy.utils.logistic_regression_sm_stripped(y, x) dict[source]

Perform logistic regression using statsmodels with stripped output.

Parameters:
  • y (array-like) – Dependent variable values.

  • x (array-like) –

    Independent variable values. The matrix should be shaped as

    (number of observations, number of independent variables).

Returns:

dict – p-values (‘p’) for each independent variable.

Return type:

A dictionary containing regression coefficients (‘b’) and corresponding

robustipy.utils.make_aic(ll: float, k: int) float[source]
robustipy.utils.make_bic(ll: float, n: int, k: int) float[source]
robustipy.utils.make_hqic(ll: float, n: int, k: int) float[source]
robustipy.utils.make_inquiry(model_name, y, data, draws, kfolds, oos_metric, n_cpu, seed)[source]

Prompt the user for missing inputs if in an interactive environment; otherwise, silently fall back to default values.

Returns:

(draws, kfolds, oos_metric, n_cpu, seed)

Return type:

tuple[int, int, str, int, int]

robustipy.utils.mcfadden_r2(y_true, y_prob, insample_mean)[source]

Compute McFadden’s pseudo R-squared for logistic regression.

robustipy.utils.prepare_asc(asc_path: str) Tuple[str, List[str], List[str], str, DataFrame][source]

Load and preprocess the ASC example dataset for illustration.

Parameters:

asc_path (str) – Path to the Stata (.dta) file containing ASC data.

Returns:

y (str): Dependent variable name. x (List[str]): Continuous predictor names. c (List[str]): Control variable names. group (str): Grouping variable name (‘pidp’). ASC_df (pd.DataFrame): Cleaned DataFrame.

Return type:

tuple

robustipy.utils.prepare_union(path_to_union: str) Tuple[str, List[str], str, DataFrame][source]

Load and preprocess the classic union dataset for example analyses.

Parameters:

path_to_union (str) – Path to the Stata (.dta) file containing union data.

Returns:

y (str): Dependent variable name (‘log_wage’). c (List[str]): Control variable names. x (str): Treatment variable name (‘union’). final_data (pd.DataFrame): Cleaned DataFrame ready for modeling.

Return type:

tuple

Raises:

FileNotFoundError – If the specified file does not exist.

robustipy.utils.pseudo_r2(y_true: Sequence, y_pred: Sequence, mean_y_train: float) float[source]

Compute the pseudo-R² (1 - MSE_model / MSE_null), coercing inputs to floats.

Parameters:
  • y_pred (Sequence) – Model predictions (can be list/array of floats or strings convertible to float).

  • y_true (Sequence) – True target values (same length as y_pred).

  • mean_y_train (float) – The baseline prediction (e.g. the training‐set mean of y).

Returns:

Pseudo‐R² = 1 - (MSE_model / MSE_null).

Return type:

float

Raises:

ValueError – If lengths differ, if mean‐square‐null is zero, or if conversion to float fails. Or if MSE_null is zero (division by zero for pseudo-R²).

robustipy.utils.rescale(variable)[source]

Rescales the input variable to have zero mean and unit standard deviation.

Parameters:

variable (array-like) – Input data to be rescaled. Can be a list, NumPy array, or similar structure.

Returns:

out – The rescaled array with mean 0 and standard deviation 1 along the specified axis. NaN values are ignored in the computation of mean and standard deviation.

Return type:

ndarray

Notes

This function uses np.nanmean and np.nanstd to ignore NaN values during scaling.

robustipy.utils.reservoir_sampling(generator: Iterable, k: int) List[source]

Uniformly sample k items from a streaming generator (reservoir sampling).

Parameters:
  • generator (Iterable) – An iterator or generator yielding items.

  • k (int) – Number of samples to retain.

Returns:

A list of k sampled items.

Return type:

List

robustipy.utils.sample_y_masks(n_y: int, n_masks: int, seed: int | None = None) List[int][source]

Uniformly sample n_masks bit-masks from the non-empty power-set of n_y items without enumerating the 2^n_y possibilities.

Returns:

which outcomes enter the composite.

Return type:

list[int] each mask is an int whose binary representation tells

robustipy.utils.sample_z_masks(n_z: int, n_masks: int, seed: int | None = None) List[int][source]

Uniformly sample n_masks bit-masks from the power-set of n_z items without enumerating the 2^n_z possibilities.

Returns:

which specifications enter the composite.

Return type:

list[int] each mask is an int whose binary representation tells

robustipy.utils.simple_ols(y, x) dict[source]

Perform simple ordinary least squares regression.

Parameters:
  • y (array-like) – Dependent variable.

  • x (array-like) – Independent variables.

Returns:

dict – AIC, BIC, and HQIC.

Return type:

Dictionary containing regression results, including coefficients, p-values, log-likelihood,

robustipy.utils.space_size(iterable) int[source]

Calculate the size of the power set of the given iterable.

Parameters:

iterable (iterable) – Input iterable.

Returns:

Size of the power set of the input iterable.

Return type:

int

robustipy.utils.stripped_ols(y, x, add_const: bool = True) dict[source]

Perform Ordinary Least Squares (OLS) regression analysis with stripped output.

Parameters:
  • y (array-like) – Dependent variable values.

  • x (array-like) –

    Independent variable values. The matrix should be shaped as

    (number of observations, number of independent variables).

  • add_const (bool, default True) – Whether to add a constant column for the intercept term. Set to False when using group-demeaned data (fixed effects), where the intercept is already absorbed by the demeaning.

Returns:

dict – regression coefficients (‘b’) and corresponding p-values (‘p’) for each independent variable.

Return type:

dictionary

Raises:

ValueError – If inputs x or y are empty.:

Notes

  • Missing values in x or y are not handled, and the function may produce unexpected results if there are missing values in the input data.

  • The function internally adds a constant column to the independent variables matrix x to represent the intercept term in the regression equation, unless add_const=False is specified.

  • Constant terms are added to x by default (add_const=True).

Module contents

robustipy package initialization.

This module intentionally avoids global warning-hook side effects at import time. If compact robustipy-only warning formatting is desired, call enable_compact_warnings() explicitly.

robustipy.disable_compact_warnings() None[source]

Restore the original warning formatting handler.

robustipy.enable_compact_warnings() None[source]

Enable compact formatting for warnings emitted from robustipy modules.