robustipy package

Submodules

robustipy.figures module

robustipy.models module

robustipy.prototypes module

class robustipy.prototypes.BaseRobust(*, y: list[str], x: list[str], data: DataFrame, model_name: str = 'BaseRobust')[source]

Bases: Protomodel

Base class for robust model estimation, including OLS and logistic.

Provides shared validation, bootstrapping, cross-validation, and composite outcome support.

y

Dependent variable column names.

Type:: list of str

x

Independent variable column names.

Type:: list of str

data

Input dataset containing variables in y, x, controls.

Type:: pandas.DataFrame

model_name

Custom label for the model run.

Type:: str

results

Fitted result object populated after fit().

Type:: object

parameters

Stores initialization parameters and any derived settings.

Type:: dict

fit(*, controls: List[str], group: str | None = None, draws: int = 500, kfold: int = 5, oos_metric: str = 'r-squared', n_cpu: int | None = None, seed: int | None = None) → None[source]

Abstract fit method; must be overridden by subclasses.

Parameters:

controls (List[str]) – Optional control variable names to include in specifications.
group (str, optional) – Column name for grouping (fixed effects) variable.
draws (int, default=500) – Number of bootstrap draws.
kfold (int, default=5) – Number of cross-validation folds.
oos_metric (str, default='r-squared') – Out-of-sample metric (‘r-squared’, ‘rmse’, etc.).
n_cpu (int, optional) – Number of CPU cores for parallel computation.
seed (int, optional) – Random seed for reproducibility.

Raises:

NotImplementedError – Always, since this method must be implemented by subclasses.

get_results()[source]

multiple_y() → None[source]

Build the lists

self.y_composites – pandas Series, one per composite Y
self.y_specs – tuple[str], names that form that composite

If self.composite_sample is a positive int, draw that many random non-empty subsets of the raw Y columns before we create any Series. Otherwise enumerate all non-empty subsets (original behaviour).

exception robustipy.prototypes.MissingValueWarning[source]: Bases: UserWarning

class robustipy.prototypes.Protomodel[source]

Bases: ABC

Prototype class, intended to be used in inheritance, not to be called.

abstract fit()[source]

class robustipy.prototypes.Protoresult[source]

Bases: ABC

Prototype class for results object, intended to be used in inheritance, not to be called.

abstract plot()[source]

abstract summary()[source]

robustipy.utils module

class robustipy.utils.IntegerRangeValidator(min_value, max_value)[source]

Bases: object

Validator that checks if an input value is an integer within a specified range.

Parameters:

min_value (int) – The minimum allowed integer value (inclusive).
max_value (int) – The maximum allowed integer value (inclusive).

Raises:

ValidationError – If the input is not an integer or is outside the specified range.

Usage:: validator = IntegerRangeValidator(1, 10) validator(_, current_value) # Returns True if valid, raises ValidationError otherwise.

exception robustipy.utils.ValidationError(*args, reason: str = '')[source]

Bases: Exception

Fallback so IntegerRangeValidator can raise a typed error safely.

robustipy.utils.all_subsets(ss)[source]

Generate all subsets of a given iterable.

Parameters:: ss (iterable) – Input iterable.
Returns:: A chain object containing all subsets of the input iterable.
Return type:: itertools.chain

robustipy.utils.calculate_imv_score(y_true, y_enhanced)[source]

Calculates the IMV (Information Metric Value) score.

Parameters: - y_true: array-like of binary true labels (0 or 1) - y_enhanced: array-like of predicted probabilities from an enhanced model

Returns: - IMV score: relative improvement of enhanced model over the null model

robustipy.utils.concat_results(objs: List[OLSResult], de_dupe=True) → OLSResult[source]

Core routine: take a list of OLSResult objects and stack them into one OLSResult. All per‐spec fields (estimates, p_values, specs_names, etc.) are concatenated, and any exact duplicates in (y_name, x_name, spec) are dropped in lockstep. accepts: de_dupe: bool, default True

If True, drop exact duplicates in (y_name, x_name, spec) triplets.

This function assumes each element of objs is already an OLSResult (not the wrapper class). We defer the import of OLSResult until inside the function to avoid circular‐import errors.

robustipy.utils.decorator_timer(func: callable) → callable[source]

Decorator to time function execution.

Parameters:: func (callable) – Function to wrap.
Returns:: Wrapped function returning (result, elapsed_seconds).
Return type:: callable

robustipy.utils.get_colormap_colors(num_colors: int = 3, colormap: str | Colormap = 'viridis') → List[str][source]

Return (texttt{num_colors}) evenly spaced colors from a Matplotlib colormap.

Parameters:

num_colors (int, optional) –
The number of colors to return. Must satisfy [

1 ;le; texttt{num_colors},

] Defaults to 3.
colormap (str or matplotlib.colors.Colormap, optional) – Colormap name or object to sample from. Defaults to ‘viridis’.

Returns:

A list of hexadecimal color strings of length exactly (texttt{num_colors}).

Return type:

List[str]

Raises:

TypeError – If num_colors is not an integer.
ValueError – If num_colors < 1.

robustipy.utils.get_colors(specs: List[List[str]], color_set_name: str | None = 'Set1') → List[Tuple[float, float, float, float]][source]

Generate a palette of colors for a list of specifications using a categorical colormap.

Parameters:

specs (list of list of str) – Each inner list represents one specification (set of variable names).
color_set_name (str, optional) – Name of a Matplotlib qualitative colormap (default ‘Set1’).

Returns:

A list of RGBA tuples, one per specification.

Return type:

List[Tuple[float, float, float, float]]

Raises:

ValueError – If specs is not a list of lists.

robustipy.utils.get_selection_key(specs: List[List[str]]) → List[frozenset][source]

Convert list of spec lists into list of frozensets.

Parameters:: specs (list of list of str) – Each inner list is one specification.
Returns:: Immutable keys for each specification.
Return type:: list of frozenset
Raises:: ValueError – If specs is not list of lists.

robustipy.utils.group_demean(x: DataFrame, group: str | None = None) → DataFrame[source]

Demean the input data within groups.

Parameters:

x (pd.DataFrame) – Input DataFrame.
group (str, optional) – Column name for grouping. Default is None.

Returns:

pd.DataFrame

Return type:

Demeaned DataFrame.

robustipy.utils.is_interactive() → bool[source]

Return True if either:

we are inside a Jupyter notebook/lab, OR
we are running from a real terminal (both stdin and stdout are TTYs).

robustipy.utils.join_sig_test(*, results_target, results_shuffled, sig_level, positive)[source]

Calculate joint significance test for the entire specification curve.

Parameters:

results_target (OLSResult) – Results object from the original analysis.
results_shuffled (OLSResult) – Results object from shuffled analysis.
sig_level (float) – Significance level threshold for specifications.
positive (bool) – Direction of the joint significance test.

Returns:

Estimated p-value for the joint significance test.

Return type:

float

robustipy.utils.logistic_regression_sm(y, x) → dict[source]

Perform logistic regression based on statsmodels.Logit.

Parameters:

y (array-like) – Dependent variable values.
x (array-like) – Independent variable values. The matrix should be shaped as (number of observations, number of independent variables).

Returns:

dict – AIC, BIC, and HQIC.

Return type:

Dictionary containing regression results, including coefficients, p-values, log-likelihood,

robustipy.utils.logistic_regression_sm_stripped(y, x) → dict[source]

Perform logistic regression using statsmodels with stripped output.

Parameters:

y (array-like) – Dependent variable values.
x (array-like) –

Independent variable values. The matrix should be shaped as
(number of observations, number of independent variables).

Returns:

dict – p-values (‘p’) for each independent variable.

Return type:

A dictionary containing regression coefficients (‘b’) and corresponding

robustipy.utils.make_aic(ll: float, k: int) → float[source]

robustipy.utils.make_bic(ll: float, n: int, k: int) → float[source]

robustipy.utils.make_hqic(ll: float, n: int, k: int) → float[source]

robustipy.utils.make_inquiry(model_name, y, data, draws, kfolds, oos_metric, n_cpu, seed)[source]

Prompt the user for missing inputs if in an interactive environment; otherwise, silently fall back to default values.

Returns:: (draws, kfolds, oos_metric, n_cpu, seed)
Return type:: tuple[int, int, str, int, int]

robustipy.utils.mcfadden_r2(y_true, y_prob, insample_mean)[source]: Compute McFadden’s pseudo R-squared for logistic regression.

robustipy.utils.prepare_asc(asc_path: str) → Tuple[str, List[str], List[str], str, DataFrame][source]

Load and preprocess the ASC example dataset for illustration.

Parameters:: asc_path (str) – Path to the Stata (.dta) file containing ASC data.
Returns:: y (str): Dependent variable name. x (List[str]): Continuous predictor names. c (List[str]): Control variable names. group (str): Grouping variable name (‘pidp’). ASC_df (pd.DataFrame): Cleaned DataFrame.
Return type:: tuple

robustipy.utils.prepare_union(path_to_union: str) → Tuple[str, List[str], str, DataFrame][source]

Load and preprocess the classic union dataset for example analyses.

Parameters:: path_to_union (str) – Path to the Stata (.dta) file containing union data.
Returns:: y (str): Dependent variable name (‘log_wage’). c (List[str]): Control variable names. x (str): Treatment variable name (‘union’). final_data (pd.DataFrame): Cleaned DataFrame ready for modeling.
Return type:: tuple
Raises:: FileNotFoundError – If the specified file does not exist.

robustipy.utils.pseudo_r2(y_true: Sequence, y_pred: Sequence, mean_y_train: float) → float[source]

Compute the pseudo-R² (1 - MSE_model / MSE_null), coercing inputs to floats.

Parameters:

y_pred (Sequence) – Model predictions (can be list/array of floats or strings convertible to float).
y_true (Sequence) – True target values (same length as y_pred).
mean_y_train (float) – The baseline prediction (e.g. the training‐set mean of y).

Returns:

Pseudo‐R² = 1 - (MSE_model / MSE_null).

Return type:

float

Raises:

ValueError – If lengths differ, if mean‐square‐null is zero, or if conversion to float fails. Or if MSE_null is zero (division by zero for pseudo-R²).

robustipy.utils.rescale(variable)[source]

Rescales the input variable to have zero mean and unit standard deviation.

Parameters:: variable (array-like) – Input data to be rescaled. Can be a list, NumPy array, or similar structure.
Returns:: out – The rescaled array with mean 0 and standard deviation 1 along the specified axis. NaN values are ignored in the computation of mean and standard deviation.
Return type:: ndarray

Notes

This function uses np.nanmean and np.nanstd to ignore NaN values during scaling.

robustipy.utils.reservoir_sampling(generator: Iterable, k: int) → List[source]

Uniformly sample k items from a streaming generator (reservoir sampling).

Parameters:

generator (Iterable) – An iterator or generator yielding items.
k (int) – Number of samples to retain.

Returns:

A list of k sampled items.

Return type:

List

robustipy.utils.sample_y_masks(n_y: int, n_masks: int, seed: int | None = None) → List[int][source]

Uniformly sample n_masks bit-masks from the non-empty power-set of n_y items without enumerating the 2^n_y possibilities.

Returns:: which outcomes enter the composite.
Return type:: list[int] each mask is an int whose binary representation tells

robustipy.utils.sample_z_masks(n_z: int, n_masks: int, seed: int | None = None) → List[int][source]

Uniformly sample n_masks bit-masks from the power-set of n_z items without enumerating the 2^n_z possibilities.

Returns:: which specifications enter the composite.
Return type:: list[int] each mask is an int whose binary representation tells

robustipy.utils.simple_ols(y, x) → dict[source]

Perform simple ordinary least squares regression.

Parameters:

y (array-like) – Dependent variable.
x (array-like) – Independent variables.

Returns:

dict – AIC, BIC, and HQIC.

Return type:

Dictionary containing regression results, including coefficients, p-values, log-likelihood,

robustipy.utils.space_size(iterable) → int[source]

Calculate the size of the power set of the given iterable.

Parameters:: iterable (iterable) – Input iterable.
Returns:: Size of the power set of the input iterable.
Return type:: int

robustipy.utils.stripped_ols(y, x, add_const: bool = True) → dict[source]

Perform Ordinary Least Squares (OLS) regression analysis with stripped output.

Parameters:

y (array-like) – Dependent variable values.
x (array-like) –

Independent variable values. The matrix should be shaped as
(number of observations, number of independent variables).
add_const (bool, default True) – Whether to add a constant column for the intercept term. Set to False when using group-demeaned data (fixed effects), where the intercept is already absorbed by the demeaning.

Returns:

dict – regression coefficients (‘b’) and corresponding p-values (‘p’) for each independent variable.

Return type:

dictionary

Raises:

ValueError – If inputs x or y are empty.:

Notes

Missing values in x or y are not handled, and the function may produce unexpected results if there are missing values in the input data.
The function internally adds a constant column to the independent variables matrix x to represent the intercept term in the regression equation, unless add_const=False is specified.
Constant terms are added to x by default (add_const=True).

Module contents

robustipy package initialization.

This module intentionally avoids global warning-hook side effects at import time. If compact robustipy-only warning formatting is desired, call enable_compact_warnings() explicitly.

robustipy.disable_compact_warnings() → None[source]: Restore the original warning formatting handler.

robustipy.enable_compact_warnings() → None[source]: Enable compact formatting for warnings emitted from robustipy modules.