robustipy package
Submodules
robustipy.figures module
robustipy.models module
robustipy.prototypes module
- class robustipy.prototypes.BaseRobust(*, y: list[str], x: list[str], data: DataFrame, model_name: str = 'BaseRobust')[source]
Bases:
ProtomodelBase class for robust model estimation, including OLS and logistic.
Provides shared validation, bootstrapping, cross-validation, and composite outcome support.
- y
Dependent variable column names.
- Type:
list of str
- x
Independent variable column names.
- Type:
list of str
- data
Input dataset containing variables in y, x, controls.
- Type:
pandas.DataFrame
- model_name
Custom label for the model run.
- Type:
str
- results
Fitted result object populated after fit().
- Type:
object
- parameters
Stores initialization parameters and any derived settings.
- Type:
dict
- fit(*, controls: List[str], group: str | None = None, draws: int = 500, kfold: int = 5, oos_metric: str = 'r-squared', n_cpu: int | None = None, seed: int | None = None) None[source]
Abstract fit method; must be overridden by subclasses.
- Parameters:
controls (List[str]) – Optional control variable names to include in specifications.
group (str, optional) – Column name for grouping (fixed effects) variable.
draws (int, default=500) – Number of bootstrap draws.
kfold (int, default=5) – Number of cross-validation folds.
oos_metric (str, default='r-squared') – Out-of-sample metric (‘r-squared’, ‘rmse’, etc.).
n_cpu (int, optional) – Number of CPU cores for parallel computation.
seed (int, optional) – Random seed for reproducibility.
- Raises:
NotImplementedError – Always, since this method must be implemented by subclasses.
- multiple_y() None[source]
- Build the lists
self.y_composites – pandas Series, one per composite Y
self.y_specs – tuple[str], names that form that composite
If self.composite_sample is a positive int, draw that many random non-empty subsets of the raw Y columns before we create any Series. Otherwise enumerate all non-empty subsets (original behaviour).
robustipy.utils module
- class robustipy.utils.IntegerRangeValidator(min_value, max_value)[source]
Bases:
objectValidator that checks if an input value is an integer within a specified range.
- Parameters:
min_value (int) – The minimum allowed integer value (inclusive).
max_value (int) – The maximum allowed integer value (inclusive).
- Raises:
ValidationError – If the input is not an integer or is outside the specified range.
- Usage:
validator = IntegerRangeValidator(1, 10) validator(_, current_value) # Returns True if valid, raises ValidationError otherwise.
- exception robustipy.utils.ValidationError(*args, reason: str = '')[source]
Bases:
ExceptionFallback so IntegerRangeValidator can raise a typed error safely.
- robustipy.utils.all_subsets(ss)[source]
Generate all subsets of a given iterable.
- Parameters:
ss (iterable) – Input iterable.
- Returns:
A chain object containing all subsets of the input iterable.
- Return type:
itertools.chain
- robustipy.utils.calculate_imv_score(y_true, y_enhanced)[source]
Calculates the IMV (Information Metric Value) score.
Parameters: - y_true: array-like of binary true labels (0 or 1) - y_enhanced: array-like of predicted probabilities from an enhanced model
Returns: - IMV score: relative improvement of enhanced model over the null model
- robustipy.utils.concat_results(objs: List[OLSResult], de_dupe=True) OLSResult[source]
Core routine: take a list of OLSResult objects and stack them into one OLSResult. All per‐spec fields (estimates, p_values, specs_names, etc.) are concatenated, and any exact duplicates in (y_name, x_name, spec) are dropped in lockstep. accepts: de_dupe: bool, default True
If True, drop exact duplicates in (y_name, x_name, spec) triplets.
This function assumes each element of objs is already an OLSResult (not the wrapper class). We defer the import of OLSResult until inside the function to avoid circular‐import errors.
- robustipy.utils.decorator_timer(func: callable) callable[source]
Decorator to time function execution.
- Parameters:
func (callable) – Function to wrap.
- Returns:
Wrapped function returning (result, elapsed_seconds).
- Return type:
callable
- robustipy.utils.get_colormap_colors(num_colors: int = 3, colormap: str | Colormap = 'viridis') List[str][source]
Return (texttt{num_colors}) evenly spaced colors from a Matplotlib colormap.
- Parameters:
num_colors (int, optional) –
The number of colors to return. Must satisfy [
1 ;le; texttt{num_colors},
] Defaults to 3.
colormap (str or matplotlib.colors.Colormap, optional) – Colormap name or object to sample from. Defaults to ‘viridis’.
- Returns:
A list of hexadecimal color strings of length exactly (texttt{num_colors}).
- Return type:
List[str]
- Raises:
TypeError – If num_colors is not an integer.
ValueError – If num_colors < 1.
- robustipy.utils.get_colors(specs: List[List[str]], color_set_name: str | None = 'Set1') List[Tuple[float, float, float, float]][source]
Generate a palette of colors for a list of specifications using a categorical colormap.
- Parameters:
specs (list of list of str) – Each inner list represents one specification (set of variable names).
color_set_name (str, optional) – Name of a Matplotlib qualitative colormap (default ‘Set1’).
- Returns:
A list of RGBA tuples, one per specification.
- Return type:
List[Tuple[float, float, float, float]]
- Raises:
ValueError – If specs is not a list of lists.
- robustipy.utils.get_selection_key(specs: List[List[str]]) List[frozenset][source]
Convert list of spec lists into list of frozensets.
- Parameters:
specs (list of list of str) – Each inner list is one specification.
- Returns:
Immutable keys for each specification.
- Return type:
list of frozenset
- Raises:
ValueError – If specs is not list of lists.
- robustipy.utils.group_demean(x: DataFrame, group: str | None = None) DataFrame[source]
Demean the input data within groups.
- Parameters:
x (pd.DataFrame) – Input DataFrame.
group (str, optional) – Column name for grouping. Default is None.
- Returns:
pd.DataFrame
- Return type:
Demeaned DataFrame.
- robustipy.utils.is_interactive() bool[source]
- Return True if either:
we are inside a Jupyter notebook/lab, OR
we are running from a real terminal (both stdin and stdout are TTYs).
- robustipy.utils.join_sig_test(*, results_target, results_shuffled, sig_level, positive)[source]
Calculate joint significance test for the entire specification curve.
- Parameters:
results_target (OLSResult) – Results object from the original analysis.
results_shuffled (OLSResult) – Results object from shuffled analysis.
sig_level (float) – Significance level threshold for specifications.
positive (bool) – Direction of the joint significance test.
- Returns:
Estimated p-value for the joint significance test.
- Return type:
float
- robustipy.utils.logistic_regression_sm(y, x) dict[source]
Perform logistic regression based on statsmodels.Logit.
- Parameters:
y (array-like) – Dependent variable values.
x (array-like) – Independent variable values. The matrix should be shaped as (number of observations, number of independent variables).
- Returns:
dict – AIC, BIC, and HQIC.
- Return type:
Dictionary containing regression results, including coefficients, p-values, log-likelihood,
- robustipy.utils.logistic_regression_sm_stripped(y, x) dict[source]
Perform logistic regression using statsmodels with stripped output.
- Parameters:
y (array-like) – Dependent variable values.
x (array-like) –
- Independent variable values. The matrix should be shaped as
(number of observations, number of independent variables).
- Returns:
dict – p-values (‘p’) for each independent variable.
- Return type:
A dictionary containing regression coefficients (‘b’) and corresponding
- robustipy.utils.make_inquiry(model_name, y, data, draws, kfolds, oos_metric, n_cpu, seed)[source]
Prompt the user for missing inputs if in an interactive environment; otherwise, silently fall back to default values.
- Returns:
(draws, kfolds, oos_metric, n_cpu, seed)
- Return type:
tuple[int, int, str, int, int]
- robustipy.utils.mcfadden_r2(y_true, y_prob, insample_mean)[source]
Compute McFadden’s pseudo R-squared for logistic regression.
- robustipy.utils.prepare_asc(asc_path: str) Tuple[str, List[str], List[str], str, DataFrame][source]
Load and preprocess the ASC example dataset for illustration.
- Parameters:
asc_path (str) – Path to the Stata (.dta) file containing ASC data.
- Returns:
y (str): Dependent variable name. x (List[str]): Continuous predictor names. c (List[str]): Control variable names. group (str): Grouping variable name (‘pidp’). ASC_df (pd.DataFrame): Cleaned DataFrame.
- Return type:
tuple
- robustipy.utils.prepare_union(path_to_union: str) Tuple[str, List[str], str, DataFrame][source]
Load and preprocess the classic union dataset for example analyses.
- Parameters:
path_to_union (str) – Path to the Stata (.dta) file containing union data.
- Returns:
y (str): Dependent variable name (‘log_wage’). c (List[str]): Control variable names. x (str): Treatment variable name (‘union’). final_data (pd.DataFrame): Cleaned DataFrame ready for modeling.
- Return type:
tuple
- Raises:
FileNotFoundError – If the specified file does not exist.
- robustipy.utils.pseudo_r2(y_true: Sequence, y_pred: Sequence, mean_y_train: float) float[source]
Compute the pseudo-R² (1 - MSE_model / MSE_null), coercing inputs to floats.
- Parameters:
y_pred (Sequence) – Model predictions (can be list/array of floats or strings convertible to float).
y_true (Sequence) – True target values (same length as y_pred).
mean_y_train (float) – The baseline prediction (e.g. the training‐set mean of y).
- Returns:
Pseudo‐R² = 1 - (MSE_model / MSE_null).
- Return type:
float
- Raises:
ValueError – If lengths differ, if mean‐square‐null is zero, or if conversion to float fails. Or if MSE_null is zero (division by zero for pseudo-R²).
- robustipy.utils.rescale(variable)[source]
Rescales the input variable to have zero mean and unit standard deviation.
- Parameters:
variable (array-like) – Input data to be rescaled. Can be a list, NumPy array, or similar structure.
- Returns:
out – The rescaled array with mean 0 and standard deviation 1 along the specified axis. NaN values are ignored in the computation of mean and standard deviation.
- Return type:
ndarray
Notes
This function uses np.nanmean and np.nanstd to ignore NaN values during scaling.
- robustipy.utils.reservoir_sampling(generator: Iterable, k: int) List[source]
Uniformly sample k items from a streaming generator (reservoir sampling).
- Parameters:
generator (Iterable) – An iterator or generator yielding items.
k (int) – Number of samples to retain.
- Returns:
A list of k sampled items.
- Return type:
List
- robustipy.utils.sample_y_masks(n_y: int, n_masks: int, seed: int | None = None) List[int][source]
Uniformly sample n_masks bit-masks from the non-empty power-set of n_y items without enumerating the 2^n_y possibilities.
- Returns:
which outcomes enter the composite.
- Return type:
list[int] each mask is an int whose binary representation tells
- robustipy.utils.sample_z_masks(n_z: int, n_masks: int, seed: int | None = None) List[int][source]
Uniformly sample n_masks bit-masks from the power-set of n_z items without enumerating the 2^n_z possibilities.
- Returns:
which specifications enter the composite.
- Return type:
list[int] each mask is an int whose binary representation tells
- robustipy.utils.simple_ols(y, x) dict[source]
Perform simple ordinary least squares regression.
- Parameters:
y (array-like) – Dependent variable.
x (array-like) – Independent variables.
- Returns:
dict – AIC, BIC, and HQIC.
- Return type:
Dictionary containing regression results, including coefficients, p-values, log-likelihood,
- robustipy.utils.space_size(iterable) int[source]
Calculate the size of the power set of the given iterable.
- Parameters:
iterable (iterable) – Input iterable.
- Returns:
Size of the power set of the input iterable.
- Return type:
int
- robustipy.utils.stripped_ols(y, x, add_const: bool = True) dict[source]
Perform Ordinary Least Squares (OLS) regression analysis with stripped output.
- Parameters:
y (array-like) – Dependent variable values.
x (array-like) –
- Independent variable values. The matrix should be shaped as
(number of observations, number of independent variables).
add_const (bool, default True) – Whether to add a constant column for the intercept term. Set to False when using group-demeaned data (fixed effects), where the intercept is already absorbed by the demeaning.
- Returns:
dict – regression coefficients (‘b’) and corresponding p-values (‘p’) for each independent variable.
- Return type:
dictionary
- Raises:
ValueError – If inputs x or y are empty.:
Notes
Missing values in x or y are not handled, and the function may produce unexpected results if there are missing values in the input data.
The function internally adds a constant column to the independent variables matrix x to represent the intercept term in the regression equation, unless add_const=False is specified.
Constant terms are added to x by default (add_const=True).
Module contents
robustipy package initialization.
This module intentionally avoids global warning-hook side effects at import time. If compact robustipy-only warning formatting is desired, call enable_compact_warnings() explicitly.