haam.haam_package module

Core Analysis Module

This module implements the main HAAMAnalysis class, which performs the Double Machine Learning Lens Model Equation (DML-LME) analysis. It is the primary interface for decomposing human and AI judgment accuracy into direct and mediated components.

Key Concepts:

Direct Effects: Accuracy not explained by measured perceptual cues (unmeasured pathways)
Indirect Effects: Accuracy mediated through the high-dimensional cue space
PoMA (Percentage of Mediated Accuracy): Proportion of accuracy flowing through measured cues

The analysis follows a four-stage process:

Feature Extraction: Convert raw inputs (text, embeddings) into principal components
Nuisance Estimation: Use ML to estimate conditional expectations with cross-fitting
Orthogonalization: Remove regularization bias via double ML
Inference: Bootstrap confidence intervals for all estimates

This implementation handles the high-dimensional setting (p >> n) that breaks traditional mediation analysis.

HAAM: Human-AI Accuracy Model Analysis Package

A lightweight package for analyzing human-AI accuracy models with sample-split post-lasso regression and interactive visualizations.

Author: HAAM Development Team License: MIT

class haam.haam_package.HAAMAnalysis(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]

Bases: object

Main class for Human-AI Accuracy Model analysis.

This class performs sample-split post-lasso regression analysis and generates various visualizations for understanding the relationships between human judgments, AI judgments, and a criterion variable.

Methods

`display_all_results`()	Display all HAAM results including coefficients and statistics in Colab.
`display_coefficient_tables`()	Display comprehensive LASSO and post-LASSO model outputs.
`display_global_statistics`()	Display comprehensive global statistics in organized sections.
`display_mediation_results`()	Display mediation analysis results with visualization in Colab.
`export_coefficients_with_inference`([output_dir])	Export both LASSO and post-LASSO coefficients with statistical inference.
`export_global_statistics`([output_dir])	Export comprehensive global statistics to CSV files.
`export_results`([output_dir, prefix])	Export results to CSV files.
`fit_debiased_lasso`([use_sample_splitting, alpha])	Fit debiased lasso models for all outcomes.
`generate_embeddings`(texts[, model_name, ...])	Generate embeddings using MiniLM model.
`get_top_pcs`([n_top, ranking_method])	Get top PCs based on ranking method.

__init__(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]

Initialize HAAM Analysis.

Parameters:

criterion (np.ndarray) – Criterion variable (e.g., social class)
ai_judgment (np.ndarray) – AI predictions/ratings
human_judgment (np.ndarray) – Human ratings
embeddings (np.ndarray, optional) – Pre-computed embeddings. If None, will be generated from texts
texts (List[str], optional) – Text data for generating embeddings if not provided
n_components (int, default=200) – Number of PCA components to extract
random_state (int, default=42) – Random state for reproducibility
standardize (bool, default=False) – Whether to standardize X and outcome variables for both total effects and DML calculations. When True, all coefficients will be in standardized units.

static generate_embeddings(texts: List[str], model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: int = 32) → ndarray[source]

Generate embeddings using MiniLM model.

Parameters:

texts (List[str]) – List of text documents
model_name (str) – Name of the sentence transformer model
batch_size (int) – Batch size for encoding

Returns:

Embedding matrix (n_samples, embedding_dim)

Return type:

np.ndarray

fit_debiased_lasso(use_sample_splitting: bool = True, alpha: float | None = None) → Dict[str, Any][source]

Fit debiased lasso models for all outcomes.

Parameters:

use_sample_splitting (bool, default=True) – Whether to use sample splitting for valid inference
alpha (float, optional) – Regularization parameter. If None, uses CV

Returns:

Dictionary containing all results

Return type:

Dict[str, Any]

get_top_pcs(n_top: int = 9, ranking_method: str = 'triple') → List[int][source]

Get top PCs based on ranking method.

Parameters:

n_top (int, default=9) – Number of top PCs to return
ranking_method (str, default='triple') – Method for ranking: ‘X’, ‘AI’, ‘HU’, or ‘triple’

Returns:

Indices of top PCs (0-based)

Return type:

List[int]

export_results(output_dir: str | None = None, prefix: str = 'haam_results') → Dict[str, str][source]

Export results to CSV files.

Parameters:

output_dir (str, optional) – Output directory. If None, uses current directory
prefix (str, default='haam_results') – Prefix for output files

Returns:

Dictionary of output file paths

Return type:

Dict[str, str]

display_mediation_results()[source]: Display mediation analysis results with visualization in Colab.

display_global_statistics()[source]: Display comprehensive global statistics in organized sections.

display_coefficient_tables()[source]: Display comprehensive LASSO and post-LASSO model outputs.

export_global_statistics(output_dir: str | None = None)[source]: Export comprehensive global statistics to CSV files.

export_coefficients_with_inference(output_dir: str | None = None)[source]: Export both LASSO and post-LASSO coefficients with statistical inference.

display_all_results()[source]: Display all HAAM results including coefficients and statistics in Colab.