haam.haam_package module
Core Analysis Module
This module implements the main HAAMAnalysis
class, which performs the Double Machine Learning Lens Model Equation (DML-LME) analysis. It is the primary interface for decomposing human and AI judgment accuracy into direct and mediated components.
Key Concepts:
Direct Effects: Accuracy not explained by measured perceptual cues (unmeasured pathways)
Indirect Effects: Accuracy mediated through the high-dimensional cue space
PoMA (Percentage of Mediated Accuracy): Proportion of accuracy flowing through measured cues
The analysis follows a four-stage process:
Feature Extraction: Convert raw inputs (text, embeddings) into principal components
Nuisance Estimation: Use ML to estimate conditional expectations with cross-fitting
Orthogonalization: Remove regularization bias via double ML
Inference: Bootstrap confidence intervals for all estimates
This implementation handles the high-dimensional setting (p >> n) that breaks traditional mediation analysis.
HAAM: Human-AI Accuracy Model Analysis Package
A lightweight package for analyzing human-AI accuracy models with sample-split post-lasso regression and interactive visualizations.
Author: HAAM Development Team License: MIT
- class haam.haam_package.HAAMAnalysis(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]
Bases:
object
Main class for Human-AI Accuracy Model analysis.
This class performs sample-split post-lasso regression analysis and generates various visualizations for understanding the relationships between human judgments, AI judgments, and a criterion variable.
Methods
Display all HAAM results including coefficients and statistics in Colab.
Display comprehensive LASSO and post-LASSO model outputs.
Display comprehensive global statistics in organized sections.
Display mediation analysis results with visualization in Colab.
export_coefficients_with_inference
([output_dir])Export both LASSO and post-LASSO coefficients with statistical inference.
export_global_statistics
([output_dir])Export comprehensive global statistics to CSV files.
export_results
([output_dir, prefix])Export results to CSV files.
fit_debiased_lasso
([use_sample_splitting, alpha])Fit debiased lasso models for all outcomes.
generate_embeddings
(texts[, model_name, ...])Generate embeddings using MiniLM model.
get_top_pcs
([n_top, ranking_method])Get top PCs based on ranking method.
- __init__(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]
Initialize HAAM Analysis.
- Parameters:
criterion (np.ndarray) – Criterion variable (e.g., social class)
ai_judgment (np.ndarray) – AI predictions/ratings
human_judgment (np.ndarray) – Human ratings
embeddings (np.ndarray, optional) – Pre-computed embeddings. If None, will be generated from texts
texts (List[str], optional) – Text data for generating embeddings if not provided
n_components (int, default=200) – Number of PCA components to extract
random_state (int, default=42) – Random state for reproducibility
standardize (bool, default=False) – Whether to standardize X and outcome variables for both total effects and DML calculations. When True, all coefficients will be in standardized units.
- static generate_embeddings(texts: List[str], model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: int = 32) ndarray [source]
Generate embeddings using MiniLM model.
- fit_debiased_lasso(use_sample_splitting: bool = True, alpha: float | None = None) Dict[str, Any] [source]
Fit debiased lasso models for all outcomes.
- get_top_pcs(n_top: int = 9, ranking_method: str = 'triple') List[int] [source]
Get top PCs based on ranking method.
- export_results(output_dir: str | None = None, prefix: str = 'haam_results') Dict[str, str] [source]
Export results to CSV files.
- display_mediation_results()[source]
Display mediation analysis results with visualization in Colab.
- export_global_statistics(output_dir: str | None = None)[source]
Export comprehensive global statistics to CSV files.