haam.haam_package module

Core Analysis Module

This module implements the main HAAMAnalysis class, which performs the Double Machine Learning Lens Model Equation (DML-LME) analysis. It is the primary interface for decomposing human and AI judgment accuracy into direct and mediated components.

Key Concepts:

  • Direct Effects: Accuracy not explained by measured perceptual cues (unmeasured pathways)

  • Indirect Effects: Accuracy mediated through the high-dimensional cue space

  • PoMA (Percentage of Mediated Accuracy): Proportion of accuracy flowing through measured cues

The analysis follows a four-stage process:

  1. Feature Extraction: Convert raw inputs (text, embeddings) into principal components

  2. Nuisance Estimation: Use ML to estimate conditional expectations with cross-fitting

  3. Orthogonalization: Remove regularization bias via double ML

  4. Inference: Bootstrap confidence intervals for all estimates

This implementation handles the high-dimensional setting (p >> n) that breaks traditional mediation analysis.

HAAM: Human-AI Accuracy Model Analysis Package

A lightweight package for analyzing human-AI accuracy models with sample-split post-lasso regression and interactive visualizations.

Author: HAAM Development Team License: MIT

class haam.haam_package.HAAMAnalysis(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]

Bases: object

Main class for Human-AI Accuracy Model analysis.

This class performs sample-split post-lasso regression analysis and generates various visualizations for understanding the relationships between human judgments, AI judgments, and a criterion variable.

Methods

display_all_results()

Display all HAAM results including coefficients and statistics in Colab.

display_coefficient_tables()

Display comprehensive LASSO and post-LASSO model outputs.

display_global_statistics()

Display comprehensive global statistics in organized sections.

display_mediation_results()

Display mediation analysis results with visualization in Colab.

export_coefficients_with_inference([output_dir])

Export both LASSO and post-LASSO coefficients with statistical inference.

export_global_statistics([output_dir])

Export comprehensive global statistics to CSV files.

export_results([output_dir, prefix])

Export results to CSV files.

fit_debiased_lasso([use_sample_splitting, alpha])

Fit debiased lasso models for all outcomes.

generate_embeddings(texts[, model_name, ...])

Generate embeddings using MiniLM model.

get_top_pcs([n_top, ranking_method])

Get top PCs based on ranking method.

__init__(criterion: ndarray, ai_judgment: ndarray, human_judgment: ndarray, embeddings: ndarray | None = None, texts: List[str] | None = None, n_components: int = 200, random_state: int = 42, standardize: bool = False)[source]

Initialize HAAM Analysis.

Parameters:
  • criterion (np.ndarray) – Criterion variable (e.g., social class)

  • ai_judgment (np.ndarray) – AI predictions/ratings

  • human_judgment (np.ndarray) – Human ratings

  • embeddings (np.ndarray, optional) – Pre-computed embeddings. If None, will be generated from texts

  • texts (List[str], optional) – Text data for generating embeddings if not provided

  • n_components (int, default=200) – Number of PCA components to extract

  • random_state (int, default=42) – Random state for reproducibility

  • standardize (bool, default=False) – Whether to standardize X and outcome variables for both total effects and DML calculations. When True, all coefficients will be in standardized units.

static generate_embeddings(texts: List[str], model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: int = 32) ndarray[source]

Generate embeddings using MiniLM model.

Parameters:
  • texts (List[str]) – List of text documents

  • model_name (str) – Name of the sentence transformer model

  • batch_size (int) – Batch size for encoding

Returns:

Embedding matrix (n_samples, embedding_dim)

Return type:

np.ndarray

fit_debiased_lasso(use_sample_splitting: bool = True, alpha: float | None = None) Dict[str, Any][source]

Fit debiased lasso models for all outcomes.

Parameters:
  • use_sample_splitting (bool, default=True) – Whether to use sample splitting for valid inference

  • alpha (float, optional) – Regularization parameter. If None, uses CV

Returns:

Dictionary containing all results

Return type:

Dict[str, Any]

get_top_pcs(n_top: int = 9, ranking_method: str = 'triple') List[int][source]

Get top PCs based on ranking method.

Parameters:
  • n_top (int, default=9) – Number of top PCs to return

  • ranking_method (str, default='triple') – Method for ranking: ‘X’, ‘AI’, ‘HU’, or ‘triple’

Returns:

Indices of top PCs (0-based)

Return type:

List[int]

export_results(output_dir: str | None = None, prefix: str = 'haam_results') Dict[str, str][source]

Export results to CSV files.

Parameters:
  • output_dir (str, optional) – Output directory. If None, uses current directory

  • prefix (str, default='haam_results') – Prefix for output files

Returns:

Dictionary of output file paths

Return type:

Dict[str, str]

display_mediation_results()[source]

Display mediation analysis results with visualization in Colab.

display_global_statistics()[source]

Display comprehensive global statistics in organized sections.

display_coefficient_tables()[source]

Display comprehensive LASSO and post-LASSO model outputs.

export_global_statistics(output_dir: str | None = None)[source]

Export comprehensive global statistics to CSV files.

export_coefficients_with_inference(output_dir: str | None = None)[source]

Export both LASSO and post-LASSO coefficients with statistical inference.

display_all_results()[source]

Display all HAAM results including coefficients and statistics in Colab.