Specification Explorer — Milestone 2b-core Design

Type System Changes
Recognizer Patterns
2SLS Computation
Group Detection — Estimator Axis
Conditional Params in the Property Sheet
Spec Curve — Estimator Encoding
Results Display

Problem

Statipy currently only recognizes lm() (OLS) and has no support for instrumental variables estimation. In applied economics, IV/2SLS is the primary tool for addressing endogeneity — researchers routinely compare OLS and IV estimates side by side to assess whether endogeneity biases their treatment effect. Three major R packages handle this: AER (ivreg), fixest (feols), and lfe (felm). Without recognition of these functions, Statipy can’t parse the majority of applied econ scripts.

Additionally, the specification explorer has no concept of an “estimator” axis — there’s no way to compare OLS vs IV as a dimension of specification choice, even though “does instrumenting change the result?” is a core robustness question.

Target User

Same as prior milestones: experienced researchers fluent in R, using Statipy for visual model comparison. They paste scripts containing both lm() and ivreg()/feols()/felm() calls, typically with a stargazer() or modelsummary() comparing all columns.

Scope

2b-viz done Specification curve + focus coefficient selection
2b-core this spec IV/2SLS, estimator axis, fixest/AER/lfe recognition
2b-breadth future ANOVA, GLM, correlation, chi-square primitives

Changes span src/core/ (parser, pipeline, stats) and src/ui/ (spec curve, comparison table, property sheet).

Deferred

deferred Fixed effects computation — FE are parsed and displayed but the within-transformation is not computed. Execution uses pooled OLS and shows a warning.
deferred GMM/LIML estimators — Future estimator values. 2SLS via QR is sufficient for now.
deferred Robust/clustered standard errors — Requires GMM or sandwich estimator.

1. Type System Changes

LinearModelParams

interface LinearModelParams {
  formula: Formula;
  data: string;
  estimator: 'ols' | '2sls';       // default 'ols' for existing lm() nodes
  endogenous?: string[];             // variables being instrumented
  instruments?: string[];            // excluded instruments
  fixedEffects?: string[];           // parsed from feols/felm, not yet computed
}

Existing lm() nodes get estimator: 'ols'. All current code paths are unaffected — the new fields are additive.

IVDiagnostics

interface IVDiagnostics {
  firstStageF: number;
  firstStageFP: number;
  wuHausman: number;
  wuHausmanP: number;
  sarganStatistic?: number;          // only when overidentified
  sarganP?: number;
}

RegressionResult Extension

interface RegressionResult {
  // ... all existing fields unchanged ...
  ivDiagnostics?: IVDiagnostics;     // present only when estimator is '2sls'
}

The comparison table and spec curve read coefficients, rSquared, dfModel, dfResidual from RegressionResult. Note: RegressionResult has no nObs field — N is computed as dfModel + dfResidual + 1, which holds for both OLS and 2SLS. IV diagnostics are additive — existing consumers don’t need changes to keep working.

ParamDef Extension

interface ParamDef {
  // ... existing fields (key, label, kind: ParamKind) ...
  showWhen?: Record<string, string | string[]>;
  options?: Array<{ value: string; label: string }>;  // for 'select' kind
  disabled?: boolean;
  disabledReason?: string;
}

The existing ParamKind union ('formula-outcome' | 'formula-terms' | 'identifier' | 'expression' | 'boolean' | 'string') needs one addition: 'select' for the estimator dropdown. The existing kinds map to the spec’s descriptive names: 'formula' → 'formula-outcome' + 'formula-terms', 'data-ref' → 'identifier', 'term-set' → 'formula-terms'.

The property sheet checks showWhen against the current node’s params. If showWhen is set and the condition isn’t met, the param row is hidden. This is a general-purpose extension usable by future param types.

2. Recognizer Patterns

Three new function patterns, all producing AnalysisCall { kind: 'linear-model' } with estimator-specific args. The mapper and pipeline are unaware of which package was used.

`ivreg()` (AER package)

ivreg(y ~ x1 + endog | x1 + inst1, data=d)

Match function name ivreg
Formula string contains | separator
Part 1 (before |): full second-stage formula → parse with formula parser → Formula
Part 2 (after |): all instruments + exogenous regressors
Endogenous = variables in part 1 RHS but not in part 2
Instruments = variables in part 2 but not in part 1 RHS
Emit: { estimator: '2sls', endogenous, instruments }

`feols()` (fixest package)

feols(y ~ x1 | fe1 + fe2 | endog ~ inst1 + inst2, data=d)

Match function names: feols, fepois, feglm, fenegbin (recognize all, but only feols is executable — others emit with a warning)
Split formula on | → 2 or 3 parts
2 parts: exogenous | FE → { estimator: 'ols', fixedEffects }
3 parts: exogenous | FE | IV → { estimator: '2sls', fixedEffects, endogenous, instruments }
FE part: +-separated names → fixedEffects: string[]
IV part: endog ~ instruments → split on ~

`felm()` (lfe package)

felm(y ~ x1 | fe1 | (endog ~ inst1) | cluster, data=d)

Match function name felm
Split formula on | → up to 4 parts
Part 1: exogenous regressors → formula parser
Part 2: FE (0 means none) → fixedEffects: string[]
Part 3: IV specification (may be parenthesized) → endogenous, instruments
Part 4: cluster variable → stored in args but not used in computation
If parts 3 and 4 are absent or 0, emit estimator: 'ols'

Formula Parsing Strategy

The multi-part | splitting happens at the recognizer level using raw source text extraction, not AST-level parsing. The R lexer has no single | token (only |> pipe and || logical-or), so multi-part formulas cannot be parsed by the existing grammar.

Each recognizer function:

Extracts the raw source text of the formula argument using sourceSpan byte offsets on the AST node
Splits the raw string on | according to the package’s convention
Feeds each part to the formula parser’s existing y ~ terms machinery (or a new RHS-only helper for instrument/FE parts)
Computes endogenous/instruments by set difference

This approach avoids lexer/parser changes entirely — the | inside a formula argument is handled as raw text by the recognizer, which is the only code that knows the package-specific multi-part conventions.

The formula parser gets one small addition: a function to parse an RHS-only expression (a +-separated term list without the y ~ prefix), reused by all three recognizer patterns for the FE and instrument parts.

3. 2SLS Computation

Algorithm

Standard two-stage least squares, reusing the existing QR-based OLS machinery in computeRegression.

Stage 1 — For each endogenous variable, regress it on all exogenous regressors + excluded instruments:

Build design matrix: [exogenous, instruments]
Run OLS (QR) → get fitted values for the endogenous variable
First-stage F-statistic: joint significance of excluded instruments in this regression

Stage 2 — Replace endogenous variables with their stage-1 fitted values in the original formula:

Build design matrix: [exogenous, fitted_endogenous]
Run OLS (QR) → coefficients are consistent 2SLS estimates

SE correction — Stage-2 residuals are wrong (they use fitted endogenous). Correct procedure:

Let X = original design matrix (exogenous + original endogenous), Z = instrument matrix (exogenous + excluded instruments)
Compute correct residuals: e = y - X × β_2sls (using original endogenous, not fitted)
σ² = e′e / (n - k) where k = number of regressors
Compute projection: P_z = Z × (Z′Z)¯¹ × Z′ — use QR on Z, then Q × Q′ gives P_z
Compute projected X: X̂ = P_z × X
Var(β) = σ² × (X̂′ × X̂)¯¹ — use QR on X̂ to get the inverse

Steps 4–6 compose from existing QR/multiply operations. No new linear algebra primitives needed, but the implementation must construct these intermediate matrices explicitly.

Diagnostics

Computed after estimation using quantities already available from the two stages:

First-stage F-statistic — F-test for joint significance of excluded instruments in the stage-1 regression. Tests instrument relevance. Rule of thumb: F > 10 suggests instruments are not weak (Stock & Yogo 2005).

Wu-Hausman test — Tests whether OLS and IV estimates differ significantly (i.e., whether endogeneity is actually present). Procedure: include stage-1 residuals as an additional regressor in the OLS equation; the F-statistic on that residual term is the Wu-Hausman statistic. Under H₀ (no endogeneity), OLS is efficient and consistent.

Sargan/Hansen J test — Only computed when overidentified (number of excluded instruments > number of endogenous variables). Procedure: regress stage-2 residuals on all exogenous regressors + instruments; test statistic is n × R², distributed as χ²(q - k) where q = number of excluded instruments, k = number of endogenous variables. Tests instrument validity (exclusion restriction). Requires a chiSquaredCDF function — implementable as a special case of the incomplete gamma function using the existing lnGamma utility in distributions.ts.

Validation

Before running, the executor validates:

instruments is non-empty
endogenous is non-empty
All instrument and endogenous variable names exist in the dataset
Number of instruments ≥ number of endogenous variables (order condition)
If fixedEffects is present, emit a warning that FE are not yet computed

Test vectors validated against R’s ivreg() output (AER package) with known datasets (e.g., Card 1995 schooling.csv, Mroz 1987 mroz.csv).

4. Group Detection — Estimator Axis

New Heuristic: `detectEstimatorVariants`

Input: All linear-model nodes in the pipeline.

Logic:

Partition nodes by (outcome, data, RHS exogenous terms, fixedEffects) — using formatTerms for canonical comparison. Including fixedEffects in the key prevents FE variants from being misclassified as estimator variants.
Within each partition, check if nodes differ by estimator (and related IV params: instruments, endogenous)
If a partition has both estimator: 'ols' and estimator: '2sls' nodes, create a group with an estimator axis

Axis values:

estimator: 'ols' → label “OLS”
estimator: '2sls' → label “2SLS” if all IV nodes share instruments, or “2SLS (nearc4)” / “2SLS (nearc2, nearc4)” if instrument sets differ between nodes

Ordering: Runs after the existing three heuristics (detectSpecificationVariants, detectSubsampleVariants, detectOutcomeVariants), before mergeOverlappingGroups.

Axis kind: 'estimator' — already defined in SpecAxis.kind but unused until now.

Composition: The estimator axis composes with other axes via the existing mergeOverlappingGroups union-find. A node that varies both controls and estimator gets a multi-axis group automatically.

Interaction with detectSpecificationVariants: The existing specification heuristic partitions by (outcome, data). To prevent OLS and 2SLS nodes with the same outcome/data but different RHS from being conflated into a single specification axis, detectSpecificationVariants must also partition by estimator. This ensures estimator and specification are detected as separate axes.

completeCrossProduct update: The completeCrossProduct function switches on axis.kind to generate missing cross-product cells. A new 'estimator' case must copy estimator, endogenous, instruments, and fixedEffects from the reference node for that axis value.

Example — Card (1995) script with 5 models:

Specification axis: {Base, + south + smsa} (2 values)
Estimator axis: {OLS, 2SLS(nearc4), 2SLS(nearc2, nearc4)} (3 values)
Cross-product: 6 cells, 1 disabled (2SLS(nearc2, nearc4) × Base not in original script)

5. Conditional Params in the Property Sheet

Param Schema for `linear-model`

getParamSchema('linear-model') → [
  { key: 'formula', label: 'Formula', kind: 'formula' },
  { key: 'data', label: 'Data', kind: 'data-ref' },
  { key: 'estimator', label: 'Estimator', kind: 'select',
    options: [{ value: 'ols', label: 'OLS' }, { value: '2sls', label: '2SLS' }] },
  { key: 'endogenous', label: 'Endogenous', kind: 'term-set',
    showWhen: { estimator: '2sls' } },
  { key: 'instruments', label: 'Instruments', kind: 'term-set',
    showWhen: { estimator: '2sls' } },
  { key: 'fixedEffects', label: 'Fixed Effects', kind: 'term-set',
    disabled: true, disabledReason: 'FE computation not yet supported' },
]

Property Sheet Rendering

Before rendering a param row, check showWhen:

If showWhen is absent, always show
If showWhen is present, read the node’s current param value for each key and check if it matches (string match or inclusion in array)
Hidden params are not rendered at all (not greyed out — absent)

Spec Grid Behavior

In the spec grid, conditional params apply per-cell:

OLS columns: endogenous and instruments rows show “—” (non-editable)
2SLS columns: endogenous and instruments are editable
Fixed effects: shown as read-only for all estimators with “not yet supported” indicator

Vectorization

The Estimator param is vectorizable like any other axis dimension. When vectorized:

User selects estimator values (OLS, 2SLS)
Each value becomes a column (or row) in the spec grid
2SLS columns show the additional instrument/endogenous fields
Cross-product completion generates cells for all specification × estimator combinations

6. Spec Curve — Estimator Encoding

Change: Single Focus Coefficient

The spec curve shows one focus coefficient at a time, not multiple. This matches the standard spec curve format (Simonsohn et al. 2020, specr package). The focusCoefficients: Record<string, string[]> store field is simplified to focusCoefficient: Record<string, string | null> (one per group). The clickable coefficient list in the comparison table is single-select.

Downstream changes from this rename:

Store: focusCoefficients → focusCoefficient, toggleFocusCoefficient becomes setFocusCoefficient (set, not toggle)
spec-curve-data.ts: focusCoefficients: string[] parameter → focusCoefficient: string
spec-curve.tsx: auto-detect logic and data assembly call updated
spec-comparison-view.tsx: click handler changes from toggle to set
detect-key-coefficients.ts: return type changes from string[] to string (pick best single coefficient)
Store tests: update all focusCoefficients references

Estimator Visual Encoding

With the color channel freed from multi-coefficient encoding, estimator gets dual encoding — both color and shape:

OLS: blue circle (COLORS.blue, 'circle')
2SLS: amber diamond (COLORS.amber, 'diamond')

Dual encoding (color + shape) ensures the distinction is readable even in grayscale and answers the core research question — “does IV shift the estimate?” — at a glance.

Observable Plot supports the symbol channel, mapping directly to this design.

Indicator Grid

The estimator axis appears as a row in the bottom indicator panel like any other axis. The indicator dots also use the estimator color (blue/amber) for consistency with the top panel.

When No Estimator Axis Exists

If all specs use the same estimator (e.g., all OLS), color and shape are uniform. The spec curve looks identical to the current 2b-viz implementation.

7. Results Display

IV Diagnostics Section

When a linear-model node with estimator: '2sls' is selected, the results panel shows the standard regression output plus an “IV Diagnostics” block:

IV Diagnostics
--------------------------------------
First-stage F         23.41  (p < 0.001)
Wu-Hausman            4.82   (p = 0.028)
Sargan J              1.23   (p = 0.267)

Endogenous:    educ
Instruments:   nearc4

First-stage F is always shown
Wu-Hausman is always shown
Sargan J only shown when overidentified (#instruments > #endogenous)
Endogenous/Instruments listed for reference

Comparison Table

Additions to the existing comparison table:

Estimator row — at the bottom of the table, showing “OLS” or “2SLS” per column. Analogous to the N row.
IV diagnostic rows — shown when any column is an IV estimate. First-stage F, Wu-Hausman p-value. Columns without IV show “—”.
Fixed effects checkmarks — when any column has fixedEffects, show a “Fixed Effects” section with checkmarks per FE variable per column. Standard in econ tables (stargazer, modelsummary, esttab all do this).

FE Warning

When a node has fixedEffects but FE computation is deferred, the results panel shows:

Fixed effects declared but not yet computed. Estimates use pooled OLS. FE computation will be available in a future milestone.

This appears as a warning banner, not an error — the model still runs, just without the FE transformation.