Specification Explorer — Milestone 2b-core Design

Contents

  1. Type System Changes
  2. Recognizer Patterns
  3. 2SLS Computation
  4. Group Detection — Estimator Axis
  5. Conditional Params in the Property Sheet
  6. Spec Curve — Estimator Encoding
  7. Results Display

Problem

Statipy currently only recognizes lm() (OLS) and has no support for instrumental variables estimation. In applied economics, IV/2SLS is the primary tool for addressing endogeneity — researchers routinely compare OLS and IV estimates side by side to assess whether endogeneity biases their treatment effect. Three major R packages handle this: AER (ivreg), fixest (feols), and lfe (felm). Without recognition of these functions, Statipy can’t parse the majority of applied econ scripts.

Additionally, the specification explorer has no concept of an “estimator” axis — there’s no way to compare OLS vs IV as a dimension of specification choice, even though “does instrumenting change the result?” is a core robustness question.

Target User

Same as prior milestones: experienced researchers fluent in R, using Statipy for visual model comparison. They paste scripts containing both lm() and ivreg()/feols()/felm() calls, typically with a stargazer() or modelsummary() comparing all columns.

Scope

Changes span src/core/ (parser, pipeline, stats) and src/ui/ (spec curve, comparison table, property sheet).

Deferred


1. Type System Changes

LinearModelParams

interface LinearModelParams {
  formula: Formula;
  data: string;
  estimator: 'ols' | '2sls';       // default 'ols' for existing lm() nodes
  endogenous?: string[];             // variables being instrumented
  instruments?: string[];            // excluded instruments
  fixedEffects?: string[];           // parsed from feols/felm, not yet computed
}

Existing lm() nodes get estimator: 'ols'. All current code paths are unaffected — the new fields are additive.

IVDiagnostics

interface IVDiagnostics {
  firstStageF: number;
  firstStageFP: number;
  wuHausman: number;
  wuHausmanP: number;
  sarganStatistic?: number;          // only when overidentified
  sarganP?: number;
}

RegressionResult Extension

interface RegressionResult {
  // ... all existing fields unchanged ...
  ivDiagnostics?: IVDiagnostics;     // present only when estimator is '2sls'
}

The comparison table and spec curve read coefficients, rSquared, dfModel, dfResidual from RegressionResult. Note: RegressionResult has no nObs field — N is computed as dfModel + dfResidual + 1, which holds for both OLS and 2SLS. IV diagnostics are additive — existing consumers don’t need changes to keep working.

ParamDef Extension

interface ParamDef {
  // ... existing fields (key, label, kind: ParamKind) ...
  showWhen?: Record<string, string | string[]>;
  options?: Array<{ value: string; label: string }>;  // for 'select' kind
  disabled?: boolean;
  disabledReason?: string;
}

The existing ParamKind union ('formula-outcome' | 'formula-terms' | 'identifier' | 'expression' | 'boolean' | 'string') needs one addition: 'select' for the estimator dropdown. The existing kinds map to the spec’s descriptive names: 'formula''formula-outcome' + 'formula-terms', 'data-ref''identifier', 'term-set''formula-terms'.

The property sheet checks showWhen against the current node’s params. If showWhen is set and the condition isn’t met, the param row is hidden. This is a general-purpose extension usable by future param types.


2. Recognizer Patterns

Three new function patterns, all producing AnalysisCall { kind: 'linear-model' } with estimator-specific args. The mapper and pipeline are unaware of which package was used.

ivreg() (AER package)

ivreg(y ~ x1 + endog | x1 + inst1, data=d)

feols() (fixest package)

feols(y ~ x1 | fe1 + fe2 | endog ~ inst1 + inst2, data=d)

felm() (lfe package)

felm(y ~ x1 | fe1 | (endog ~ inst1) | cluster, data=d)

Formula Parsing Strategy

The multi-part | splitting happens at the recognizer level using raw source text extraction, not AST-level parsing. The R lexer has no single | token (only |> pipe and || logical-or), so multi-part formulas cannot be parsed by the existing grammar.

Each recognizer function:

  1. Extracts the raw source text of the formula argument using sourceSpan byte offsets on the AST node
  2. Splits the raw string on | according to the package’s convention
  3. Feeds each part to the formula parser’s existing y ~ terms machinery (or a new RHS-only helper for instrument/FE parts)
  4. Computes endogenous/instruments by set difference

This approach avoids lexer/parser changes entirely — the | inside a formula argument is handled as raw text by the recognizer, which is the only code that knows the package-specific multi-part conventions.

The formula parser gets one small addition: a function to parse an RHS-only expression (a +-separated term list without the y ~ prefix), reused by all three recognizer patterns for the FE and instrument parts.


3. 2SLS Computation

Algorithm

Standard two-stage least squares, reusing the existing QR-based OLS machinery in computeRegression.

Stage 1 — For each endogenous variable, regress it on all exogenous regressors + excluded instruments:

Stage 2 — Replace endogenous variables with their stage-1 fitted values in the original formula:

SE correction — Stage-2 residuals are wrong (they use fitted endogenous). Correct procedure:

  1. Let X = original design matrix (exogenous + original endogenous), Z = instrument matrix (exogenous + excluded instruments)
  2. Compute correct residuals: e = y - X × β2sls (using original endogenous, not fitted)
  3. σ² = e′e / (n - k) where k = number of regressors
  4. Compute projection: Pz = Z × (Z′Z)¯¹ × Z′ — use QR on Z, then Q × Q′ gives Pz
  5. Compute projected X: X̂ = Pz × X
  6. Var(β) = σ² × (X̂′ × X̂)¯¹ — use QR on to get the inverse

Steps 4–6 compose from existing QR/multiply operations. No new linear algebra primitives needed, but the implementation must construct these intermediate matrices explicitly.

Diagnostics

Computed after estimation using quantities already available from the two stages:

First-stage F-statistic — F-test for joint significance of excluded instruments in the stage-1 regression. Tests instrument relevance. Rule of thumb: F > 10 suggests instruments are not weak (Stock & Yogo 2005).

Wu-Hausman test — Tests whether OLS and IV estimates differ significantly (i.e., whether endogeneity is actually present). Procedure: include stage-1 residuals as an additional regressor in the OLS equation; the F-statistic on that residual term is the Wu-Hausman statistic. Under H0 (no endogeneity), OLS is efficient and consistent.

Sargan/Hansen J test — Only computed when overidentified (number of excluded instruments > number of endogenous variables). Procedure: regress stage-2 residuals on all exogenous regressors + instruments; test statistic is n × R², distributed as χ²(q - k) where q = number of excluded instruments, k = number of endogenous variables. Tests instrument validity (exclusion restriction). Requires a chiSquaredCDF function — implementable as a special case of the incomplete gamma function using the existing lnGamma utility in distributions.ts.

Validation

Before running, the executor validates:

Test vectors validated against R’s ivreg() output (AER package) with known datasets (e.g., Card 1995 schooling.csv, Mroz 1987 mroz.csv).


4. Group Detection — Estimator Axis

New Heuristic: detectEstimatorVariants

Input: All linear-model nodes in the pipeline.

Logic:

  1. Partition nodes by (outcome, data, RHS exogenous terms, fixedEffects) — using formatTerms for canonical comparison. Including fixedEffects in the key prevents FE variants from being misclassified as estimator variants.
  2. Within each partition, check if nodes differ by estimator (and related IV params: instruments, endogenous)
  3. If a partition has both estimator: 'ols' and estimator: '2sls' nodes, create a group with an estimator axis

Axis values:

Ordering: Runs after the existing three heuristics (detectSpecificationVariants, detectSubsampleVariants, detectOutcomeVariants), before mergeOverlappingGroups.

Axis kind: 'estimator' — already defined in SpecAxis.kind but unused until now.

Composition: The estimator axis composes with other axes via the existing mergeOverlappingGroups union-find. A node that varies both controls and estimator gets a multi-axis group automatically.

Interaction with detectSpecificationVariants: The existing specification heuristic partitions by (outcome, data). To prevent OLS and 2SLS nodes with the same outcome/data but different RHS from being conflated into a single specification axis, detectSpecificationVariants must also partition by estimator. This ensures estimator and specification are detected as separate axes.

completeCrossProduct update: The completeCrossProduct function switches on axis.kind to generate missing cross-product cells. A new 'estimator' case must copy estimator, endogenous, instruments, and fixedEffects from the reference node for that axis value.

Example — Card (1995) script with 5 models:


5. Conditional Params in the Property Sheet

Param Schema for linear-model

getParamSchema('linear-model') → [
  { key: 'formula', label: 'Formula', kind: 'formula' },
  { key: 'data', label: 'Data', kind: 'data-ref' },
  { key: 'estimator', label: 'Estimator', kind: 'select',
    options: [{ value: 'ols', label: 'OLS' }, { value: '2sls', label: '2SLS' }] },
  { key: 'endogenous', label: 'Endogenous', kind: 'term-set',
    showWhen: { estimator: '2sls' } },
  { key: 'instruments', label: 'Instruments', kind: 'term-set',
    showWhen: { estimator: '2sls' } },
  { key: 'fixedEffects', label: 'Fixed Effects', kind: 'term-set',
    disabled: true, disabledReason: 'FE computation not yet supported' },
]

Property Sheet Rendering

Before rendering a param row, check showWhen:

Spec Grid Behavior

In the spec grid, conditional params apply per-cell:

Vectorization

The Estimator param is vectorizable like any other axis dimension. When vectorized:


6. Spec Curve — Estimator Encoding

Change: Single Focus Coefficient

The spec curve shows one focus coefficient at a time, not multiple. This matches the standard spec curve format (Simonsohn et al. 2020, specr package). The focusCoefficients: Record<string, string[]> store field is simplified to focusCoefficient: Record<string, string | null> (one per group). The clickable coefficient list in the comparison table is single-select.

Downstream changes from this rename:

Estimator Visual Encoding

With the color channel freed from multi-coefficient encoding, estimator gets dual encoding — both color and shape:

Dual encoding (color + shape) ensures the distinction is readable even in grayscale and answers the core research question — “does IV shift the estimate?” — at a glance.

Observable Plot supports the symbol channel, mapping directly to this design.

Indicator Grid

The estimator axis appears as a row in the bottom indicator panel like any other axis. The indicator dots also use the estimator color (blue/amber) for consistency with the top panel.

When No Estimator Axis Exists

If all specs use the same estimator (e.g., all OLS), color and shape are uniform. The spec curve looks identical to the current 2b-viz implementation.


7. Results Display

IV Diagnostics Section

When a linear-model node with estimator: '2sls' is selected, the results panel shows the standard regression output plus an “IV Diagnostics” block:

IV Diagnostics
--------------------------------------
First-stage F         23.41  (p < 0.001)
Wu-Hausman            4.82   (p = 0.028)
Sargan J              1.23   (p = 0.267)

Endogenous:    educ
Instruments:   nearc4

Comparison Table

Additions to the existing comparison table:

  1. Estimator row — at the bottom of the table, showing “OLS” or “2SLS” per column. Analogous to the N row.
  2. IV diagnostic rows — shown when any column is an IV estimate. First-stage F, Wu-Hausman p-value. Columns without IV show “—”.
  3. Fixed effects checkmarks — when any column has fixedEffects, show a “Fixed Effects” section with checkmarks per FE variable per column. Standard in econ tables (stargazer, modelsummary, esttab all do this).

FE Warning

When a node has fixedEffects but FE computation is deferred, the results panel shows:

Fixed effects declared but not yet computed. Estimates use pooled OLS. FE computation will be available in a future milestone.

This appears as a warning banner, not an error — the model still runs, just without the FE transformation.