Statipy currently only recognizes lm() (OLS) and has no support for instrumental variables estimation. In applied economics, IV/2SLS is the primary tool for addressing endogeneity — researchers routinely compare OLS and IV estimates side by side to assess whether endogeneity biases their treatment effect. Three major R packages handle this: AER (ivreg), fixest (feols), and lfe (felm). Without recognition of these functions, Statipy can’t parse the majority of applied econ scripts.
Additionally, the specification explorer has no concept of an “estimator” axis — there’s no way to compare OLS vs IV as a dimension of specification choice, even though “does instrumenting change the result?” is a core robustness question.
Same as prior milestones: experienced researchers fluent in R, using Statipy for visual model comparison. They paste scripts containing both lm() and ivreg()/feols()/felm() calls, typically with a stargazer() or modelsummary() comparing all columns.
Changes span src/core/ (parser, pipeline, stats) and src/ui/ (spec curve, comparison table, property sheet).
estimator values. 2SLS via QR is sufficient for now.interface LinearModelParams {
formula: Formula;
data: string;
estimator: 'ols' | '2sls'; // default 'ols' for existing lm() nodes
endogenous?: string[]; // variables being instrumented
instruments?: string[]; // excluded instruments
fixedEffects?: string[]; // parsed from feols/felm, not yet computed
}
Existing lm() nodes get estimator: 'ols'. All current code paths are unaffected — the new fields are additive.
interface IVDiagnostics {
firstStageF: number;
firstStageFP: number;
wuHausman: number;
wuHausmanP: number;
sarganStatistic?: number; // only when overidentified
sarganP?: number;
}
interface RegressionResult {
// ... all existing fields unchanged ...
ivDiagnostics?: IVDiagnostics; // present only when estimator is '2sls'
}
The comparison table and spec curve read coefficients, rSquared, dfModel, dfResidual from RegressionResult. Note: RegressionResult has no nObs field — N is computed as dfModel + dfResidual + 1, which holds for both OLS and 2SLS. IV diagnostics are additive — existing consumers don’t need changes to keep working.
interface ParamDef {
// ... existing fields (key, label, kind: ParamKind) ...
showWhen?: Record<string, string | string[]>;
options?: Array<{ value: string; label: string }>; // for 'select' kind
disabled?: boolean;
disabledReason?: string;
}
The existing ParamKind union ('formula-outcome' | 'formula-terms' | 'identifier' | 'expression' | 'boolean' | 'string') needs one addition: 'select' for the estimator dropdown. The existing kinds map to the spec’s descriptive names: 'formula' → 'formula-outcome' + 'formula-terms', 'data-ref' → 'identifier', 'term-set' → 'formula-terms'.
The property sheet checks showWhen against the current node’s params. If showWhen is set and the condition isn’t met, the param row is hidden. This is a general-purpose extension usable by future param types.
Three new function patterns, all producing AnalysisCall { kind: 'linear-model' } with estimator-specific args. The mapper and pipeline are unaware of which package was used.
ivreg() (AER package)ivreg(y ~ x1 + endog | x1 + inst1, data=d)
ivreg| separator|): full second-stage formula → parse with formula parser → Formula|): all instruments + exogenous regressors{ estimator: '2sls', endogenous, instruments }feols() (fixest package)feols(y ~ x1 | fe1 + fe2 | endog ~ inst1 + inst2, data=d)
feols, fepois, feglm, fenegbin (recognize all, but only feols is executable — others emit with a warning)| → 2 or 3 partsexogenous | FE → { estimator: 'ols', fixedEffects }exogenous | FE | IV → { estimator: '2sls', fixedEffects, endogenous, instruments }+-separated names → fixedEffects: string[]endog ~ instruments → split on ~felm() (lfe package)felm(y ~ x1 | fe1 | (endog ~ inst1) | cluster, data=d)
felm| → up to 4 parts0 means none) → fixedEffects: string[]endogenous, instruments0, emit estimator: 'ols'The multi-part | splitting happens at the recognizer level using raw source text extraction, not AST-level parsing. The R lexer has no single | token (only |> pipe and || logical-or), so multi-part formulas cannot be parsed by the existing grammar.
Each recognizer function:
sourceSpan byte offsets on the AST node| according to the package’s conventiony ~ terms machinery (or a new RHS-only helper for instrument/FE parts)This approach avoids lexer/parser changes entirely — the | inside a formula argument is handled as raw text by the recognizer, which is the only code that knows the package-specific multi-part conventions.
The formula parser gets one small addition: a function to parse an RHS-only expression (a +-separated term list without the y ~ prefix), reused by all three recognizer patterns for the FE and instrument parts.
Standard two-stage least squares, reusing the existing QR-based OLS machinery in computeRegression.
Stage 1 — For each endogenous variable, regress it on all exogenous regressors + excluded instruments:
[exogenous, instruments]Stage 2 — Replace endogenous variables with their stage-1 fitted values in the original formula:
[exogenous, fitted_endogenous]SE correction — Stage-2 residuals are wrong (they use fitted endogenous). Correct procedure:
X = original design matrix (exogenous + original endogenous), Z = instrument matrix (exogenous + excluded instruments)e = y - X × β2sls (using original endogenous, not fitted)σ² = e′e / (n - k) where k = number of regressorsPz = Z × (Z′Z)¯¹ × Z′ — use QR on Z, then Q × Q′ gives PzX̂ = Pz × XVar(β) = σ² × (X̂′ × X̂)¯¹ — use QR on X̂ to get the inverseSteps 4–6 compose from existing QR/multiply operations. No new linear algebra primitives needed, but the implementation must construct these intermediate matrices explicitly.
Computed after estimation using quantities already available from the two stages:
First-stage F-statistic — F-test for joint significance of excluded instruments in the stage-1 regression. Tests instrument relevance. Rule of thumb: F > 10 suggests instruments are not weak (Stock & Yogo 2005).
Wu-Hausman test — Tests whether OLS and IV estimates differ significantly (i.e., whether endogeneity is actually present). Procedure: include stage-1 residuals as an additional regressor in the OLS equation; the F-statistic on that residual term is the Wu-Hausman statistic. Under H0 (no endogeneity), OLS is efficient and consistent.
Sargan/Hansen J test — Only computed when overidentified (number of excluded instruments > number of endogenous variables). Procedure: regress stage-2 residuals on all exogenous regressors + instruments; test statistic is n × R², distributed as χ²(q - k) where q = number of excluded instruments, k = number of endogenous variables. Tests instrument validity (exclusion restriction). Requires a chiSquaredCDF function — implementable as a special case of the incomplete gamma function using the existing lnGamma utility in distributions.ts.
Before running, the executor validates:
instruments is non-emptyendogenous is non-emptyfixedEffects is present, emit a warning that FE are not yet computedTest vectors validated against R’s ivreg() output (AER package) with known datasets (e.g., Card 1995 schooling.csv, Mroz 1987 mroz.csv).
detectEstimatorVariantsInput: All linear-model nodes in the pipeline.
Logic:
(outcome, data, RHS exogenous terms, fixedEffects) — using formatTerms for canonical comparison. Including fixedEffects in the key prevents FE variants from being misclassified as estimator variants.estimator (and related IV params: instruments, endogenous)estimator: 'ols' and estimator: '2sls' nodes, create a group with an estimator axisAxis values:
estimator: 'ols' → label “OLS”estimator: '2sls' → label “2SLS” if all IV nodes share instruments, or “2SLS (nearc4)” / “2SLS (nearc2, nearc4)” if instrument sets differ between nodesOrdering: Runs after the existing three heuristics (detectSpecificationVariants, detectSubsampleVariants, detectOutcomeVariants), before mergeOverlappingGroups.
Axis kind: 'estimator' — already defined in SpecAxis.kind but unused until now.
Composition: The estimator axis composes with other axes via the existing mergeOverlappingGroups union-find. A node that varies both controls and estimator gets a multi-axis group automatically.
Interaction with detectSpecificationVariants: The existing specification heuristic partitions by (outcome, data). To prevent OLS and 2SLS nodes with the same outcome/data but different RHS from being conflated into a single specification axis, detectSpecificationVariants must also partition by estimator. This ensures estimator and specification are detected as separate axes.
completeCrossProduct update: The completeCrossProduct function switches on axis.kind to generate missing cross-product cells. A new 'estimator' case must copy estimator, endogenous, instruments, and fixedEffects from the reference node for that axis value.
Example — Card (1995) script with 5 models:
linear-modelgetParamSchema('linear-model') → [
{ key: 'formula', label: 'Formula', kind: 'formula' },
{ key: 'data', label: 'Data', kind: 'data-ref' },
{ key: 'estimator', label: 'Estimator', kind: 'select',
options: [{ value: 'ols', label: 'OLS' }, { value: '2sls', label: '2SLS' }] },
{ key: 'endogenous', label: 'Endogenous', kind: 'term-set',
showWhen: { estimator: '2sls' } },
{ key: 'instruments', label: 'Instruments', kind: 'term-set',
showWhen: { estimator: '2sls' } },
{ key: 'fixedEffects', label: 'Fixed Effects', kind: 'term-set',
disabled: true, disabledReason: 'FE computation not yet supported' },
]
Before rendering a param row, check showWhen:
showWhen is absent, always showshowWhen is present, read the node’s current param value for each key and check if it matches (string match or inclusion in array)In the spec grid, conditional params apply per-cell:
The Estimator param is vectorizable like any other axis dimension. When vectorized:
The spec curve shows one focus coefficient at a time, not multiple. This matches the standard spec curve format (Simonsohn et al. 2020, specr package). The focusCoefficients: Record<string, string[]> store field is simplified to focusCoefficient: Record<string, string | null> (one per group). The clickable coefficient list in the comparison table is single-select.
Downstream changes from this rename:
focusCoefficients → focusCoefficient, toggleFocusCoefficient becomes setFocusCoefficient (set, not toggle)spec-curve-data.ts: focusCoefficients: string[] parameter → focusCoefficient: stringspec-curve.tsx: auto-detect logic and data assembly call updatedspec-comparison-view.tsx: click handler changes from toggle to setdetect-key-coefficients.ts: return type changes from string[] to string (pick best single coefficient)focusCoefficients referencesWith the color channel freed from multi-coefficient encoding, estimator gets dual encoding — both color and shape:
COLORS.blue, 'circle')COLORS.amber, 'diamond')Dual encoding (color + shape) ensures the distinction is readable even in grayscale and answers the core research question — “does IV shift the estimate?” — at a glance.
Observable Plot supports the symbol channel, mapping directly to this design.
The estimator axis appears as a row in the bottom indicator panel like any other axis. The indicator dots also use the estimator color (blue/amber) for consistency with the top panel.
If all specs use the same estimator (e.g., all OLS), color and shape are uniform. The spec curve looks identical to the current 2b-viz implementation.
When a linear-model node with estimator: '2sls' is selected, the results panel shows the standard regression output plus an “IV Diagnostics” block:
IV Diagnostics
--------------------------------------
First-stage F 23.41 (p < 0.001)
Wu-Hausman 4.82 (p = 0.028)
Sargan J 1.23 (p = 0.267)
Endogenous: educ
Instruments: nearc4
Additions to the existing comparison table:
fixedEffects, show a “Fixed Effects” section with checkmarks per FE variable per column. Standard in econ tables (stargazer, modelsummary, esttab all do this).When a node has fixedEffects but FE computation is deferred, the results panel shows:
Fixed effects declared but not yet computed. Estimates use pooled OLS. FE computation will be available in a future milestone.
This appears as a warning banner, not an error — the model still runs, just without the FE transformation.