TFM Analysis – Step by Step

This notebook implements a step-by-step transfer function modeling (TFM) workflow to relate national fentanyl powder purity to regional overdose mortality. It loads a prepared CSV, prewhitens the input/output series with ARIMA, estimates contemporaneous transfer gains via OLS, compares simple lag structures using BIC, and summarizes results including a Geographic Concordance Index (GCI) and heterogeneity statistics. To run end-to-end, ensure the required CSV (correlation.csv) is present in /work, verify configuration constants in the next section, and execute the notebook top-to-bottom (or run the final cell under "Final Model Runs"). Outputs are saved as CSVs in the output directory.

Notebook run timestamp and environment

Notebook was run on a 4 vCPU environment with 16 GB memory.

import sys, platform, datetime, os from pathlib import Path # Compose timestamp and environment info # Original UTC timestamp retained and additionally compute New York time run_timestamp_utc = datetime.datetime.now(datetime.timezone.utc) try: # Python 3.9+ zoneinfo from zoneinfo import ZoneInfo ny_tz = ZoneInfo("America/New_York") run_timestamp_nyc_dt = run_timestamp_utc.astimezone(ny_tz) except Exception: # Fallback: naive conversion using fixed offset (not DST aware). Kept for robustness if zoneinfo unavailable. # New York is UTC-5 or UTC-4 depending on DST; here we default to -5 hours. run_timestamp_nyc_dt = (run_timestamp_utc - datetime.timedelta(hours=5)).replace(tzinfo=None) run_timestamp = run_timestamp_utc.isoformat() run_timestamp_nyc = run_timestamp_nyc_dt.isoformat() python_version = sys.version.replace('\n', ' ') platform_info = platform.platform() virtual_env = os.environ.get('VIRTUAL_ENV') or os.environ.get('CONDA_DEFAULT_ENV') or 'none' working_dir = str(Path.cwd()) env_summary = { 'run_timestamp_utc': run_timestamp, 'run_timestamp_nyc': run_timestamp_nyc, 'python_version': python_version, 'platform': platform_info, 'environment': virtual_env, 'working_directory': working_dir } # Print a concise summary print(f"Notebook run at (UTC): {run_timestamp}") print(f"Notebook run at (America/New_York): {run_timestamp_nyc}") print(f"Python: {python_version}") print(f"Platform: {platform_info}") print(f"Environment: {virtual_env}") print(f"Working directory: {working_dir}") # Store in a dataframe for optional logging/export import pandas as pd run_info_df = pd.DataFrame([env_summary])

Imports and configuration

Purpose and libraries: pandas and numpy are used for data manipulation and numerics; statsmodels provides ARIMA for prewhitening, OLS for estimating the transfer function, and residual diagnostics (Ljung–Box) to assess noise adequacy; scipy supplies chi-square functions for heterogeneity testing; pathlib and os handle file paths and outputs; warnings is used to suppress convergence and statistical warnings to keep output readable. No plotting libraries are required for this workflow.

# Import functions import pandas as pd import numpy as np from statsmodels.tsa.arima.model import ARIMA from statsmodels.regression.linear_model import OLS from statsmodels.tools import add_constant from statsmodels.stats.diagnostic import acorr_ljungbox from scipy import stats import warnings import os from pathlib import Path

Set Path

This section defines key paths and constants used throughout the analysis. DATA_PATH/INPUT_FILE point to the input CSV containing regional mortality and national purity series. OUTPUT_DIR controls where result CSVs are written. REGIONS lists the U.S. Census regions analyzed. BLOCK1_END and BLOCK2_START define two analytic time blocks used to compare relationships over time. REFERENCE_REGION selects the denominator (West by default) for computing the Geographic Concordance Index. ARIMA_ORDER and ARIMA_FALLBACK specify the prewhitening filter for the input series and a simpler fallback if estimation fails.

# Data loading function(s) DATA_PATH = Path('/work/correlation.csv')

No major warning raised. Suppression rule to aid future compatibility.

warnings.filterwarnings('ignore')

Configuration

The configuration cell below sets filenames and analysis parameters used by all subsequent functions. INPUT_FILE must match the uploaded CSV (correlation.csv). OUTPUT_DIR is created if it doesn’t exist. REGIONS enumerates the four regions evaluated. BLOCK1_END and BLOCK2_START split the series into two periods for comparison. REFERENCE_REGION determines the baseline region for GCI ratios. ARIMA_ORDER and ARIMA_FALLBACK control the ARIMA-based prewhitening filter and its fallback if the primary model fails to converge.

Data loading

The load_data function reads the input CSV into a DataFrame, trims any whitespace from column names, parses the date column into pandas Timestamps, and sorts the data by region and date. It returns a tidy DataFrame ready for region-wise time series analysis.

def load_data(filepath): """Load and prepare the purity-mortality dataset.""" df = pd.read_csv(filepath) df.columns = df.columns.str.strip() df['date'] = pd.to_datetime(df['date']) df = df.sort_values(['region', 'date']).reset_index(drop=True) return df

Descriptives

Basic descriptives before modeling for time intervals and regions.

# Load data and compute table of purity and OD rates at start/end of each block by region # Load df = load_data(INPUT_FILE) # Inspect columns to infer OD and purity names columns = df.columns.tolist() print(columns) # Try to find purity and OD columns heuristically purity_cols = [c for c in columns if 'purity' in c.lower() or 'powder' in c.lower()] od_cols = [c for c in columns if ('od' in c.lower() or 'mort' in c.lower() or 'death' in c.lower()) and ('rate' in c.lower() or 'per' in c.lower() or 'mortality' in c.lower())] print('Purity candidates:', purity_cols) print('OD candidates:', od_cols) # Choose first matches as defaults purity_col = purity_cols[0] if purity_cols else None od_col = od_cols[0] if od_cols else None if purity_col is None or od_col is None: raise ValueError('Could not infer purity or OD rate columns from correlation.csv. Columns: ' + ', '.join(columns)) # Determine start/end dates for each block present in data block1_mask = df['date'] <= BLOCK1_END block2_mask = df['date'] >= BLOCK2_START # For each region, get first and last observation within each block def block_endpoints(group): res = {} # Block 1 g1 = group[block1_mask.loc[group.index]] if not g1.empty: start1 = g1.iloc[0] end1 = g1.iloc[-1] res.update({ 'block1_start_date': start1['date'], 'block1_start_purity': start1[purity_col], 'block1_start_od_rate': start1[od_col], 'block1_end_date': end1['date'], 'block1_end_purity': end1[purity_col], 'block1_end_od_rate': end1[od_col] }) else: res.update({ 'block1_start_date': pd.NaT, 'block1_start_purity': np.nan, 'block1_start_od_rate': np.nan, 'block1_end_date': pd.NaT, 'block1_end_purity': np.nan, 'block1_end_od_rate': np.nan }) # Block 2 g2 = group[block2_mask.loc[group.index]] if not g2.empty: start2 = g2.iloc[0] end2 = g2.iloc[-1] res.update({ 'block2_start_date': start2['date'], 'block2_start_purity': start2[purity_col], 'block2_start_od_rate': start2[od_col], 'block2_end_date': end2['date'], 'block2_end_purity': end2[purity_col], 'block2_end_od_rate': end2[od_col] }) else: res.update({ 'block2_start_date': pd.NaT, 'block2_start_purity': np.nan, 'block2_start_od_rate': np.nan, 'block2_end_date': pd.NaT, 'block2_end_purity': np.nan, 'block2_end_od_rate': np.nan }) return pd.Series(res) summary_df = df.groupby('region').apply(block_endpoints).reset_index() # Order regions consistently summary_df = summary_df.set_index('region').reindex(REGIONS).reset_index() # Preview the table summary_df

Transfer Function Model

Prewhiten

Prewhitening removes autocorrelation from the input (x) and applies the same filter to the output (y) so that their contemporaneous relationship can be estimated without serial correlation bias. The function fits ARIMA(order) to x, falling back to a simpler model if needed. It then differences both series and applies the estimated AR(1) filter when present. It returns alpha (filtered x), beta (filtered y), the estimated AR parameter phi (0 if no AR term), and the actual ARIMA order used.

def prewhiten(x, y, order=ARIMA_ORDER, fallback=ARIMA_FALLBACK): """ Prewhiten input series x with ARIMA filter, apply same filter to output y. Returns prewhitened alpha, beta, estimated phi, and the actual ARIMA order. """ try: model = ARIMA(x, order=order).fit() except Exception: model = ARIMA(x, order=fallback).fit() actual_order = model.model.order phi = model.params[1] if actual_order[0] == 1 else 0.0 dx = np.diff(x) dy = np.diff(y) if actual_order[0] == 1: # Apply (1 - phi*B)(1 - B) filter alpha = dx[1:] - phi * dx[:-1] beta = dy[1:] - phi * dy[:-1] else: # Simple first difference only alpha = dx beta = dy return alpha, beta, phi, actual_order

ARIMA diagnostics

Note on residual diagnostics: In some cases it is reasonable to exclude the first residual when summarizing ARIMA diagnostic tests. Rationale - Initialization effects: When we difference the series and/or apply an AR filter, the very first residual can reflect filter initialization rather than steady-state behavior. This may inflate the magnitude of the first residual and distort normality (e.g., Jarque–Bera) and serial correlation (e.g., Ljung–Box) summaries. - Boundary effects at block starts: Because we run diagnostics within analytic blocks, the boundary at the start of each block can introduce edge effects that disproportionately impact the very first residual in that block. Best practice - Report diagnostics both ways: To ensure conclusions aren’t driven by initialization artifacts, consider reporting diagnostic results including all residuals and again after excluding the first residual. If results agree qualitatively, it increases confidence that findings are not sensitive to boundary initialization.

This section provides fit diagnostics for the ARIMA prewhitening step applied to the input series (national fentanyl powder purity). For a chosen region and time block, we fit the configured ARIMA model (with fallback if needed), print model information and information criteria, and evaluate residuals via Ljung–Box serial correlation tests, Jarque–Bera normality, and basic moments. We also generate and display plots: observed vs fitted values, residuals over time, residual ACF/PACF, and a QQ-plot.

# Helper functions for ARIMA diagnostics import os import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.arima.model import ARIMA from statsmodels.graphics.tsaplots import plot_acf, plot_pacf from statsmodels.graphics.gofplots import qqplot from statsmodels.stats.diagnostic import acorr_ljungbox from scipy import stats def fit_arima_with_fallback(series, order=ARIMA_ORDER, fallback=ARIMA_FALLBACK): """ Fit ARIMA on a 1D array-like series using the configured order, with a fallback. Returns: fitted model and actual (p,d,q) order used. """ try: model = ARIMA(series, order=order).fit() except Exception: model = ARIMA(series, order=fallback).fit() actual_order = model.model.order return model, actual_order def _slugify(text): return ''.join(c.lower() if c.isalnum() else '_' for c in str(text)).strip('_').replace('__', '_') def plot_arima_diagnostics(model, resid, title, save_path, exclude_first=False): """ Create diagnostic plots for ARIMA residuals and fitted vs observed. If exclude_first is True, drop the first residual (and corresponding first observation in the fitted vs observed overlay) from visualizations only. Saves a PNG to save_path and displays inline. """ # Optionally exclude the first residual for plotting resid_plot = resid[1:] if (exclude_first and len(resid) > 1) else resid # Adjust title if excluding first residual plot_title = title + " (first residual excluded)" if exclude_first else title # Residual diagnostics figure (2x2) fig, axes = plt.subplots(2, 2, figsize=(12, 8)) t = np.arange(len(resid_plot)) axes[0, 0].plot(t, resid_plot, color='tab:blue') axes[0, 0].axhline(0, color='black', linewidth=0.8) axes[0, 0].set_title('Residuals over time') axes[0, 0].set_xlabel('Index') axes[0, 0].set_ylabel('Residual') # ACF lags should be < len(resid_plot) lags_acf = max(1, min(24, len(resid_plot) - 1)) if len(resid_plot) > 1 else 1 try: plot_acf(resid_plot, lags=lags_acf, ax=axes[0, 1]) except Exception: # Fallback for very short series axes[0, 1].text(0.5, 0.5, 'ACF not available for short series', ha='center', va='center') axes[0, 1].set_title('Residual ACF') # PACF requires nlags < 50% of sample size; also cap to 6 for very short series if len(resid_plot) > 2: lags_pacf = max(1, min(6, (len(resid_plot) - 1) // 2)) try: plot_pacf(resid_plot, lags=lags_pacf, ax=axes[1, 0], method='ywm') pacf_used = lags_pacf except ValueError: pacf_used = max(1, min(3, (len(resid_plot) - 1) // 2)) try: plot_pacf(resid_plot, lags=pacf_used, ax=axes[1, 0], method='ywm') except Exception: axes[1, 0].text(0.5, 0.5, 'PACF not available', ha='center', va='center') axes[1, 0].set_title(f'Residual PACF (nlags={pacf_used})') else: axes[1, 0].text(0.5, 0.5, 'PACF not available for short series', ha='center', va='center') axes[1, 0].set_title('Residual PACF') try: qqplot(resid_plot, line='s', ax=axes[1, 1]) axes[1, 1].set_title('Residuals QQ-plot') except Exception: axes[1, 1].text(0.5, 0.5, 'QQ-plot not available', ha='center', va='center') axes[1, 1].set_title('Residuals QQ-plot') fig.suptitle(plot_title, fontsize=14) fig.tight_layout(rect=[0, 0.03, 1, 0.95]) # Fitted vs observed fitted = model.fittedvalues y = model.data.endog # Align lengths for plotting (statsmodels may drop initial observations due to differencing) idx_start = len(y) - len(fitted) # If excluding first residual, drop first element of the aligned series for the overlay if exclude_first and len(fitted) > 1: fitted_plot = fitted[1:] y_plot = y[idx_start + 1: idx_start + 1 + len(fitted_plot)] overlay_start = idx_start + 1 else: fitted_plot = fitted y_plot = y[idx_start: idx_start + len(fitted_plot)] overlay_start = idx_start fig2, ax2 = plt.subplots(figsize=(12, 4)) ax2.plot(y, label='Observed', color='tab:blue') ax2.plot(range(overlay_start, overlay_start + len(fitted_plot)), fitted_plot, label='Fitted (aligned)', color='tab:orange') ax2.set_title('Observed vs Fitted' + (" (first point excluded)" if exclude_first else '')) ax2.set_xlabel('Index') ax2.set_ylabel('Value') ax2.legend() fig2.tight_layout() # Save both figures base, ext = os.path.splitext(save_path) fig.savefig(save_path, dpi=150, bbox_inches='tight') fig2_path = f"{base}_fitted{ext}" fig2.savefig(fig2_path, dpi=150, bbox_inches='tight') # Display inline for both figures plt.show() # show first figure plt.show() # show second figure def run_arima_diagnostics_for_subset(x, region_label, block_label): """ Fit ARIMA with fallback on x, print diagnostics, and plot residual diagnostics. """ model, actual_order = fit_arima_with_fallback(x) print('ARIMA model used:', actual_order) print(model.summary()) # Information criteria try: print(f"AIC: {model.aic:.3f} BIC: {model.bic:.3f} HQIC: {model.hqic:.3f}") except Exception: print(f"AIC: {model.aic:.3f} BIC: {model.bic:.3f}") resid = model.resid n = len(resid) lb_max_lag = max(1, min(12, n // 4)) lags = list(range(1, lb_max_lag + 1)) lb = acorr_ljungbox(resid, lags=lags, return_df=True) print('\nLjung–Box test p-values by lag:') print(lb[['lb_stat', 'lb_pvalue']]) # Jarque-Bera may return (stat, p) in newer SciPy versions; compute skew/kurtosis separately jb = stats.jarque_bera(resid) if isinstance(jb, tuple) and len(jb) == 2: jb_stat, jb_p = jb skew_val = stats.skew(resid, bias=False) kurt_val = stats.kurtosis(resid, fisher=False, bias=False) else: jb_stat, jb_p, skew_val, kurt_val = jb print(f"\nJarque–Bera: stat={jb_stat:.3f}, p={jb_p:.3f}, skew={skew_val:.3f}, kurtosis={kurt_val:.3f}") print(f"Residual mean: {np.mean(resid):.5f}, variance: {np.var(resid, ddof=1):.5f}\n") # Plots os.makedirs(OUTPUT_DIR, exist_ok=True) block_slug = _slugify(block_label) region_slug = _slugify(region_label) save_path = os.path.join(OUTPUT_DIR, f"arima_diagnostics_{region_slug}_{block_slug}.png") plot_title = f"ARIMA diagnostics – {region_label} – {block_label}" plot_arima_diagnostics(model, resid, plot_title, save_path, exclude_first=True) print(f"Diagnostic figures saved to: {save_path} and {save_path.replace('.png', '_fitted.png')}")

Note: The diagnostic graphs exclude the first residual for visualization to reduce initialization artifacts at block boundaries. The printed model summary, information criteria, Ljung–Box, and Jarque–Bera statistics still use the full residual series. Estimation remains unchanged; the exclusion applies only to plotting.

ARIMA diagnostics for all regions and blocks

The following cell iterates over every region in REGIONS and both analytic blocks defined in this notebook. For each subset, it fits the configured ARIMA model on the input series x (national fentanyl powder purity), falling back if necessary, and renders residual diagnostics inline while also saving both the residual diagnostics and observed vs fitted figures under the output directory. This provides a quick, comprehensive view of prewhitening adequacy across all region–block combinations.

# Comprehensive ARIMA diagnostics across all regions and blocks # Uses existing helpers: load_data, fit_arima_with_fallback, plot_arima_diagnostics, _slugify # Load data _df_all_diag = load_data(INPUT_FILE) # Ensure output directory exists os.makedirs(OUTPUT_DIR, exist_ok=True) # Define analytic blocks (same as in main()) _blocks_all = [ ('Block 1 (Jul 2022-Sep 2023)', lambda d: d['date'] <= BLOCK1_END), ('Block 2 (Oct 2023-Sep 2024)', lambda d: d['date'] >= BLOCK2_START), ] print("Running ARIMA diagnostics for all regions and blocks...\n") for _region in REGIONS: _rd = _df_all_diag[_df_all_diag['region'] == _region].copy().reset_index(drop=True) for _block_label, _block_filter in _blocks_all: _bd = _rd[_block_filter(_rd)].reset_index(drop=True) # Extract series _x = _bd['powder_purity_pct'].values _y = _bd['mortality_rate_per_100k'].values # Fit ARIMA with fallback on x _model, _actual_order = fit_arima_with_fallback(_x) _resid = _model.resid # Print concise header and ICs print(f"=== {_region} | {_block_label} ===") print(f"ARIMA used: {_actual_order}") try: print(f"AIC: {_model.aic:.3f} BIC: {_model.bic:.3f} HQIC: {_model.hqic:.3f}") except Exception: print(f"AIC: {_model.aic:.3f} BIC: {_model.bic:.3f}") # Plot diagnostics and save (exclude first residual in visuals only) _save_path = os.path.join(OUTPUT_DIR, f"arima_diagnostics_{_slugify(_region)}_{_slugify(_block_label)}.png") _title = f"ARIMA diagnostics – {_region} – {_block_label}" plot_arima_diagnostics(_model, _resid, _title, _save_path, exclude_first=True) print(f"Saved figures to: {_save_path} and {_save_path.replace('.png', '_fitted.png')}\n")

Note: For all-region/block diagnostics above, plots also exclude the first residual purely for visualization alignment with the prewhitening filter. All printed statistics are computed on the full residuals to keep the estimation and inference unchanged.

Transfer Function

This step estimates the contemporaneous transfer function by regressing the prewhitened output (beta) on a constant and the prewhitened input (alpha) using OLS. The slope is the gain, with standard error and p-value for inference; R-squared summarizes fit in the filtered domain. A Ljung–Box test on residuals at a moderate lag checks for remaining autocorrelation, indicating whether the prewhitening adequately captured serial dependence.

def estimate_tfm(alpha, beta): """Estimate contemporaneous transfer function via OLS on prewhitened series.""" X = add_constant(alpha) fit = OLS(beta, X).fit() # Ljung-Box test on residuals for noise model adequacy max_lb_lag = max(1, len(fit.resid) // 3) lb = acorr_ljungbox(fit.resid, lags=[max_lb_lag], return_df=True) lb_pval = lb['lb_pvalue'].values[0] return { 'gain': fit.params[1], 'gain_se': fit.bse[1], 'gain_pval': fit.pvalues[1], 'r2': fit.rsquared, 'adj_r2': fit.rsquared_adj, 'lb_pval': lb_pval, 'n_obs': len(fit.resid), 'model': fit, }

Lags

Lag selection compares two simple transfer structures on the prewhitened series using BIC: lag-0 (current alpha predicts current beta) versus lag-0,1 (current and one-period lag of alpha). BIC penalizes model complexity; the lower BIC is preferred. With very short series, the function defaults to lag-0 to avoid overfitting. The selection helps detect short delayed effects without complicating the model.

def select_lag_by_bic(alpha, beta): """Compare lag-0 vs lag-0,1 specification using BIC and return selected lag.""" # Lag-0 X0 = add_constant(alpha) fit0 = OLS(beta, X0).fit() n0 = len(fit0.resid) bic0 = n0 * np.log(fit0.ssr / n0) + 2 * np.log(n0) # Lag-0,1 k = min(len(alpha) - 1, len(beta) - 1) if k < 5: return 0, bic0, np.nan alpha_lag0 = alpha[1:k + 1] alpha_lag1 = alpha[0:k] beta_use = beta[1:k + 1] X1 = add_constant(np.column_stack([alpha_lag0, alpha_lag1])) fit1 = OLS(beta_use, X1).fit() n1 = len(fit1.resid) bic1 = n1 * np.log(fit1.ssr / n1) + 3 * np.log(n1) selected = 0 if bic0 <= bic1 else 1 return selected, bic0, bic1

Consolidate Results

The main function orchestrates the workflow: it loads the input data, defines two time blocks, and iterates over each region and block. For each subset it prewhitens the series, evaluates simple lag structures by BIC, estimates the transfer gain via OLS, and records diagnostics and raw correlations. It then computes GCI relative to the reference region and runs heterogeneity tests within each block. Results are printed and saved to output/tfm_results.csv (per-region estimates) and output/tfm_heterogeneity.csv (Q, I², pooled gain).

# Main execution logic def main(): # Load data df = load_data(INPUT_FILE) os.makedirs(OUTPUT_DIR, exist_ok=True) # Define analytic blocks blocks = [ ('Block 1 (Jul 2022-Sep 2023)', lambda d: d['date'] <= BLOCK1_END), ('Block 2 (Oct 2023-Sep 2024)', lambda d: d['date'] >= BLOCK2_START), ] results = [] for region in REGIONS: rd = df[df['region'] == region].copy().reset_index(drop=True) for block_label, block_filter in blocks: bd = rd[block_filter(rd)].reset_index(drop=True) x = bd['powder_purity_pct'].values y = bd['mortality_rate_per_100k'].values n = len(x) # Step 1: Prewhiten both series alpha, beta, phi, actual_order = prewhiten(x, y) # Step 2: Select lag structure by BIC selected_lag, bic0, bic1 = select_lag_by_bic(alpha, beta) # Step 3: Estimate TFM (lag-0) tfm = estimate_tfm(alpha, beta) # Raw correlation (unprewhitened) for comparison raw_r = np.corrcoef(x, y)[0, 1] results.append({ 'Region': region, 'Block': block_label, 'n': n, 'ARIMA_order': f"({actual_order[0]},{actual_order[1]},{actual_order[2]})", 'phi': round(phi, 4), 'BIC_lag0': round(bic0, 2), 'BIC_lag1': round(bic1, 2) if not np.isnan(bic1) else np.nan, 'Selected_lag': selected_lag, 'Gain': tfm['gain'], 'Gain_SE': tfm['gain_se'], 'Gain_pval': tfm['gain_pval'], 'R2_TFM': tfm['r2'], 'Adj_R2_TFM': tfm['adj_r2'], 'LB_pval': tfm['lb_pval'], 'Raw_r': raw_r, 'n_prewhitened': tfm['n_obs'], }) res = pd.DataFrame(results) # Output: print summary print("=" * 80) print("TRANSFER FUNCTION MODEL RESULTS") print("Fentanyl Powder Purity (National/DEA) -> Regional Overdose Mortality") print("=" * 80) print() for block_label in res['Block'].unique(): bdf = res[res['Block'] == block_label] print(f"\n {block_label}") print("-" * 80) print(f" {'Region':<12} {'n':>3} {'Gain':>9} {'SE':>8} {'p':>8} " f"{'R2':>7} {'Adj R2':>8} {'Raw r':>7} {'LB p':>7}") print(f" {'-'*12} {'-'*3} {'-'*9} {'-'*8} {'-'*8} " f"{'-'*7} {'-'*8} {'-'*7} {'-'*7}") for _, r in bdf.iterrows(): sig = '**' if r['Gain_pval'] < 0.05 else '* ' if r['Gain_pval'] < 0.10 else ' ' print(f" {r['Region']:<12} {r['n']:>3} {r['Gain']:>+9.4f} " f"{r['Gain_SE']:>8.4f} {r['Gain_pval']:>6.3f}{sig} " f"{r['R2_TFM']:>7.3f} {r['Adj_R2_TFM']:>8.3f} {r['Raw_r']:>+7.3f} " f"{r['LB_pval']:>7.3f}") # Save results res_out = res.drop(columns=['n_prewhitened'], errors='ignore') res_out.to_csv(os.path.join(OUTPUT_DIR, 'tfm_results.csv'), index=False) print(f"\n\nResults saved to {OUTPUT_DIR}/") print(" tfm_results.csv — Full model estimates by region-block (includes Adj_R2_TFM, penalized)")

Final Model Runs

Run the cell below to execute the full pipeline with the current configuration. It will load the data, process both blocks across all regions, print a formatted summary to the console, and write two CSVs to the output/ directory: tfm_results.csv and tfm_heterogeneity.csv.

if 'main' in globals(): main() else: raise NameError("main() is not defined. Ensure previous cells were executed.")

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}TFM Analysis – Step by Step