5 Analysis

This template provides a unified framework for fMRI data collection, preprocessing, and statistical analysis. Every step — from the experimenter’s session protocol to final archival — is designed around BIDS compliance, version-controlled model definitions, and reproducible environments. The sections below consolidate the template’s analysis standards, data collection procedures, simulation workflows, and reproducibility requirements into a single reference.

5.1 Pipeline Overview

Every subject passes through a 6-stage DAG orchestrated by SLURM job dependencies:

flowchart LR
    fmriprep["fmriprep<br/>(subject-level)"]
    validate["validate_fmriprep<br/>(output gate)"]
    mriqc["mriqc"]
    xcpd["xcpd<br/>(rest/FC only)"]
    glmsingle["glmsingle"]
    fitlins["fitlins"]
    fmriprep --> validate
    validate --> mriqc
    validate --> xcpd
    validate --> glmsingle
    validate --> fitlins

Nodes are SLURM jobs submitted with sbatch --parsable. Edges are --dependency=afterok:<parent> constraints — SLURM starts a child only when its parent exits 0. A non-zero exit cascades: every descendant is cancelled.

validate_fmriprep is a fast gate that checks the html report, dataset_description.json, and expected output files all exist and are non-empty. If it fails, the 4-way fan-out never starts.

# Submit the full DAG for one subject:
make pipeline SUBJECT=sub-01 BATCH_LABEL=my-study MODEL=models/task.smdl.json

# All subjects:
make pipeline-all BATCH_LABEL=my-study MODEL=models/task.smdl.json

# Monitor:
make pipeline-status SUBJECT=sub-01
make pipeline-dag SUBJECT=sub-01          # renders SVG + Mermaid

Resumability: Each stage has a sentinel file check. Completed stages are skipped on re-run; only incomplete stages are resubmitted. See scripts/orchestration/submit_subject_pipeline.sh for the full flag list (--skip-xcpd, --skip-mriqc, --dry-run, etc.).

For the full design rationale (why not snakemake, pydra comparison, sentinel details, adding new stages), see the source code in libs/pipeline_dag.py and scripts/orchestration/submit_subject_pipeline.sh.

5.2 Analysis Standards

This template captures community-friendly fMRI analysis expectations, curated by the Cognitive & Neural Computation Lab and intended for any research group adopting a BIDS-aligned workflow. When adding new analyses under analyses/fmri/, follow these principles:

Modular notebooks and scripts
- Keep data loading isolated in helper modules (e.g., analyses/helpers/dataloaders.py).
- Parameterise notebooks via papermill, Jupytext, or CLI wrappers so they can run headlessly.
Pre-registered contrasts (spec-vs-runner decoupling)
- Store statistical models and contrast definitions in .smdl.json files (per BIDS Stats Models spec) inside analyses/fmri/models/.
- Include "$schema" for editor validation; the repo’s libs.bids_statsmodels.validate_model() runs the schema check and logs every validation to logs/guardrail_events.jsonl.
- The .smdl.json spec is canonical. The runner that executes it is swappable: libs.bids_statsmodels.fit(model_path, runner=...) dispatches to nilearn (default, in-process, no container) or fitlins (alternate, container-based reference implementation). make glm RUNNER=nilearn|fitlins wraps both paths.
- Refer to these model files from pipelines instead of hard-coding contrast weights. One file, either runner.
Cache discipline
- Use libs.paths.analysis_cache("fmri", "<artifact>") for intermediate products (e.g., design matrices, GLM outputs).
- Never commit NIfTI images or large tables; stage them in the data repository.
Quality assurance
- Mirror QC plots under analysis_cache("fmri", "qc") with predictable filenames.
- Provide CLI entrypoints in pipelines/ that generate QC reports (e.g., PDF or HTML) for each subject/session.
Reproducible environments
- Document required container images or module loads (e.g., FSL, AFNI) in the reproducibility checklist below.
- If using containerised pipelines (fMRIPrep, MRIQC), wrap invocations inside pipelines/tasks/ for reuse.

5.2.1 BIDS Stats Models

The template provides machine-readable statistical model definitions following the BIDS Statistical Models specification.

5.2.1.1 Template Models

File	Analysis Type	Use Case
`model-taskGLM_desc-threeLevel_smdl.json`	Task GLM (3-level)	Standard event-related or block design
`model-singleTrial_desc-betaSeries_smdl.json`	Beta series (2-level)	MVPA, RSA (prefer GLMsingle)
`model-twoGroup_desc-betweenSubjects_smdl.json`	Group comparison (3-level)	Between-group contrasts
`model-restingState_desc-denoiseOnly_smdl.json`	Resting-state (2-level)	Nuisance regression (prefer XCP-D)

5.2.1.2 Validation

# Validate all models
uv run python -c "from libs.bids_statsmodels import validate_model, list_models; [print(f'{m.name}: {validate_model(m)}') for m in list_models()]"

# Generate a model for your task
uv run python -c "from libs.bids_statsmodels import generate_task_model; generate_task_model('myTask', ['condA', 'condB'])"

5.2.1.3 Execution

Models can be executed via: - FitLins (container): analyses/fmri/run_fitlins_batch.sh --model model-taskGLM_desc-threeLevel_smdl.json --batch-label study-20260101 - nilearn (Python): Use the model JSON as a reference for building design matrices - Manual: Use model JSON as documentation for hand-coded GLM pipelines

5.2.2 Confound Strategy

See config/glm_defaults.example.toml for documented confound presets and libs/confounds.py for the Python interface. Key presets:

Preset	Parameters	Use For
`minimal`	6 motion	Task GLM, MVPA, ROI analyses
`moderate`	6 motion + CSF/WM + FD scrub	Whole-brain task GLM
`aggressive`	24 motion + CSF/WM + FD scrub	High-motion data

5.2.3 Filter symmetry (the silent-double-removal trap)

Rule: any filter or denoising applied to BOLD must also be applied to the confound regressors before they enter the GLM. If you bandpass-filter BOLD but not the confounds, the regression silently re-introduces the filtered-out variance through the unfiltered confound projection.

This is the same correctness footgun that made HALFpipe’s authors bake filter-symmetry into their pipeline (Waller et al. 2022). Our libs/confounds.py enforces this when you use the documented presets; if you write a custom confound list, apply the same Gaussian / bandpass filter to the confound design matrix that you applied to BOLD.

5.2.4 Recommended defaults

These align with HALFpipe + ENIGMA defaults and are the values our config/glm_defaults.example.toml ships with. Override per project, but do so deliberately.

Parameter	Default	Notes
Smoothing FWHM	6 mm	Task GLM standard. Use 4 mm for high-resolution / cortical mapping work, 8 mm for group-level meta-analysis
Grand mean scaling	10000	Stabilizes scale across runs / subjects
Temporal filter (task)	128 s Gaussian high-pass	Removes slow drifts, preserves task frequencies
Temporal filter (resting-state)	0.01-0.10 Hz bandpass	Standard rs-fMRI band; see also XCP-D defaults
Standard space	MNI152NLin2009cAsym	fMRIPrep default; matches FSL atlases
Surface space (when used)	fsLR / fsaverage	fMRIPrep `--cifti-output 91k` for grayordinates
ICA-AROMA	OFF by default	Off in template; on in HALFpipe. Turn on with care — interacts with task design
Frame censoring (FD threshold)	0.5 mm (moderate), 0.3 mm (aggressive)	Censor TRs above threshold. Use scrub regressors, not bandpass

5.2.5 Task GLMs and XCP-D — IMPORTANT

Task GLMs must run on fMRIPrep _desc-preproc_bold with confounds in the design matrix, NOT on XCP-D _desc-denoised_bold. The XCP-D paper (Mehta et al. 2024, Imaging Neuroscience, doi:10.1162/imag_a_00257) is explicit:

“XCP-D derivatives are not particularly useful for task-dependent functional connectivity analyses, such as psychophysiological interactions (PPIs) or beta series analyses, and it is not suitable for general task-based analyses, such as standard task GLMs, as nuisance regressors should be included in the GLM step rather than denoising data prior to the GLM.”

Pre-regressing confounds before the GLM removes variance that the task regressors might share with confounds (e.g. motion spikes during a demanding condition), biasing task estimates toward zero.

What to do instead:

Use case	Input	Pipeline
Task GLM (event-related, block)	fMRIPrep `_desc-preproc_bold`	`make preprocess` → `make glm MODEL=...` (FitLins/nilearn with confounds in design)
GLMsingle (single-trial betas)	fMRIPrep `_desc-preproc_bold`	`make preprocess` → `make glmsingle`
gPPI (psychophysiological interactions)	fMRIPrep `_desc-preproc_bold`	Custom nilearn pipeline with confounds + interactions in the design matrix
Resting-state functional connectivity	XCP-D `_desc-denoised_bold`	`make preprocess` → `make denoise` → custom FC analysis
Static parcel-based FC	XCP-D `_desc-denoised_bold`	Same as resting-state

The make all target runs both make denoise and make glm, but the two are independent downstream tracks of make preprocess, not a sequential chain. make glm reads fMRIPrep output directly. For a task-only pipeline, set SKIP_DENOISE=1 to skip the XCP-D step entirely:

make all BATCH_LABEL=my-study MODEL=models/task.smdl.json SKIP_DENOISE=1

Belt-and-suspenders backstop. libs/confounds.load_task_confounds() raises ValueError if you accidentally pass a _desc-denoised_bold file. The error message tells you to switch to _desc-preproc_bold. The check is logged via libs.guardrail_log.log_double_denoising for prospective tracking.

5.2.6 Suggested Directory Layout

analyses/fmri/
├── models/                # BIDS Stats Model JSON files (.smdl.json)
├── glm/                   # GLM implementations (GLMsingle, custom)
├── design_matrices/       # Reusable matrix builders
├── stats/                 # GLM, RSA, connectivity modules
├── qc/                    # QC plotting utilities
├── run_fitlins_hpc.sh     # FitLins HPC launcher
├── run_fitlins_batch.sh   # FitLins batch launcher
└── notebooks/             # Jupytext notebooks linked from docs

5.4 Simulation Studies

Simulations help validate analysis pipelines before running them on real participant data.

5.4.1 Goals

Stress-test preprocessing and analysis pipelines under controlled conditions.
Estimate statistical power and detection thresholds.
Benchmark computational requirements for large-scale runs.

5.4.2 Recommended Workflow

Design simulation modules under analyses/helpers/simulations.py or a dedicated subpackage.

Generate synthetic datasets in the analysis cache:

from libs import paths
sim_root = paths.analysis_cache("simulations", "2025-11-validation")
sim_root.mkdir(parents=True, exist_ok=True)

Reuse preprocessing pipelines by pointing their inputs to the synthetic data directory.
Record parameters (noise levels, effect sizes, design matrices) in YAML/JSON sidecars stored alongside the synthetic outputs.
Summarise outcomes (true positives, false positives, ROC curves) and export plots to analysis_cache("simulations", "figures").

5.4.3 Documentation

Log each simulation run in the reproducibility checklist below with a short rationale.
When a simulation informs study design, reference it in pre-registration materials.

5.4.4 Automation Tips

Integrate simulations into CI on a reduced scale (few subjects) to catch regressions.
Use pipelines/tasks/ to share common setup/teardown steps between synthetic and real-data workflows.
Store large simulation artefacts in the external data repository; keep only code and configs in git.

5 Analysis

5.1 Pipeline Overview

5.2 Analysis Standards

5.2.1 BIDS Stats Models

5.2.1.1 Template Models

5.2.1.2 Validation

5.2.1.3 Execution

5.2.2 Confound Strategy

5.2.3 Filter symmetry (the silent-double-removal trap)

5.2.4 Recommended defaults

5.2.5 Task GLMs and XCP-D — IMPORTANT

5.2.6 Suggested Directory Layout

5.3 Data Collection Protocol

5.3.1 Before the Session

5.3.2 During the Session

5.3.3 After the Session

5.3.4 Maintenance

5.4 Simulation Studies

5.4.1 Goals

5.4.2 Recommended Workflow

5.4.3 Documentation

5.4.4 Automation Tips

5.5 Reproducibility Checklist

5.5.1 1. Reproducibility

5.5.2 2. Validation

5.5.3 3. Provenance

5.5.4 4. Collaboration

5.5.5 5. Archival