5 Analysis
This template provides a unified framework for fMRI data collection, preprocessing, and statistical analysis. Every step — from the experimenter’s session protocol to final archival — is designed around BIDS compliance, version-controlled model definitions, and reproducible environments. The sections below consolidate the template’s analysis standards, data collection procedures, simulation workflows, and reproducibility requirements into a single reference.
5.1 Pipeline Overview
Every subject passes through a 6-stage DAG orchestrated by SLURM job dependencies:
flowchart LR
fmriprep["fmriprep<br/>(subject-level)"]
validate["validate_fmriprep<br/>(output gate)"]
mriqc["mriqc"]
xcpd["xcpd<br/>(rest/FC only)"]
glmsingle["glmsingle"]
fitlins["fitlins"]
fmriprep --> validate
validate --> mriqc
validate --> xcpd
validate --> glmsingle
validate --> fitlins
Nodes are SLURM jobs submitted with sbatch --parsable. Edges are --dependency=afterok:<parent> constraints — SLURM starts a child only when its parent exits 0. A non-zero exit cascades: every descendant is cancelled.
validate_fmriprep is a fast gate that checks the html report, dataset_description.json, and expected output files all exist and are non-empty. If it fails, the 4-way fan-out never starts.
# Submit the full DAG for one subject:
make pipeline SUBJECT=sub-01 BATCH_LABEL=my-study MODEL=models/task.smdl.json
# All subjects:
make pipeline-all BATCH_LABEL=my-study MODEL=models/task.smdl.json
# Monitor:
make pipeline-status SUBJECT=sub-01
make pipeline-dag SUBJECT=sub-01 # renders SVG + MermaidResumability: Each stage has a sentinel file check. Completed stages are skipped on re-run; only incomplete stages are resubmitted. See scripts/orchestration/submit_subject_pipeline.sh for the full flag list (--skip-xcpd, --skip-mriqc, --dry-run, etc.).
For the full design rationale (why not snakemake, pydra comparison, sentinel details, adding new stages), see the source code in libs/pipeline_dag.py and scripts/orchestration/submit_subject_pipeline.sh.
5.2 Analysis Standards
This template captures community-friendly fMRI analysis expectations, curated by the Cognitive & Neural Computation Lab and intended for any research group adopting a BIDS-aligned workflow. When adding new analyses under analyses/fmri/, follow these principles:
- Modular notebooks and scripts
- Keep data loading isolated in helper modules (e.g.,
analyses/helpers/dataloaders.py). - Parameterise notebooks via papermill, Jupytext, or CLI wrappers so they can run headlessly.
- Keep data loading isolated in helper modules (e.g.,
- Pre-registered contrasts (spec-vs-runner decoupling)
- Store statistical models and contrast definitions in
.smdl.jsonfiles (per BIDS Stats Models spec) insideanalyses/fmri/models/. - Include
"$schema"for editor validation; the repo’slibs.bids_statsmodels.validate_model()runs the schema check and logs every validation tologs/guardrail_events.jsonl. - The
.smdl.jsonspec is canonical. The runner that executes it is swappable:libs.bids_statsmodels.fit(model_path, runner=...)dispatches to nilearn (default, in-process, no container) or fitlins (alternate, container-based reference implementation).make glm RUNNER=nilearn|fitlinswraps both paths. - Refer to these model files from pipelines instead of hard-coding contrast weights. One file, either runner.
- Store statistical models and contrast definitions in
- Cache discipline
- Use
libs.paths.analysis_cache("fmri", "<artifact>")for intermediate products (e.g., design matrices, GLM outputs). - Never commit NIfTI images or large tables; stage them in the data repository.
- Use
- Quality assurance
- Mirror QC plots under
analysis_cache("fmri", "qc")with predictable filenames. - Provide CLI entrypoints in
pipelines/that generate QC reports (e.g., PDF or HTML) for each subject/session.
- Mirror QC plots under
- Reproducible environments
- Document required container images or module loads (e.g., FSL, AFNI) in the reproducibility checklist below.
- If using containerised pipelines (fMRIPrep, MRIQC), wrap invocations inside
pipelines/tasks/for reuse.
5.2.1 BIDS Stats Models
The template provides machine-readable statistical model definitions following the BIDS Statistical Models specification.
5.2.1.1 Template Models
| File | Analysis Type | Use Case |
|---|---|---|
model-taskGLM_desc-threeLevel_smdl.json |
Task GLM (3-level) | Standard event-related or block design |
model-singleTrial_desc-betaSeries_smdl.json |
Beta series (2-level) | MVPA, RSA (prefer GLMsingle) |
model-twoGroup_desc-betweenSubjects_smdl.json |
Group comparison (3-level) | Between-group contrasts |
model-restingState_desc-denoiseOnly_smdl.json |
Resting-state (2-level) | Nuisance regression (prefer XCP-D) |
5.2.1.2 Validation
# Validate all models
uv run python -c "from libs.bids_statsmodels import validate_model, list_models; [print(f'{m.name}: {validate_model(m)}') for m in list_models()]"
# Generate a model for your task
uv run python -c "from libs.bids_statsmodels import generate_task_model; generate_task_model('myTask', ['condA', 'condB'])"5.2.1.3 Execution
Models can be executed via: - FitLins (container): analyses/fmri/run_fitlins_batch.sh --model model-taskGLM_desc-threeLevel_smdl.json --batch-label study-20260101 - nilearn (Python): Use the model JSON as a reference for building design matrices - Manual: Use model JSON as documentation for hand-coded GLM pipelines
5.2.2 Confound Strategy
See config/glm_defaults.example.toml for documented confound presets and libs/confounds.py for the Python interface. Key presets:
| Preset | Parameters | Use For |
|---|---|---|
minimal |
6 motion | Task GLM, MVPA, ROI analyses |
moderate |
6 motion + CSF/WM + FD scrub | Whole-brain task GLM |
aggressive |
24 motion + CSF/WM + FD scrub | High-motion data |
5.2.3 Filter symmetry (the silent-double-removal trap)
Rule: any filter or denoising applied to BOLD must also be applied to the confound regressors before they enter the GLM. If you bandpass-filter BOLD but not the confounds, the regression silently re-introduces the filtered-out variance through the unfiltered confound projection.
This is the same correctness footgun that made HALFpipe’s authors bake filter-symmetry into their pipeline (Waller et al. 2022). Our libs/confounds.py enforces this when you use the documented presets; if you write a custom confound list, apply the same Gaussian / bandpass filter to the confound design matrix that you applied to BOLD.
5.2.4 Recommended defaults
These align with HALFpipe + ENIGMA defaults and are the values our config/glm_defaults.example.toml ships with. Override per project, but do so deliberately.
| Parameter | Default | Notes |
|---|---|---|
| Smoothing FWHM | 6 mm | Task GLM standard. Use 4 mm for high-resolution / cortical mapping work, 8 mm for group-level meta-analysis |
| Grand mean scaling | 10000 | Stabilizes scale across runs / subjects |
| Temporal filter (task) | 128 s Gaussian high-pass | Removes slow drifts, preserves task frequencies |
| Temporal filter (resting-state) | 0.01-0.10 Hz bandpass | Standard rs-fMRI band; see also XCP-D defaults |
| Standard space | MNI152NLin2009cAsym | fMRIPrep default; matches FSL atlases |
| Surface space (when used) | fsLR / fsaverage | fMRIPrep --cifti-output 91k for grayordinates |
| ICA-AROMA | OFF by default | Off in template; on in HALFpipe. Turn on with care — interacts with task design |
| Frame censoring (FD threshold) | 0.5 mm (moderate), 0.3 mm (aggressive) | Censor TRs above threshold. Use scrub regressors, not bandpass |
5.2.5 Task GLMs and XCP-D — IMPORTANT
Task GLMs must run on fMRIPrep _desc-preproc_bold with confounds in the design matrix, NOT on XCP-D _desc-denoised_bold. The XCP-D paper (Mehta et al. 2024, Imaging Neuroscience, doi:10.1162/imag_a_00257) is explicit:
“XCP-D derivatives are not particularly useful for task-dependent functional connectivity analyses, such as psychophysiological interactions (PPIs) or beta series analyses, and it is not suitable for general task-based analyses, such as standard task GLMs, as nuisance regressors should be included in the GLM step rather than denoising data prior to the GLM.”
Pre-regressing confounds before the GLM removes variance that the task regressors might share with confounds (e.g. motion spikes during a demanding condition), biasing task estimates toward zero.
What to do instead:
| Use case | Input | Pipeline |
|---|---|---|
| Task GLM (event-related, block) | fMRIPrep _desc-preproc_bold |
make preprocess → make glm MODEL=... (FitLins/nilearn with confounds in design) |
| GLMsingle (single-trial betas) | fMRIPrep _desc-preproc_bold |
make preprocess → make glmsingle |
| gPPI (psychophysiological interactions) | fMRIPrep _desc-preproc_bold |
Custom nilearn pipeline with confounds + interactions in the design matrix |
| Resting-state functional connectivity | XCP-D _desc-denoised_bold |
make preprocess → make denoise → custom FC analysis |
| Static parcel-based FC | XCP-D _desc-denoised_bold |
Same as resting-state |
The make all target runs both make denoise and make glm, but the two are independent downstream tracks of make preprocess, not a sequential chain. make glm reads fMRIPrep output directly. For a task-only pipeline, set SKIP_DENOISE=1 to skip the XCP-D step entirely:
make all BATCH_LABEL=my-study MODEL=models/task.smdl.json SKIP_DENOISE=1Belt-and-suspenders backstop. libs/confounds.load_task_confounds() raises ValueError if you accidentally pass a _desc-denoised_bold file. The error message tells you to switch to _desc-preproc_bold. The check is logged via libs.guardrail_log.log_double_denoising for prospective tracking.
5.2.6 Suggested Directory Layout
analyses/fmri/
├── models/ # BIDS Stats Model JSON files (.smdl.json)
├── glm/ # GLM implementations (GLMsingle, custom)
├── design_matrices/ # Reusable matrix builders
├── stats/ # GLM, RSA, connectivity modules
├── qc/ # QC plotting utilities
├── run_fitlins_hpc.sh # FitLins HPC launcher
├── run_fitlins_batch.sh # FitLins batch launcher
└── notebooks/ # Jupytext notebooks linked from docs
5.3 Data Collection Protocol
This section documents operational steps for running in-lab and in-scanner sessions.
5.3.1 Before the Session
5.3.2 During the Session
5.3.3 After the Session
5.3.4 Maintenance
- Version experiment builds (PsychoPy, jsPsych) by tagging releases or storing zipped exports in the data repository.
- Keep the
experiments/directory clean: archived versions live in the data repository, while this template tracks only the active code branch. - Document hardware changes and calibration routines in
docs/DOCUMENTATION_INDEX.mdso analysts know which configuration applies to each dataset.
5.4 Simulation Studies
Simulations help validate analysis pipelines before running them on real participant data.
5.4.1 Goals
- Stress-test preprocessing and analysis pipelines under controlled conditions.
- Estimate statistical power and detection thresholds.
- Benchmark computational requirements for large-scale runs.
5.4.2 Recommended Workflow
Design simulation modules under
analyses/helpers/simulations.pyor a dedicated subpackage.Generate synthetic datasets in the analysis cache:
from libs import paths sim_root = paths.analysis_cache("simulations", "2025-11-validation") sim_root.mkdir(parents=True, exist_ok=True)Reuse preprocessing pipelines by pointing their inputs to the synthetic data directory.
Record parameters (noise levels, effect sizes, design matrices) in YAML/JSON sidecars stored alongside the synthetic outputs.
Summarise outcomes (true positives, false positives, ROC curves) and export plots to
analysis_cache("simulations", "figures").
5.4.3 Documentation
- Log each simulation run in the reproducibility checklist below with a short rationale.
- When a simulation informs study design, reference it in pre-registration materials.
5.4.4 Automation Tips
- Integrate simulations into CI on a reduced scale (few subjects) to catch regressions.
- Use
pipelines/tasks/to share common setup/teardown steps between synthetic and real-data workflows. - Store large simulation artefacts in the external data repository; keep only code and configs in git.
5.5 Reproducibility Checklist
Use this checklist when porting analysis pipelines into this template.