5  Analysis

This template provides a unified framework for fMRI data collection, preprocessing, and statistical analysis. Every step — from the experimenter’s session protocol to final archival — is designed around BIDS compliance, version-controlled model definitions, and reproducible environments. The sections below consolidate the template’s analysis standards, data collection procedures, simulation workflows, and reproducibility requirements into a single reference.

5.1 Pipeline Overview

Every subject passes through a 6-stage DAG orchestrated by SLURM job dependencies:

flowchart LR
    fmriprep["fmriprep<br/>(subject-level)"]
    validate["validate_fmriprep<br/>(output gate)"]
    mriqc["mriqc"]
    xcpd["xcpd<br/>(rest/FC only)"]
    glmsingle["glmsingle"]
    fitlins["fitlins"]
    fmriprep --> validate
    validate --> mriqc
    validate --> xcpd
    validate --> glmsingle
    validate --> fitlins

Nodes are SLURM jobs submitted with sbatch --parsable. Edges are --dependency=afterok:<parent> constraints — SLURM starts a child only when its parent exits 0. A non-zero exit cascades: every descendant is cancelled.

validate_fmriprep is a fast gate that checks the html report, dataset_description.json, and expected output files all exist and are non-empty. If it fails, the 4-way fan-out never starts.

# Submit the full DAG for one subject:
make pipeline SUBJECT=sub-01 BATCH_LABEL=my-study MODEL=models/task.smdl.json

# All subjects:
make pipeline-all BATCH_LABEL=my-study MODEL=models/task.smdl.json

# Monitor:
make pipeline-status SUBJECT=sub-01
make pipeline-dag SUBJECT=sub-01          # renders SVG + Mermaid

Resumability: Each stage has a sentinel file check. Completed stages are skipped on re-run; only incomplete stages are resubmitted. See scripts/orchestration/submit_subject_pipeline.sh for the full flag list (--skip-xcpd, --skip-mriqc, --dry-run, etc.).

For the full design rationale (why not snakemake, pydra comparison, sentinel details, adding new stages), see the source code in libs/pipeline_dag.py and scripts/orchestration/submit_subject_pipeline.sh.

5.2 Analysis Standards

This template captures community-friendly fMRI analysis expectations, curated by the Cognitive & Neural Computation Lab and intended for any research group adopting a BIDS-aligned workflow. When adding new analyses under analyses/fmri/, follow these principles:

  1. Modular notebooks and scripts
    • Keep data loading isolated in helper modules (e.g., analyses/helpers/dataloaders.py).
    • Parameterise notebooks via papermill, Jupytext, or CLI wrappers so they can run headlessly.
  2. Pre-registered contrasts (spec-vs-runner decoupling)
    • Store statistical models and contrast definitions in .smdl.json files (per BIDS Stats Models spec) inside analyses/fmri/models/.
    • Include "$schema" for editor validation; the repo’s libs.bids_statsmodels.validate_model() runs the schema check and logs every validation to logs/guardrail_events.jsonl.
    • The .smdl.json spec is canonical. The runner that executes it is swappable: libs.bids_statsmodels.fit(model_path, runner=...) dispatches to nilearn (default, in-process, no container) or fitlins (alternate, container-based reference implementation). make glm RUNNER=nilearn|fitlins wraps both paths.
    • Refer to these model files from pipelines instead of hard-coding contrast weights. One file, either runner.
  3. Cache discipline
    • Use libs.paths.analysis_cache("fmri", "<artifact>") for intermediate products (e.g., design matrices, GLM outputs).
    • Never commit NIfTI images or large tables; stage them in the data repository.
  4. Quality assurance
    • Mirror QC plots under analysis_cache("fmri", "qc") with predictable filenames.
    • Provide CLI entrypoints in pipelines/ that generate QC reports (e.g., PDF or HTML) for each subject/session.
  5. Reproducible environments
    • Document required container images or module loads (e.g., FSL, AFNI) in the reproducibility checklist below.
    • If using containerised pipelines (fMRIPrep, MRIQC), wrap invocations inside pipelines/tasks/ for reuse.

5.2.1 BIDS Stats Models

The template provides machine-readable statistical model definitions following the BIDS Statistical Models specification.

5.2.1.1 Template Models

File Analysis Type Use Case
model-taskGLM_desc-threeLevel_smdl.json Task GLM (3-level) Standard event-related or block design
model-singleTrial_desc-betaSeries_smdl.json Beta series (2-level) MVPA, RSA (prefer GLMsingle)
model-twoGroup_desc-betweenSubjects_smdl.json Group comparison (3-level) Between-group contrasts
model-restingState_desc-denoiseOnly_smdl.json Resting-state (2-level) Nuisance regression (prefer XCP-D)

5.2.1.2 Validation

# Validate all models
uv run python -c "from libs.bids_statsmodels import validate_model, list_models; [print(f'{m.name}: {validate_model(m)}') for m in list_models()]"

# Generate a model for your task
uv run python -c "from libs.bids_statsmodels import generate_task_model; generate_task_model('myTask', ['condA', 'condB'])"

5.2.1.3 Execution

Models can be executed via: - FitLins (container): analyses/fmri/run_fitlins_batch.sh --model model-taskGLM_desc-threeLevel_smdl.json --batch-label study-20260101 - nilearn (Python): Use the model JSON as a reference for building design matrices - Manual: Use model JSON as documentation for hand-coded GLM pipelines

5.2.2 Confound Strategy

See config/glm_defaults.example.toml for documented confound presets and libs/confounds.py for the Python interface. Key presets:

Preset Parameters Use For
minimal 6 motion Task GLM, MVPA, ROI analyses
moderate 6 motion + CSF/WM + FD scrub Whole-brain task GLM
aggressive 24 motion + CSF/WM + FD scrub High-motion data

5.2.3 Filter symmetry (the silent-double-removal trap)

Rule: any filter or denoising applied to BOLD must also be applied to the confound regressors before they enter the GLM. If you bandpass-filter BOLD but not the confounds, the regression silently re-introduces the filtered-out variance through the unfiltered confound projection.

This is the same correctness footgun that made HALFpipe’s authors bake filter-symmetry into their pipeline (Waller et al. 2022). Our libs/confounds.py enforces this when you use the documented presets; if you write a custom confound list, apply the same Gaussian / bandpass filter to the confound design matrix that you applied to BOLD.

5.2.5 Task GLMs and XCP-D — IMPORTANT

Task GLMs must run on fMRIPrep _desc-preproc_bold with confounds in the design matrix, NOT on XCP-D _desc-denoised_bold. The XCP-D paper (Mehta et al. 2024, Imaging Neuroscience, doi:10.1162/imag_a_00257) is explicit:

“XCP-D derivatives are not particularly useful for task-dependent functional connectivity analyses, such as psychophysiological interactions (PPIs) or beta series analyses, and it is not suitable for general task-based analyses, such as standard task GLMs, as nuisance regressors should be included in the GLM step rather than denoising data prior to the GLM.”

Pre-regressing confounds before the GLM removes variance that the task regressors might share with confounds (e.g. motion spikes during a demanding condition), biasing task estimates toward zero.

What to do instead:

Use case Input Pipeline
Task GLM (event-related, block) fMRIPrep _desc-preproc_bold make preprocessmake glm MODEL=... (FitLins/nilearn with confounds in design)
GLMsingle (single-trial betas) fMRIPrep _desc-preproc_bold make preprocessmake glmsingle
gPPI (psychophysiological interactions) fMRIPrep _desc-preproc_bold Custom nilearn pipeline with confounds + interactions in the design matrix
Resting-state functional connectivity XCP-D _desc-denoised_bold make preprocessmake denoise → custom FC analysis
Static parcel-based FC XCP-D _desc-denoised_bold Same as resting-state

The make all target runs both make denoise and make glm, but the two are independent downstream tracks of make preprocess, not a sequential chain. make glm reads fMRIPrep output directly. For a task-only pipeline, set SKIP_DENOISE=1 to skip the XCP-D step entirely:

make all BATCH_LABEL=my-study MODEL=models/task.smdl.json SKIP_DENOISE=1

Belt-and-suspenders backstop. libs/confounds.load_task_confounds() raises ValueError if you accidentally pass a _desc-denoised_bold file. The error message tells you to switch to _desc-preproc_bold. The check is logged via libs.guardrail_log.log_double_denoising for prospective tracking.

5.2.6 Suggested Directory Layout

analyses/fmri/
├── models/                # BIDS Stats Model JSON files (.smdl.json)
├── glm/                   # GLM implementations (GLMsingle, custom)
├── design_matrices/       # Reusable matrix builders
├── stats/                 # GLM, RSA, connectivity modules
├── qc/                    # QC plotting utilities
├── run_fitlins_hpc.sh     # FitLins HPC launcher
├── run_fitlins_batch.sh   # FitLins batch launcher
└── notebooks/             # Jupytext notebooks linked from docs

5.3 Data Collection Protocol

This section documents operational steps for running in-lab and in-scanner sessions.

5.3.1 Before the Session

5.3.2 During the Session

5.3.3 After the Session

5.3.4 Maintenance

  • Version experiment builds (PsychoPy, jsPsych) by tagging releases or storing zipped exports in the data repository.
  • Keep the experiments/ directory clean: archived versions live in the data repository, while this template tracks only the active code branch.
  • Document hardware changes and calibration routines in docs/DOCUMENTATION_INDEX.md so analysts know which configuration applies to each dataset.

5.4 Simulation Studies

Simulations help validate analysis pipelines before running them on real participant data.

5.4.1 Goals

  • Stress-test preprocessing and analysis pipelines under controlled conditions.
  • Estimate statistical power and detection thresholds.
  • Benchmark computational requirements for large-scale runs.

5.4.3 Documentation

  • Log each simulation run in the reproducibility checklist below with a short rationale.
  • When a simulation informs study design, reference it in pre-registration materials.

5.4.4 Automation Tips

  • Integrate simulations into CI on a reduced scale (few subjects) to catch regressions.
  • Use pipelines/tasks/ to share common setup/teardown steps between synthetic and real-data workflows.
  • Store large simulation artefacts in the external data repository; keep only code and configs in git.

5.5 Reproducibility Checklist

Use this checklist when porting analysis pipelines into this template.

5.5.1 1. Reproducibility

5.5.2 2. Validation

5.5.3 3. Provenance

5.5.4 4. Collaboration

5.5.5 5. Archival