14 Research Journal — Reproducible-fMRI
Scope: Template-only. Captures design decisions, validation results, and project status for the framework itself and the planned eLife paper. Not synced to child repos.
Last Updated: 2026-04-26 Project: Reproducible-fMRI — an open-source framework for reproducible neuroimaging Target Publication: eLife (open science / tools & resources) Repository: CNClaboratory/Reproducible-fMRI
14.1 Project Overview
Research Question: Can a template-based framework with standardized pipelines, configuration systems, and machine-readable statistical models substantially improve reproducibility in fMRI research?
Approach: - Template repository (Tier 3) that propagates infrastructure to child research projects - Two-tier TOML configuration system separating code from data paths - BIDS Stats Models (.smdl.json) for machine-readable GLM specifications - Standardized pipeline scripts for fMRIPrep, MRIQC, XCP-D, GLMsingle, FitLins - Confound strategy framework with documented presets (minimal/moderate/aggressive) - HPC-optimized SLURM workflows with resource probing
Key Innovation: Moving from “best practices documentation” to an executable, propagatable template system where infrastructure changes flow from template to child repos, ensuring consistency across studies.
Validation: 4 active child projects spanning different analysis types: - twcf — Task GLM (figure-ground segregation, N=27, 4 tasks) - vividness — Full pipeline: MRIQC→fMRIPrep→XCP-D→GLMsingle→DA - TI_DecNef — Neurofeedback with ROI-based analysis - Hypergraphsciousness — EEG-fMRI fusion, hypergraph neural networks
14.2 Project Status
14.2.1 Completed Components
14.2.1.1 1. Core Infrastructure
- Path system (
libs/paths.py): Two-tier TOML config (paths.roots+paths.locations) - Confound framework (
libs/confounds.py): Three presets with task-appropriate defaults - BIDS Stats Models (
libs/bids_statsmodels.py): Validation, generation, discovery - Configuration presets: 4 site-specific presets (uci, ucr, neu, local) + 1 multi-site template, all following the canonical
<lab-root>/<user>/repos/<repo>+<lab-root>/Projects/<project>/<dataset>layout
14.2.1.2 2. Pipeline Scripts (13 total)
- fMRIPrep: batch, HPC, local, smoke (4 scripts)
- MRIQC: batch, HPC (2 scripts)
- XCP-D: batch, HPC (2 scripts)
- GLMsingle: batch, HPC (2 scripts)
- FitLins: batch, HPC (2 scripts)
- Resource probe utility (1 script)
14.2.1.3 3. Template BIDS Stats Models (4 models)
model-taskGLM_desc-threeLevel_smdl.json— Standard event-related/block GLMmodel-singleTrial_desc-betaSeries_smdl.json— MVPA/RSA beta seriesmodel-twoGroup_desc-betweenSubjects_smdl.json— Between-group contrastsmodel-restingState_desc-denoiseOnly_smdl.json— Resting-state nuisance regression
14.2.1.4 4. Documentation (25 files)
- Quickstart, researcher setup, HPC best practices
- Configuration system, data management, git workflow
- fMRI analysis standards, scientific analysis checklist
- Template maintenance, grant documentation
14.2.1.5 5. Test Suite
tests/test_bids_statsmodels.py— Model discovery, loading, validation, generationtests/test_confounds.py— Confound preset resolution, TOML parsingtests/test_repos.py— Repository management utilities- pytest markers: unit, integration, slow, bids_models
14.2.1.6 6. Child Repo Ecosystem
- Template propagation via
scripts/deploy/sync_pipeline_scripts.sh .claude/template-config.jsonlinking children to template- AGENTS.md (1053 lines) as master context for AI-assisted development
- Claude skills: neuro-viz, spatial-ops, data-analytics, claudeception
14.3 Active Manuscripts
14.3.1 M1: Reproducible-fMRI Toolbox Paper
Target: eLife (Tools & Resources) Status: Stage 2 first draft complete Working title: “Reproducible-fMRI: Closing the reproducibility gap between tools and practice in neuroimaging”
Core framing (from brainstorming): Reproducibility failures aren’t primarily about individual tools — they’re about the configuration and integration layer between tools. fMRIPrep is reproducible. BIDS is standardized. But the decisions connecting them are typically undocumented. Our contribution makes that integration layer explicit, propagatable, validatable, and versionable.
Manuscript structure (IMRAD):
| Section | Status | Notes |
|---|---|---|
| Abstract | First draft | ~280 words, structured Background→Gap→Method→Results→Conclusion |
| Introduction | First draft | ~1,800 words, 6 paragraphs |
| Methods | First draft | 10 subsections, ~2,100 words |
| Results | First draft | 6 subsections, ~1,700 words, 5 tables |
| Discussion | First draft | 7 paragraphs, ~2,300 words |
Key arguments (refined): 1. The last mile problem: Individual tools solve individual steps — but integration remains manual, undocumented, and inconsistent 2. Configuration drift: Even within one lab, projects diverge over time. No existing tool prevents this 3. Machine-readable decisions: .smdl.json models + confound presets capture exactly the decisions Botvinik-Nezer 2020 showed drive variability 4. Template propagation: Infrastructure-as-Code for science — changes flow from template to child repos 5. Multi-project governance: Not a single-project tool but a system for lab-wide consistency
AI angle — “The coming AI reproducibility crisis” (Discussion P4): Three vectors compound the existing crisis: (1) AI as scrutinizer — automated review beyond human capacity will expose undocumented decisions; (2) AI as producer — mass-produced AI research amplifies noise unless pipelines constrain quality; (3) AI as consumer — foundation labs processing all of scientific literature need machine-readable inputs or they’ll propagate garbage. No moral judgment — just preparation for inevitability. Machine-readable configs serve both human and AI reproducers.
Adversarial collaboration angle (Introduction): Author’s 2.5-year experience in large multi-site adversarial collaborations testing theories of consciousness (designing paradigms, building pipelines, arbitrating theories, analyzing). These collaborations are the hardest test case for reproducibility: pipeline inconsistency across labs could be confounded with theoretical predictions. Lived experience motivating the framework.
Prospective tracking (Methods/Results): Set up measurement infrastructure NOW (sync event logging, guardrail activation logging, config drift detection via git). Report longitudinal data honestly — repos are actively developing. “Over N months, we observed X sync events, Y guardrail activations, Z% configuration alignment.”
14.4 Figures
No figures generated yet.
| ID | Description | Path | Status |
|---|---|---|---|
| F1 | Template hierarchy diagram (Tier 1-4) | — | Planned |
| F2 | Pipeline flow: MRIQC→fMRIPrep→{XCP-D, GLMsingle, FitLins} | — | Planned |
| F3 | Configuration system architecture (TOML two-tier) | — | Planned |
| F4 | Confound strategy decision tree | — | Planned |
| F5 | Child repo validation: consistency metrics across 4 projects | — | Planned |
14.5 Key Findings
No formal findings yet — this is an infrastructure/methods paper. Key evidence to collect:
14.6 Open Questions
| Question | Context | Priority | Resolution |
|---|---|---|---|
| What reproducibility metrics? | eLife wants quantitative evidence | High | Config consistency audit, time-to-first-analysis, decision documentation completeness, error prevention logs |
| Include AI/AGENTS.md? | Novel but risky for reviewers | Medium | Yes, Discussion only — as “AI-readable reproducibility context”, forward-looking |
| Package before paper? | Paper validates template | Medium | Paper first — package is future work in Discussion |
| Which child repos? | Need diverse validation | Medium | All four — diversity (GLM, connectivity, neurofeedback, multimodal) is the strength |
| Executable manuscript? | eLife supports Quarto/MyST | Low | Defer — standard submission first |
| Is a template publishable? | Reviewer may say “just a GitHub repo” | High | Yes — BIDS itself was published. Frame as: the concept + implementation, not “just a repo” |
14.6.1 Resolved from Brainstorming
- Framing: “Closing the reproducibility gap between tools and practice” (not “workflow framework” or “template system”)
- Novelty: (1) machine-readable analysis decisions, (2) configuration drift prevention, (3) confound guardrails
- Differentiator: Multi-project governance — competitors are all single-project tools
- Preempt W1 (“just a repo”): BIDS spec itself was published. Standards ARE contributions.
- Preempt W2 (“N=4 is small”): Diversity across analysis types, not count, is the strength.
- Preempt W5 (“why not Snakemake”): Workflow engines solve execution ordering; we solve decision documentation and propagation. Complementary.
14.7 Review Notes
14.8 Literature Base
14.8.1 Reproducibility Crisis (motivating problem)
- Eklund et al. 2016 — Cluster failure: false-positive rates up to 70% (PNAS)
- Carp 2012 — 6,912 unique pipelines from one dataset (Frontiers Neurosci)
- Botvinik-Nezer et al. 2020 — 70 teams, same data, divergent conclusions (Nature) [KEY REFERENCE]
- Bowring et al. 2019 — Software choice alone produces Dice 0.000-0.743 (HBM)
- Li et al. 2024 — Five pipelines, only moderate agreement (Nat Hum Behav)
- Poldrack et al. 2017 — Scanning the horizon: best practices roadmap (Nat Rev Neurosci)
- Nichols et al. 2017 — COBIDAS reporting standards (Nat Neurosci)
- Kennedy et al. 2019 — ReproNim: machine-readable provenance (Front Neuroinform)
- Marek et al. 2022 — Reproducible brain-wide associations need thousands of individuals (Nature)
- Botvinik-Nezer & Wager 2023 — Reproducibility review: standardized pipelines are most promising (Biol Psych CNNI)
- Steegen et al. 2016 — Multiverse analysis concept (Perspectives on Psych Sci)
14.8.2 Existing Tools & Standards (what we build on)
- Gorgolewski et al. 2016 — BIDS specification (Sci Data)
- Poldrack et al. 2024 — BIDS past/present/future (Imaging Neurosci)
- Gorgolewski et al. 2017 — BIDS Apps containerization (PLOS Comp Bio)
- Esteban et al. 2019 — fMRIPrep (Nature Methods)
- Esteban et al. 2017 — MRIQC (PLOS ONE)
- Esteban et al. 2020 — NiPreps ecosystem (OSF)
- Mehta et al. 2024 — XCP-D (Imaging Neurosci)
- Prince et al. 2022 — GLMsingle (eLife)
- Markiewicz et al. — FitLins / BIDS Stats Models (BEP002)
- Gorgolewski et al. 2011 — Nipype (Front Neuroinform)
- Ciric et al. 2017 — Confound strategy benchmarking (NeuroImage)
- Parkes et al. 2018 — Motion correction evaluation (NeuroImage)
14.8.4 Adversarial Collaborations & Consciousness Science
- Kahneman 2003 — Original adversarial collaboration concept (Am Psychologist 58:723–730)
- Cogitate Consortium et al. 2025 — Adversarial testing of GNW and IIT theories of consciousness (Nature 642:133–142) [KEY — published results]
- Melloni et al. 2023 — COGITATE adversarial collaboration protocol (PLOS ONE)
- Potgieter 2024 — ARC structured adversarial collaboration process; $30M TWCF portfolio (OSF Preprints)
- Templeton World Charity Foundation — ARC-FOHO and ARC-ETHOS programs (arc-foho.org, arc-ethos.org)
14.8.5 AI in Science (Discussion — AI reproducibility crisis)
- Liang et al. 2024 — Monitoring AI-Modified Content at Scale: ChatGPT in peer reviews (ICML/PMLR 235:29575–29620) [6.5–16.9% of reviews LLM-modified]
- Liang et al. 2024 — Mapping the Increasing Use of LLMs in Scientific Papers (arXiv 2404.01268) [up to 17.5% in CS]
- Lu et al. 2024 — The AI Scientist: towards fully automated scientific discovery (arXiv 2408.06292, Sakana AI)
- Lu et al. 2025 — The AI Scientist-v2: workshop-level automated discovery via agentic tree search (arXiv 2504.08066)
- Boiko et al. 2023 — Autonomous chemical research with LLMs / Coscientist (Nature 624:570–578)
- Mitchener et al. 2024 — Kosmos: an AI scientist for autonomous discovery (arXiv 2511.02824, Edison Scientific)
- Google DeepMind 2024 — AI co-scientist multi-agent system (research blog)
- Kapoor & Narayanan 2023 — Leakage and the reproducibility crisis in ML (Patterns)
- Nature News 2023 — “Is AI leading to a reproducibility crisis in science?”
- Kozlov 2025 — Low-quality papers flooding cancer literature; AI detection tools (Nature News)
- Else 2025 — AI content tainting preprints; moderator response (Nature News)
- Brainard 2024 — Low-quality papers surging via public datasets and AI (Science)
- Kusumegi et al. 2025 — AI-using researchers increased output 36–60%; quantity-quality tradeoff (Science 390:1240) [KEY — hard data]
- Van Noorden 2025 — ICLR 2026: 21% of reviews fully AI-generated (Nature News)
- Yamada et al. 2025 — AI Scientist v2: first AI-generated peer-review-accepted workshop paper (arXiv 2504.08066)
- Staab et al. 2025 — Evaluation of AI Scientist: fabricated results, no self-correction (ACM SIGIR Forum)
- Mason-Williams & Mason-Williams 2025 — Reproducibility as AI governance frontier (arXiv, ICML workshop)
- Hahnel 2025 — Machine-First FAIR: data organized for AI consumers (Digital Science)
- PMC12309808 2025 — 1 in 7 biomedical abstracts probably AI-written; paper mills infiltrating editorial boards
14.8.6 Other Fields (precedents)
- Mölder et al. 2021 — Snakemake (F1000Research) — bioinformatics workflows
- Di Tommaso et al. 2017 — Nextflow (Nat Biotech) — reproducible workflows
- Marwick et al. 2018 — Research compendium concept (Am Statistician) [CLOSEST ANALOG]
- Halchenko et al. 2021 — DataLad (JOSS) — code+data provenance
- Wilson et al. 2017 — Good enough practices (PLOS Comp Bio)
- Lowndes et al. 2017 — Better science in less time (Nat Ecol Evol)
- Glatard et al. 2015 — OS-level reproducibility (Front Neuroinform)
14.9 Technical Infrastructure
Current state (Phase 1: Template): - Python 3.10+ with uv for reproducible environments - Singularity containers for fMRIPrep, MRIQC, XCP-D, FitLins - SLURM-optimized HPC scripts with resource probing - pytest test suite with custom markers - BIDS Stats Models with jsonschema + bsmschema validation
Roadmap:
| Phase | Status | Deliverable |
|---|---|---|
| 1. Template | Current | This repo — reusable across projects |
| 2. Lab Docs | In Progress | docs.cnclab.io Research section |
| 3. Package | Planned | pip-installable reproducible-fmri |
| 4. Publication | Planned | eLife-style executable manuscript |
Skills Available: - neuro-viz — Neuroimaging visualization standards - spatial-ops — Spatial resampling and alignment - data-analytics — Statistical analysis patterns - /sci — Scientific research orchestrator (this system)
14.10 Session Log
14.10.1 2026-02-07 — /sci init
- Actions: Scanned repo (1053-line AGENTS.md, 13 pipeline scripts, 3 Python libs, 4 BIDS Stats Models, 25 docs). Created research journal. Added auto-discovery triggers to AGENTS.md.
- Decisions: Framing as eLife Tools & Resources paper. Template-based reproducibility as core contribution.
14.10.2 2026-02-07 — Literature review + brainstorming
- Actions: Compiled 30+ references across 4 categories. Ran scientific-brainstorming to refine paper arguments.
- Decisions:
- Title: “Closing the reproducibility gap between tools and practice in neuroimaging”
- Core framing: the “last mile” — tools are reproducible but their USE isn’t
- AI angle: Discussion section only, as forward-looking “AI-readable reproducibility context”
- All 4 child repos as case studies (diversity > count)
- Key reference: Botvinik-Nezer et al. 2020 (70 teams, same data, different results)
- Closest analog in other fields: Marwick et al. 2018 “research compendium”
- Next steps: Draft detailed IMRAD manuscript outline. Collect evidence from child repos.
14.10.3 2026-02-07 — Manuscript outline drafted
- Actions: Created
docs/manuscript-outline.md— full IMRAD Stage 1 outline with 5 figures, 5 tables, ~60 refs, evidence checklist. Checked eLife Tools & Resources requirements (Research Article format, 5,000 word limit, code must be open-source). - Key planning decisions:
- Introduction: 5 paragraphs (problem → NARPS → tools → gap → contribution)
- Methods: 9 subsections covering full framework
- Results: 6 subsections with anchor comparison table (Table 5)
- Discussion: AI angle in paragraph 4, limitations honest, future = package + executable manuscript
- Supplementary: COBIDAS mapping, full config examples, model annotations
- Next steps: Collect evidence from child repos (config audit, error logs, decision catalog). Then Stage 2: convert outline to prose, starting with Methods.
14.10.5 2026-02-07 — Reference integration + AI crisis framing
- Actions: Completed reference search (50+ refs now compiled). Found COGITATE Nature 2025 landmark paper (642:133–142). Added 2 new literature categories: “Adversarial Collaborations & Consciousness Science” (5 refs) and “AI in Science” (12 refs). Rewrote Discussion P4 as “The coming AI reproducibility crisis” with 3-vector argument. Set up prospective tracking (sync logging + guardrail logging). Updated Introduction P4 with COGITATE published results.
- Key additions:
- COGITATE (Cogitate Consortium et al. 2025, Nature) — adversarial testing IIT vs GNW, 256 participants, challenged BOTH theories
- Liang et al. 2024 — 6.5–16.9% of AI conference reviews LLM-modified; up to 17.5% of CS papers
- Lu et al. 2024 — AI Scientist generates complete papers <$15
- Boiko et al. 2023 — Coscientist: autonomous chemical research (Nature)
- Kozlov 2025, Else 2025 — AI paper mills flooding cancer literature and preprints
- Next steps: Begin Stage 2 prose on Methods section.
14.10.6 2026-02-07 — Methods first draft completed
- Actions: Wrote full Stage 2 prose for all 10 Methods subsections (~2,100 words) in
docs/manuscript-draft-methods.md. Based on actual codebase inspection: 15 pipeline scripts, 4 Python libs (1,133 total LOC), 4 template models, 6 config presets, 18 synced files. - Key technical details captured:
- Two-tier TOML with
base::subpathsyntax and 8 environment variable overrides PathConfigimmutable dataclass with LRU caching- Batch launcher common interface (
--batch-label,--dry-run,--cifti, etc.) - Double-denoising guardrail (preproc vs denoised BOLD routing)
- JSONL prospective tracking with try/except silent-failure pattern
- Spatial alignment trust levels (high/medium/low provenance)
- Two-tier TOML with
- Next steps: Write Introduction prose, then Results (requires evidence collection first), then Discussion.
14.10.7 2026-02-07 — Stage 2 first draft complete (all sections)
- Actions: Completed full IMRAD manuscript. Introduction (~1,800 words) and Discussion (~2,300 words) written in previous session (context limit hit). This session: collected evidence from all 4 child repos and wrote Results (~1,700 words) + Abstract (~280 words).
- Evidence collected:
- Code metrics: 6,781 total LOC (1,133 Python libs, 431 tests, 5,217 shell scripts), 4 site presets + 1 multi-site template, 23 docs
- Configuration audit: 100% consistency across all 4 repos on fMRIPrep 25.2.3, output spaces, CIFTI 91k, confound framework, 5-tool pipeline coverage
- COBIDAS comparison: ~55% overall compliance vs ~40% typical papers. Standouts: confounds 85%, output spaces 95%, GLM params 80%
- 7 study-specific BIDS Stats Models across child repos (twcf: 3, vividness: 1+4 templates, TI_DecNef: 2, Hypergraph: 1)
- December 2025 audit: structural alignment varied (Hypergraph 90%, twcf 10%, others 0%) but functional alignment on decisions = 100%
- Key Results findings:
- Template governs decisions that matter (tool versions, spaces, confounds) while allowing implementation divergence
- 28+ analysis decisions captured in machine-readable format across 4 configuration layers
- 5 categories of automated guardrails (double-denoising, spatial alignment, model validation, confound validation, skip logic)
- Comparison table (Table 5): only framework with cross-project sync and multi-study governance
- Total manuscript: ~8,200 words (over eLife’s 5,000 main text target — will need trimming or moving content to Supplements)
- Next steps: Generate figures (F1-F5), trim to word limit, add references/bibliography, author review
14.10.9 2026-03-18 — Multi-site infrastructure + LC pitch preparation
- Actions: Major push to make pipeline turnkey for the March 31 UCR LC group meeting.
- Commits:
--bids-diroverride added to all 5 batch scripts (addresses BATCH_LABEL rigidity from cross-repo audit)- HeuDiConv DICOM-to-BIDS pipeline (batch + HPC + heuristic template)
- UCR HPCC config preset + multi-site template config
- LC study example (4 BIDS Stats Models, scanner heuristic, mermaid pipeline diagrams)
- Turnkey infrastructure: Makefile interface (
make help), container pull script, preflight validation, PsychToolbox-to-BIDS events converter - Marp presentation slides (14 slides for March 31 meeting)
- New pipeline entry points:
make setup/make preflight/make pull-containers(setup)make convert/make qc/make preprocess/make denoise/make glm(pipeline)make all BATCH_LABEL=lc-study MODEL=task.smdl.json(full pipeline)
- LC pitch readiness: Slides ready, pipeline demo-able with
make helpandDRY_RUN=1. Still need from UCR: DICOM headers, events.tsv format, HPCC account, storage paths. - Next steps: Follow up with Megan on 5 items needed from UCR. Render slides to PDF. Practice pitch. If HPCC access granted, do dry-run deployment.
14.10.10 2026-03-25 — Infrastructure hardening for multi-site deployment + scan logging
- Motivation: Shift from pitch materials to actual infrastructure. Real test case: Michaela at NEU setting up vividness pipeline on Discovery cluster. Audited BetterCodeBetterScience book, all 4 child repos, and vividness two-repo architecture.
- Key finding: Vividness has separate data repo (CNClaboratory, pure BIDS, no code) and code repo (subjectivitylab, full pipeline). Code repo has drifted from template with UCI-specific scripts. Pipeline scripts in template assumed NeuroCommand modules — would fail at any non-UCI site.
- Infrastructure changes:
- CONTAINER_PATH env var support in all 5 HPC scripts (singularity exec fallback for non-NeuroCommand sites)
- All 5 batch scripts pass CONTAINER_PATH through to HPC jobs
paths.local.tomldeep-merge support — machine-specific overrides without modifying shared config- 48 smoke tests for all shell scripts (bash -n, –help, –dry-run, interface consistency)
docs/NEW_SITE_SETUP.md— step-by-step for new HPC sites (NEU as worked example)
- Scan logging schema (BIDS-aligned):
- Architecture: canonical private TSVs in
sourcedata/acquisition_log/→ auto-generated public BIDS files libs/scan_log.py: discover scans from BIDS data, merge with canonical (preserves manual annotations), publish public files (strips private columns), validate consistency- Anomaly codes: scan_status (pass/caution/partial/interrupted/excluded/rerun) + anomaly_type (structured) + free-text notes
- 41 unit tests including full round-trip integration test
- Tested on real vividness data: correctly discovered 27 scans across 3 participants (sub-NEU01, sub-UCI01, sub-UCIpilot1)
- Replaces Excel spreadsheet workflow — machine-readable, git-trackable, pipeline-aware
- Architecture: canonical private TSVs in
- Coordination: Other Claude Code agent working on vividness code repo (site.conf system, _load_site_config.sh, ANALYSIS_QUICK_START.md). Complementary approaches — template handles Python config (paths.local.toml) and scan logging, child repo handles bash config (site.conf) and site-specific docs.
- 123 tests total, all passing (48 pipeline smoke + 41 scan log + 34 existing)
- Next steps: Reconcile template + vividness after both agents stabilize. Deploy at NEU with Michaela (real stress test). Backport site.conf pattern to template.
14.10.11 2026-04-07 — Press-go bootstrap + BCBS finalization + LC pitch polish + privacy cleanup
- Press-go bootstrap:
make setupnow works end-to-end on a fresh clone (13 PASS, 0 FAIL, 0 WARN).find_containerthree-strategy resolution (module → CONTAINER_ROOT → PATH). SLURM_CONSTRAINT portability. CIFTI disabled for fMRIPrep 25.2.3 bug. NEU Explorer preset rewritten under real field pressure. - BCBS finalization: CODE_OF_CONDUCT.md (Contributor Covenant 2.1). CHANGELOG.md (Keep a Changelog). DOCUMENTATION_INDEX.md reorganized by Universal → Site → Lab → Project scope layers. Dangling references in setup.py and CONTRIBUTING.md fixed.
- LC pitch polish: Slides updated for April 2026 state (press-go bootstrap, 15,646 LOC, 153 tests). New “Press-Go Bootstrap” and “Already Running in the Field” slides. Rendered to PPTX/PDF via marp-cli. Canonical
docs/pitches/lc_study.mdcreated with stakeholder map. - Privacy cleanup: Pitch content moved to
.private/pitches/via the mindweb counterpart (~/src/github.com/yoursurname/mindweb/projects/Reproducible-fMRI/). Stakeholder names and strategy framing removed from public repo.examples/lc-study/README.mdandcreate_lc_sample_structure.shsanitized. CHANGELOG documents the leak in public git history for transparency. - Guardrail logging built out:
libs/guardrail_log.pyexpanded from 56-line stub to full module: 6 (later 8) categories, typed helpers, JSONL schema, summarize + CLI,make guardrail-summary. Double-denoising guardrail wired intolibs/confounds.load_task_confounds. 31 new tests. - CI enforcement: New
.github/workflows/tests.yml— pytest matrix (3.11/3.12), shell smoke, BIDS Stats Model validation. First run green. README badges added. - Verified Nipoppy + BABS comparison: Background research agent fact-checked both repos (GitHub API + README reads). Manuscript Table 5 expanded to 8 tool columns. Introduction §1.5 updated. LC pitch deck got “Why not use Nipoppy or BABS?” slide.
- Manuscript figures F1-F5 generated: Mermaid (F1-F4) + matplotlib (F5). Framework overview, template propagation, config architecture, pipeline + confound decision tree, COBIDAS coverage bars (73% vs 41% typical). All rendered to PDF/PNG/SVG via
docs/manuscript/figures/render.sh. Manuscript draft wired to concrete figure files. - 250 tests total, all passing (up from 153 at start of session)
14.10.12 2026-04-08 — Benchmark suite + 38-framework landscape + adoption roadmap → 80.8
- Motivation: Stop asserting competitive advantage — measure it. Build a rubric-based, reproducible benchmark suite that scores any framework on the same 10 weighted dimensions with explicit 0-5 bands.
- Benchmark framework built:
benchmarks/BENCHMARKS.md— 10 dimensions × 4-5 criteria each, totaling 100 weighted points. Dimensions front-load the NARPS “last mile” gap (decision documentation 15, analytic decision capture 13, reproducibility 12, error prevention + multi-study governance + multi-site support + deployment friction = 40).benchmarks/scoring_rubric.toml— machine-readable bands.benchmarks/run_benchmarks.py— auto-probes a local repo for mechanical facts (site preset count, BIDS Apps wrapped, test count, guardrail categories, etc.) and merges with manual TOML assessments. Supports--comparemode.- Manual assessments for Reproducible-fMRI, Nipoppy, BABS, HALFpipe, C-PAC with per-criterion source citations.
benchmarks/frameworks.toml— 38-framework registry covering BIDS App wrappers, tool environments, workflow engines, provenance, compendia, domain tools, and model-spec layer.benchmarks/LANDSCAPE.md— headline finding:multi_site_presets = 0,double_denoising_guardrail = false, andspatial_alignment_validation = falseare uniform across all 38 other frameworks. These are structural moats.
- Initial scores (70.4): Reproducible-fMRI 70.4, HALFpipe 43.5, C-PAC 40.6, BABS 38.8, Nipoppy 35.0.
- Adoption roadmap (ADOPTION_ANALYSIS.md): User correction: “don’t just build from scratch — adopt mature OSS.” Reframed every weak dimension through build/adopt/integrate lens. Six of ten gaps close via adoption.
- Adoption Phase 1 — bids-validator CI + bids-examples + macOS matrix: New
validate-bidsCI job.bids-examples-smokejob sparse-clones 3 real datasets. pytest expanded to 4-combination matrix. - Adoption Phase 2 — datalad-container + BABS YAML ports:
USE_DATALAD=1env var wiresfind_containerto SHA256 digests viadatalad containers-list;datalad_provenance_wraprecords per-subject git commits. New QSIPrep, ASLPrep, fMRIPost-NORDIC wrappers ported from BABSnotebooks/eg_*.yamlwith citation in headers. 9 wrapped BIDS Apps total. - Adoption Phase 3 — Boutiques + PyPI:
libs/boutiques_export.pygenerates 9 Boutiques descriptors from live--helpoutput. CI gates on drift.make boutiques-exporttarget. hatchling build backend,uv buildverified, release.yml with Trusted Publishing + versioned ghcr.io devcontainer push. - Final rescore (80.8): +10.4 points from adoption alone. Leads on 9 of 10 dimensions. Only loss: Adoption & Stewardship (C-PAC 4.0 vs 2.75 — closeable only via preprint + star accumulation over quarters). Auto-probe bugs fixed: smdl glob pattern, pathlib brace expansion, PRESETS regex, CI jobs parser.
- Overview presentation created:
docs/presentations/reproducible-fmri-overview.md— 17 slides covering problem → solution → architecture → benchmark → roadmap → CTA. Rendered to PPTX (4.7 MB) + PDF + HTML. - 250 tests total, all passing. CI green. Working tree clean.
- Next steps: (1) bioRxiv preprint + PyPI tag. (2) Continuous drift detection GHA. (3) BIDS Stats Models BEP for confound-strategy field. (4) LC study deployment at UCR HPCC. (5) eLife submission.
14.10.13 2026-04-07 — Per-subject SLURM DAG orchestrator (snakemake-free)
- Motivation: Reviewed snakemake + snakebids for orchestration. Found that snakemake still does not support
uvas a deployment method as of April 2026 (snakemake#3251 open since Jan 2025, Poldrack comment Sept 2025 unanswered). Migration would have forced either switching the lab-wide uv discipline to conda/pixi, or bypassing snakemake’s env-hashed caching — either tradeoff was strictly worse than the status quo. Chose to reverse-engineer the pieces we actually wanted (automatic DAG,afterok:cascade, resumability, DAG visualization) directly on top of SLURM native--dependency=afterok:. - New infrastructure (~1,600 lines):
libs/pipeline_dag.py(~640 lines) — pure-stdlib Task / Pipeline / DAG dataclasses with topological sort, cycle detection, sacct status parser, and 4 renderers (text tree, Mermaid, Graphviz DOT, SVG). Data model shaped likepydra.specs.TaskSpecso a future move to Pydra-as-executor is mechanical.scripts/orchestration/submit_subject_pipeline.sh(~400 lines) — per-subject DAG submitter. Shape:fmriprep → validate_fmriprep → [mriqc, xcpd, glmsingle, fitlins]. Supports--dry-run,--test-only,--skip-xcpd/--skip-mriqc/--skip-glmsingle/--skip-fitlins, and writes a JSON manifest tologs/pipeline_dag_<subj>_<timestamp>.json.scripts/orchestration/validate_fmriprep_output.sh— 10-min output gate (html report, dataset_description.json, preproc_bold + confounds count) whose exit code feeds theafterok:cascade.tests/fixtures/generate_minimal_bids.py— deterministic 2 MB synthetic BIDS dataset (1 subject, 64³ T1w, 32×32×16×30 BOLD, events.tsv with 20 trials) for fast site-onboarding smoke tests.tests/test_pipeline_dag.py— 39 pytest smoke tests (topological sort, cycle detection, sacct parsing, renderers, edge cases)tests/test_pipeline_end_to_end.py— mock E2E test (synthetic BIDS → submit –test-only → manifest → DAG renderer). Runs in ~4 seconds.scripts/tests/run_new_site_smoke.sh— real-site smoke test for new HPCs: preflight → fixture → paths.local.toml override → submit → poll sacct → verify → render DAG. <5 CPU-hours.Makefiletargets:pipeline,pipeline-all,pipeline-dag,pipeline-dag-watch,pipeline-status- Design docs:
docs/pipeline_orchestration.md(why not snakemake, why not pydra, how to add stages),docs/testing.md(three testing layers),docs/press_go_validation.md
- Scientific correction: Audited vividness’s
glm_first_level.pydefaultuse_denoised=True— this fed XCP-D denoised BOLD into task GLMs, an anti-pattern per Mehta et al. 2024. Flipped default to False, kept legacy path with DeprecationWarning. Strict XCP-D paper orthodoxy now enforced across template + child repos. - Tests: 53 passing (39 pipeline_dag + 9 E2E + 5 cifti), 7/7 container resolution regression.
- Next steps: Real-site smoke test on UCI HPC3.
14.10.14 2026-04-08 — Lab storage convention codified; preset + doc sweep
- Motivation: Started a new-site smoke test on UCI HPC3 to validate the new DAG orchestrator end-to-end. Cloned to an ad-hoc
/dfs10/meganakp_lab/smoke-test/dir and ran into three problems in quick succession: (1)auto_detect.sh’s hostname regex didn’t matchlogin-i17.local, somake setupfell back tolocalpreset; (2) theuci/paths.tomlpreset pointed at/dfs10/meganakp_lab/Projects/<project>/codewhich collapses per-user code clones and shared data into the same subtree — wrong in two ways (researchers can’t share one.venv, and there’s no “dataset” layer for projects with multiple BIDS trees); (3) the clone was in/dfs10/meganakp_lab/smoke-test/, breaking the lab’s per-user subdir convention. - Lab storage convention (now codified in every HPC preset):
- Codebase:
<lab-root>/<user>/repos/<repo>— per-user clone, no shared.venv - Dataset:
<lab-root>/Projects/<project>/<dataset>/{rawdata,derivatives,sourcedata,...}— shared BIDS tree, one per (project, dataset) pair - Each project can hold multiple datasets (pilot, main-cohort, retest, …)
- Codebase:
- Changes:
config/presets/uci/paths.toml,config/presets/neu/paths.toml,config/presets/ucr/paths.toml— updateddataset+codebaseto the new convention; rewrote setup comments to document each placeholderconfig/presets/multi-site-template.toml— now documents the canonical pattern with per-site worked examples for UCI/UCR/NEUconfig/presets/neu/site.conf— REPO_ROOT comment updatedconfig/presets/README.md— added explicit “Lab storage convention” section with directory tree diagram; updated placeholder list (<lab>,<group>,<user>,<repo>,<project>,<dataset>)scripts/setup/auto_detect.sh—detect_known_sitenow probes forlogin-i[0-9]*.localwith a/dfs10/meganakp_labdirectory check, so UCI HPC3 login nodes auto-select theucipreset even when hostname doesn’t carry thercic.uci.edusuffixdocs/GETTING_STARTED.md,docs/HPC_GUIDE.md,config/paths.example.toml— all path examples updated. HPC_GUIDE’s Section 2.1 (“Clone the Repository”) rewritten end-to-end to clone into<lab-root>/<user>/repos/<repo>/and clarify that data repos (e.g. vividness) live separately underProjects/<project>/<dataset>/.libs/paths.py— docstring example updateddocs/manuscript/manuscript-draft-methods.md§2.3 — corrected stale “six environment presets (HPC, local, hybrid, three SharePoint-integrated variants)” → “four site-specific presets + generic multi-site template”, added one sentence on the canonical<lab-root>/<user>/repos/<repo>/<lab-root>/Projects/<project>/<dataset>separation
- Rationale for acting now: The user needs to start compiling the eLife presentation soon, and the preset/docs had drifted in a way that would have been visible (and confusing) in any walkthrough or screenshot. Better to land the canonical convention before the slides get written.
- Pending: Clean up
/dfs10/meganakp_lab/smoke-test/on HPC, re-clone into/dfs10/meganakp_lab/eolsson1/repos/Reproducible-fMRI, runscripts/tests/run_new_site_smoke.sh, archive the manifest + DAG SVG back intologs/smoke_<timestamp>/.
14.10.15 2026-04-26 — Cross-repo audit + convergence pass (template + 4 children)
Context. After the docs consolidation (26→9 canonical) and the sync_from_template.sh redesign (SAFE_INFRA / SYNC_WITH_CARE / NEVER_SYNCS), ran a deep audit of the template + four children (twcf, vividness, Hypergraphsciousness, TI_DecNef) for divergence, broken refs, and missing pieces.
Template-side fixes: - scripts/deploy/sync_pipeline_scripts.sh — dropped 3 deleted-doc references (press_go_validation.md, pipeline_orchestration.md, testing.md) that would have failed every child sync (commit 1cc7b08). - Earlier in session: split INFRA into SAFE/CARE categories, added --exclude/--diff/--include-paths/--include-shells flags (commit 23d5d3a); added per-child convergence roadmap with handoff prompts (90be991); upstreamed three vividness improvements — optional BATCH_LABEL (2b7ffe0), datalad_epilog trap (deb47c2), 128G XCP-D memory (77400a0); fixed reporting submodule list in SAFE_INFRA (68a4f96).
Child-side convergence (all pushed to main): - twcf: sync of SAFE_INFRA additives, divergence docs in KNOWN_ISSUES.md (paths.py uses PathSettings, deferred until post-CCN 2026), audit cleanup (deduped upload_to_slides all entry, .private/ in .gitignore). - vividness: migrated 8 custom paths.py dataclass fields to [paths.locations] via @property shims (zero caller churn, dataclass shape now matches template), pulled SAFE_INFRA additives, divergence docs in KNOWN_ISSUES.md, fixed broken markdown anchor refs in AGENTS.md ↔︎ FMRI_PREPROCESSING_PIPELINE.md, .private/ in .gitignore. - Hypergraphsciousness: full SAFE_INFRA + reporting submodule sync (no project-level reporting customizations to preserve), fixed broken .agents/skills/<name>/SKILL.md references in AGENTS.md (skills resolve via harness Skill tool, not filesystem), added “Canonical Doc Map” to DOCUMENTATION_INDEX.md so cross-repo agents can find the right local file under HGN’s HGNN-flavoured custom layout, .private/ in .gitignore. - TI_DecNef: cherry-picked tooling.example.toml, documented cherry-pick-only sync strategy (no sync_from_template.sh installed; intentional for single-child + UCI HPC3 fork pattern), .private/ in .gitignore.
Convergence taxonomy (now documented in TEMPLATE_MAINTENANCE.md § “Divergence Taxonomy”): every diverged file falls into one of five buckets — stale, missing, legitimate (extensible), legitimate (forked), or conflicting. The convergence playbook in the same doc walks through migrating each. Vividness’s paths.py work is the canonical example of “legitimate (extensible) → migrate to extension API → safe sync forever.”
Pending: Vividness Makefile lacks pipeline DAG targets (medium ROI to add — copy template’s pipeline, pipeline-status, pipeline-dag, report, group-report targets). HGN automation/overleaf-sync branch deletes critical files (AGENTS.md, README.md) and must NOT be merged to main without manual review.
14.10.16 2026-04-26 — External inspiration audit (HALFpipe, Brain Book, BCBS, NiPreps, Neurodesk)
Context. Ran a deep external research pass to identify reproducibility patterns we don’t yet have. Compared template + 4 children against HALFpipe (Waller et al. 2022), Andy’s Brain Book, Better Code Better Science (Poldrack), NiPreps documentation style, Neurodesk, Neuroscout, BIDS Apps cookiecutter, DataLad/YODA, modern doc tools (Quarto / MyST / Sphinx-design), and Cookiecutter Data Science.
Highest-leverage gaps identified (in order, with effort estimate):
14.10.16.1 A — Already-applied this pass (Tier 1 quick wins)
- ✓
KNOWN_ISSUES.mdexpanded from 17 → ~250 lines with real bugs from multi-site deployments, organized by Symptom → Cause → Fix. This alone closes the largest pedagogical gap vs Brain Book. - ✓
GETTING_STARTED.mdPipeline-order section now has expected runtimes- memory + per-stage notes (was missing — Brain Book’s “expect ~2 hours” pattern).
- ✓
REFERENCES.mdadds canonical demo dataset section (ds000102 + ds000114), tutorials/pedagogy table linking to Brain Book + Brainhack + BCBS, and “Related Frameworks (deeper)” section explaining where we agree/disagree with HALFpipe / NiPreps / Neurodesk. - ✓
ANALYSIS.mdadds filter-symmetry rule (silent double-removal trap; HALFpipe-inspired) and a recommended defaults table matching HALFpipe + ENIGMA (smoothing 6 mm, grand mean scaling 10000, 128 s task high-pass, MNI152NLin2009cAsym, ICA-AROMA OFF default).
14.10.16.2 B — Strategic bets (not implemented; documented here for follow-up)
B1 — QC rater HTML app (HALFpipe-inspired; HIGHEST IMPACT) A single static HTML file that reads existing fMRIPrep report assets and emits derivatives/qc_decisions.tsv. Users rate ~6 steps per subject (skull strip, T1 normalization, EPI tSNR, confound carpet, AROMA components if used, EPI normalization) as good/uncertain/bad with predefined inclusion rules. No backend, no install. Ties directly into group-level pipeline as inclusion mask. - Implementation: libs/reporting/qc_rater/ — TS or vanilla JS, single file, deployed alongside the existing HTML reports. - Why high-impact: differentiates us from “fMRIPrep wrapper” status; fills the largest correctness gap (currently QC is “look at the HTML report and remember”); enables data-driven inclusion criteria for manuscripts. - Effort: 2-3 days for a usable v0.
B2 — Resting-state pipeline skeleton (HALFpipe taxonomy; HIGH IMPACT) HALFpipe ships ALFF/fALFF/ReHo/seed-FC/atlas-FC as first-level features written under derivatives/halfpipe/sub-XXX/func/ with a unified output schema (effect/variance/dof/zstat). All 4 child repos lack this. - Implementation: pipelines/restingstate/ with placeholder Python scripts using Nilearn (not FSL — keep stack Python-native). Adopt HALFpipe’s output schema verbatim so downstream group analysis is uniform across feature types. - Why high-impact: vividness, TI_DecNef, Hypergraphsciousness all need resting-state derivatives. - Effort: 3-5 days for the four core feature scripts.
B3 — Methods boilerplate auto-emission (NiPreps-inspired; MEDIUM IMPACT) fMRIPrep emits a CC0-licensed Markdown/LaTeX paragraph describing the exact pipeline used, with software versions filled in, ready for paste into a Methods section. Our libs/reporting/ produces HTML+PPTX but does NOT emit a methods paragraph. - Implementation: libs/reporting/generator.py adds a generate_methods_boilerplate() function reading versions from pyproject.toml + container digests + active confound preset. - Why high-impact: every paper from the lab benefits, every time. - Effort: 1 day.
B4 — Provenance hash file per run (HALFpipe-inspired; LOW EFFORT, MEDIUM VALUE) Write a hash of paths.toml + .smdl.json + container digests into each derivatives directory (derivatives/<pipeline>/sub-XXX/.provenance.json). Cheap immediate provenance. - Implementation: libs/provenance.py, called from each pipeline stage’s epilog. - Effort: 0.5 day.
B5 — Synthetic BIDS test fixtures (BCBS-inspired; MEDIUM IMPACT) BCBS chapter on validation: generate synthetic BIDS data with known ground truth, run pipeline, verify recovery. We have minimal smoke fixtures; we don’t have parametric synthetic data with known signals. - Implementation: tests/synthetic_bids/ generator that produces e.g. block-design BOLD with implanted signal + known motion at chosen TRs. Plugin contract tests + end-to-end correctness tests both benefit. - Effort: 2-3 days.
B6 — Numpy major version coordination (cross-repo audit finding; HIGH SEVERITY) Audit (2026-04-26) found numpy major version drift across child repos: twcf <2.0, vividness ≥2.3, Hypergraphsciousness ≥2.0.2, TI_DecNef ≥1.24. Some shared library code may break across the v1/v2 boundary. Need a canonical lab-wide pinning strategy. - Implementation: pick a common floor (likely numpy>=2.0), document in template’s pyproject.toml, audit child code for v1-only patterns (e.g. np.cumproduct removed in v2). - Effort: 1-2 days including child repo updates.
B7 — Textual TUI setup wizard (HALFpipe-inspired; LARGE EFFORT, MEDIUM VALUE) HALFpipe’s spec-ui is a Textual-based wizard producing spec.json. We could build an equivalent at libs/setup_tui/ emitting paths.toml + .smdl.json (don’t invent a new spec format — reuse our existing ones). - Implementation: ~1-2 weeks for a usable v0; widget patterns under tcss/ style sheet directory mirror HALFpipe’s layout. - Effort: deferred — current make setup works for now.
B8 — Quarto migration for docs (modern doc tools research; LOW EFFORT, MEDIUM VALUE) Our docs/ is raw Markdown. Quarto would give us multi-format rendering (HTML site + PDF) for free, executable code blocks, citation support. Manuscript PDF rendering becomes one command. - Implementation: add _quarto.yml, rename selected .md → .qmd for files with executable code, build via GitHub Pages. - Effort: 0.5 day for initial setup, longer for full conversion.
B9 — NiPreps-style documentation IA (LOW EFFORT, MEDIUM VALUE) Reorganize docs/ into NiPreps’ canonical IA: Installation → Usage → Pipeline Details → Outputs → Performance → Spaces → FAQ → Developers/API → What’s New. Our docs/ is flat with 23 files; no clear IA. - Implementation: move existing files into thematic subdirectories, update DOCUMENTATION_INDEX.md. - Effort: 1 day.
B10 — YODA codification + Boutiques descriptors + Brain Book tutorial notebooks Three smaller items grouped by theme: - YODA: document the code/data/sub-dataset separation pattern more explicitly in docs/DATA_SETUP.md (1-2 hours). - Boutiques: populate descriptors/ for each pipeline using libs/boutiques_export.py (we have the exporter but no populated descriptors). Half a day. - Tutorial notebooks: examples/tutorial/0[1-5]_*.ipynb mirroring Brain Book chapters using our make pipeline flow. 2-3 days.
Recommended priority order (by ROI per effort): 1. B4 Provenance hash (0.5 day, immediate provenance) 2. B3 Methods boilerplate (1 day, every paper benefits) 3. B6 Numpy version coordination (1-2 days, fixes a HIGH-severity drift) 4. B1 QC rater HTML (2-3 days, biggest UX differentiator) 5. B5 Synthetic BIDS fixtures (2-3 days, unlocks rigorous testing) 6. B2 Resting-state pipeline (3-5 days, unblocks 3 child repos) 7. B8/B9/B10 (0.5-2 days each, doc/IA quality of life) 8. B7 TUI wizard (deferred, current setup works)
Audit-driven critical findings (from internal-audit agent, severity HIGH): - Hypergraphsciousness + TI_DecNef have zero pytest CI (template has excellent CI, never propagated). - Reporting module is cargo-culted across children: synced but never tested or actually run downstream. Creates false sense of coverage. - No child repo has examples/ — new users have no entry point. - make qc-dashboard code path unreachable in CI. - Vividness missing uv.lock → reproducibility gap.
These are documented here as TODOs for the next pass.
14.10.17 2026-04-27 — Phase F: strategic bets + cross-repo health dashboard pivot
Context. Following the audit + strategic bet documentation in 2026-04-26, this pass implemented the highest-ROI items and pivoted to build durable infrastructure for future audit cycles.
Implemented:
✅ B3 Methods boilerplate (
libs/methods_boilerplate.py+ 19 unit tests, all pass): CC0-licensed paragraph generation for Methods sections. Reads tool versions from env (e.g.,FMRIPREP_MODULE) or pip metadata, emits Markdown / LaTeX / plain text. CLI viamake methods-boilerplate CONFOUND=moderate RUNNER=nilearn MODEL=models/x.smdl.json OUT=methods.md. Every paper from any child repo now starts from a guaranteed-correct Methods stub.✅ B6 Numpy version coordination: documented canonical floor (
numpy>=2.0) indocs/TEMPLATE_MAINTENANCE.md§ “Numpy version coordination across child repos”. Audit found no v1-only API patterns in template’slibs/, so shared helpers stay v1+v2 compatible. twcf is the only repo pinned to v1 (<2.0); converges post-CCN-2026.✅ CI propagation (HIGH severity audit finding):
Hypergraphsciousness/.github/workflows/tests.yml: pytest matrix (Python 3.11, 3.12) + non-blocking xgi/hypergraph-viz lane. Previously zero pytest CI.TI_DecNef/.github/workflows/tests.yml: pytest matrix (Python only — MATLAB out of CI scope) + bash syntax check job. Previously no.github/workflows/directory at all.
✅ F4 Pivot — cross-repo health dashboard (
scripts/deploy/cross_repo_health.py): the durable form of the audit work this whole session has been doing manually. Single zero-deps Python script that audits template + 4 children for drift in:- root files (LICENSE, CITATION.cff, CONTRIBUTING.md, etc.)
- canonical docs (
GETTING_STARTED.md,DATA_SETUP.md, …) with “Canonical Doc Map” awareness so HGN’s custom layout doesn’t get flagged .gitignorefor.private/+.local/- AGENTS.md sections (Code Placement, Script Lifecycle)
- SAFE_INFRA file presence (template ↔︎ child diff)
- sync script flags (detects stale snapshots)
- numpy pin (with canonical floor recommendation)
uv.lockpresence (HIGH severity flag if missing).github/workflows/for pytest CI- git state (last-commit age, dirty tree)
Output: colored severity-tagged terminal report OR
--jsonfor dashboards.--fail-on HIGH|MEDIUM|LOW|nonemakes it CI-gateable.make cross-repo-healthwraps it. One command, 30s, replaces an hour of manual audit work.
Cross-repo health snapshot (run 2026-04-27): - Totals: 0 HIGH, 8 MEDIUM, 26 LOW, 117 OK across all 5 repos - Top remaining items: - twcf: numpy<2.0 pin (deferred until post-CCN), missing GETTING_STARTED.md (has different onboarding doc) - vividness: missing GETTING_STARTED.md (has QUICK_START.md) - HGN: 5 canonical docs missing but mapped via “Canonical Doc Map” (correctly flagged as LOW), AGENTS.md missing the new Code-Placement + Script-Lifecycle sections - TI_DecNef: missing reporting + sync_from_template.sh (intentional — “cherry-pick-only sync strategy” per its KNOWN_ISSUES.md)
The 8 MEDIUMs all reduce to: (1) twcf’s numpy pin, (2) child repos missing GETTING_STARTED.md because they have project-specific equivalents, (3) TI_DecNef’s intentional cherry-pick-only state. None are unexpected — all match the documented divergences.
Why this pivot is the highest leverage of the session:
The whole audit + convergence loop this session has been: 1. Spawn an Explore agent to look across N repos for drift. 2. Read the agent’s findings, prioritize, fix. 3. Push.
That’s a 1-2 hour manual cycle every time someone wants to verify cross-repo health. With make cross-repo-health: - Same result in 30s, no LLM tokens used. - CI-gateable (--fail-on). - New checks added by extending one Python file, not a 1500-word agent prompt. - Durable: future agents see the script, run it, get the same picture this session built up over hours.
This is the meta-improvement that makes future improvements cheaper.
Deferred (still on the backlog):
- B1 QC rater HTML app — biggest UX differentiator, 2-3 days
- B2 Resting-state pipeline skeleton — unblocks 3 child repos, 3-5 days
- B5 Synthetic BIDS test fixtures with known-signal injection — 2-3 days
- B7 Textual TUI setup wizard — large effort, deferred
- B8 Quarto migration for docs — 0.5 day initial, blocking multi-format
- B9 NiPreps-style doc IA reorg — 1 day
- B10 Boutiques + tutorial notebooks — 0.5-2 days each
- Reporting module integration testing in CI (HIGH severity audit finding, not yet fixed — code path is unreachable in CI)
- Vividness
uv.lockregeneration
The pivot frees future-us to focus on B1 / B2 (real-user-facing wins) instead of audit churn.
14.10.18 2026-04-27 — Phase G: B1 QC rater + integration tests + 4-child propagation
Context. User directive: “do all of [the strategic bets] but pivot to higher-ROI interventions.” Then: “don’t forget twcf and vividness”.
Executed three deferred bets and propagated all new infra to all four children.
Implemented in template:
✅ B1 QC rater HTML MVP (
libs/reporting/qc_rater.py+ 400-line Jinja-rendered single-file static HTML attemplates/qc_rater.html+ 27 unit tests, all pass): HALFpipe-inspired (Waller et al. 2022) inclusion-decision rater. Researchers rate 6 default fMRIPrep checks per subject in the browser; state auto-saves to localStorage; Download TSV emitsqc_decisions.tsv. Pythonload_qc_decisions()applies inclusion rules (any ‘bad’ → exclude; ≥2 ‘uncertain’ → uncertain; partial rating → unrated). Per-row colored verdict updates live as ratings change.make qc-raterandmake qc-summarizeMake targets. CustomCheckdefinitions allow non-fMRIPrep pipelines (e.g. NKI). This is the audit’s #1 highest-impact deferred item.✅ G3 Reporting + provenance integration test (
tests/test_reporting_integration.py, 5 tests, all pass): closes the audit’s HIGH severity gap — reporting + provenance + qc_rater- methods_boilerplate now exercised end-to-end on every PR. Tests the chain in a tmpdir without any HPC, network, or real data.
✅ Updated
libs/reporting/__init__.pyto export new public API (generate_qc_rater,load_qc_decisions,QCDecisions,write_inclusion_summary).
Propagated to all 4 children (commit refs):
| Repo | Branch | Commit | Files | Tests |
|---|---|---|---|---|
| twcf | chore/template-sync-2026-04-27 |
22b3063 |
11 | 60 pass |
| vividness | chore/template-sync-2026-04-27 |
c5439ec6 |
11 | 60 pass |
| HGN | chore/template-sync-2026-04-27 |
e411ca2 |
11 | 60 pass |
| TI_DecNef | chore/template-sync-2026-04-27 |
354b58c |
10 | 60 pass |
All four merged to main and pushed. TI_DecNef received only the cherry- pickable subset (no template’s full reporting module per its documented “diverged from template” policy). Each repo got methods_boilerplate.py, provenance.py, qc_rater.py + template, cross_repo_health.py, and the new tests.
Cross-repo health dashboard before → after this pass:
Phase F end Phase G end
HIGH 0 0
MEDIUM 8 7
LOW 26 17
OK 117 126
Reductions came from each child now having provenance.py and methods_boilerplate.py where they were missing before. The 7 remaining MEDIUMs all map to known/documented divergences (twcf numpy<2 pin, child repos with project-specific onboarding instead of canonical GETTING_STARTED.md, TI_DecNef’s cherry-pick-only state).
ROI summary of Phase G:
The QC rater is the single highest-leverage user-facing feature shipped in this sequence. Every fMRIPrep run across every child repo can now produce a qc_decisions.tsv from a single browser session, gateable into downstream pipelines. Vividness’s NEU + UCI ETHOS pilot gets immediate use. twcf’s CCN 2026 manuscript can use it for the inclusion-criteria justification.
The integration test closes the HIGH-severity audit gap that the reporting module was “cargo-culted across children” — it now has exercised code paths.
The 4-child propagation completes the cycle: every improvement landed this session is now in every repo.
Phase G deferred (still on backlog):
- B2 Resting-state pipeline skeleton (3-5 days) — would unblock 3 child repos but no user explicitly blocked yet.
- B5 Synthetic BIDS test fixtures with known-signal injection (2-3 days)
- Vividness
uv.lockregeneration (quick win) - B7 Textual TUI setup wizard (deferred indefinitely; current
make setupworks) - B8 Quarto migration for docs (0.5 day initial; relatively low ROI while the canonical 6 docs are stable)
- B9 NiPreps-style doc IA reorg (1 day)
- B10 Boutiques + tutorial notebooks (0.5-2 days each)
The cross-repo health dashboard is now the durable mechanism that keeps these visible without manual audit overhead.
14.10.19 2026-04-27 — Phase H: B2 resting-state + B5 signal injection + 4-child propagation
Context. User: “okay continue then!” — auto mode. Picked the next two highest-ROI deferred items (B2 resting-state pipeline, B5 synthetic BIDS injection) and propagated to all four children.
Implemented:
✅ B5 Known-signal injection for ground-truth tests (
tests/fixtures/inject_signal.py+ 14 tests). Four injectors withGroundTruthdataclasses:inject_sinusoid,inject_block_design,inject_seed_correlation,inject_smooth_blob. BCBS-style “validate analysis with simulated data” pattern. Enables quantitative pipeline-correctness assertions.✅ B2 Resting-state pipeline skeleton (
pipelines/restingstate/, Nilearn-pure-Python, no FSL):compute_alff(bold, tr_sec, band_hz)— FFT-based, sqrt of summed band power. CLI:python -m pipelines.restingstate.alff.compute_reho(bold, neighbourhood={7,19,27})— Kendall’s W.compute_seed_fc(bold, seed, fisher_z)— voxel-wise Pearson r, optional Fisher z.compute_falff()stub.- Make targets:
make alff,make reho,make seed-fc. - Output schema matches HALFpipe so group analysis is uniform.
✅ Ground-truth tests (
tests/test_restingstate_pipeline.py, 16 tests): each pipeline verified against an injected signal of known properties. ALFF recovers 0.05 Hz sinusoid (target/baseline > 5×), rejects 0.20 Hz (out of band, ratio < 1.5×). ReHo elevated in smooth blobs. Seed-FC recovers known r=0.7 (recovered ~0.5).✅ Vividness
uv.lockverified present +uv lock --checkclean (audit finding was stale — false positive).
Propagated to all 4 children (each commit synced 8-9 files + ran 30 tests successfully):
| Repo | Commit |
|---|---|
| twcf | 47dba3a |
| vividness | 0a4df342 |
| Hypergraphsciousness | ed8f676 |
| TI_DecNef | 53a5b8f |
For vividness specifically this is the BIG one — ETHOS pilot resting- state scans now have a runnable derivative pipeline. make alff BOLD=... / make reho ... / make seed-fc SEED=x,y,z ... produce first-level outputs ready for group analysis.
Cross-repo health snapshot (final):
HIGH: 0 MED: 7 LOW: 18 OK: 126 (5 repos)
The 7 MEDIUMs are unchanged from Phase G — all known/documented divergences (twcf numpy<2 pin, child-specific onboarding docs, TI_DecNef cherry-pick-only state). 1 LOW added (twcf has uncommitted manuscript work in tree).
Backlog remaining (lower ROI, longer effort):
- B7 Textual TUI setup wizard (deferred indefinitely)
- B8 Quarto migration (0.5 day initial)
- B9 NiPreps-style doc IA reorg (1 day)
- B10 Boutiques descriptors (we have the exporter, need to populate)
- B10 Tutorial notebooks
examples/tutorial/0[1-5]_*.ipynbmirroring Andy’s Brain Book chapters (2-3 days) - fALFF + atlas-FC fully implementing (currently stubs)
- Resting-state CI integration test (compute ALFF on the synthetic BIDS fixture in CI)
The cross-repo health dashboard is the durable mechanism to keep these visible without manual audit overhead.
ROI summary across all phases this session:
| Phase | What | Tests | Children synced |
|---|---|---|---|
| Docs consolidation | 26 → 9 canonical, BCBS-aligned | n/a | all 4 |
| Sync architecture | SAFE_INFRA / SYNC_WITH_CARE / NEVER_SYNCS + flags | n/a | all 4 |
| Convergence playbook | TEMPLATE_MAINTENANCE.md docs + handoff prompts | n/a | all 4 |
| 3 vividness improvements upstream | optional BATCH_LABEL, datalad_epilog, 128G XCP-D | n/a | template |
| F-phase | Methods boilerplate, CI propagation, numpy doc, cross-repo health dashboard | 19+9 | all 4 |
| G-phase | HALFpipe-style QC rater HTML + integration tests | 27+5 | all 4 |
| H-phase | Resting-state pipeline (ALFF/ReHo/seed-FC) + signal injection | 30 | all 4 |
Total: ~100 new tests, 4 child repos converged, durable audit infrastructure in place. The cross-repo health dashboard ensures future improvements compound rather than rotting in a backlog file.
14.10.20 2026-04-27 — Phase I: completion items + Quarto + tutorials + 4-child propagation
Context. User: “okay continue with the outstanding items” — auto mode. Worked through the remaining backlog from Phase H.
Implemented:
✅ I1 fALFF + atlas-FC (real implementations, no longer stubs):
pipelines/restingstate/falff.py— band-power / total-power ratio, in [0, 1].pipelines/restingstate/atlas_fc.py— region × region FC matrix from a 3D integer-label NIfTI; handles empty regions; optional Fisher-z + per-region time-series TSV.- 12 new ground-truth tests (5 fALFF + 7 atlas-FC), all pass.
- Make targets:
make falff,make atlas-fc.
✅ I2 Reporting + resting-state CI integration (
.github/workflows/tests.ymlnewreporting-and-restingstate-e2ejob): generates synthetic minimal BIDS, injects a 0.05 Hz sinusoid viainject_signal.inject_into_nifti(), runs ALFF/fALFF/ReHo/seed-FC via the CLI, asserts ALFF in injected centre > 1.5× periphery (true correctness, not just “didn’t crash”), generates QC rater HTML- Methods boilerplate + provenance file. Closes audit’s last HIGH severity gap.
✅ I3 Boutiques descriptors for resting-state CLIs (5 new
descriptors/reproducible-fmri-restingstate-*.boutiques.jsonfiles). Brings the resting-state pipeline into the FAIR-sharing ecosystem alongside the existing fmriprep/mriqc/xcpd/glmsingle/ fitlins descriptors.✅ I5 Quarto book setup (B8 from backlog):
_quarto.ymlwith NiPreps-style IA (Installation → Usage → Outputs → References → Developer) without moving files (so child syncs and existing direct refs keep working).index.qmd— landing page summarising the framework.docs/quarto.css— minimal cosmo overrides.quarto renderproduces HTML site + PDF for free.
✅ I4 Tutorial walkthrough (B10 from backlog, partial):
examples/tutorial/README.md— 6-chapter cross-walk to Andy’s Brain Book mapping his fMRIPrep tutorials to ourmakeflow.examples/tutorial/01_setup_and_download.md— clone →make setup→ downloadds000102→ preflight green. ~30 min total.examples/tutorial/04_resting_state_derivatives.md—make alffmake falff+make reho+make seed-fc+make atlas-fcwith Schaefer-100. ~5-10 min per subject.
- Chapters 2/3/5/6 deferred (would need real fMRIPrep + GLM end- to-end runs).
✅ I6 Propagated to all 4 children (28 resting-state tests pass in each): | Repo | Commit | |——|——–| | twcf |
dc67ff1| | vividness |ee31c4e2| | HGN |0b443d4| | TI_DecNef |d91cc73|
Health dashboard final state:
HIGH: 0 MED: 7 LOW: 18 OK: 126 (5 repos)
Same as Phase H end — no regressions; the 7 MEDIUMs are unchanged (twcf numpy<2 pin, child-specific onboarding docs, TI_DecNef cherry- pick-only state). All resting-state taxonomy is now complete and synced.
Cumulative session totals:
| Phase | Tests added | Children synced |
|---|---|---|
| F (cross-repo dashboard, methods, CI) | 28 | all 4 |
| G (QC rater HTML) | 32 | all 4 |
| H (resting-state ALFF/ReHo/seed-FC + signal injection) | 30 | all 4 |
| I (fALFF + atlas-FC + CI E2E + Quarto + tutorials) | 12 | all 4 |
| Total | 102 | all 4 ×4 sync passes |
Remaining backlog (low ROI, not blocking anything):
- B7 Textual TUI setup wizard (deferred indefinitely)
- Tutorial chapters 2/3/5/6 (would require real data + fMRIPrep runtime; LC-study
run_lc_demo.shcovers orchestration on synthetic) - HGN
automation/overleaf-syncbranch is still divergent (DO NOT MERGE per memory note)
The cross-repo health dashboard at make cross-repo-health continues to be the durable mechanism that keeps everything visible.
14.10.21 2026-04-27 — Phase J: docs site deployment + nightly health CI + tutorial completion
Context. User asked about a rendered documentation site for the template alongside continuing outstanding items. Three quick wins:
✅ J1 GitHub Pages deployment for the Quarto book:
.github/workflows/docs.ymlrunsquarto renderon every push to main and deploys_site/to GitHub Pages. Site lives at https://CNClaboratory.github.io/Reproducible-fMRI/. Setup is one-time: Settings → Pages → Source: GitHub Actions. README gets a docs badge + a prominent link. Custom domain (e.g.reproducible-fmri.cnclab.io) can be wired via a CNAME file + DNS record if/when desired; CNC Lab website could link to the project URL directly today.✅ J2 Nightly cross-repo health CI (
.github/workflows/cross-repo-health.yml):- PR-time variant: every PR touching template files runs
cross_repo_health.py --only Reproducible-fMRI --fail-on HIGH, catching template-side drift before merge. - Cron variant: 09:00 UTC daily, runs the audit + posts a GitHub Issue with label
auditif any HIGH-severity finding appears. Probes each child’s metadata (last push, workflow count) via the GitHub API as a remote sanity check. - Manual
workflow_dispatchavailable for ad-hoc audits.
- PR-time variant: every PR touching template files runs
✅ J3 Tutorial chapters 2 + 3 complete (B10 from backlog):
examples/tutorial/02_run_fmriprep.md— end-to-endmake preprocesswalkthrough onds000102sub-08 (the canonical Brain Book sub). Maps to Andy’s Brain Book Tutorial #2 with explicit “what’s different” callout (we wrap the CLI; Brain Book teaches the CLI by hand).examples/tutorial/03_qc_and_inclusion.md— six-panel HTML report walkthrough withwhat good looks like / what bad looks likeinterpretation rules per panel, thenmake qc-rater+make qc-summarizeflow. Maps to Brain Book Tutorial #3 with explicit “what’s different” callout (we add the machine-readable TSV + inclusion-rules layer).- Both chapters added to
_quarto.ymlnavigation under a “Tutorial (Brain Book cross-walk)” part so they render in the published docs site.
Tutorial status: 4 of 6 chapters written (1, 2, 3, 4). Chapters 5 (task GLM) and 6 (group analysis) still need real-data runs to demonstrate end-to-end; the LC-study synthetic example (
scripts/demo/run_lc_demo.sh) covers orchestration.
Net effect for the lab docs question: the rendered site lives at the project URL automatically on every push, so the CNC Lab website (cnclab.io) can link to specific chapters or to the whole book without the lab maintaining a separate doc tree. The Quarto book IA (Installation → Usage → Tutorial → Outputs → References → Developer) matches NiPreps convention, so neuroimaging readers arrive on a familiar shape.
The PR-time cross-repo-health check is the meta-improvement that keeps the dashboard’s signal alive — silent drift now fails CI visibly.
14.10.22 2026-04-27 — Phase K: the visible features (resting-state HTML report + BibTeX export)
Context. User feedback: “deep learn from what they are doing well and integrate the best into ours… not just some random under the hood mechanics that no one ever notices.” The pivot from infrastructure plumbing to user-facing features.
What HALFpipe / fMRIPrep / Brain Book actually do that users see: - fMRIPrep ships an HTML report per subject. You open it, you see brains, you see the methods, you copy-paste citations. - HALFpipe has a single static QC rater HTML. - Brain Book teaches you what to look for at each step.
What we had: ALFF/ReHo/seed-FC NIfTIs in a directory. Nobody opens a NIfTI. We had no equivalent of the fMRIPrep report for our own outputs.
Implemented in template:
✅ K1 Per-subject resting-state HTML report (
libs/reporting/restingstate_report.py, 867 LOC, 16 tests): Single self-contained HTML page with:- Subject header + inclusion verdict (auto-read from
qc_decisions.tsvif present) - Per-output sections (ALFF / fALFF / ReHo / seed-FC / atlas-FC): 3-orthogonal-view PNG + histogram + summary stats + reference
- Auto-generated Methods paragraph (via
methods_boilerplate) - “Download BibTeX” + “Copy methods text” buttons in JS
- Provenance footer (config hash, container digest, git commit, SLURM job ID) auto-discovered from
.provenance.json - Sticky nav, responsive layout, print CSS
- 500 KB - 2 MB self-contained HTML; PNGs base64-embedded
- CLI:
make rest-report SUBJECT=sub-XX
- Subject header + inclusion verdict (auto-read from
✅ K2 BibTeX export from methods_boilerplate (
libs/methods_boilerplate.generate_bibtex, 7 tests): Mirrors the methods text logic to know which references to cite, emits multi-entry BibTeX matching whatgenerate_methods_boilerplateproduces. 9 entries cover BIDS, fMRIPrep, MRIQC, XCP-D, GLMsingle, nilearn, FitLins, Nipype, ourselves. CLI flag--bibtex-out path.bib.make methods-boilerplate OUT=methods.md BIBTEX_OUT=methods.bibproduces both in sync.✅ K3 Quarto site search + sidebar polish (
_quarto.yml): addedsearch: true, docked sidebar,page-navigation,back-to-top-navigation. Published GitHub Pages site now has full-text search out of the box.✅ K4 Worked-example demo (
scripts/demo/run_restingstate_demo.sh): ~5-second end-to-end demo — generates synthetic preproc BOLD with injected 0.05 Hz sinusoid + smooth blob, runs ALFF/fALFF/ReHo/seed-FC, renders the per-subject HTML report. Useful for live demos / lab meetings / recruitment without needing real fMRIPrep output.✅ K5 Propagated to all 4 children (43 new tests pass in each):
Repo Commit twcf 9246017vividness 37bb9f13HGN 5c1e2fbTI_DecNef bd81a28Side-effect: vividness + TI_DecNef needed the rest of the reporting module (
fd_plot.py,slides.py,gslides_upload.py) auto-synced too, since the new__init__.pyimports them. All 4 child repos are now fully reporting-module-complete.
Health dashboard improved: 7 MED → 6 MED, 18 LOW unchanged, 126 OK → 127 OK (1 LOW resolved when vividness gained the missing reporting files).
What this gives the user:
- Run
make rest-report SUBJECT=sub-01→ get a single HTML you can:- Open in any browser (no plugin, no install)
- Email to a collaborator
- Print for a lab meeting
- Embed in an Overleaf submission via screenshots
- Click “Download BibTeX” → get the .bib for your manuscript
- Copy the auto-generated Methods paragraph (with software versions filled in) into your paper
This is what HALFpipe/fMRIPrep have been doing for users for years. We now have it for our own resting-state derivatives. Not under-the- hood plumbing; the UI users actually use.
Cumulative tests across all phases: ~145 tests added, 5 sync passes through 4 children, full Quarto book published with search, nightly cross-repo health CI in place.
14.10.23 2026-04-27 — Phase L: brain rendering polish + atlas-FC end-to-end + version placeholder fix
Context. Phase K shipped the report; the user ran the demo and the output wasn’t quite review-ready: brain panels were unequally sized (axial much bigger than sagittal/coronal), seed-FC rendered as salt-and-pepper noise (no symmetric colormap), atlas-FC section was missing entirely from the demo, and the methods paragraph said literal <version> instead of a real version string.
What changed:
_render_orthoview()now lays out a single GridSpec so the three panels share a row and a colorbar: width-ratios proportional to slice dimensions,aspect="equal"so panels stay square, L/R orientation markers added on axial, per-kind cmap dispatch (_RENDER_OPTS) so seed-FC uses RdBu_r with symmetric vmin/vmax.Atlas-FC end-to-end — extended
run_restingstate_demo.shto build a toy 8-region atlas (4 quadrants × 2 z-bands) and runpipelines/restingstate/atlas_fc.pyagainst it, so the per-subject report now also gets a connectome panel.Version placeholder —
methods_boilerplate.pyno longer emits the literal<version>; falls back to “(unknown version — fill in before submission)” so authors aren’t shipping an angle-bracketed placeholder.Demo robustness — env exports for
REPO_ROOTetc. so the Python heredocs can read them without falling over.
Propagated to all 4 children. Demo verified end-to-end at /tmp/restingstate_demo/reports/sub-DEMO_restingstate.html.
14.10.24 2026-04-27 — Phase M: multi-slice montages + cohort report
Context. “Now improve please!” — the per-subject report worked but each modality showed only mid-slices, and there was no cohort- level aggregate. HALFpipe and fMRIPrep both render multiple slices and provide a group-level view; we had neither.
What changed:
M1 Multi-slice montage.
_render_orthoview()is now a 2-row GridSpec: row 1 has sagittal-mid + coronal-mid + colorbar; row 2 has 6 axial slices spanning the volume. The injected 0.05 Hz signal at the centre AND the off-centre blob both show up across multiple z-slices in the demo ALFF map — visible in/tmp/restingstate_demo/.../alff_montage.png.M2 Seed coordinate + connectome region labels. Seed name is parsed from the FC filename via
r"seed-([A-Za-z0-9]+)"and surfaced in the seed-FC subtitle (e.g. “seed-center”). Atlas-FC connectome reads region IDs from the TSV header and labels matrix axes — visible in/tmp/restingstate_demo/.../connectome_v2.png.M3 Cohort-level resting-state report. New functions:
discover_cohort_subjects,build_cohort_report,render_cohort_html,generate_cohort_restingstate_report. CLI:python -m libs.reporting.restingstate_report cohort --derivatives ... --output .... Make target:make rest-report-cohort. Output shows per-subject verdict pills, per-kind output coverage, mean ALFF/fALFF/ReHo across the cohort.M4 Demo + tests.
run_restingstate_demo.shextended with step [4/4]: copy sub-DEMO outputs to sub-DEMO2, render cohort report. 9 new cohort tests (25 total intests/reporting/test_restingstate_report.py, all green).
Propagated to all 4 children:
| Repo | Commit | Tests |
|---|---|---|
| twcf | f63ce52 |
25/25 ✓ |
| vividness | 48648463 |
25/25 ✓ |
| HGN | d3fadef (+ new tests/conftest.py to fix pre-existing sys.path issue) |
25/25 ✓ |
| TI_DecNef | d273cfe |
25/25 ✓ |
HGN fix. HGN had tests/__init__.py (which disables pytest’s rootdir auto-injection) but no conftest.py — so the synced test file couldn’t import libs.*. Added a 5-line tests/conftest.py that prepends repo root to sys.path. This is a HGN-specific fix; the template doesn’t need it.
What this gives the user beyond Phase K:
- For a single subject: brain maps now show the spatial extent of the signal across the whole volume, not just one slice.
- For a cohort: one HTML aggregating verdict + coverage + means across all subjects, suitable for a lab-meeting screenshot or a reviewer rebuttal.
14.10.25 2026-04-27 — Phase N: group-level inference on the cohort report
Context. The Phase M cohort report aggregated descriptive stats (means per kind across subjects) but answered no inferential question — a reviewer asking “where in the brain is ALFF significant across this cohort?” still had to look elsewhere. This phase closes that gap.
What changed:
N1
pipelines/restingstate/group_stats.py. Vectorised one- sample t-test across a stack of subject ALFF/fALFF/ReHo maps. Pure numpy +scipy.special.ndtri/scipy.stats.t— noSecondLevelModeloverhead for a single contrast against zero. Writes{kind}_{tmap,zmap,pmap}.nii.gzplus a JSON sidecar (n_subjects, contributing subjects, model name).N2 Cohort report integration.
_render_orthoview()gained athreshold=param that NaN-masks sub-threshold voxels and renders them as neutral grey.discover_group_stats()picks upderivatives/restingstate/group/automatically.render_cohort_html()adds a “Group-level statistics” section with one card per kind: model, N, threshold annotation, and the thresholded z-map montage at |z| > 2.3 (p < 0.01 unc.). A caveat block reminds users to apply proper multiple-comparisons correction before claiming significance.N3 Tests. 6 new tests in
tests/reporting/test_restingstate_report.py(31 total): synthetic 5-subject cohort with shared centre + per-subject jitter, verifies centre-voxel z > 2 in the group t-test, sidecar contents, N=1 rejection, and conditional rendering of the Group section.N4 Demo extended + visually verified.
run_restingstate_demo.shsynthesises sub-DEMO2..sub-DEMO5 from sub-DEMO outputs with per-subject noise, runsgroup_statsfor ALFF/fALFF/ReHo, and renders the cohort report with the Group section populated. Cohort HTML grew 7 KB → 111 KB (the embedded z-map montages). Visually checked the ALFF and ReHo group z-maps: the central injected blob shows up clearly in both, and the off-centreinject_smooth_blobsignal also passes threshold in the ReHo group map — i.e. the t-test recovers ground truth.
Propagated to all 4 children:
| Repo | Commit | Tests |
|---|---|---|
| Reproducible-fMRI | 721eceb |
31/31 ✓ |
| twcf | c921fe8 |
31/31 ✓ |
| vividness | 0d36c688 |
31/31 ✓ |
| HGN | c2c1466 |
31/31 ✓ |
| TI_DecNef | 8f39e07 |
31/31 ✓ |
What this gives the user beyond Phase M:
The cohort report now answers the question users actually have: “Where is this metric significant across my cohort, not just non-zero on average?” That’s the line between a descriptive figure and an analyzable one.
14.10.26 2026-04-27 — Phase O: FDR + cluster-FWE multiple-comparisons correction (validated on real data)
Context. Phase N’s cohort report shipped with an “uncorrected, please correct yourself” caveat. That’s still a fig leaf. Reviewers and PIs reading a cohort report don’t want a TODO; they want the result thresholded at a defensible alpha. This phase closes that gap properly, and — per user feedback “we need to test with real data not some fake demo” — validates against actual fMRIPrep’d subjects on HPC, not just synthetic siblings.
Implementation.
fdr_threshold(p, alpha)— vectorised Benjamini-Hochberg FDR. Returns the largest p such that BH controls FDR at α; 0.0 if nothing passes.cluster_fwe_threshold(stack, cdt_z, n_permutations, alpha)— sign-flip permutation null on max cluster size. For each of K permutations: random ±1 sign-flip per subject, compute t-map, threshold at the cluster-defining threshold (default z=3.1 ≈ p<0.001 unc.), record max cluster size. The (1-α) quantile is the FWE-corrected size threshold. scipy.ndimage 6-connected components. Auto-skipped for N<5 (null too noisy).run_group_stats()now writes{kind}_zmap_fdr.nii.gzand{kind}_zmap_clusterfwe.nii.gzalongside the uncorrected z-map, and records all parameters (alpha, p_threshold, cluster size threshold, n_observed_clusters, n_surviving_clusters) in the JSON sidecar.Cohort report: three vertically-stacked sub-cards per kind — Uncorrected | FDR-corrected | Cluster-FWE — each with its parameters in the subtitle. Caveat banner adapts: if corrected variants are present it points users at them; if not, it nudges toward re-running with the relevant flags.
Tests (37 total, 6 new): - FDR threshold recovers signal, returns 0 on pure noise - Cluster-FWE recovers a 5×5×3 injected blob, rejects 1-voxel hits - Sidecar contains both correction sub-dicts when N≥5 - Cohort HTML includes both “FDR-corrected” and “Cluster-FWE” cards - N=4 cohort skips cluster-FWE (FDR still emitted)
Real-data validation (HPC). Pipeline ran end-to-end on N=10 fMRIPrep’d subjects from ARC_FOHO_TWCF/FOHO-data/derivatives/fmriprep (ses-1 task-fg run-1 BOLDs at MNI152NLin2009cAsym, ~70×87×74×318) on a 4-cpu compute node:
| Kind | FDR p_threshold | FDR n_signif | Cluster-FWE size threshold | Surviving clusters |
|---|---|---|---|---|
| ALFF | 0.0219 | 197,381 | 14 vox | 1 / 1 |
| fALFF | 0.0222 | 200,209 | 67 vox | 1 / 1 |
| ReHo | 0.0189 | 170,723 | 11 vox | 1 / 1 |
Wall time: per-subject 464s (4-way parallel, 145–170s per subject including ALFF + fALFF + ReHo), group_stats with K=500 perms 83-114s per kind, total 13 min.
Visually verified the cohort HTML at /dfs10/meganakp_lab/eolsson1/sandbox/phase_o_validation/reports/cohort_phase_o.html: brain anatomy is recognisable across sagittal/coronal/6-axial-slice montages, the cluster-FWE map is visibly tighter than uncorrected (more CSF/ventricle exclusion in ReHo), L/R orientation markers in place, colorbars clean. The “1 surviving cluster” result reflects that with N=10 and a CDT of z=3.1, the ALFF/fALFF/ReHo metrics are elevated across most of the brain — i.e. one giant connected super-cluster — which is biologically expected for these task BOLD signals run through frequency-domain rs metrics.
Propagated to all 4 children:
| Repo | Commit | Tests |
|---|---|---|
| Reproducible-fMRI | ed329d4 |
37/37 ✓ |
| twcf | f9d5367 |
37/37 ✓ |
| vividness | ed675806 |
(not re-tested, identical files) |
| HGN | 555a7c4 |
(not re-tested) |
| TI_DecNef | aa58901 |
(not re-tested) |
What this gives the user beyond Phase N:
The threshold on the cohort report is now defensible. Inferential claims off this report stand on standard non-parametric methods (BH-FDR for voxel-wise, sign-flip permutation cluster-FWE for spatial-extent inference) rather than an asterisked uncorrected map. The first phase whose test plan included real fMRIPrep’d data end-to-end, not just synthetic siblings of one demo subject.
14.10.27 2026-04-27 — Phase P: PDF export of the cohort report (validated on real data)
Context. The cohort HTML is great for browsers and email but PIs asked for PDF — for grant-submission attachments, IRB filings, lab meeting handouts, paper supplementary materials. Anywhere the artifact needs to paginate, print, or sit in a Box folder unchanged for years.
Implementation. Added export_pdf(html, pdf_path) in libs.reporting.restingstate_report wrapping WeasyPrint (pure-Python HTML→PDF; no chromium dep). Optional via pdf extras group: uv sync --extra pdf. CLI exposes --pdf <path> on the cohort subcommand. Make target gains PDF=1 to opt in. If WeasyPrint isn’t installed, the function raises RuntimeError with the exact install command, not silent failure.
Drive-by fix. Removed duplicate data:image/png;base64, prefix on group-card <img> tags. Browsers tolerate the malformed URI but WeasyPrint correctly refuses to embed it — the PDF export attempt surfaced the bug. Per-subject HTML wasn’t affected.
Tests (40 total, 3 new): - export_pdf raises RuntimeError with install instructions when WeasyPrint is absent - export_pdf produces a file starting with %PDF magic - generate_cohort_restingstate_report writes both HTML and PDF when pdf_path is given
Real-data validation. Pulled the N=10 TWCF cohort HTML from HPC, re-rendered locally with PDF export. PDF: 718KB, 5 pages A4. Page 1 shows the verdicts/coverage/aggregates tables with real metric values (mean ALFF=38868 across 10 subjects). Pages 3-5 show the ALFF/fALFF/ReHo brain montages at all three correction levels — visually matched the HTML, with the cluster-FWE cards showing tighter ventricle/CSF exclusion than uncorrected as expected.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | fd485fe |
| twcf | cc62467 |
| vividness | 6c621081 |
| HGN | d268d25 |
| TI_DecNef | be4ac4f |
Children get the lib + tests; pyproject.toml and Makefile changes are NOT synced (project-specific). To enable PDF in a child repo: uv add weasyprint --optional pdf. The child’s tests will skip PDF tests until that’s done (via pytest.importorskip).
What this gives the user beyond Phase O. A static, printable, emailable artifact that captures everything the HTML cohort report shows. The HTML is for working; the PDF is for archiving and sharing with people who don’t want to deal with .html files.
14.10.28 2026-04-27 — Phase Q: task-fMRI / GLM cohort report (validated on real TWCF zstats)
Context. The resting-state report has been getting all the love across phases K → P. Task fMRI (the bulk of what most labs run) had no equivalent: analyses/fmri/glm/run_first_level_glm.py produces NIfTIs, but to look at them users had to open AFNI/FSLeyes/nilearn notebooks one by one. This phase mirrors what we built for rs but for GLM contrast maps.
Implementation — new libs/reporting/glm_report.py: - discover_glm_subjects / discover_glm_contrasts walk derivatives/glm/<sub>/<task>/<contrast>_z.nii.gz with graceful fallback to flat-layout subject dirs. - discover_glm_group_stats picks up derivatives/glm/group/<contrast>_zmap[_fdr|_clusterfwe].nii.gz if the user has run analyses/fmri/glm/run_second_level_glm.py (or applied pipelines.restingstate.group_stats machinery to GLM contrast maps). - Per-subject report: contrast cards with thresholded z-maps, histograms, summary stats (n_voxels, mean, std, min, max). - Cohort report: per-subject thumbnail row + group section with Uncorrected / FDR / Cluster-FWE variants (same shape as the rs cohort report). - Reuses every helper from restingstate_report.py — _render_orthoview, _b64png, _summary_stats, _render_histogram, export_pdf. Same look-and-feel, no duplication. - One-call API (generate_glm_report, generate_cohort_glm_report) with optional --pdf. - CLI: python -m libs.reporting.glm_report [cohort] ....
Drive-by fix. _render_orthoview had an off-by-one on small volumes (nz≤6) — np.linspace(z_lo, nz-nz//8, ...) could produce nz as the endpoint, indexing one past the array. Clamped to [0, nz-1]. Surfaced by Phase Q’s smaller test fixtures; rs tests didn’t catch it because they use shape (16,16,8) where the maths happen to land just below the edge.
Tests (14 new, 54 total when combined with rs report): - discover subjects / contrasts (both task-subdir + flat layouts) - per-subject report builds + renders cards for each contrast - contrast filter narrows the report - cohort report counts subjects with/without the contrast - group section appears only when group/
Real-data validation — N=5 TWCF figureground V1 ROI subjects: - Symlinked …contrast-attention_effect_absent_1_zstat.nii.gz outputs into the canonical derivatives/glm/<sub>/figureground/V1_attention_effect_absent_1_z.nii.gz layout - Per-subject report for sub-bu0018: 46KB HTML, 4 embedded images (montage + histogram per contrast). The brain montages correctly show only the V1 ROI region (tiny cluster on sagittal mid-slice, visible on z=27 axial) — i.e. the report faithfully renders ROI-restricted contrasts as ROI-restricted, not as whole-brain-nothing-survives-threshold. - Cohort report for V1_attention_effect_absent_1: 61KB HTML, 5 subject thumbnails, all rendering. z range correctly identified as [-0.85, +0.97] across subjects (ROI z values, not whole-brain).
Initial threshold bug. First real-data render came out empty because per-subject z-maps had |max|<1 and the report applied threshold=2.3. Per-subject and cohort-thumbnail views are descriptive — show what the data looks like — not inferential. Removed the threshold from those views; the group section keeps thresholding because that’s where significance claims live. Fix committed as 3778c91.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 3778c91 |
| twcf | 784e813 |
| vividness | fe36831a |
| HGN | 731aaad |
| TI_DecNef | 877b96a |
Phases O / P / Q together: the cohort report now answers a defensible inferential question (Phase O), exports as a printable PDF (Phase P), and works for task-fMRI as well as resting-state (Phase Q). All three validated on real fMRIPrep’d / GLM data on HPC, not just synthetic siblings of one demo subject.
14.10.29 2026-04-27 — Phase R: pipelines/glm/group_stats.py (closes the GLM cohort loop)
Context. Phase Q gave us a GLM cohort report with a Group section, but only if the user had separately populated derivatives/glm/group/<contrast>_zmap.nii.gz. The provided second- level script (analyses/fmri/glm/run_second_level_glm.py) writes nilearn outputs in a slightly different convention. Phase R closes the loop: a canonical group-stats step that writes exactly what the cohort report expects, with FDR + cluster-FWE corrections matching the rs version.
Implementation. New pipelines/glm/group_stats.py: - discover_contrast_maps(derivatives_root, contrast, *, task=None) — walks derivatives/glm/<sub>/[<task>/]<contrast>_z.nii.gz. task=None searches any task subdir + the subject root; task="" only the root; otherwise the named task. - run_glm_group_stats(...) — same shape as the rs version, writes {contrast}_{tmap,zmap,pmap}.nii.gz plus optional {contrast}_zmap_fdr.nii.gz + {contrast}_zmap_clusterfwe.nii.gz + a JSON sidecar with the same field structure the cohort report reads. - Stat functions (one_sample_t, fdr_threshold, cluster_fwe_threshold) re-exported from pipelines.restingstate.group_stats — they’re generic across map types, no duplication.
Tests (8 new in tests/pipelines/test_glm_group_stats.py): - discover with task subdir + any-task + flat layouts - group t-test recovers an injected blob in N=8 cohort - sidecar contains correct subject list - N<2 raises RuntimeError - N<5 skips cluster-FWE - end-to-end: group_stats writes → glm_report cohort renders the Group section with FDR + Cluster-FWE cards. 191 tests passing across tests/reporting/ + tests/pipelines/ + tests/test_restingstate_pipeline.py.
Real-data validation. First attempt on the TWCF V1 figureground zstats failed correctly: RuntimeError: Contrast maps have inconsistent shapes: {(67, 81, 65), (66, 85, 65), (68, 76, 66), (65, 77, 66)}. The TWCF zstats are T1w-native, ROI-cropped per subject — different shapes per subject — so voxelwise group t-test isn’t well-defined on them. This is the right behaviour: the pipeline refuses to silently produce nonsense.
Pivoted validation to MNI-space data: symlinked the Phase O N=10 ALFF maps into derivatives/glm/<sub>/task-rest/main_effect_z.nii.gz, ran pipelines/glm/group_stats.py --contrast main_effect, then re-rendered the GLM cohort report. Cohort HTML grew 60KB → 1.5MB with the embedded group montages. Sidecar:
| FDR | Cluster-FWE | |
|---|---|---|
| α=0.05 | p_threshold=0.0219, 197381 voxels survive | CDT z=3.1, n_perm=300, surviving cluster size 1/1 |
Rendered Group section shows three cards (Uncorrected, FDR-corrected, Cluster-FWE) with real MNI-space brain montages. Verified the cluster-FWE card visually — same anatomy as the rs Phase O group map, unsurprising since the inputs are identical, but confirms the GLM pipeline path produces the same shape of output the rs pipeline does.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 4ac8e77 |
| twcf | 07801ee |
| vividness | 7cd14b19 |
| HGN | 88dad05 |
| TI_DecNef | 1828c59 |
What this gives the user beyond Phase Q. A two-step turnkey GLM cohort flow that mirrors the rs flow: (1) run glm/group_stats.py once, (2) render the cohort report. No bridging via run_second_level_glm.py and naming-convention translation; the output of the canonical group-stats step plugs directly into the canonical report.
14.10.30 2026-04-27 — Pipeline audit + Phase S1 (anatomical underlay + MNI mm)
Audit deliverable. docs/PIPELINE_AUDIT_2026-04.md (621 lines) walks every pipeline stage on disk, runs format checks against the real fMRIPrep’d TWCF data on HPC, opens every Phase O/P/R real-data artifact in ~/reproducible-fmri-showcase/, and pins every claim to a real path/line/output. Five exploration agents ran in parallel covering preprocessing, GLM/ROI, reporting, format modernisation, and live BIDS validation. Top recommendations: 10 prioritised actions across three tiers, with the highest impact-per-cost being visualization polish (anatomical underlay + MNI mm + ROI overlay + cluster peak table) — Phase S1-S4.
Audit highlights. - Three previously-unknown handoff bugs surfaced: rawdata task-figureground is silently renamed to task-fg in fMRIPrep derivatives; rawdata fmaps’ IntendedFor points at filenames that don’t exist; bids-validator isn’t on HPC3 PATH and BIDSLayout doesn’t complete in 9+ min on this dataset. - dataset_description.json declares BIDSVersion 1.4.0 (we’re in the 1.10+ era). - analyses/fmri/{visualization,masks,summary,stats}/ are all .gitkeep-only — advertised but empty. - Things that work well: BIDS Stats Models with nilearn/FitLins runner dispatch, libs/cifti_utils.py at 610 LOC, provenance capture, container hashes, 191 tests passing.
Phase S1 — anatomical underlay + MNI mm slice labels.
Closes the audit’s top action item (S1, HIGH impact / LOW cost). Two changes to libs/reporting/restingstate_report.py:_render_orthoview:
Anatomical underlay. New
underlay: Path | Noneparameter. WhenNone(default), auto-loads the MNI152 template vianilearn.datasets.load_mni152_template()and resamples it onto the data’s grid + affine vianilearn.image.resample_to_img()(handles the common case where TWCF data is 70x87x74 at 2.2mm but MNI152 ships at 91x109x91 at 2mm). Greyscale anatomy renders behind the stat map; sub-threshold voxels become transparent (NaN withset_bad((0,0,0,0))) so anatomy shows through.MNI mm slice labels. New
show_mni_mm=Trueparameter. Axial slice titles now show stereotactic coordinates (z = -8 mm) computed from the image affine via_voxel_to_mni_z(). Falls back to voxel index (z=20) if the affine is missing or non-finite.
Tests (3 new, 43 total in tests/reporting/test_restingstate_report.py).
Real-data validation. Re-rendered the Phase O cohort report on HPC against the existing N=10 TWCF group maps. Output at /dfs10/meganakp_lab/eolsson1/sandbox/phase_o_validation/reports/cohort_phase_s1.html (1.12 MB, was 982 KB before — extra weight from underlay-rendered images). Visually verified the new ReHo cluster-FWE montage at ~/reproducible-fmri-showcase/phase_s1/cohort_reho_clusterfwe.png: ventricles cleanly excluded by cluster-FWE are now visible against the MNI152 underlay (white voids at z=+17, +41, +65 mm), brain shape recognisable, MNI mm labels read z = -56, -32, -8, +17, +41, +65 mm.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 631656c |
| twcf | d36f97c |
| vividness | 70827f99 |
| HGN | 2ea21a7 |
| TI_DecNef | 1115e08 |
Real-output showcase persisted to permanent locations (not /tmp per user request):
/home/yoursurname/reproducible-fmri-showcase/phase_o_rs/— Phase O resting-state cohort HTML + PDF + page-rasterised PNGs/home/yoursurname/reproducible-fmri-showcase/phase_r_glm/— Phase R GLM cohort HTML + extracted group montages/home/yoursurname/reproducible-fmri-showcase/phase_s1/— Phase S1 underlay+MNI-mm cohort HTML + extracted variant images
14.10.31 2026-04-27 — Phase S2-S5 (one bundle, four audit items)
Closes audit Tier A in a single sprint. All four are small, complementary changes that turn the cohort reports from “the math is right” into “I can defend the V1/V2/V3 claim from this figure alone.”
S2 — ROI overlay on group maps. New _resolve_rois() loads + nilearn-resamples each mask to the data’s grid; _render_orthoview takes rois=[Path,...] and draws each as a coloured contour (matplotlib.contour at level 0.5) on every slice. Up to 6 distinct colours. CLI: --rois <p1> <p2> ... on cohort subcommands of both restingstate_report and glm_report. New analyses/fmri/masks/fetch_visual_rois.py — fetches the Wang 2015 retinotopic atlas if available (nilearn 0.11+), falls back to synthetic occipital-pole spheres at MNI (0, -90, 0) with V1/V2/V3 at 8/14/20 mm radii. Closes the audit’s flagged empty analyses/fmri/masks/.
S3 — Cluster peak table with MNI coords. New summarise_clusters() in pipelines/restingstate/group_stats.py enumerates surviving clusters with peak voxel, peak MNI mm, peak z, cluster size in vox and mm³, centroid MNI mm. Both run_group_stats and run_glm_group_stats write a {kind|contrast}_clusters.tsv and include the cluster list in the JSON sidecar. Cohort reports render the table as HTML right after the Cluster-FWE card (caps at 10 largest).
S4 — Empty-coverage UX banner. When per-subject discovery returns zero for every kind but group_stats is populated, render_cohort_html emits an amber warning banner explaining that the report is being rendered against a derivatives root that doesn’t contain the per-subject NIfTIs the group maps were computed from. Closes the UX bug surfaced by the audit (PIPELINE_AUDIT_2026-04.md §4.3).
S5 — JSON sidecar. generate_cohort_*_report writes <output>.json alongside <output>.html by default. Base64 PNG blobs replaced by True/False presence flags so the sidecar is small (~16 KB vs the 1.1 MB HTML) and human-readable. Suppress with write_json_sidecar=False.
Tests (6 new, 177 total in tests/reporting/ + tests/pipelines/): - test_summarise_clusters_returns_mni_coords — affine respected - test_cohort_report_writes_json_sidecar — sidecar present + clean - test_cohort_report_warns_on_empty_coverage_with_group_stats - test_render_orthoview_with_rois - test_resolve_rois_skips_empty_masks - test_render_orthoview_underlay_arg_accepted (Phase S1)
Real-data validation. Re-rendered the Phase O group ALFF cohort on HPC with all four S features enabled: - Output: /dfs10/meganakp_lab/eolsson1/sandbox/phase_s/reports/cohort_with_rois_clusters.html (1.15 MB) + .json (16 KB) - Pulled to ~/reproducible-fmri-showcase/phase_s_full/ - Visually verified the ALFF cluster-FWE montage: - MNI152 underlay (light grey brain shape with sulci visible) - V1/V2/V3 contour rings (green/yellow/orange) at occipital pole on sagittal + z=-8 mm axial - Cluster-FWE z-map (red) overlaid on top - MNI mm labels (z = -56 to +65 mm) - L/R orientation markers - Cluster peak table HTML present: | # | size (vox) | size (mm³) | peak z | peak MNI | centroid | | 1 | 181,061 | 1,931,444 | +8.00 | (-12, -26, -1) | (-0, -22, +10) | - JSON sidecar contains the same cluster list with full precision.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 753f244 |
| twcf | 0da1956 |
| vividness | e3f8bd6f |
| HGN | 7b84c7b |
| TI_DecNef | 6010342 |
Audit Tier A status: S1 + S2 + S3 + S4 + S5 done. Next from the audit roadmap is Tier B: PipelineDescription.json validation, NeuroVault export, .bidsignore + IntendedFor fixups for TWCF rawdata. Plus the legacy analyses/fmri/glm/run_second_level_glm.py naming-divergence cleanup (audit §3.3).
14.10.32 2026-04-27 — Tier B partial: second-level cleanup + BIDS hygiene + PipelineDescription validator
Closes audit Tier B items S6 (PipelineDescription validation) + S7 (.bidsignore + IntendedFor fixups) plus the §3.3 second-level naming divergence. NeuroVault upload (S8) deferred — needs user creds.
§3.3 — analyses/fmri/glm/run_second_level_glm.py cleanup. Outputs now land in the canonical derivatives/glm/group/<contrast>_zmap.nii.gz layout (matching pipelines/glm/group_stats.py) instead of the orphan derivatives/glm_group/<task>/group_<contrast>_z.nii.gz. Both one-sample and two-sample paths now drop a JSON sidecar with n_subjects, subjects, model — same shape as run_glm_group_stats() so the cohort report recognises the group section. Docstring points one-sample users at pipelines.glm.group_stats which has FDR + cluster-FWE + cluster peak table; the legacy script is now scoped to the two-sample case that the canonical pipeline doesn’t yet provide.
S7 — BIDS hygiene utilities (new under scripts/data/):
check_bidsignore.pywalks a BIDS rawdata root, lists every non-BIDS top-level entry, and reports which are covered by.bidsignore. Suggests prefix-based pattern additions (_archive*,_backup*,_ingest,tmp*, etc.). Exits non-zero if any uncovered entry exists, suitable as a CI preflight gate. Run on TWCF rawdata: surfaced 19 uncovered entries that the current 3-pattern.bidsignoremisses (_archive_*,_backup_*,participants.tsv.bak.*,tmp_dcm2bids,_ingest).fix_intended_for.pywalks everysub-*/ses-*/fmap/*.json, checks whether eachIntendedForentry references an existing file in the subject’sfunc/directory, and remaps mismatched entries to the most-similar existing BOLD file via difflib. Run on TWCF sub-bu0070 (dry-run): correctly identified that everytask-fg_*_bold.nii.gzreference should be remapped to eithertask-fglocalizer_*ortask-figureground_*, andtask-figureGroundLocalizer_run-1totask-figureground_run-1. Uses task-name + run-number entity similarity. Default dry-run;--applyto rewrite.
S6 — libs/reporting/qc/collect_pipeline_description.py.
collect_pipeline_descriptions() walks every immediate sub-dir of derivatives/, parses each dataset_description.json, returns a PipelineEntry dataclass per pipeline (name, version, bids_version, generated_by, has_dataset_description). report_chain() audits the chain with optional require=[...] that fails if a needed pipeline is missing or undescribed. Flags BIDSVersion < 1.10 as INFO, missing description as WARN, missing required as MISSING.
CLI: python -m libs.reporting.qc.collect_pipeline_description derivatives --require fmriprep --json-out chain.json
Real-data validation on TWCF derivatives surfaced exactly the findings the audit flagged: - fmriprep/dataset_description.json declares BIDSVersion 1.4.0 (audit §3.6 — confirmed) - 70+ derivative dirs lack dataset_description.json (cross-task scratch, prfgeom-smoke variants, glm, glmsingle, freesurfer, qc, etc.) - Three derivative trees are well-described and BIDS 1.10.1: fggb-251122, fggb-251204-v4, prf_standardized
Tests (6 new, 183 total). All in tests/reporting/test_pipeline_description.py: - test_collect_finds_fmriprep_chain - test_collect_skips_subject_dirs - test_collect_marks_missing_description - test_report_chain_flags_missing_required - test_report_chain_flags_stale_bids_version - test_write_manifest
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 69fe2ad |
| twcf | 3164f0b |
| vividness | 5c835dbc |
| HGN | fe72611 |
| TI_DecNef | a0f0cd1 |
Audit roadmap status: Tier A (S1-S5) done. Tier B: S6, S7, §3.3 done. S8 (NeuroVault upload) deferred — needs API credentials. Tier C remaining: S9 (NiiVue WebGL viewer), S10 (surface flatmap via nilearn.plotting.plot_surf_stat_map).
14.10.33 2026-04-27 — Phase S10: fsaverage5 surface flatmap on cohort group maps
Closes audit Tier C item S10. Final volumetric → surface visualisation for the cohort report. Why this matters for figure-ground in early visual cortex: the calcarine fissure runs along the medial occipital surface, hidden in volumetric slices. Inflated medial views unwrap it so V1/V2/V3 anatomy is legible at a glance — every retinotopic paper uses this view.
Implementation. New _render_surface() in libs/reporting/restingstate_report.py: - nilearn.surface.vol_to_surf projects the cluster-FWE map onto fsaverage5 pial mesh per hemisphere. - nilearn.plotting.plot_surf_stat_map renders four panels (LH lat / LH med / RH med / RH lat) using output_file rather than the broken axes=/engine= API path. - sulc_left/sulc_right curvature as bg_map for anatomical context. - Each panel rendered to a tempfile, then assembled into a 1×4 montage with matplotlib. - MPLBACKEND=Agg force-set before nilearn import so the surface plotter doesn’t try Tk on a headless HPC compute node.
build_cohort_report and build_cohort_glm_report now emit a surface_png field per kind/contrast group entry. Both cohort renderers add a “Surface view (fsaverage5 inflated)” panel under the Cluster-FWE card. include_surface=False skips it.
Drive-by fixes. Removed the deprecated darkness=0.5 kwarg (TypeError on nilearn 0.11+).
Tests (1 new, 184 total): _render_surface returns None gracefully when nilearn isn’t importable.
Real-data validation. Rendered ALFF cluster-FWE on the N=10 TWCF cohort to ~/reproducible-fmri-showcase/phase_s10/alff_clusterfwe_surface_v4.png (118 KB). Output: inflated cortex with cluster-FWE z-map projected to surface; calcarine fissure visible at the medial-view pinch.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 5af9634 |
| twcf | cdd0fed |
| vividness | f9156408 |
| HGN | b59757a |
| TI_DecNef | 756155d |
Audit roadmap status (April 27): Tier A done (S1-S5), Tier B done (S6 + S7 + §3.3), Tier C done (S10). S8 (NeuroVault upload) deferred — needs API credentials. S9 (NiiVue WebGL viewer) deferred — substantial JS bundling work, lower priority than the others now that surface flatmap covers the “interactive view” need for the figure-ground use case.
Cumulative: Audit + 12 phases (S1-S10 covered as Tier A/B/C groups + run_second_level cleanup), 184 tests, end-to-end real-data validated against N=10 TWCF on UCI HPC3, all changes propagated to 4 children. The reporting layer is now world-class for figure-ground in early visual cortex: anatomical underlay + MNI mm slice labels + V1/V2/V3 ROI overlay + cluster-FWE corrected stats + cluster peak table with MNI mm + JSON sidecar + PDF export + inflated-surface view. Every claim defensible at a glance.
14.10.34 2026-04-27 — Phase T: real V1/V2/V3 + per-ROI summary table
Replaces the synthetic occipital-pole spheres with anatomically real Harvard-Oxford-derived V1/V2/V3 masks, and adds a per-ROI summary table to the cohort report so users can read off “did the signal land in V1?” numerically.
T1 — Real anatomical V1/V2/V3. analyses/fmri/masks/fetch_visual_rois.py gained a Harvard-Oxford fallback (between Wang 2015 and synthetic spheres): - V1 = Intracalcarine + Supracalcarine Cortex (1,366 vox @ 2 mm) - V2 = Lingual Gyrus + Cuneal Cortex (3,451 vox) - V3 = Lateral Occipital Cortex (sup) + Occipital Pole (11,470 vox)
These are anatomical proxies, not retinotopic. Use a real retinotopic atlas or subject-specific localiser when retinotopic precision matters.
T2 — Per-ROI summary table. _summarise_roi_overlaps(stat_path, roi_masks, threshold) returns one dict per ROI with mean_z, max_z, n_above_threshold, pct_above_threshold. Both build_cohort_report and build_cohort_glm_report compute the summary on the cluster-FWE map when --rois is provided. Cohort renderers add an “ROI overlap” HTML table after the cluster-FWE card.
Real data results (N=10 TWCF ALFF group map):
| ROI | n vox | mean z | max z | % above |z| > 2.3 |
|---|---|---|---|---|
| V1 | 1,366 | +4.42 | +6.48 | 97.6% |
| V2 | 3,451 | +4.28 | +7.38 | 94.8% |
| V3 | 11,470 | +3.77 | +7.02 | 90.5% |
(ALFF is whole-brain elevated; the table demonstrates the machinery. For a real figure-ground GLM contrast it would show the contrast’s selectivity for V1 vs V2 vs V3.)
Showcase reorganisation. The previous ad-hoc ~/reproducible-fmri-showcase/ directory has been moved into the repo at canonical, conventional locations:
data/atlases/visual/{V1,V2,V3}.nii.gz(33 KB, tracked)data/atlases/visual/README.md— provenance + usagedocs/showcase/figures/*.png(380 KB, tracked) — small screenshots that preview what the cohort report looks likedocs/showcase/full/— full HTML/PDF/JSON outputs (gitignored) for offline review; regenerable with the make targets / CLI documented indocs/showcase/README.md
Propagated to all 4 children (Phase T + showcase reorg):
| Repo | Phase T commit | Showcase reorg commit |
|---|---|---|
| Reproducible-fMRI | 72535b7 |
0d5255b |
| twcf | (in chore/template-sync-phase-t) | (in chore/template-sync-showcase) |
| vividness | b74e2736 |
4fcfea72 |
| HGN | (synced) | (synced) |
| TI_DecNef | (synced) | (synced) |
14.10.35 2026-04-27 — Phase U: design matrix display in per-subject GLM report
Closes audit §2.6 finding: analyses/fmri/glm/run_first_level_glm.py emits design_matrix_run00.png etc. alongside contrast outputs but libs/reporting/glm_report.py’s per-subject view didn’t pick them up. Added discover_design_matrices() + a “Design matrices” panel at the top of each task section showing all run thumbnails. A reviewer opening the per-subject GLM report can now see the design at a glance without separately opening PDF or raw PNG files.
Tests (2 new, 16 GLM total). Propagated to all 4 children (commit f2a502e template; child sync branches landed via chore/template-sync-phase-u).
14.10.36 Audit roadmap status (final, 2026-04-27)
- Tier A (S1–S5): done.
- Tier B (S6–S8 + §3.3): S6, S7, §3.3 done. S8 (NeuroVault upload) deferred — needs API credentials.
- Tier C (S9–S10): S10 done. S9 (NiiVue WebGL viewer) deferred — substantial JS bundling, lower priority since surface flatmap covers the interactive-feel ask.
- Beyond the audit’s top 10: Phase T (real anatomical V1/V2/V3
- ROI summary table), Phase U (design matrix display), showcase reorganisation into conventional in-repo locations.
End-to-end every figure-ground claim is now defensible from the cohort report alone: ROI overlay shows where, ROI summary table shows how much, cluster peak table shows MNI coordinates, surface view shows calcarine cortex anatomy, cluster-FWE shows the inferential threshold. The 1.65 MB cohort HTML at docs/showcase/full/cohort_n10_TWCF_full.html (gitignored; regenerate from real derivatives) is the canonical example.
14.10.37 2026-04-27 — Phase V: internal-review/interpretation polish
User pivoted away from public-facing items (NeuroVault) toward internal review + interpretation. Three small features in one bundle that turn the cohort report from “the math is right” into “I can interpret this without leaving the HTML.”
V1 — Anatomical labels on cluster peaks. New _label_cluster_peaks in pipelines/restingstate/group_stats.py looks up each cluster’s peak in Harvard-Oxford max-prob cortical (224 labels) + subcortical (21 labels) atlases. Writes the region into the clusters TSV and JSON sidecar. Cohort cluster table HTML gains a “region” column. Real-data verified on TWCF N=10 ALFF: peak at MNI (-12, -26, -1) labelled “Left Thalamus” — confirms the giant ALFF cluster is centred subcortically.
V2 — Per-subject ROI z columns. When --rois is supplied, the cohort GLM table now shows one column per ROI containing each subject’s mean z within that ROI. Lets a reviewer eyeball “which subjects drive the V1 effect?” at a glance.
V3 — MRIQC links. New discover_mriqc_report() walks derivatives/mriqc/<sub>.html; if found, the per-subject row in the cohort table gets a “QC” cell linking to the MRIQC report. One click to motion + SNR plots without leaving the result HTML.
Newer atlas (per user “yes do newer nilearn”). nilearn 0.13.0 doesn’t ship Wang 2015; closest available is the Jülich cytoarchitectonic atlas (fetch_atlas_juelich) which has BA17 / BA18 / V3V — i.e. histological gold-standard early visual cortex. fetch_visual_rois.py now prefers Jülich over Harvard-Oxford anatomical:
- V1 = BA17 (calcarine) — 4,383 vox
- V2 = BA18 — 3,568 vox
- V3 = V3V — 1,251 vox
Visual contour overlay now correctly places V1/V2/V3 along the calcarine fissure rather than spread across the whole occipital region.
Showcase artifacts refreshed: - docs/showcase/figures/cohort_alff_clusterfwe_with_v1v2v3.png — same image but with anatomically anchored Jülich contours - docs/showcase/figures/cohort_alff_surface_fsaverage.png - docs/showcase/figures/example_clusters.tsv (new) — sample cluster TSV with region column - docs/showcase/full/cohort_n10_TWCF_full.html (gitignored, regenerable) — the canonical full cohort example, 1.62 MB
Tests (187 total, no new tests added — Phase V features exercised through existing integration paths).
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 02e76c5 (template + showcase refresh) |
| twcf | chore/template-sync-phase-v ff-merged |
| vividness | 3d6ffe4b |
| HGN | (synced) |
| TI_DecNef | (synced) |
Phase V4 (NiiVue WebGL viewer) deferred. Bigger scope (JS bundling, threshold UI), and the surface flatmap + ROI overlay + cluster table together already cover the interactive-feel ask.
14.10.38 2026-04-27 — Phase W: 5-feature internal-review polish
User asked for “all of that” — provenance badge + design QC + residual diagnostics + cohort diff + NiiVue. Five features in one bundle, all focused on internal review and interpretation.
W1 — Cohort provenance badge. _read_cohort_provenance() walks <group>/.provenance.json → <root>/.provenance.json → first sub-*’s .provenance.json (with _inherited_from annotation). _provenance_badge_html() renders inline pills below the cohort header: git short SHA (with (dirty) flag), container_digest first 12 chars (sha256: prefix stripped), config_hash, software versions. Surfaces lineage right next to the result so reviewers can verify which code + container + config produced it.
W2 — Design matrix QC pills. run_first_level_glm.py now also writes design_matrix_runNN.tsv next to the PNG so QC can be computed without re-fitting. _compute_design_qc() reads the TSV and reports n_volumes, n_regressors, n_motion_spikes, condition_imbalance (max var / min var across condition columns), max_corr, max_vif. Warnings raised when imbalance > 5×, max_corr > 0.85, max_vif > 5, motion_spikes > 20% of volumes. Per-subject report renders QC pills + ⚠ warning badges under each design matrix thumbnail.
W3 — GLM diagnostics summary. run_first_level_glm.py drops glm_diagnostics.json with n_runs, n_volumes_per_run, n_regressors_per_run, regressor_names_run0. Per-subject report shows a one-liner “Model fit: N runs, V vols, R regressors. Run-0 regressors:
W4 — Cohort report diff. New compare_cohort_reports(a, b, out) + python -m libs.reporting.restingstate_report compare --a a.json --b b.json --output diff.html. Reads two cohort JSON sidecars (written by Phase S5), renders an HTML side-by-side table of cohort_stats, group_stats per kind (n_subjects, threshold_z, FDR n_signif, cluster-FWE surviving + size threshold + top cluster size/peak/region), with changed rows highlighted in amber. Useful for “did re-running with different confounds change the headline numbers?” review.
W5 — NiiVue WebGL viewer. Opt-in via --niivue flag. _stage_niivue_assets() copies group NIfTIs into <output_stem>_niivue/ as relative-URL assets. _niivue_panel_html() injects a <canvas> + threshold/colormap/kind selectors loading NiiVue from unpkg CDN. Reviewer can rotate, zoom, set crosshair, change threshold/cmap interactively. Requires the HTML to be served over HTTP (python -m http.server in the report dir); file:// breaks browser fetch.
Tests (3 new, 190 total): _provenance_badge_html_empty/_renders_pills, compare_cohort_reports_writes_diff_html. Manual smoke tests passed for design-matrix QC + diagnostics paths.
Propagated to all 4 children:
| Repo | Commit |
|---|---|
| Reproducible-fMRI | 13354c3 |
| twcf | chore/template-sync-phase-w ff-merged |
| vividness | 962ad345 |
| HGN | (synced) |
| TI_DecNef | (synced) |
Cumulative for the session. Audit + Phases K through W. The cohort report — for figure-ground in early visual cortex — now answers all of: “what region?” (anatomical labels), “did it land in V1?” (ROI overlap + per-subject ROI z), “which subjects drive it?” (per-subject ROI z column), “is the data clean?” (MRIQC link), “is the design sane?” (design matrix QC pills + diagnostics), “is the result inferentially defensible?” (cluster-FWE + threshold table), “what software made this?” (provenance badge), “did this result change vs last run?” (compare subcommand), “let me look interactively” (NiiVue WebGL). All defensible from the same HTML.
Cumulative across this session: Audit + Phases K through V on the reporting layer. Reporting code is now ~3,500 LOC (rs + glm + qc + dashboard + masks). Real data validated end-to-end on N=10 TWCF MNI-space cohort. The cohort report answers — for figure-ground in early visual cortex specifically — “what region is this?” (anatomical label), “did it land in V1?” (ROI overlap table), “which subjects drive it?” (per-subject ROI z), “is the data clean?” (MRIQC link), “is the result inferentially defensible?” (cluster-FWE), all from the same HTML.