# Rclone Remote-to-Remote Transfers

> **Scope:** Multi-site. Uses UCI HPC3 as the worked example; see "Adapting for Other Universities" for your site.

Use this guide whenever you need to move data between two cloud systems (for example SharePoint and an institutional HPC) without downloading files to your laptop first. The walkthrough below uses UCI’s HPC3 cluster as the concrete example, but every step is clearly labeled so teams at other universities can substitute their own hostnames, usernames, and onboarding rules.

## TL;DR (Non-Technical Quick Start)

1. **Install rclone** on your machine (Linux, macOS, or Windows).
2. **Create a key pair per device** with `ssh-keygen -t rsa -b 3072 -C "UCInetID (device)" -f ~/.ssh/hpc3-<device>` and choose a non-empty passphrase.
3. **Upload the public key** to your HPC account (for UCI see “Upload the key to HPC3” below). Other institutions usually provide a “copy your `.pub` file to `authorized_keys`” guide—follow theirs.
4. **Add/edit the HPC remote** in `rclone config`:
   - Name: `hpc`
   - Storage: `sftp`
   - Host: `hpc3.rcic.uci.edu` (replace for other clusters)
   - User: your HPC username
   - Key file: `/home/<you>/.ssh/hpc3-<device>`
5. **Copy data directly**:

   ```bash
   rclone copy sharepoint_cnclab:Projects/... hpc:/pub/<user>/projects/... \
     --transfers=16 --progress --create-empty-src-dirs
   ```

Keeping the key pair on each device skips Duo during transfers—the passphrase on the key unlocks the session instead.

---

## 1. Prerequisites

- rclone ≥ 1.65 installed (`rclone version`).
- A source remote already defined (e.g., `sharepoint_cnclab` for lab SharePoint storage).
- Access to your institution’s HPC login node (ssh + Duo or equivalent).
- For UCI-specific key requirements, reference the official RCIC instructions on generating keys and uploading them to HPC3 [link](https://rcic.uci.edu/account/generate-ssh-keys.html).

## 2. Generate a Key Pair Per Device

Create a unique key pair on each machine so lost hardware can be revoked without touching other devices.

```bash
ssh-keygen -t rsa -b 3072 \
  -C "username@university.edu (device-name)" \
  -f ~/.ssh/hpc-device-name
```

**Example:**

```bash
ssh-keygen -t rsa -b 3072 \
  -C "jsmith@uci.edu (laptop)" \
  -f ~/.ssh/hpc3-laptop
```

Replace `jsmith` and `laptop` with your own username and device label.

After generating the pair, convert it to PEM format so rclone can decrypt the passphrase-protected key:

```bash
ssh-keygen -p -m PEM -f ~/.ssh/hpc3-laptop-name
```

You will be prompted for the current passphrase and asked to re-enter it.

Notes:

- RCIC requires a non-empty passphrase; never press enter on an empty prompt.
- On Windows PowerShell, use the same command inside a terminal started as your user.
- macOS/Linux store keys under `~/.ssh`; Windows stores them in `C:\Users\<you>\.ssh`.
- This creates `hpc3-laptop-name` (private) and `hpc3-laptop-name.pub` (public). The `.pub` file is safe to share with the cluster admins.

## 3. Upload the Key to HPC3 (Example)

UCI-specific steps (adapt if you are on a different cluster):

```bash
cd ~/.ssh
sftp username@hpc3.rcic.uci.edu:.ssh
```

During the `sftp` session:

```
sftp> put hpc3-laptop-name.pub add
sftp> ls   # verify `add` is present
sftp> quit
```

The RCIC automation copies `add` into `authorized_keys` within ~5 minutes. Afterwards:

```bash
ssh -i ~/.ssh/hpc3-laptop-name username@hpc3.rcic.uci.edu
```

You should be prompted only for the key’s passphrase—Duo is bypassed because key-based auth satisfied MFA. For other universities, follow their published “upload your public key” directions; the only requirement is that the `.pub` content ends up inside your HPC account’s `~/.ssh/authorized_keys`.

## 4. Define the HPC Remote in rclone

Run `rclone config` or edit `~/.config/rclone/rclone.conf` manually. Minimal stanza:

```
[hpc]
type = sftp
host = hpc3.rcic.uci.edu        # replace with your institution’s login node
user = username                 # replace with your username
key_file = /home/<you>/.ssh/hpc3-laptop-name
```

Guidelines:

- Keep one remote name (e.g., `hpc`) that every project references. Only the `key_file` path differs per device.
- If your cluster mandates Duo even with keys, remove `key_file` temporarily to fall back to password authentication.
- Never commit live tokens or private keys into git. Store secrets in a password manager (see Section 6).
- Finish by running:

  ```bash
  rclone config password hpc key_file_pass
  ```

  (Replace `hpc` with your remote name.) This stores the key’s passphrase in obscured form so rclone can decrypt the PEM key. If you prefer `ssh-agent`, load the key there instead.
## 5. Run Remote-to-Remote Transfers

Example command streaming SharePoint → HPC3:

```bash
rclone copy sharepoint_cnclab:Projects/ARC_myproject_TWCF/myproject/batch/... \
  hpc:/path/to/user/projects/myproject/batch/... \
  --transfers=16 \
  --progress \
  --create-empty-src-dirs
```

Replace paths with your project layout. `copy` preserves the source; use `sync` if you want deletions to propagate (use carefully). `--transfers` controls parallelism; HPC3 typically tolerates 8–16 parallel streams.

> ✅ Tip for Windows users: if your SharePoint mount shows up as `S:\Projects\...`, the equivalent rclone path is `sharepoint_cnclab:Projects/...`—rclone talks to SharePoint’s API directly and does not rely on the drive letter.

### Example: Export fMRIPrep/FreeSurfer outputs to SharePoint

`fmriprep` and `freesurfer` outputs contain many symlinks (surface caches, per-session derivatives), so include `--copy-links` to copy the file targets rather than the lightweight symlink stubs. Always keep `-P` (or `--progress`) enabled so you can monitor long-running exports.

**Option A – Launch from your laptop/desktop (remote-to-remote):**

```bash
for folder in fmriprep freesurfer; do
  rclone copy -P \
    hpc:/path/to/user/projects/myproject/derivatives/${folder} \
    sharepoint_cnclab:Projects/DecNef/derivatives/${folder} \
    --create-empty-src-dirs \
    --copy-links \
    --transfers=16 --checkers=16 \
    --modify-window=2s
done
```

- Requires both `hpc` (SFTP) and `sharepoint_cnclab` remotes in your local `rclone.conf`.
- Your workstation orchestrates the transfer; data streams directly between the two endpoints (it is not downloaded locally).

**Option B – Run directly on the HPC login node:**

```bash
for folder in fmriprep freesurfer; do
  rclone copy -P \
    /path/to/user/projects/myproject/derivatives/${folder} \
    sharepoint_cnclab:Projects/DecNef/derivatives/${folder} \
    --create-empty-src-dirs \
    --copy-links \
    --transfers=16 --checkers=16 \
    --modify-window=2s
done
```

- Works even if your local machine lacks the `hpc` remote—only the SharePoint remote must be configured on the login node.
- Use `module load rclone` (or the site-equivalent) first when rclone isn’t already on the `PATH`.

Optional quality-of-life flags for either flow:

- `-n` / `--dry-run` to preview the transfer.
- `--bwlimit 20M` (adjust to taste) if the cluster enforces bandwidth caps.
- `--exclude "sourcedata/**"` etc. when you need to skip subtrees.

### Optional: `rclone move` + HTML archiving

Some destinations (SharePoint/OneDrive) rewrite HTML files on upload, so rclone’s post-transfer size check fails and repeats forever. Bundle the HTML reportlets before running `rclone move`, then upload the tarball separately. If your destination leaves HTML untouched, you can skip this section.

> Shortcut: run `bash scripts/export_derivatives_to_sharepoint.sh` (defaults assume the myproject BIDS derivatives layout and the `hpc`/`sharepoint_cnclab` remotes). The script performs the exact steps below—archive HTML, move the remaining data, copy the archives, and optionally clean up. Override paths/remotes via flags or environment variables documented in the script header.

The helper script supports both `--mode move` (delete from HPC once uploaded) and `--mode copy` (leave HPC data intact for follow-up analyses). By default HTML reportlets are wrapped into tarballs; set `--html-content-type application/octet-stream` if your destination mangles inline HTML and you prefer direct uploads.

#### Copy only specific subjects (manual command)

When you need to stage a single BIDS subject (for example `sub-uci04`) without touching the rest of the derivatives tree, use rclone’s filter syntax so only the requested folders/files are transferred:

```bash
rclone copy -P \
  hpc:/path/to/user/projects/myproject/derivatives/batch-YYYYMMDD \
  sharepoint_cnclab:Projects/ARC_myproject_TWCF/myproject/batch/fg-20251108/derivatives \
  --filter '+ freesurfer/sub-uci04/**' \
  --filter '+ fmriprep/sub-uci04/**' \
  --filter '+ fmriprep/sub-uci04*.html' \
  --filter '- **' \
  --copy-links \
  --header-upload Content-Type:application/octet-stream \
  --transfers=8 --checkers=8 \
  --modify-window=2s
```

- Replace the subject ID (`sub-uci04`) and base paths for your project. Add more `--filter '+ fmriprep/sub-<ID>*.html'` lines when multiple subjects are needed.
- The `--filter '- **'` line rejects everything that wasn’t explicitly whitelisted; omit it if you still want dataset-level metadata copied.
- For short HTML-only transfers, you can drop `--copy-links` and the symlink-heavy directories to shave time. If SharePoint continues to rewrite HTML despite the `Content-Type` override, archive those reportlets first (`tar czf html_sub-uci04.tar.gz sub-uci04*.html`) and upload the archive instead.

**1. Archive HTML reportlets on the HPC side**

Create a timestamped archive per derivatives folder so the original HTML can be deleted after upload:

```bash
cd /path/to/user/projects/myproject/derivatives
timestamp=$(date +%Y%m%d_%H%M%S)
mkdir -p html_archives
for folder in fmriprep freesurfer; do
  find "${folder}" -type f -name '*.html' -print0 |
    tar --null -czf "html_archives/${folder}_html_${timestamp}.tar.gz" \
    --directory="${folder}" --files-from=-
done
```

> Swap `/path/to/user/...` for your own derivatives root. The tarball only contains HTML—it does not modify the rest of the outputs.

**2. Move the remaining data (excluding HTML)**

Use `rclone move` so files disappear from HPC as soon as they land on the destination. Excluding `*.html` keeps the originals in place until the tarball is uploaded.

```bash
for folder in fmriprep freesurfer; do
  rclone move -P \
    hpc:/path/to/user/projects/myproject/derivatives/${folder} \
    sharepoint_cnclab:Projects/DecNef/derivatives/${folder} \
    --create-empty-src-dirs \
    --delete-empty-src-dirs \
    --copy-links \
    --exclude '**/*.html' \
    --transfers=12 --checkers=12 \
    --modify-window=2s
done
```

- Stop any previous `rclone copy` run (`Ctrl+C`)—the `move` command above skips files that already exist with matching size/timestamp, so reruns are safe.
- To run directly on the login node instead of remote-to-remote, replace the `hpc:/...` source with the absolute `/pub/...` path.

**3. Upload the tarballs and clean up**

```bash
rclone copy -P \
  hpc:/path/to/user/projects/myproject/derivatives/html_archives \
  sharepoint_cnclab:Projects/DecNef/derivatives/html_archives \
  --transfers=4 --checkers=4 \
  --modify-window=2s

# After verifying the archives on the destination:
rm html_archives/*.tar.gz
find fmriprep freesurfer -type f -name '*.html' -delete
```

- Keep the tarballs only if collaborators need a local copy of the reportlets; otherwise delete them after SharePoint upload to reclaim space.
- If your destination storage never rewrites files, you can skip the archive step and run `rclone move` without `--exclude '**/*.html'`.

## 6. Sharing Configuration Across Devices

1. **Template in version control:** Keep a sanitized `rclone.conf.template` (no tokens, no secrets) in your dotfiles so new machines know the expected remote names.
2. **Secrets in 1Password:** Store the real `rclone.conf` as a secure document in 1Password. Provisioning flow:
   - `op document get rclone.conf --vault "<vault>" > ~/.config/rclone/rclone.conf`
   - `chmod 600 ~/.config/rclone/rclone.conf`
   - Update `key_file` to point at the local device’s private key (`hpc3-laptop-name`).
3. **Automation hook:** Optionally add a bootstrap script (see `scripts/` in this repo) that checks whether the key file exists and prints the exact `sftp put` command if it does not. Non-technical lab members can run one script and follow the on-screen prompts.

### Per-Device Checklist

1. Generate a new key pair (`ssh-keygen ... -f ~/.ssh/hpc3-<device>`).
2. Upload `<device>.pub` to the cluster (`sftp ... put ... add`).
3. Append `add` into `authorized_keys` (manually or wait for automation).
4. Update that device’s `rclone.conf` `key_file` path and supply the key passphrase via `rclone config`.
5. Test with `rclone lsd hpc:/path` before running `copy` or `sync`.

### Optional Bootstrap Script

Run `scripts/setup-rclone-hpc.sh` to automate most of the manual work:

```bash
cd Reproducible-fMRI
bash scripts/setup-rclone-hpc.sh
```

The script generates (or reuses) the key pair, converts it to PEM, prints the `sftp` upload instructions, updates the `hpc` remote (`host`, `user`, `key_file`), prompts for the key-file passphrase via `rclone config password`, and runs `rclone lsd` as a smoke test. Forks can copy this script and adjust the defaults to match their cluster.

## 8. Common Workflows Enabled

- **Upload raw data** from SharePoint to `/pub/<user>/...` before launching HPC jobs.
- **Download HPC outputs** back to SharePoint (reverse the source/destination) when fMRIPrep or other pipelines finish.
- **Bidirectional sync** (with `rclone sync`) to keep project scaffolding consistent between local repos, SharePoint, and HPC—useful when rapidly switching between local prototyping and cluster-scale runs.
- **Stage configs or toolboxes** by copying smaller `code/` or `toolboxes/` directories between environments without re-cloning.
- **Daisy-chain additional remotes**, e.g., push HPC results directly into Google Drive or object storage by swapping out the destination remote.

## 9. Adapting for Other Universities

- **Cluster hostname/usernames:** Substitute your site’s SSH gateway instead of `hpc3.rcic.uci.edu` and use your local username field.
- **Key policies:** Some sites require Ed25519 keys, others enforce hardware tokens. Swap the `ssh-keygen` arguments to match (`-t ed25519`, etc.) but keep the “per device, passphrase-protected” rule.
- **Firewall rules:** If outbound SFTP must traverse VPN, run `rclone copy ... --sftp-port <port>` or add `ProxyCommand` in `~/.ssh/config`. Document these deviations in your fork so collaborators know the institution-specific requirements.
- **Storage roots:** Replace `/path/to/user/...` with your HPC’s project directory (e.g., `/scratch/<group>/datasets`). The remote-to-remote concept stays identical—only paths change.

> 📌 **Reminder:** Keep this document in sync with your institution’s official HPC guidance. If an onboarding email contradicts this example (different hostnames, extra MFA), update the “Adapting for Other Universities” section and hyperlink their doc so downstream labs can adjust safely.


