# Backup & Restore

Weavestream stores persistent data in three locations. All live under `$DATA_DIR` (default: `./data` next to `compose.yml`).

## What to Back Up

| Location | Contents | Back up? |
|---|---|---|
| `$DATA_DIR/postgres` | Postgres data directory — all your records | **Yes — critical** |
| `$DATA_DIR/files`    | Uploaded files (attachments, thumbnails, logos, export PDFs) | **Yes — important** |
| `$DATA_DIR/backup`   | Scheduled Postgres dumps + manifests written by the worker | **Yes — if scheduled exports are enabled** |
| `$DATA_DIR/redis`    | Session data, BullMQ queues, cache | Optional (replayable) |

Redis data is largely replayable — queued jobs will retry, sessions will expire and users will re-login. The Postgres directory and the file directory are irreplaceable without a backup. The new `$DATA_DIR/backup` directory holds dump files produced by the in-app schedule below, and your external backup routine should treat it the same as `$DATA_DIR/postgres` and `$DATA_DIR/files` — copy it off-host on a regular cadence.

## In-App Scheduled Postgres Export

Operators with the `BACKUP_MANAGE` capability (or any `SUPER_ADMIN`) can configure scheduled Postgres exports under **Admin → Backups**. Each schedule produces a `pg_dump --format=custom` file plus a `manifest.json` sidecar in `$DATA_DIR/backup` on the Docker host.

### Configure a schedule

1. Go to **Admin → Backups**.
2. Click **New schedule**.
3. Pick a cron preset (Daily 03:00, Weekly Sun 03:00, Monthly 1st 03:00) or enter a raw 5-field cron pattern.
4. Set the timezone (defaults to `Etc/UTC`).
5. Set retention:
   - **Keep last** — bucket-agnostic floor; always retains the N most-recent successful runs (default 3). Useful for testing or multiple manual triggers within the same day so they don't immediately prune each other.
   - **Daily / Weekly / Monthly** — GFS counts; one run per distinct day / ISO week / calendar month, newest first.
   - The most recent successful run is always retained regardless of these numbers.
6. (Optional) Add notification recipients. Failures always email; toggle "Email on success too" if you also want a confirmation per run.
7. Save. The schedule begins firing on the next matching cron tick.

### Run-now and downloads

Press **Run now** on any enabled schedule to trigger an immediate run. The History tab polls for status and surfaces:

- A **Download** button on every successful run (streams the dump back through the API; the API container has the backup directory mounted read-only).
- A **Manifest** button that opens the JSON sidecar with the Weavestream version, Prisma migration hash, active password-encryption `kid`, generated-at timestamp, dump size, and SHA-256 checksum.

### Concurrency safety

Every run takes a Postgres advisory lock around the whole job. A second run that fires while another is in flight is marked `failed` with `error = 'concurrent'` instead of racing — operators see the skip on the History tab and (if configured) receive a notification email.

### What's in the dump

The exported `pg_dump --format=custom` file contains every Weavestream table and is portable across hosts. It does **not** contain:

- The contents of `$DATA_DIR/files`. Uploaded attachments live on the filesystem, not in Postgres — back them up separately (see below).
- Your `.env` secrets. The `PASSWORD_ENCRYPTION_KEY`, `JWT_SIGNING_KEY`, etc. are required to decrypt vault data after a restore. Keep a copy of `.env` in a secrets vault.

## Postgres Backup (manual / external)

Even with the in-app schedule enabled, you may still want an external `pg_dump` cron — for example to push directly to S3 from a separate host.

### Logical dump (recommended)

```bash
docker compose exec -T postgres \
  pg_dump -U "$POSTGRES_USER" "$POSTGRES_DB" \
  | gzip > "backup-$(date +%F).sql.gz"
```

This works while the stack is running and is always point-in-time consistent.

### Physical backup

For very large databases, you can also rsync the data directory while Postgres is stopped:

```bash
docker compose stop postgres
rsync -a "$DATA_DIR/postgres/" /backup/postgres/
docker compose start postgres
```

## File Storage Backup

The api and worker write files atomically (write to a temp sibling, then `rename`), so an `rsync` of the host directory while the stack is running never sees partial files. Just copy the directory:

```bash
rsync -a "$DATA_DIR/files/" "/backup/files-$(date +%F)/"
```

If you prefer to stop the stack first (e.g. for a snapshot-consistent NAS share), it works the same way:

```bash
docker compose stop api worker
rsync -a "$DATA_DIR/files/" /backup/files/
docker compose start api worker
```

## Off-Host Routine

The in-app schedule writes to `$DATA_DIR/backup` on the same Docker host. To survive a host loss, you still need to copy that directory off-host. A minimal nightly cron looks like this:

```bash
#!/bin/bash
set -e

BACKUP_DIR="/opt/backups/weavestream"
COMPOSE_DIR="/opt/weavestream"
DATA_DIR="$COMPOSE_DIR/data"
DATE=$(date +%F)

mkdir -p "$BACKUP_DIR"

# In-app dumps + manifests already produced by the worker
rsync -a "$DATA_DIR/backup/" "$BACKUP_DIR/postgres-dumps/"

# Uploaded files
rsync -a "$DATA_DIR/files/" "$BACKUP_DIR/files-$DATE/"

# Rotate — keep last 30 days of file snapshots; the dumps are
# pruned by Weavestream's own GFS retention so we don't touch them.
find "$BACKUP_DIR" -maxdepth 1 -name "files-*" -type d -mtime +30 -exec rm -rf {} +

echo "Weavestream off-host sync complete: $DATE"
```

If you prefer a one-shot external `pg_dump` instead of the in-app schedule, the legacy command in [Postgres Backup](#postgres-backup-manual--external) still works and can run from any host with the Postgres client installed.

## Restore from Backup

A clean restore involves three things, in order:

1. The original `.env` (specifically `PASSWORD_ENCRYPTION_KEY`, `PASSWORD_PREVIOUS_KEYS`, `JWT_SIGNING_KEY`, `MFA_ENCRYPTION_KEY`).
2. `$DATA_DIR/files`.
3. The Postgres database.

### Restore on a new host

```bash
# 1. Install Docker, place compose.yml + .env, restore secrets
mkdir -p /opt/weavestream && cd /opt/weavestream
curl -O https://raw.githubusercontent.com/Weavestream/Weavestream/main/compose.yml
cp /backup/secrets/.env .env
mkdir -p data
rsync -a /backup/files-2026-04-23/ data/files/

# 2. Start only Postgres and let it initialise
docker compose up -d postgres

# 3. Restore the dump from the in-app schedule
#    — the file is whatever weavestream-postgres-<timestamp>.dump you copied
#    from the source host's $DATA_DIR/backup directory.
docker compose exec -T postgres sh -lc \
  'pg_restore --clean --if-exists --no-owner --no-acl \
     -U "$POSTGRES_USER" -d "$POSTGRES_DB"' \
  < /backup/postgres-dumps/weavestream-postgres-2026-04-23T03-00-00Z.dump

# 4. Bring the rest of the stack up — `prisma migrate deploy` runs on
#    api boot and is a no-op if the dump already includes every
#    migration recorded in the manifest.
docker compose up -d
```

### Restore from a legacy `pg_dump` SQL dump

If you used the older `pg_dump | gzip` cron and have a `.sql.gz` file:

```bash
# Stop the api, web, and worker (but keep postgres running)
docker compose stop api web worker

# Drop and recreate the database
docker compose exec -T postgres \
  psql -U postgres -c "DROP DATABASE weavestream; CREATE DATABASE weavestream OWNER weavestream;"

# Restore
gunzip -c backup-2026-04-23.sql.gz | \
  docker compose exec -T postgres psql -U weavestream weavestream

# Start everything back up — migrations run automatically
docker compose up -d
```

### Restore File Storage

```bash
docker compose stop api worker
rsync -a /backup/files-2026-04-23/ "$DATA_DIR/files/"
docker compose start api worker
```

### Post-restore verification

After restoring, log into the admin and confirm:

- Login works for an existing account (proves session and JWT keys match).
- Clicking any saved password shows the plaintext value (proves `PASSWORD_ENCRYPTION_KEY` matches the dump's `passwordEncryptionKid` from the manifest).
- A recently uploaded image renders (proves `$DATA_DIR/files` is in place and readable).
- A fresh **Run now** export from **Admin → Backups** completes successfully (proves the new host can talk to Postgres and write to `$DATA_DIR/backup`).

## Off-Site Backups

For production deployments, replicate backups off-site:

- **NAS or external drive** — rsync `$DATA_DIR/backup` and `$DATA_DIR/files` to a mounted share.
- **S3-compatible storage** — `aws s3 sync`, `rclone`, or `restic` to Backblaze B2, Wasabi, or AWS S3.

Example with restic:

```bash
restic -r s3:https://s3.amazonaws.com/my-backup-bucket backup \
  "$DATA_DIR/backup" "$DATA_DIR/files"
```

The in-app dumps are bit-for-bit identical to a manual `pg_dump --format=custom`, so they deduplicate well across nightly snapshots in restic and rclone.

## Backup Checklist

- [ ] **Admin → Backups** has at least one enabled schedule producing successful runs.
- [ ] `$DATA_DIR/backup` is part of your off-host sync.
- [ ] `$DATA_DIR/files` is part of your off-host sync.
- [ ] (Optional) An external `pg_dump` cron runs from a separate host as a second line of defence.
- [ ] Backups replicated off-site.
- [ ] Retention policy set (in-app GFS plus an off-site retention).
- [ ] Restore tested at least once on a fresh Docker host (a backup you've never restored is not a backup).
- [ ] `.env` file backed up separately (contains secrets not in the database).
