Changelog¶

All notable changes to Shoreguard are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.40.0] — 2026-06-20¶

Compatibility: no new gateway requirement — these features reuse existing OpenShell RPCs and add no new upstream surface, so the same OpenShell v0.0.57+ as 0.39.0 applies. The ShoreGuard database upgrades in place via the embedded migrations 109–113 on startup; no manual step. Every new behavior is off unless its SHOREGUARD_* flag is set (the rate governor and denial replay are off even in --local).

A control-plane response to where OpenShell users feel the most pain (deep-research driven): runaway inference spend with no native visibility, no per-agent throttle, no cross-gateway tenant boundary, trial-and-error policy authoring, and silent gateway-restart blast radius. ShoreGuard already holds the keys and meters every sandbox, so these are squarely the control plane's job.

Added¶

Estimated-dollar cost overlay for inference spend (Spend Governor, stage 1). OpenShell logs security events but not tokens or dollars, and its L7 proxy strips usage metadata — so despite repeated $300–1,300/mo runaway-spend reports, operators had no spend figure at all. ShoreGuard already meters per-sandbox inference request counts; a new SHOREGUARD_PRICING_* price table (a per-provider-type or flat per-request rate) turns those counts into an estimated dollar amount, surfaced in the dashboard top-spenders table, the sandbox usage card, and the daily digest. A budget can now be set as a dollar ceiling (limit_usd) instead of a request count, taking precedence when both are present. Honestly labelled "estimated": there is no token-accurate cost until an upstream usage RPC lands. On by default in --local mode unless SHOREGUARD_PRICING_ENABLED is set explicitly (feat(budgets)).
Tenants — a gateway-grouped visibility boundary. OpenShell is single-operator with no tenant model and no per-tenant observability (its maintainers' #1 enterprise gap), and ShoreGuard itself showed the whole fleet to every user. A tenant groups gateways and users; a non-admin user assigned to tenants now sees only their tenants' gateways in the gateway list, fleet overview, and on-demand digest, with a per-tenant spend/health rollup. This is a visibility boundary only — never data-plane namespace/quota/GPU isolation (that stays OpenShell's job). Admins, the --no-auth bypass, and users in no tenant always see the full fleet (fail-open); a transient DB error fails closed (503). Every identity-free background loop (metering, health, drift, discovery, cert rotation, metrics, /readyz) and the pushed daily digest stay fleet-wide; the audit hash chain stays global. Admin CRUD at /api/tenants and a new Tenants admin page; gated by SHOREGUARD_TENANT_ENABLED (feat(tenants)).
Gateway restart reconciler — surface the blast radius. A gateway/Docker restart destroys all sandboxes and is the highest-engagement reliability pain in the OpenShell community; ShoreGuard can't prevent it (it's data-plane), but it now surfaces it. The health loop snapshots each gateway's sandboxes + provider attachments on every successful probe, and on an unreachable → recovered transition diffs the last pre-down snapshot against a fresh one and fires a gateway.sandboxes_reaped webhook (with the reaped sandboxes + lost attachments), also summed into the daily digest. An "at risk on restart" dashboard badge flags gateways whose OpenShell version is below SHOREGUARD_RECONCILER_RESTART_SAFE_MIN_VERSION. Read APIs at /api/gateways/{gw}/reconciler/{reaps,inventory} and a fleet-wide /api/reconciler/{recent,at-risk}; snapshots/reaps are pruned by the cleanup task. Surfacing/diagnosing only — never self-healing the host daemon, and honestly time-decaying as upstream ships restart-safe state (feat(reconciler)).
Spend Governor stage 2 — per-sandbox rate ceilings with a reversible soft-pause. OpenShell has only a gateway-wide limiter, so one OpenClaw-style retry/parallel storm can exhaust a shared provider key with only the hard kill switch to stop it. A per-sandbox request-rate ceiling (max_requests per a tumbling window_seconds, evaluated from the metered counts — no second log poll) now trips a reversible soft-pause: it detaches the sandbox's providers like the kill switch but into its own rate_pause_entries table with an auto-resume cooldown, sitting between budgets and the hard kill switch. It writes its own table (never KillSwitchEntry) and skips any sandbox already kill-switched, re-attaching only what it detached. New rate.paused/rate.resumed webhook events, a digest "rate-paused" line, per-sandbox rate-limit CRUD at /api/gateways/{gw}/sandboxes/{name}/rate-{limit,status} and a fleet-wide /api/rate-governor/paused, plus a sandbox-detail UI card. Off by default even in local mode (SHOREGUARD_RATEGOV_ENABLED); per-agent == per-sandbox, and "requests" remain proxy log lines, not tokens (feat(rate-governor)).
Policy Simulator — narrowness gate + best-effort denial replay. Policy authoring is trial-and-error and approval chunks often propose a grant far broader than the denial that prompted it. (1) A narrowness gate annotates each pending approval chunk with a proposed-rule breadth assessment and badges over-broad grants (** host/path) in the approval inbox — a sound, side-effect-free breadth heuristic, on by default (SHOREGUARD_SIMULATOR_NARROWNESS_GATE_ENABLED). It deliberately does not re-implement the in-progress upstream server-side narrowness scorer (#1840) nor auto-route to quorum — it flags only. (2) Denial replay (opt-in, SHOREGUARD_SIMULATOR_REPLAY_ENABLED) persists the inbound denial corpus (denial_samples, migration 113) — the live cache is in-memory and volatile — and a new POST /api/gateways/{gw}/sandboxes/{name}/policy/simulate replays it against a candidate (or the active) policy via the existing Z3 encoders, predicting which previously-blocked requests it would now allow. Labelled best-effort: deterministic evaluation that may diverge from the live gateway matcher (feat(prover)).

Security¶

Raised security floors for four transitive dependencies flagged by pip-audit: cryptography (→ 49.0.0, GHSA-537c-gmf6-5ccf), starlette (→ 1.3.1, CVE-2026-54282 / CVE-2026-54283), msgpack (→ 1.2.1, GHSA-6v7p-g79w-8964), and pydantic-settings (→ 2.14.2, GHSA-4xgf-cpjx-pc3j). No API impact; the full suite passes against the bumped versions (fix(deps)).

[0.39.0] — 2026-06-13¶

Compatibility: requires a gateway running OpenShell v0.0.57 or newer (see the installation guide). Existing ShoreGuard v0.37 databases upgrade in place; older databases must pass through v0.37 first.

A major release in two parts. First, a ground-up internal architecture redesign — a Preact/TypeScript island frontend, an async-native gRPC client and data layer, a single composition root, and a squashed migration baseline — with REST paths, responses, and CLI commands unchanged. Second, the Homelab / DGX-Spark program: making the single-box, local-agent deployment a first-class citizen, from phone approvals to overnight digests. The architecture redesign is detailed at the end of this entry.

Changed¶

Phone approvals work out of the box in --local mode. Approving an agent from your phone is ShoreGuard's headline overnight capability, but it shipped dead by default: one-tap approve/reject links only attach to approval webhooks when both webhooks.one_tap_approvals and server.public_url are set, and neither was. Local mode now enables one-tap links (unless SHOREGUARD_WEBHOOK_ONE_TAP_APPROVALS is set explicitly) and the CLI derives public_url from the resolved LAN/Tailscale bind address — so a notification a solo operator receives actually carries a button to tap.
Solo/local defaults: usage metering and the daily digest are on by default in --local mode. Both ship off so a multi-tenant production install never starts log-polling or sending digests unexpectedly — but on a single-machine local box that caution is backwards: the operator is the only user and wants spend tracking and the overnight report to just work. Local mode now turns both on unless the operator set SHOREGUARD_BUDGET_METERING_ENABLED / SHOREGUARD_DIGEST_ENABLED explicitly (an explicit value always wins). This is what makes the new dashboard's request counts and budget bars real out of the box.
The daily digest now reports today's inference spend. digest.build() never touched usage data; it now adds today's total inference requests and the top spenders (reusing the cross-gateway budget.summary), surfaced both in the digest.daily webhook/message and as a chip on the dashboard digest card — so the overnight report finally answers "what did my agents cost while I slept?".
UI: one consistent brand-green primary, app-wide. The web UI already ships Bootstrap 5.3, but mixed Bootstrap-blue .btn-primary with brand-green .btn-success, and form focus rings / switches stayed Bootstrap blue. The brand colour is now wired into Bootstrap's tokens once (--bs-primary, link colour, focus-ring, 2px radius) with .btn-primary / .btn-outline-primary themed to the brand green — so every primary action, form focus ring and toggle is the same green in light and dark. Drops the per-component border-radius overrides and the duplicate login focus rule.
UI: consistent status colours and friendlier capability notices. Role badges now resolve through one shared map everywhere (admin showed red on Users but grey on the audit log); OCSF log severities, the security check and provider-profile diagnostics share one severity→colour map; and gateway capability gaps ("not supported by the current gateway version") render as an informational warning instead of an alarming red error. Hand-rolled empty states fold into the shared EmptyState component and the duplicate role-badge map is removed.
UI: accessible modals + dead-CSS cleanup. The shared Preact modal now closes on Escape, locks background scroll while open, and exposes role="dialog" / aria-modal. Removes dead CSS — the unused .detail-tabs rules, the legacy #approvalModal override, and the .phase-* badge classes superseded by the shared badge map.
UI: responsive topbar, scrollable sandbox tabs, wrapping toolbars. The topbar no longer overflows on phones — the account email collapses to a profile icon below md and the bar tightens its padding. The 10-tab sandbox sub-navigation now scrolls horizontally instead of running off-screen (Terminal/Forward were unreachable on mobile), page-header action toolbars wrap below the title, the remaining unwrapped tables (passkeys, boot hooks, wizard templates) gain .table-responsive, and auth cards get padding so they're never clipped on short viewports.
UI: de-scrolled the long detail pages. The gateway detail page now lays the Kill-switch / Curfew controls and the Metadata / Connection blocks out in two equal-height columns on wide screens (stacking on mobile), and the sandbox detail page does the same for its Metadata / Attached-providers blocks — roughly halving the desktop scroll while keeping everything visible at once.
SQLite production check — single-replica SQLite is now a WARN (supported homelab shape, WAL mode) instead of a hard startup ERROR; it remains an ERROR when SHOREGUARD_REPLICAS > 1.

Fixed¶

Dashboard cards that never rendered, and a hardcoded approvals count. The home dashboard's Sandboxes, Approvals and Create Sandbox cards were gated on a single "active gateway" that the home page (/) never sets — so they were dead, and the Approvals stat was hardcoded to 0 ("populated by future aggregation"). They are now driven by real cross-gateway aggregation: total sandboxes, summed pending approvals, and gateway-aware navigation.
UI click-path polish (follow-ups from the exhaustive click-path test). Four interaction issues surfaced by clicking every reachable control: (1) The authenticated data-entry forms (register gateway, invite user, new service principal, create/refresh provider, expose service) relied on the browser's native required tooltip, which blocked submission without any on-page feedback — so an empty submit felt like a dead button. They now opt out of native validation (noValidate) and show the same styled inline banner as the groups/webhooks forms. (2) "Remove budget" and webhook Pause/Resume fired instantly; they now route through the shared confirm dialog like every other state change. (3) The command palette (Ctrl/⌘-K) ignored sandboxes — typing "sandbox" returned nothing; on a gateway-scoped page it now lists that gateway's Sandboxes page and indexes its sandboxes by name. (4) The gateways list showed "mtls" for plaintext (http://) endpoints; the Auth column now derives an honest transport label from the endpoint scheme.
"Open on phone" no longer QR-encodes localhost — the dialog blindly encoded the current location, which is unreachable from any other device when browsing via loopback. A new GET /api/system/access-urls endpoint reports the actual bind address and the host's LAN addresses; the dialog now re-hosts the QR URL onto a reachable address (with a picker when there are several, e.g. LAN + tailnet), warns explicitly when the server is bound to loopback only (naming the --host 0.0.0.0 / tailscale serve escape hatches), and notes that push on the phone needs HTTPS.
Island modulepreload requests 404ed on every page — Vite resolved code-split chunk URLs against the default base /, so browsers requested /islands/….js instead of /static/dist/islands/….js. Islands still mounted via the relative fallback import, but every page logged module-load errors and lost the preload optimisation. vite.config.ts now sets base: "/static/dist/".
The update check now runs once at startup — periodic tasks sleep a full interval before their first run, so the 24-hour update check never produced a result within any realistic homelab uptime and the dashboard banner could not appear. PeriodicTask gained a run_at_start flag; only the update check sets it.
Startup crash on restart after running post-baseline migrations — the v2-baseline stamp check rejected every alembic_version other than the v0.37 head and the baseline itself, so a persistent database that had applied migrations 101+ (kill switch, budgets, …) failed the next boot with "predates ShoreGuard v0.37". Revisions known to the embedded migration chain are now handed to the regular upgrade head path; only genuinely unknown (pre-squash) revisions raise.
Review fixes for the 0.39 feature train (multi-agent adversarial review): backup restore now stages files next to their destination before swapping, so it survives /tmp-on-tmpfs (EXDEV) and a failed copy never displaces the live database; the curfew acts exactly once per window transition (no more event repeats on zero-sandbox gateways or partially failed releases, which now retry silently); a breached node metric whose sample disappears no longer fires a false node.recovered; MQTT publishes use a unique client id (fixed id caused broker-side session takeover) and resolve DNS off the event loop; SMTP delivery honours --local for LAN relays and never sends the server-wide relay credentials to a per-webhook overridden host; fleet policy-sync enforces per-gateway role overrides on the source and every target; a failed passkey login shows the error instead of hard-redirecting to /login.
Telegram webhooks were rejected by the API — the telegram channel type had a formatter, delivery logic, and frontend UI but was missing from the REST validation allowlist, so creating one returned
Creation/update now accept it and validate that the bot URL carries the chat_id query parameter delivery extracts.
Packaging: the wheel reliably bundles the compiled island bundle. The frontend rewrite force-included frontend/dist (a git-ignored Vite artifact) in the wheel, which broke uv sync (the editable build) anywhere the bundle was not pre-built — including every CI job. A Hatch build hook (hatch_build.py) now creates the directory for editable installs (which serve the source tree and do not need it), while the Docker image and the PyPI wheel compile the real bundle first via a Node build stage/step.

Added¶

Web approval inbox (/approvals). A single, cross-gateway list of every pending policy approval across all sandboxes — security-flagged first, with the rule, proposed endpoints, confidence and hit-count — that a reviewer approves or rejects in place (quorum-aware), without tabbing through each sandbox. It is the click-through target for the dashboard's pending-approval badges and the reviewer-side companion to phone approvals (mirrors upstream OpenShell #1612).
"Set up phone approvals" wizard (/setup/phone-approvals). One button subscribes this device to web push, wires a webpush webhook to the approval events (idempotently), and fires a sample approval notification to tap — so a solo operator goes from zero to "I can approve from my phone" in one click, reachable from the profile page. Degrades clearly when the browser/URL can't do web push (needs localhost/HTTPS, e.g. Tailscale).
Mission control — a single-pane home dashboard. The home dashboard now renders a cross-gateway Sandbox activity table: every sandbox across all reachable gateways with its phase, 24h inference requests (busiest first) and a live pending-approval badge — plus real Sandboxes and Approvals totals. For a solo box this is the at-a-glance "what are my agents doing / spending / waiting on?" view, without picking a gateway first.
First-gateway bootstrap from the UI. A fresh local box can create its first gateway from the empty state: a Create local gateway action (local mode only) opens an inline name/GPU form and calls the create endpoint, instead of dropping to the CLI. Hidden entirely on remote/production deployments.
Active sessions — see and revoke your signed-in devices — sessions are stateless HMAC cookies, so until now they could not be listed or individually killed (the only levers were deactivating the user or rotating the secret). A new session ledger records each sign-in (device, IP, kind, last seen), the per-request auth check rejects a revoked session, and the profile page gains an Active sessions section with a per-device Revoke and a Sign out other devices button. Logout now revokes its own session too. It is a denylist, not an allowlist — a valid cookie with no ledger row still authenticates, so enabling tracking does not force everyone to log in again. New endpoints GET/DELETE /api/auth/sessions and POST /api/auth/sessions/revoke-others, migration 108_user_sessions, and SHOREGUARD_SESSION_TRACKING (on by default). Especially relevant now that device-link can mint sessions onto phones.
QR device-link sign-in handoff — the "Open on phone" dialog can mint a one-time code so a phone gets its own session without typing a password. The flow is deliberately conservative: the code is 256-bit, stored only as a SHA-256 hash, travels in the URL fragment (never reaches server/proxy logs), is single-use via atomic conditional UPDATEs, and — crucially — only becomes a session after the operator approves the request on the original, already-trusted device, so a scanned-from-a-screen-share QR is caught. The phone is shown whose account it is joining (defends QR-swap phishing), the redeem endpoint is same-origin-guarded (Sec-Fetch-Site/Origin, blocks login-CSRF) with a SameSite=Strict cookie, and the handoff session is short-lived (device_link_session_max_age, default 24h) since these sessions cannot be individually revoked. Off by default (SHOREGUARD_DEVICE_LINK_ENABLED); failed/expired/replayed redeems are audit-logged. New endpoints under /api/auth/device-link/*, a /login/device confirmation page, and migration 107_device_link_codes.
Grafana dashboard expansion + metrics docs — the shipped dashboard (deploy/grafana/shoreguard.json) gains panels for gRPC latency/errors/retries against the gateways, sandbox phase transitions, client-cert expiry, and webhook delivery results; the monitoring guide now documents the /metrics scrape setup (public-or-bearer) and the alert-worthy metric set.
Fleet view — the second-box step — new /fleet page and /api/fleet/* endpoints for the moment the second Spark arrives: per-gateway status/OpenShell-version/sandbox table with a skew warning, policy drift between same-named sandboxes across gateways (compared by policy hash), and one-click policy sync ("use as source") that pushes a sandbox's policy to its namesakes — every push is a normal revision, so it shows up in history and can be reverted. Unreachable gateways degrade gracefully instead of blocking the view.
Per-sandbox activity timeline — "what did this agent do last night?" answered for one sandbox: GET /api/gateways/{gw}/sandboxes/{name}/timeline merges audit entries, approval decisions, kill-switch engagements, and metered usage into one chronology (24h/3d/7d card on the sandbox page). Pure DB reads — it answers even while the gateway is down, which is exactly when you need it.
Policy preset preview + agent-workflow presets — applying a preset is no longer blind: Preview (GET …/policy/presets/{p}/preview) shows which network rules a preset would add or overwrite before you commit. Two presets every coding agent needs join the pack: github (clone/fetch via smart-HTTP, raw files, releases, API reads) and apt (Debian/Ubuntu archives over HTTPS, including ports.ubuntu.com — where arm64/Spark actually fetches from).
LAN inference endpoint probe — the provider form gains a Test endpoint button (POST /api/system/probe-inference, operator+) that probes an OpenAI-compatible base_url for served models before the provider is created — the multi-box complement to the existing loopback auto-detection (the Spark often serves models for the whole LAN). Restricted to private/LAN addresses; read-only single GET with an Ollama /api/tags fallback.
Update awareness & gateway version skew — health probes now record each gateway's OpenShell version; GET /api/system/updates reports them plus a skew flag, and the dashboard shows a banner when gateways diverge. Opt-in (SHOREGUARD_UPDATES_ENABLED, off by default — no phone-home) daily PyPI check fires a one-shot shoreguard.update_available webhook event per new release.
Built-in backup & restore — shoreguard backup create bundles a live, consistent SQLite snapshot (online backup API), the .secret_key, and the VAPID key into one tar.gz; shoreguard backup restore puts it back (server stopped, replaced files kept as *.pre-restore). Admins get a Download backup button on the Security Check page (GET /api/system/backup), and SHOREGUARD_BACKUP_* enables periodic snapshots with rotation. PostgreSQL deployments are pointed at pg_dump.
Agent curfew (quiet hours) — per-gateway schedule (PUT /api/gateway/{name}/curfew, card on the gateway page, new gateway_curfews table, migration 106_gateway_curfews): inside the window the reversible kill switch engages automatically (actor curfew), outside it the curfew releases its own engagement and agents resume. Manually or budget-engaged switches are never touched; windows may wrap midnight and are evaluated in a configurable IANA timezone. Ends the "agent burned tokens while I slept" failure mode for good.
Tamper-evident audit log (hash chain) — every new audit row hashes its fields together with the previous row's hash (new prev_hash/entry_hash columns, migration 105_audit_hash_chain). Edits, mid-chain deletions, and out-of-band inserts are detectable via shoreguard audit verify, GET /api/audit/verify, or the Verify chain button on the audit page. Pre-upgrade rows are reported as legacy; retention cleanup keeps the surviving chain verifiable.
Passkey login (WebAuthn) — register a passkey on the new /profile page and sign in with the phone's screen lock instead of a password. Discoverable-credential login from the login page, per-user passkey management (/api/auth/passkeys/*), new webauthn_credentials table (migration 104_webauthn_credentials), enabled by default via SHOREGUARD_PASSKEYS_ENABLED (SHOREGUARD_PASSKEY_RP_ID pins the relying-party ID). Requires HTTPS or localhost in the browser. New dependency webauthn.
Web Push — phone notifications without a third party — the installed PWA now receives push notifications directly: a service worker (/sw.js), VAPID keys generated on first use (stored next to the secret key, contact via SHOREGUARD_PUSH_CONTACT), device enrolment from the phone dialog in the top bar (/api/push/public-key|subscriptions|test, new push_subscriptions table, migration 103_push_subscriptions), and a webpush webhook channel that fans events out to every registered device with end-to-end-encrypted payloads. Expired devices are pruned automatically. Requires HTTPS (or localhost) — tailscale serve provides exactly that. New dependency pywebpush.
Host threshold alerts + GPU power draw — the node-stats sample now includes power_w (nvidia-smi power.draw), and a new background task (SHOREGUARD_NODE_ALERT_*, on by default) evaluates GPU temperature, memory, and disk thresholds, firing node.threshold_breached / node.recovered webhook events on state transitions — the hot-Spark alert reaches the phone exactly once. GET /api/system/node-alerts and a dashboard badge expose the state.
MQTT webhook channel (Home Assistant bridge) — channel_type: "mqtt" publishes the generic event envelope to <topic>/<event-type> on an mqtt:///mqtts:// broker (one-shot, write-only, optional auth/qos/retain via extra_config). Private broker addresses are allowed in --local mode; the docs ship Home Assistant automations for actionable approval notifications and a kill-switch light. New dependency paho-mqtt.
Server-wide SMTP defaults — SHOREGUARD_SMTP_HOST/_PORT/_USERNAME/ _PASSWORD/_FROM_ADDR configure the mail relay once; email webhooks then only need to_addrs in their extra_config (per-webhook values still override). Pairs with the daily digest for a morning report by mail.
Host resource visibility — GET /api/system/node-stats samples CPU load, memory, disk, and GPU utilisation/memory/temperature (via nvidia-smi when present) for the machine running ShoreGuard, cached a few seconds; the dashboard gains a "This machine" card with usage bars. On a single-box deployment the host is the gateway node — and on GB10's unified memory, host RAM is GPU memory. Scoped honestly (scope: shoreguard-host) until upstream grows a node-stats RPC.
Usage metering & budgets (phase 1) — opt-in SHOREGUARD_BUDGET_METERING_ENABLED meters per-sandbox inference requests by polling gateway logs with persisted cursors (new sandbox_budgets / sandbox_usage / usage_cursors tables, migration 102_budgets); per-sandbox budgets (PUT /api/gateways/{gw}/sandboxes/{name}/budget) with daily/weekly/monthly/total windows fire a budget.exceeded webhook or reversibly detach the sandbox's providers at the limit (resume via the kill-switch path); usage card on the sandbox detail page and a global top-consumers endpoint (GET /api/usage/summary). The upstream metering RPC replaces the log-derived counting when it lands.
mDNS gateway discovery + adopt flow — discovery scans can now browse the local network via mDNS/zeroconf (SHOREGUARD_DISCOVERY_MDNS_ENABLED, service type _openshell._tcp, avahi snippet in the docs); new "Scan this machine" button (POST /api/gateway/import-filesystem) re-runs the filesystem import on demand and shows the per-entry log including skip reasons — adopts NemoClaw-provisioned gateways in one click. New DGX Spark quickstart guide (NemoClaw OOBE → adopt → Tailscale → phone pushes).
Daily digest ("what did my agents do while I slept?") — GET /api/digest aggregates the trailing window (audit activity by action, sandbox churn, approvals, gateway health, webhook failures, engaged kill switches) and renders as a dashboard card; with SHOREGUARD_DIGEST_ENABLED=true a digest.daily webhook event is pushed once a day after SHOREGUARD_DIGEST_HOUR (local time) — the morning report on your phone, sized for always-on overnight agent boxes.
Gateway kill switch — reversible "big red button" (POST/DELETE /api/gateway/{name}/kill-switch, card on the gateway detail page): detaches every sandbox's providers so agents instantly lose inference and tool credentials while keeping their state; the detached set is persisted (new kill_switch_entries table, migration 101_kill_switch) and re-attached on resume, with per-provider retry on partial failures. Fires kill_switch.engaged / kill_switch.released webhook events.
Gateway watchdog events — the health monitor now fires gateway.unreachable (ntfy priority: urgent) and gateway.recovered webhook events on health transitions, replacing hand-rolled cron watchdogs for "did my gateway die overnight?".
Phone-first approvals — new telegram webhook channel (Bot API sendMessage with inline buttons); opt-in one-tap approve/reject links (SHOREGUARD_WEBHOOK_ONE_TAP_APPROVALS + SHOREGUARD_PUBLIC_URL): approval notifications carry signed, expiring action links that cast a single vote from a mobile confirmation page (ntfy action buttons / Telegram inline keyboard); PWA manifest so the dashboard installs to a phone home screen.
Tailscale-first remote access — new operations guide for tailscale serve in front of a loopback bind; opt-in authentication via Tailscale Serve identity headers (SHOREGUARD_TAILSCALE_IDENTITY, loopback-only, login must match a user email); "Open on phone" QR button in the topbar.
Security Check page (Admin → Security Check, GET /api/security/posture) — a deployment posture self-audit: auth mode vs. bind address, unsafe-LAN overrides, secret-key hygiene, open registration, HSTS/CSP, per-gateway mTLS status, and Tailscale detection, each with severity and an actionable fix hint.
Single-user mode (--single-user / SHOREGUARD_SINGLE_USER) — the homelab middle ground between --no-auth and full RBAC: authentication with exactly one admin account (admin@localhost) whose password comes from --admin-password / SHOREGUARD_ADMIN_PASSWORD (interactive prompt on a TTY) and is re-synced on every start, so rotating it is just changing the env var. Refuses to combine with --no-auth.
arm64 / DGX Spark packaging — CI now runs the unit suite natively on an arm64 runner (every GB10 device is aarch64); new deploy/docker-compose.homelab.yml (one container, SQLite, loopback-only bind) and deploy/systemd/shoreguard.service (hardened bare-metal unit, state in /var/lib/shoreguard); new "Homelab / Single Box" operations guide covering install, backup/restore, and health monitoring.

Architecture redesign (internal)¶

Ground-up internal redesign, applied in-place over eight staged refactors. REST API paths/responses and CLI commands are unchanged; internals are not.

Changed¶

Composition root — the 16 module-global service singletons are replaced by a single ServiceContainer built in shoreguard/container.py. The FastAPI lifespan and the test suite construct services through the same build_container() code path.
App factory — shoreguard.app:create_app() assembles the application; shoreguard.api.main is a compatibility shim for uvicorn shoreguard.api.main:app. CORS is no longer configured at import time.
Async-native gRPC (breaking for embedders) — ShoreGuardClient and all submanagers run on grpc.aio; every RPC is a coroutine, streams are async iterators, and ~150 asyncio.to_thread call sites collapsed to direct awaits. The WebSocket thread bridge (api/ws_bridge.py) is gone.
Async data layer (breaking for embedders) — every DB-backed service and the auth subsystem use AsyncSession on the async engine; legacy session.query call sites were rewritten to select(). A sync engine exists solely to run Alembic migrations at startup.
Module layout (breaking for importers) — the CLI moved to shoreguard/cli/ (console script target is now shoreguard.cli:cli); api/pages.py split into auth/user routes + HTML pages; api/auth.py and api/schemas.py became packages; db.py became shoreguard/db/ with per-domain model modules (re-exported via shoreguard.models).
Background tasks — the five hand-rolled lifespan polling loops are declarative PeriodicTask specs run by a generic TaskSupervisor (failure backoff, /readyz health snapshot). Disabled features no longer register — and no longer report as dead — background tasks.
Gateway request scope — a typed, frozen GatewayContext on request.state replaces the private string attribute + ContextVar pair.

Added¶

Preact + TypeScript frontend (complete rewrite) — Vite 6 build in frontend/ (strict tsc, vitest) emitting code-split island bundles into frontend/dist. Every page is a typed Preact island; the app shell (gateway switcher, command palette, theme, health polling, keyboard shortcuts, toasts/confirm) is TypeScript. Alpine.js and all 33 legacy JS files are deleted, and 'unsafe-eval' is dropped from both CSP policies. scripts/generate_api_types.py generates TypeScript API types from the OpenAPI schema. New just frontend-* recipes, pre-commit hooks, and a CI job.
Migration squash — the 17-step Alembic chain is replaced by a single v2_baseline revision built from the models. v0.37 databases are stamped in place; older databases must upgrade through v0.37 first.

Fixed¶

Approval escalation no longer crashes comparing tz-naive and tz-aware decision timestamps when fresh and round-tripped rows mix.
Sandbox event WebSockets now send an explicit close frame when the watch stream ends.
The SBOM, bypass-detection, prover, and boot-hooks pages called an sgFetch() helper that was never defined — all four tabs threw a ReferenceError at runtime. The quorum-approval "voted" badge likewise read an unset window.sgCurrentUser and never displayed. Both work in the rewritten islands.

[0.37.0] — 2026-06-10¶

Solo-dev quality of life¶

Builds out the single-box path (homelab, DGX Spark): safe-by-default no-auth binding, one-click local inference providers, and phone push notifications for approvals.

Added¶

--unsafe-lan / SHOREGUARD_UNSAFE_LAN — explicit opt-in required to combine --no-auth with a non-loopback bind address. Without it, the CLI refuses the combination and enforce_production_safety() blocks startup (override still possible via SHOREGUARD_ALLOW_UNSAFE_CONFIG).
Local inference auto-detect — in local mode, the Providers page probes the default loopback ports of Ollama (11434), vLLM/NIM (8000), llama.cpp (8080), and LM Studio (1234), and offers one-click provider creation with the right OpenAI-compatible base_url prefilled (GET /api/gateway/local-inference). Agents reach local models through inference.local/v1 — no cloud API key ever exists.
ntfy webhook channel — new channel_type: ntfy posts JSON publishes to an ntfy topic URL (ntfy.sh or self-hosted; optional access token via extra_config.token). Approval events arrive as high-priority pushes (approval.pending = high, approval.escalated = urgent) — overnight agent runs can ping your phone for a decision. For a self-hosted ntfy on a LAN address, combine with SHOREGUARD_SSRF_ALLOWED_IPS.

Changed¶

--no-auth now binds to 127.0.0.1 instead of 0.0.0.0 when no explicit --host/SHOREGUARD_HOST is given. Previously the unauthenticated admin UI was reachable from the entire network by default. Containerised no-auth deployments must now pass --host 0.0.0.0 --unsafe-lan explicitly.

Fixed¶

CLI flags now reach the uvicorn reload worker. Under --reload (the default), uvicorn spawns the server as a fresh process that re-reads settings from the environment — flags like --local and --no-auth silently vanished there, and since the worker then looked prod-like, shoreguard --local --no-auth crashed at boot with prod-readiness errors. The CLI now exports its resolved flags to the environment.
The drift_detection background task was missing from the task-health supervision map; its done-callback raised a KeyError whenever the loop exited (e.g. drift detection disabled).

[0.36.3] — 2026-06-10¶

Added¶

SSRF allowlist SHOREGUARD_SSRF_ALLOWED_IPS (#13) — comma-separated IPs/CIDR ranges exempted from the private/loopback SSRF rejection, so a homelab OIDC provider (Authelia, Keycloak, authentik) or an internal webhook/SMTP target on a LAN address can be used without --local. Applies consistently to OIDC issuer/JWKS/token endpoints, webhook URLs (registration and delivery-time DNS-rebinding re-checks — private-address webhook delivery was previously impossible even in local mode), SMTP hosts, and gateway endpoints. Matching happens against the resolved address; SHOREGUARD_ALWAYS_BLOCKED_IPS always takes precedence. Invalid entries hard-fail boot; a /0 entry and the redundant local-mode combination surface prod-readiness warnings. SSRF rejection messages now name the setting as the remedy.

Changed¶

Combining local mode with an allowlisted private gateway endpoint now requires an mTLS bundle for that gateway (the certificate-free plaintext connection no longer applies to allowlisted hosts) — fail-closed edge of the new allowlist, flagged by a prod-readiness warning.
docs/reference/settings.md regenerated; it was missing several previously-added settings (tracing, discovery, cert rotation, prover, audit export, …).

[0.36.2] — 2026-06-07¶

Solo-dev on-ramp¶

Sharpens the single-box / solo-developer path so it competes with "just run the OpenShell TUI". The frictionless path already existed (shoreguard --local --no-auth → SQLite auto-init → auto-import filesystem gateways → no-credential sandbox) but was mis-signposted and had two sharp default edges.

Added¶

Solo Dev guide — new docs/getting-started/solo-dev.md as the headline single-box on-ramp (in the nav before Quick Start, linked from the README). Quick Start is reframed as the team / remote-gateway flow.
In-app orientation hints (no new API surface) — the dashboard "No gateways" empty state and the gateway-register form now point single-box users at shoreguard --local; the sandbox wizard's image field documents that blank uses the gateway default.
Headless first-admin docs — SHOREGUARD_ADMIN_PASSWORD and shoreguard create-user documented in the installation + solo-dev guides for SSH-only boxes.
Boot-time Docker check — in local mode, a clear startup warning when Docker is unusable, instead of a later opaque gRPC timeout on first sandbox create.

Changed¶

Local mode connects to a loopback/private gateway without mTLS when it is registered with no certificate bundle, mirroring the existing private-IP SSRF bypass. Strictly gated on --local / SHOREGUARD_LOCAL_MODE and a private/loopback host — production behaviour is unchanged (mTLS still required by default). Emits a warning so the plaintext connection is visible.
Actionable sandbox ready-timeout — the warning now names SHOREGUARD_SANDBOX_READY_TIMEOUT and points to /api/gateways/diagnostics instead of a bare "did not become ready in time".
--local help text notes it requires a running Docker daemon.

Security¶

Bumped pymdown-extensions 10.21.2 → 10.21.3 (CVE-2026-46338, docs toolchain only).

[0.36.1] — 2026-06-07¶

Security¶

Dependency-only patch that clears 8 CVEs flagged by pip-audit in four locked dependencies. These are time-based advisories unrelated to the M38 sync; the failing ci/audit job had blocked the v0.36.0 publish pipeline (GHCR / PyPI / sigstore never ran). No source changes — fastapi 0.135.1 already allows starlette>=0.46.0, so the starlette bump needs no framework change, and the full suite (3225 tests) passes unchanged under starlette 1.2.x.

pyjwt 2.12.1 → 2.13.0 — PYSEC-2026-175 / -177 / -178 / -179. Direct dependency; the PyJWT[crypto] floor is raised to >=2.13.
starlette 0.52.1 → 1.2.1 — PYSEC-2026-161 (transitive via fastapi).
idna 3.11 → 3.18 — CVE-2026-45409 (transitive via httpx/requests).
pip 26.1.1 → 26.1.2 — PYSEC-2026-196 (transitive, dev/CI tooling only).

[0.36.0] — 2026-06-07¶

M38 Upstream-Sync¶

Synced ShoreGuard against the NVIDIA/OpenShell v0.0.57 release tag (stub regen via scripts/generate_proto.py --ref v0.0.57). Twelve new RPCs wired end-to-end and one wire-breaking field move handled. Every non-supervisor RPC again has client + REST + UI coverage — the coverage allowlist is back to the original four supervisor-path RPCs.

Fixed — wire-breaking field move (SandboxStatus). Upstream PR #1565 moved Sandbox.phase and Sandbox.current_policy_version into the SandboxStatus sub-message. Against a v0.0.57 gateway the previous code read empty fields, so every sandbox showed phase unknown and wait_for_ready hung until timeout. _sandbox_to_dict now reads status.phase / status.current_policy_version.

Added¶

WS38.1 — Provider credential refresh / rotation. Four RPCs (PR #1349) — ConfigureProviderRefresh, GetProviderRefreshStatus, RotateProviderCredential, DeleteProviderRefresh — wired through ProviderManager, REST under /providers/{name}/refresh, and a Credential-Refresh modal on the providers page. Secret material is passed through to the gateway and never written to the audit log (key names only).
WS38.2 — Local-domain service routing. Four RPCs (PR #1101) — ExposeService, GetService, ListServices, DeleteService — via a new ServiceManager, /services REST surface, and a Service Routing page.
WS38.3 — Interactive terminal. ExecSandboxInteractive (PR #1331) — a true bidirectional TTY over a new /ws/{gw}/{sandbox}/exec WebSocket bridge (api/ws_bridge.py) driving a vendored xterm.js terminal (frontend/vendor/xterm/), replacing the one-shot command runner.
WS38.4 — TCP / SSH forward. ForwardTcp (PR #1029) — a raw bidi tunnel over /ws/{gw}/{sandbox}/forward reusing the WebSocket bridge, with a Forward sub-tab (TCP target or SSH, the latter minting a relay session via the existing CreateSshSession). A full in-browser SSH client remains a follow-up; the SSH view streams the raw relay bytes.
WS38.5 — Gateway token (diagnostic). IssueSandboxToken / RefreshSandboxToken (PR #1404) under /tokens. These RPCs bind the minted JWT to the caller's identity, so from ShoreGuard they mint a token for ShoreGuard's own gateway identity — an admin-only diagnostic, not a sandbox-scoped token (documented in the UI).
New WebSocket RBAC dependency require_role_ws gates the mutating exec/forward channels at operator level.
Additive v0.0.57 fields surfaced in the client projections: ObjectMeta.resource_version (on sandbox + provider — foundation for compare-and-swap on UpdateConfig) and Provider.credential_expires_at_ms.

[0.35.0] — 2026-05-09¶

M37 Upstream-Sync¶

Synced ShoreGuard against NVIDIA/OpenShell origin/main @ 57a80ed2 — 157 commits past the M36 sync point (PR #943) including the v0.0.37 release tag. Stub regen, eight new RPC clusters wired end-to-end (client → REST → UI), GraphQL L7 inspection surfaced in the endpoint-rule editor, and a hard-cut migration of the Provider and Sandbox wire schemas to the new upstream ObjectMeta convention.

Gateway minimum lifts to v0.0.37+ (effectively origin/main until the next upstream tag): ShoreGuard no longer speaks the pre-ObjectMeta Provider/Sandbox shape and will fail at the first list/get against any older gateway. There is no compat shim.

Added¶

WS37.3 — Sandbox-provider attach lifecycle. Three new RPCs from upstream PR #1242 — ListSandboxProviders, AttachSandboxProvider, DetachSandboxProvider — fully wired through shoreguard.client.sandboxes.SandboxManager.{list_providers, attach_provider, detach_provider}, the REST surface (GET, POST, DELETE under /sandboxes/{name}/providers) and a new "Attached Providers" card on the sandbox-detail page with attach picker and detach badges.
WS37.4 — Provider-profile registry. Five new RPCs from upstream PR #1170 wrapped in a new shoreguard.client.provider_profiles.ProviderProfileManager: ListProviderProfiles, GetProviderProfile, ImportProviderProfiles, LintProviderProfiles, DeleteProviderProfile. New REST router under /api/gateways/{gw}/provider-profiles (list, get, lint, import, delete). New UI page at /gateways/{gw}/provider-profiles with a table view and a lint-then-apply import dialog modeled on the GitOps apply flow. Gateway-detail page gained a "Profiles" quick-action button.
WS37.5 — GraphQL L7 inspection. Endpoint-rule editor's protocol dropdown now offers graphql (PR #1083). New endpoint-level fields (persisted_queries, graphql_max_body_bytes, path glob) and per-rule fields (operation_type, operation_name, fields) are rendered conditionally when the protocol is GraphQL. Wire mapping added in shoreguard.client._converters and reverse projection in shoreguard.client.policies._network_rule_to_dict.
WS37.6 — Two new gateway settings keys registered upstream and smoke-tested in ShoreGuard:
providers_v2_enabled (gateway-level opt-in for the provider profile policy composition surface).
agent_policy_proposals_enabled (sandbox-level opt-in for the agent-driven policy proposal surface). Both flow through the existing generic Settings whitelist (UI editor is data-driven and discovers new keys automatically) — only test coverage was added for drift protection à la v0.34.2. See docs/reference/gateway-settings.md for the full registered-keys table.

Changed¶

WS37.2 — Provider wire-schema hard cut. Upstream relocated Provider.id and Provider.name into a nested ObjectMeta message with shifted field numbers (id was 1, became metadata.id = 1; type was 3, is now 2; etc.). ShoreGuard's _provider_to_dict and Provider-construction sites in ProviderManager.{create, update} were rewritten to read/write via provider.metadata.{id, name, created_at_ms, labels}. The ProviderResponse Pydantic schema gained explicit id, created_at_ms and labels fields. Tests across tests/test_client_providers.py were migrated.
Sandbox wire-schema hard cut. Same ObjectMeta migration also applied to Sandbox — sandbox.id, sandbox.name, sandbox.created_at_ms now flow through sandbox.metadata. The legacy namespace field was removed upstream and dropped from the ShoreGuard projection. Bulk-rewrote the construction call sites in tests/test_client_sandboxes.py, tests/test_client_resilience.py, and tests/test_m28_metrics.py.
Surface-coverage doc updated. docs/reference/surface-coverage.md reflects the new totals: 42 upstream RPCs, 38 client-consumed, 137 REST routes, 77 UI apiFetch calls.

Notes¶

Agent-driven policy MVP RPCs from upstream PR #1151 ( SubmitPolicyAnalysis, GetDraftPolicy, *DraftChunk*, GetDraftHistory) were already wired through shoreguard.client.policies and shoreguard.api.routes.policies during the M36 sync; this release verifies the surface remains consistent against the new proto.

[0.34.2] — 2026-04-28¶

Fixed¶

Gateway OCSF logging toggle was silently broken. The gateway-detail Observability switch and its REST/CLI counterparts wrote setting key ocsf_logging_enabled, but OpenShell crates/openshell-core/src/settings.rs registers the key as ocsf_json_enabled — gateways rejected the unknown key with InvalidArgument, so the toggle never took effect. Renamed all read/write call sites to use ocsf_json_enabled (frontend/js/gateway.js, tests/test_rbac.py, tests/test_api_gateway_routes.py). Existing gateway state is unaffected; the next save through the toggle now writes the correct registered key.

Changed¶

The CI audit job and the pre-push pip-audit hook now both ignore CVE-2026-3219 (pip 26.0.1, no fix version published yet). The CVE affects pip itself — bundled into the dev environment by uv — not Shoreguard's runtime dependencies. Whitelisted with an inline comment in both call sites so it's removed automatically once pip ships a patched release. All other CVEs continue to fail the build.

[0.34.1] — 2026-04-23¶

Changed¶

Upstream-sync confirmation (post-M35). Re-verified parity against NVIDIA/OpenShell v0.0.36 and origin/main (through #943). proto/*.proto is byte-identical v0.0.35 → v0.0.36 → origin/main, so ShoreGuard's generated stubs remain wire-parity without regeneration. Documentation pin bumped from v0.0.35 to v0.0.36 across installation.md, production-k8s.md, and sbom.md — any gateway ≥ v0.0.30 remains wire-compatible for existing flows.

Upstream absorbed (server-side, no ShoreGuard code change)¶

Gateway-owned VM readiness + VM compute driver E2E (#901) — driver-vm rework; supervisor-gateway readiness handshake moves server-side.
Optional gateway-native Prometheus endpoint (#920) — new --metrics-port flag exposes openshell_server_grpc_requests_total, openshell_server_grpc_request_duration_seconds, openshell_server_http_requests_total, and openshell_server_http_request_duration_seconds. Helm chart adds service.metricsPort (default 9090; set to 0 to disable). Complementary to ShoreGuard's control-plane metrics — docs/integrations/prometheus.md now points operators at scraping both jobs for end-to-end request-path visibility.
CI/Helm hygiene: #943 (helm ClusterRole cleanup), #938 (E2E Gate posting), #942, #928, #929, #926 (CI/toolchain bumps), #931 (driver-vm cross-compile preflight).

Upstream watchlist (unmerged at time of release)¶

drew/creating-a-docker-driver-like-the-vm-driver — bundled Docker compute driver; same compute-driver-axis question as v0.33 (#904 Podman).
drew/containers-in-virtual-machines — libkrun OCI containers; ShoreGuard-irrelevant wire-wise.
vcauxbrisebo/vm-gpu-support, feat/wsl-cdi-spec-watcher — server-only, no ShoreGuard delta expected.

[0.34.0] — 2026-04-23¶

Added¶

M33 REST/UI Coverage Sweep. Every OpenShell capability ShoreGuard consumes is now reachable end-to-end — Client, REST, and UI:
REST routes for sandbox inspection. New GET /api/gateways/{gw}/sandboxes/{name}/config returns the full stored sandbox configuration via GetSandboxConfig, and GET …/provider-env surfaces the gateway-injected provider environment with values redacted to [REDACTED]. Both were implemented on the client since v0.30 but previously had no REST caller.
Endpoint allow_encoded_slash UI toggle. Rule editor now exposes the M32 GitLab-style-path switch per endpoint; no more YAML-only configuration.
GitOps merge-mode apply dialog. New YAML apply section on the sandbox policy page with a replace | merge radio group, optional expected_version guard, dry-run preview, and guided toast when the gateway rejects non-network edits with merge_unsupported.
M18 drift indicator. Policy pin panel now renders Pinned vX and Active vY side-by-side and flags drift when the supervisor has not yet reloaded the pinned revision — using the current_policy_version field exposed in v0.33.
Advanced gateway settings editor. Collapsible key/value panel under each gateway's detail view with add/edit/delete operations against PUT/DELETE /api/gateways/{name}/settings/{key} and a warning banner that validation is gateway-side only.
Coverage Matrix CI gate. New docs/reference/surface-coverage.md cross-tables every OpenShell RPC against Client / REST / UI reachability. scripts/check_coverage.py enforces the matrix; the coverage-matrix CI job is required and will fail the build when a new RPC or route lands without being plumbed through all three layers (with an explicit allowlist for the supervisor-only RPCs PushSandboxLogs, ReportPolicyStatus, ConnectSupervisor, RelayStream).
M34 — SandboxSpec.log_level. The last unmapped upstream SandboxSpec field. Client sandboxes.create() accepts log_level=""|"debug"|"info"|"warn"|"error", the REST POST /sandboxes schema mirrors the same enum, and the creation wizard surfaces a Log Level select defaulting to "Gateway default" (empty string, upstream-conformant).
M35 — Proactive mTLS Cert Rotation. New shoreguard.services.cert_rotation.CertRotationService wires up the reload_credentials() hook that has existed since v0.31 but had no scheduler. A background task polls every registered gateway each hour (SHOREGUARD_CERT_ROTATION_POLL_INTERVAL_S); when a client cert drops below the threshold (SHOREGUARD_CERT_ROTATION_THRESHOLD_DAYS, default 7 days), the registry bytes are re-read and the gRPC channel is rebuilt. Retries use exponential backoff (SHOREGUARD_CERT_ROTATION_MAX_RETRIES, default 3); giveups fire the gateway.cert_rotation_failed webhook. Rotation is idempotent relative to the registry bytes, so multi-replica deployments need no advisory lock. Successful rotations emit an audit gateway.cert_rotated with before/after validity and the attempt count. New metric sg_gateway_cert_rotations_total{gateway,outcome} with labels success, failure, skipped_not_due, skipped_no_cert. New runbook at docs/operations/cert-rotation.md.

Changed¶

Configuration reference now documents the four SHOREGUARD_CERT_ROTATION_* settings in a new "Cert Rotation" section.
Surface Coverage linked from the docs nav under Reference.

Not in scope¶

Supervisor session relay RPCs (ConnectSupervisor, RelayStream) remain generated-only; no Client / REST / UI surface.
A full gateway-settings table discovered from upstream remains a follow-up: the Advanced Settings editor is a generic key/value component hidden behind a warn banner, not a typed form.
current_policy_version drift indicator only covers the pin panel; cross-sandbox fleet-wide drift dashboards are deferred.

[0.33.0] — 2026-04-23¶

Added¶

M32 Upstream-Sync + GitOps Incremental Merges. Pin bumped to NVIDIA/OpenShell v0.0.35 (from v0.0.32). Three upstream tags and six main commits of delta, absorbed in six commits:
Stub regen (chore(proto)). Protobuf stubs regenerated against v0.0.35; sandbox.proto restored to byte-parity with upstream. The Sandbox / SandboxSpec / SandboxTemplate message family migrated upstream from datamodel.proto to openshell.proto; call sites updated. compute_driver.proto is now skipped by the regen script because the supervisor↔gateway surface is not consumed by ShoreGuard as a control-plane.
NetworkEndpoint.allow_encoded_slash (upstream #826). New Field 11 flows through _dict_to_network_rule, the listing converter, and the Z3 prover. Endpoints fronting GitLab-style upstreams can now preserve %2F in paths instead of rejecting. Default stays False, upstream-conformant.
L7 path canonicalization (upstream #878). Closes a soundness gap in the Z3 prover's L7 reasoning. A new canonicalize_request_path() mirrors the upstream Rust canonicalizer (percent-decode, dot-segment resolution, slash collapse, ;params strip) so prover counterexamples live in the same path universe as the gateway's enforcement path.
SSH session response charset contract (upstream #876). Every field of CreateSshSessionResponse is now validated against the documented charset before ShoreGuard surfaces the response. Defence-in-depth: a compromised or misconfigured gateway cannot push ProxyCommand-injection metacharacters into the REST response.
current_policy_version on sandbox endpoints. The field was already extracted client-side but not declared on SandboxResponse; REST consumers and OpenAPI tooling now see it and the M18 policy-pinning UI can render "configured vs. active".
GitOps incremental merge mode (upstream #860, requires gateway ≥ v0.0.33). POST /policy/apply accepts mode: "replace" | "merge". In merge mode, ShoreGuard diffs current against target policy, emits PolicyMergeOperations (remove_rule before add_rule, safety-ordered), and sends them through the gateway's UpdateConfigRequest.merge_operations surface. UnsupportedMergeError surfaces as HTTP 400 for non-network edits (filesystem / process / landlock); callers retry with mode=replace. CLI: shoreguard policy apply --mode merge. New doc page docs/guides/gitops-merge.md.

Changed¶

Documentation pin references bumped from v0.0.32 to v0.0.35 across installation.md, production-k8s.md, sbom.md, and m8-demo.md. New compatibility matrix in installation.md.

Upstream absorbed (server-side, no ShoreGuard code change)¶

These upstream changes land on the gateway and pass through our existing surfaces without modification:

Supervisor-initiated SSH connect / exec over gRPC-multiplexed relay (#867). ConnectSupervisor / RelayStream RPCs and 12 new messages exist only as generated Python; ShoreGuard does not consume them.
Seccomp / procfs hardening (#844, #869, #891).
Dedicated health-check listener on a separate unauthenticated port (#903, #915).
tower-http TraceLayer request-level logging (#895).
install-vm CLI for the gateway binary (#887).
Dedicated Kubernetes client without read-timeout for watches (#907).
Read-only baseline path preservation (#910).
SSRF host-alias resolution (#912).
Configurable image-transfer timeout (#914).
Sandbox git clone trusts the internal CA via GIT_SSL_CAINFO injection (#918).
E2E pipeline on external forks via copy-pr-bot flow (#922).

Upstream watchlist (unmerged at time of release)¶

pull-request/904 — Podman compute driver. Clarify whether this exposes a new gateway runtime tag (extending the M30 KNOWN_RUNTIMES set) or a sandbox-side compute driver (orthogonal axis) before the next sync.
drew/creating-a-docker-driver-like-the-vm-driver — bundled Docker compute driver. Same clarification needed.
drew/containers-in-virtual-machines — libkrun OCI containers; ShoreGuard-irrelevant.
vcauxbrisebo/vm-gpu-support, feat/wsl-cdi-spec-watcher — server only, tracked until merge, no ShoreGuard delta expected.

[0.32.2] — 2026-04-19¶

Changed¶

Upstream-sync confirmation (post-M29). Re-verified parity against NVIDIA/OpenShell v0.0.32 and origin/main@e39bb380. All four upstream .proto files (sandbox.proto, inference.proto, datamodel.proto, openshell.proto) are byte-identical across v0.0.30 → v0.0.32 → origin/main, so ShoreGuard's generated stubs remain wire-parity without regeneration. Documentation references bumped from v0.0.26 to v0.0.32 as the recommended gateway pin — any gateway ≥ v0.0.30 is wire-compatible.
Routed-inference docs now describe the upstream header sanitization behaviour added in OpenShell PR NVIDIA/OpenShell#826: the gateway's router forwards only a common-header set (content-type, accept, accept-encoding, user-agent), the per-route default_headers, and a per-provider passthrough list (anthropic-version, anthropic-beta, openai-organization, x-model-id). ShoreGuard itself does not inject inference-path HTTP headers, so no code change is required — OpenTelemetry traceparent propagates on gRPC metadata, not on the forwarded HTTP surface.
Installation guide now mentions the standalone openshell-gateway binary upstream began publishing in NVIDIA/OpenShell#853 as an alternative to the full cluster image.

No behavior changes at runtime, no schema changes, no new dependencies.

Upstream watchlist (not pulled)¶

Unmerged upstream branches tracked for a future milestone:

feat/os-81-incremental-policy-merge — adds incremental sandbox policy updates (proto/openshell.proto +45). Relevant to M23 GitOps once tagged.
feat/supervisor-session-grpc-data + feat/supervisor-session-relay — re-platforms sandbox SSH connect + exec onto an HTTP/2 relay (proto/compute_driver.proto -25, proto/openshell.proto +120). Wire-break candidate; will require a stub regen + parity pass on the scale of M29 once released.
fix/l7-path-canonicalization — L7 path canonicalization fix; may shift policy-prover counterexamples for path-prefix rules.
tmutch/include-runtime-policy-revision-sandbox-get-output — surfaces runtime policy revision on sandbox get; pairs with M18 policy pinning UI once merged.

[0.32.1] — 2026-04-16¶

Fixed¶

Release pipeline Trivy scan. The v0.31.0 and v0.32.0 Release workflows both failed at the "Scan image for CRITICAL/HIGH CVEs" step because the computed image reference carried the original repository casing (ghcr.io/FloHofstetter/shoreguard@sha256:…) and Trivy rejects uppercase OCI references with could not parse reference. The docker/metadata and docker/build-push actions lowercase internally so the push itself worked, but the Trivy step read github.repository raw. A new workflow step lowercases the name into a step output and the Trivy image-ref now reads from it. No code change; no wire-format change. This release exists purely to get a passing Release pipeline on tag so the Docker image, PyPI package, and GitHub Release body actually ship for the 0.32.x line.
Auto-tag version parser. The auto-tag workflow's version extractor used (.+)$ which swallowed any trailing commit-subject text after the version, so a subject like release: ShoreGuard v0.32.0 — M29 + M30 produced an EXPECTED=0.32.0 — M29 + M30 and failed the pyproject.toml equality check. Bug was latent on v0.31.0 already. The capture group now stops at the first whitespace.

[0.32.0] — 2026-04-16¶

Added¶

M29 Upstream-Sync Hardening. Parity pass against NVIDIA/OpenShell v0.0.27–v0.0.30 for the parts of the upstream delta that apply to a control-plane:
Network-policy deny rules (upstream #822). Regenerated sandbox.proto from upstream origin/main — adds the L7DenyRule message and NetworkEndpoint.deny_rules field. Deny rules flow through the dict ↔ proto converter and the Z3 prover encoding as NOT(any_deny_matches) AND-ed over the allow-clause disjunction, so a deny rule that overlaps an allow rule makes the otherwise matching path UNSAT in counterexample search (deny wins).
TLD-level host wildcard rejection (upstream #791). Registering a network policy with *.com / *.io / *.local now raises PolicyValidationError at policy-write time and surfaces as HTTP 400. Multi-label suffixes like *.example.com stay allowed.
Symlink-aware binary paths (upstream #774). When a network policy declares a binary path that exists locally and is a symlink, the resolved realpath is persisted in the proto instead of the symlink. Remote-gateway paths (the common case) pass through unchanged.
SSE error framing hardening (upstream #842). A new _format_sse_event helper strips stray control characters from the data: line before emission so a pre-escaped payload cannot smuggle a premature SSE-framing terminator.
Always-blocked IP list (upstream #814). New SHOREGUARD_ALWAYS_BLOCKED_IPS setting parses operator-supplied CIDRs at startup (invalid entries hard-fail boot via a Pydantic field_validator) and feeds is_private_ip() so extra ranges (metadata VIPs, internal-management subnets) can be hard-blocked without a code change.
M30 libkrun microVM gateway awareness. Upstream PR #611 added a third gateway runtime alongside Docker and Kubernetes; ShoreGuard now models gateway runtimes via a strict metadata convention:
New shoreguard.gateway_runtime module defines the closed KNOWN_RUNTIMES set (docker, kubernetes, libkrun) plus get_runtime() / validate_runtime() helpers. The validator rejects typos (libKrun, krun) and normalises mixed-case inputs to the canonical lowercase spelling at registration time.
RegisterGatewayRequest has a Pydantic field_validator on metadata that picks up metadata.runtime, validates it, and persists the normalised value. Unknown runtimes surface as HTTP 422 with a precise field pointer.
GatewayResponse gets a top-level runtime: str | None field, populated by GatewayRegistry._to_dict so GET /gateway/list and GET /gateway/{name}/info expose the same surface.
GET /gateway/list?runtime=libkrun filters gateways by runtime tag. Unknown runtime filters hard-fail with HTTP 400 instead of silently returning an empty list.
gateway.register audit-log entries include the resolved runtime alongside endpoint, auth mode, and labels.

Changed¶

sandbox.proto is now byte-identical to NVIDIA/OpenShell@origin/main, so ShoreGuard can speak wire-level parity with any OpenShell gateway ≥ v0.0.30, including for network-policy deny rules over the gRPC channel.
_dict_to_network_rule is now the single chokepoint for network policy validation — it runs TLD-wildcard rejection and symlink resolution before the proto is built, so every caller (YAML import, REST CRUD, policy-apply proposal) picks up the same guarantees.

[0.31.0] — 2026-04-14¶

Added¶

M28 Observability — complete. Three independent pillars land in this release; all three are off by default and opt in via env vars:
Prometheus /metrics — M28 gRPC call counters, retry counters, sandbox phase-transition counters, boot-hook run counters and duration histograms, and a gateway client-cert expiry gauge. The /metrics endpoint honours the normal auth gate and can be flipped public via SHOREGUARD_METRICS_PUBLIC.
OpenTelemetry trace context through the routed-inference path. FastAPI + gRPC auto-instrumentation so a W3C traceparent header propagates end-to-end from an incoming HTTP request through every outgoing gRPC call to an OpenShell gateway. Console exporter by default; set SHOREGUARD_TRACING_OTLP_ENDPOINT=... for OTLP/HTTP. New SHOREGUARD_TRACING_* settings. Disabled unless SHOREGUARD_TRACING_ENABLED=true.
Structured audit-log export lanes. A new AuditExporter service fans every successfully-written audit entry across three lanes, each independently togglable: stdout-JSON (one JSON line per entry for Loki/Vector), syslog via SysLogHandler (RFC 5424 framing, JSON body, for SIEM receivers), and webhook via the existing fire_webhook() pipeline as audit.entry events. Lane errors are isolated — a broken receiver in one lane never prevents siblings from firing, and a lane exception never propagates into the audit write path. New SHOREGUARD_AUDIT_EXPORT_* settings.
mTLS hardening. Client-cert validation is now eager at gateway registration time (not lazy at first use), and the registered cert is rotated proactively before expiry. Expiry is also surfaced on /metrics via sg_gateway_cert_expiry_seconds (see above).
gRPC retry + deadline resilience on the sandbox path. Sandbox create/exec/list now retry through a shared resilience layer with per-op deadlines, exponential backoff, and jitter. Retry and final status are surfaced on /metrics.

Changed¶

New dependencies: opentelemetry-api, opentelemetry-sdk, opentelemetry-instrumentation-fastapi, opentelemetry-instrumentation-grpc, opentelemetry-exporter-otlp-proto-http.

[0.30.3] — 2026-04-13¶

Changed¶

Docstring cleanup across the code surface. Every module, class, and method docstring in shoreguard/ that was either a trivial one-liner or dominated by sprint-tracking noise (milestone identifiers, release-version timestamps used to justify a design choice, CHANGELOG-style prose) has been rewritten into a proper heading-plus-explanation shape that describes what the code does, why it looks the way it does, and where the non-obvious delegation boundaries are. That kind of context belongs next to the code; sprint history belongs in this file and in the roadmap.
Docstring cleanup — tests, scripts, frontend. Same pass applied to tests/ module and class docstrings, scripts/ demo-walker module/function docstrings plus banner and API description strings, and frontend/js/, frontend/css/, and frontend/templates/ JSDoc headers and HTML/Jinja section comments. Sprint identifiers no longer appear in any source tree. Demo script filenames (scripts/m*_demo.py) stay as-is — they remain historical markers referenced from runbooks. No behavior, no test assertions, no Alpine bindings or selectors touched.
README. Bumped the cosign verify example from the stale 0.27.0 image tag to 0.30.2.
CI. Cleaned a stale milestone reference out of the m12-fixture-lint job comment. The job name itself is preserved so branch-protection status checks don't break.

Fixed¶

Test rate-limiter leak under serial pytest. The _disable_auth autouse fixture in tests/conftest.py reset only the login rate limiter, not the global or write limiters. Under pytest-xdist each worker has its own Python process so every worker starts with fresh limiters and the leak was invisible — the local -n auto runs stayed green. CI runs the test job serially, all tests share one process, the global limiter accumulates request counts, and tests late in collection order started seeing HTTP 429 on reads that should have been 200/404. The cascade failure looked like two distinct problems in the CI log (14 tests failing with 429 plus one with KeyError: 'total'), but the KeyError is the same bug: the assertion read data["total"] without checking the status code, so a 429 body without that field triggered it. Fix: call reset_limiters() instead of reset_login_limiter(). Verified with a full serial pytest run — 2933 passed, 1 skipped.

No behavior changes at runtime, no schema changes, no new dependencies.

[0.30.2] — 2026-04-13¶

Added (M24)¶

Terraform Provider v0.30.0. Released separately at FloHofstetter/terraform-provider-shoreguard. Provider versioning now mirrors the ShoreGuard server (jump from v0.1.0 straight to v0.30.0) so operators can pin provider + server together. New resources:
shoreguard_group, shoreguard_group_membership, shoreguard_group_gateway_role — RBAC as code.
shoreguard_approval_workflow — M19 quorum + escalation config.
shoreguard_policy_pin — M18 pin (locks active policy version, server returns HTTP 423 on any subsequent edit or approval while the pin is active).
shoreguard_sandbox_boot_hook — M22 pre/post-create hooks.
Breaking change. shoreguard_sandbox_policy has been removed. Policy content belongs in the M23 GitOps flow (shoreguard policy export/apply), not Terraform state which would drift on every denial flow. Migration snippet in the provider CHANGELOG: terraform state rm shoreguard_sandbox_policy.<name> followed by shoreguard policy export ….
Build. Thin REST wrapper built on terraform-plugin-framework v1.19 (Go 1.25). Acceptance tests are skeletons that skip without SHOREGUARD_BASE_URL + GATEWAY_NAME.

Added (M23)¶

GitOps Policy Sync. Declarative YAML policy management for sandboxes, driven from CI/CD. Two new endpoints under /api/gateways/{gw}/sandboxes/{name}/policy/:
GET /export returns a deterministic YAML document with a metadata block (gateway, sandbox, version, policy_hash, exported_at) plus the full policy. Round-trip stable: re-export of a parsed export yields the same policy block.
POST /apply accepts {yaml, dry_run, expected_version}. Status codes: 200 up_to_date, 200 dry_run, 200 applied, 202 vote_recorded, 409 version mismatch, 423 pinned, 400 malformed YAML.
Optimistic locking. expected_version falls back to metadata.policy_hash from the YAML document. Mismatch → HTTP 409 with the live current_hash in the body so CI can refetch + retry.
M18 pin guard reuse. Apply (and dry-run) on a pinned sandbox returns HTTP 423 — CI sees the pin instead of silently planning a change that cannot apply. Export remains allowed (read-only).
M19 workflow gating. When an active multi-stage approval workflow exists for the sandbox, the first apply records one approve-vote on a synthetic chunk id policy.apply:<sha16> and returns 202. Subsequent votes (same YAML body → same chunk id) accumulate until quorum, at which point the upstream UpdateConfig fires once. New table policy_apply_proposals (Alembic 017) caches the pending YAML body so the second voter does not need to resubmit bytes.
shoreguard policy CLI. Three Typer subcommands wrapping the new endpoints: export (stdout/file), diff (dry-run, exits 1 on drift), apply (writes, exits 1 if a vote was recorded but quorum not yet met, exit 2 on errors). Reads SHOREGUARD_URL + SHOREGUARD_TOKEN from env or --url / --token flags.
Drift detection (optional). New DriftDetectionService background loop, off by default behind SHOREGUARD_DRIFT_DETECTION_ENABLED. Polls every registered sandbox every interval and fires policy.drift_detected webhook on any hash change between scans (someone edited the policy outside the GitOps pipeline). The first scan after restart bootstraps the snapshot silently. Failures per sandbox are logged + swallowed — one broken sandbox does not kill the loop.
New webhook events. policy.applied (now also fires from apply), policy.drift_detected. Existing approval.vote_cast / approval.quorum_met are reused when apply hits the quorum path with a scope: policy.apply field added to the payload.
New audit events. policy.exported, policy.apply.dry_run, policy.apply.noop, policy.apply.voted, policy.applied (apply variant), policy.drift_detected.
Demo + runbook. scripts/m23_demo.py runs an 8-phase walk against a live local stack (export → no-op → drift → write → vote → quorum → pin → drift hint). Detailed runbook at scripts/m23-gitops.md.
Tests. 49 new tests across test_policy_diff_service.py, test_policy_yaml_service.py, test_policy_apply_proposal_service.py, test_drift_detection_service.py, test_policy_gitops_api.py, and test_cli_policy.py.

Added (M22)¶

Sandbox Boot Hooks. Operators can attach pre/post-create hooks to a sandbox. Pre-create hooks act as ShoreGuard-side validation gates that run before CreateSandbox reaches the gateway: their command executes via subprocess.run in the ShoreGuard process with a whitelisted environment (SG_SANDBOX_NAME, SG_SANDBOX_IMAGE, SG_SANDBOX_POLICY_ID, plus user-defined env). A failure raises BootHookError and aborts sandbox creation. Post-create hooks run after CreateSandbox succeeds, executing inside the new sandbox via the existing ExecSandbox RPC — intended for warm-up tasks (apt update, telemetry init). The execution surface is intentionally ShoreGuard-side because OpenShell v0.0.26 has no native hook RPC; once upstream ships one, BootHookService will detect it and delegate.
Storage. New table sandbox_boot_hooks (Alembic 016) with phase, command, workdir, env_json, timeout_seconds, order, enabled, continue_on_failure, plus run state (last_run_at, last_status, last_output truncated to 4 KiB).
REST API under /api/gateways/{gw}/sandboxes/{name}/hooks: GET (list, viewer), GET /{id} (single), POST (admin), PUT /{id} (admin), DELETE /{id} (admin), POST /reorder (admin), POST /{id}/run (operator, manual trigger). Audit events: boot_hook.created, boot_hook.updated, boot_hook.deleted, boot_hook.reordered, boot_hook.manual_run.
SandboxService.create() integration. When the boot hook service is wired in, create() runs the pre-create gate before CreateSandbox and the post-create chain after. The new admin-only skip_hooks flag on POST .../sandboxes bypasses both phases for recovery scenarios. Failures from continue-on-failure hooks are surfaced in the response under boot_hooks.post_create rather than rolled back.
Frontend. New "Hooks" tab on the sandbox detail page (frontend/templates/pages/sandbox_hooks.html + frontend/js/boot_hooks.js) with separate Pre-create / Post-create sections, in-place toggle, reorder buttons, an editor modal (command, workdir, env KEY=VALUE, timeout, continue_on_failure), and a one-click "Run" button that surfaces the captured output inline.
MicroVM Gateway Discovery. ShoreGuard can now auto-register OpenShell gateways announced via DNS SRV records (_openshell._tcp.<domain>). Discovery runs both as a manual trigger (POST /api/gateway/discover — operator+) and as a configurable background loop in the application lifespan (analogous to the existing _health_monitor). Discovered endpoints flow through the same _validate_endpoint_format guard as manual registration, so the *.svc.cluster.local whitelist still applies and other private IPs are still rejected unless local_mode is enabled.
New dependency. dnspython >= 2.6 (MIT licensed).
Settings. New DiscoverySettings block (SHOREGUARD_DISCOVERY_*): enabled, domains, interval_seconds, default_scheme, auto_register, resolver_timeout_seconds. Off by default.
Service. shoreguard/services/discovery.py exposes discover_domain, discover_all, auto_register, run_once, and status. Names are derived from the SRV target host (sanitised, max 253 chars), with the port appended when not 443/30051.
REST API. POST /api/gateway/discover (operator+, optional {domains: [...]} override; audit-logged as gateway.discovered) and GET /api/gateway/discovery/status (viewer).
Frontend. "Discover" button on the gateways list page that triggers POST /api/gateway/discover, surfaces the result counts in a dismissable banner + toast, and refreshes the table.
Demo + tests. scripts/m22_demo.py walks the boot-hook + discovery flow end-to-end against a live ShoreGuard. New test files (tests/test_boot_hooks_service.py, tests/test_api_boot_hooks_routes.py, tests/test_discovery_service.py, tests/test_api_discovery_routes.py) add ~80 unit + integration tests; total suite remains green.

Added (M21)¶

SBOM / Supply-Chain Viewer. Operators can upload a CycloneDX JSON SBOM per sandbox (typically from CI) and browse components, licenses, and known vulnerabilities directly in the ShoreGuard UI. Vulnerabilities are read offline from the CycloneDX vulnerabilities array — no online NVD/OSV lookup.
Storage. Two new tables (Alembic 015): sbom_snapshots (one per (gateway, sandbox), holds metadata + the original CycloneDX payload) and sbom_components (denormalised rows for fast paginated search). A new upload replaces the prior snapshot — historical snapshots are deliberately out of scope.
Service. shoreguard/services/sbom.py parses CycloneDX 1.5, aggregates per-component vuln_count + max_severity via bom-ref join, and exposes ingest, get_snapshot, get_raw_json, delete_snapshot, search_components, get_vulnerabilities. No new Python dependency — the parser is self-contained (~280 LoC).
REST API under /api/gateways/{gw}/sandboxes/{name}/sbom: POST (upload, admin, max 10 MiB, audit-logged as sbom.uploaded), GET (snapshot metadata), GET /components (paginated search by ?search= + ?severity=, including severity=CLEAN for vuln-free components), GET /vulnerabilities (sorted highest-severity first), GET /raw (original payload as application/vnd.cyclonedx+json), DELETE (admin, audit-logged as sbom.deleted).
Frontend. New SBOM tab in the sandbox sub-nav. Empty-state upload flow with cURL example for CI; component table with debounced search + severity-filter chips + pagination; vulnerabilities tab with severity-coloured CVE cards and reference links; admin-only Replace + Delete actions; raw download.
Tests. 46 new service tests covering parser happy + failure paths, ingest replace + cascade, search/filter combinations, pagination clamping; 20 new API route tests covering upload/get/list/ delete + edge cases. Full suite: 2805 passed, 1 skipped.
Demo script. scripts/m21_demo.py walks 8 phases against a running ShoreGuard, using the bundled CycloneDX fixture scripts/fixtures/sample_cyclonedx.json (10 components, 2 CVEs).
Ingestion model. v0.0.26 OpenShell exposes no SBOM RPC, so M21 is upload-only. CI is the right source anyway — it knows which build is deploying. A future milestone can add a gateway-pull path once the upstream feat/237-sbom-tooling branch ships.

Added (M20)¶

RPC Parity. ShoreGuardClient grew two new thin wrappers around OpenShell RPCs that were previously unreachable from the UI: a get_inference_bundle() that returns the fully resolved inference config (cluster default + route list + per-route credential state) and SandboxManager.get_config() / get_provider_environment() for inspecting live sandbox config + provider env projection. API keys are redacted to has_api_key: bool at the wrapper boundary so the UI can render a shield badge without handling secrets.
New endpoint. GET /api/gateways/{gw}/inference/bundle (viewer, audit-logged). Surfaces the resolved bundle as a table on the gateway detail page with a per-route credential shield badge.
Push-based policy-status wait. approve_chunk / approve_all no longer busy-poll via _poll_policy_loaded. New PolicyStatusBroker (shoreguard/services/policy_status.py) opens a short-lived WatchSandbox stream in a worker thread, sets an asyncio.Event on every draft_policy_update, confirms the new version via GetSandboxPolicyStatus, and falls back to a 2 s slow poll if the stream fails. The stream is always cancelled cleanly on success, timeout, or cancellation. Browser receives the same updates over the existing /ws channel via a new sg:policy-status-update DOM event, so any open page reacts without a hard refresh. The persistent-first sort toggle state is now persisted per browser via localStorage.
Tests. 8 new tests covering broker happy-path, wake on draft update, timeout fallback, cancel cleanup, upstream watch failure, and api_key → has_api_key redaction.

Added (M19)¶

Multi-Stage Approval Workflows (Quorum). Per-sandbox approval workflows let teams require multiple sign-offs before a policy change takes effect. New models ApprovalWorkflow + ApprovalDecision (Alembic 014) and ApprovalWorkflowService with upsert, delete, record_decision, check_quorum, and reactive escalation on each vote (no background scheduler — escalation fires on the next vote after the deadline).
Endpoints. GET|PUT|DELETE /api/gateways/{gw}/sandboxes/{name}/approval-workflow (admin for writes, viewer for reads). GET /api/gateways/{gw}/approvals/{chunk_id}/decisions returns the running tally + voter list. POST .../approve under an active workflow returns HTTP 202 vote_recorded until quorum is reached, at which point the upstream ApproveChunk fires exactly once. A single reject is unanimous and kills the proposal immediately. POST .../approve-all is admin-only when a workflow is active and returns HTTP 409 to non-admins (emergency override path).
Webhook events. approval.vote_cast, approval.quorum_met, approval.escalated — all carry the workflow ID, sandbox, voter, and current tally.
Frontend. Workflow banner + vote-count badge + voter list on the approval detail modal, "Vote to Approve" button (disabled after the current user has voted), and an admin-only workflow config modal on the sandbox detail page.
Tests. 37 new service + API tests covering upsert, quorum, rejection, escalation, admin override, and webhook firing paths.

Added (M18)¶

Policy Pinning. Operators can pin the active policy version of a sandbox to prevent accidental edits during an incident or change freeze. New PolicyPin model (Alembic 013) + PolicyPinService with pin, unpin, get, check, and auto-expiry. All seven policy-write endpoints (PUT /policy, network/filesystem/process CRUD, preset apply) plus POST .../approve and .../approve-all now raise PolicyLockedError → HTTP 423 when a pin is active. Export (M23) remains allowed; discovery + read paths are unaffected.
Endpoints. GET|POST|DELETE /api/gateways/{gw}/sandboxes/{name}/policy/pin. POST accepts {reason, expires_at} (operator+, audit-logged as policy_pin.created / policy_pin.deleted).
Security-Flagged Rules UI. Rule chunks that OpenShell marks as security-flagged now render a red shield badge per chunk, a dedicated filter chip, and a warning banner on the approval page. The "Approve All" confirmation dialog carries an explicit "include flagged" checkbox so flagged rules cannot be bulk-approved by accident.
Frontend. Pin banner + lock/unlock button + pin modal (reason + expiry picker) on the sandbox detail page. All policy sub-pages (network, filesystem, process, presets, approvals) disable their edit buttons when the sandbox is pinned.
Tests. 43 new tests covering service CRUD, auto-expiry, guard-by-endpoint coverage, and UI state.

Added (M17)¶

Policy Prover (Z3 Formal Verification). New optional dependency on z3-solver. ProverService (shoreguard/services/prover.py) ships four query templates encoded as Z3 constraints over the sandbox policy:
can_exfiltrate — is there a writable egress path to a non-whitelisted destination?
unrestricted_egress — does any network rule allow 0.0.0.0/0 on an unbounded port range?
binary_bypass — can a binary hash outside the allowlist be executed?
write_despite_readonly — can any filesystem write succeed despite a readonly root? Each template returns SAT / UNSAT plus a witness model on SAT so operators can see why a policy fails.
Endpoints. POST /api/gateways/{gw}/sandboxes/{name}/policy/verify (operator+) runs one or more templates. GET /api/gateways/{gw}/policies/presets/verify lists the available templates + default parameters.
Frontend. New "Verify" tab on the sandbox detail page with a preset picker, run-button, and a result panel that renders the witness model as a table on SAT or a green "property holds" banner on UNSAT.
Tests. 30 new unit tests covering each template happy path, malformed policy input, and Z3 timeout handling.
Demo. scripts/m17_demo.py walks the four templates against a purposefully misconfigured sandbox.

Added (M16)¶

Binary-Context Approvals. Approval chunks now carry binary + process context so reviewers can decide with full evidence. DenialContextService caches the denial context at submit_analysis time (in-memory TTL cache) and enriches it at get_draft:
Process ancestry breadcrumb (parent → grandparent → …)
Binary SHA-256 badge
Persistent-context badge (flagged when the same binary has requested approval before)
L7 request samples table (up to 10 recent requests with method, path, status, source)
Frontend. Approval detail modal renders the new context block with collapsible sections and a "Persistent first" sort toggle on the pending-approvals list.

Added (M15)¶

Bypass Detection Dashboard. OCSF events classified as potential policy bypasses (denials followed by success, egress via unusual ports, DNS exfiltration signatures) are streamed into a new BypassService ring buffer (last 1 000 events, in-memory) and exposed both as an API and a UI tab.
Endpoints. GET /api/gateways/{gw}/bypass (paginated event list with severity filter) and GET /api/gateways/{gw}/bypass/summary (per-severity counts + top offending sandboxes).
Frontend. New "Bypass" tab on the gateway detail page with a severity filter, event timeline, and a MITRE ATT&CK technique mapping per event.

Fixed (M14)¶

Approve → reload race. The POST /approve and POST /approve-all endpoints now accept a ?wait_loaded=true query parameter. When set, the server polls the gateway's policy status internally (up to 30 s) and only returns once the new policy version is reported as loaded — or 504 on timeout. This eliminates the client-side polling loop that was previously required to avoid spurious 403s from the proxy still running the old policy. All three demo scripts (m7_demo.py, m8_demo.py, m12_demo.py) have been updated to use the server-side wait.
Local-mode plaintext gateway auto-register. When SHOREGUARD_LOCAL_MODE=true, the filesystem gateway importer now skips mTLS certificate material for http:// (plaintext) endpoints. Previously, if the OpenShell data directory contained cert files alongside a plaintext gateway, they were imported and the connection attempt used TLS against a plaintext endpoint, resulting in a permanent unreachable status.

[0.30.1] — 2026-04-12¶

Changed¶

Moved charts/openshell-cluster → tests/fixtures/charts/openshell-cluster. The chart was never a supported production install path — it wraps NVIDIA's ghcr.io/nvidia/openshell/cluster all-in-one image (privileged k3s-in-container, ~10-15% network overhead from double iptables NAT) so that scripts/m12_demo.py can exercise the M12 federation code path in local/kind/CI without requiring NVIDIA's upstream OpenShell Helm chart at test time. Keeping it under charts/ misled readers into thinking it was a production option. The fixture is now clearly scoped as internal test infrastructure: its README leads with a "not a supported install path" banner and points at the real production pattern (install NVIDIA's upstream OpenShell chart separately, then charts/shoreguard alongside it). CI renames the lint/render block into a dedicated m12-fixture-lint job so fixture status is tracked distinctly from the supported helm-lint job. scripts/m12_demo.py and scripts/m12-federation.md reference the new path and carry the same positioning notice.

Added (M12)¶

Internal M12 federation test fixture at tests/fixtures/charts/openshell-cluster/. Runs the upstream ghcr.io/nvidia/openshell/cluster:0.0.26 k3s-in-container image as a privileged StatefulSet so the helm-deployed ShoreGuard can federate multiple gateways entirely in k8s. A post-install bootstrap Job (weight 5, bitnami/kubectl) kubectl exec's into the cluster pod, generates a CA + server + client mTLS set inside /certs, creates the k3s-internal secrets openshell-server-tls, openshell-server-client-ca, openshell-client-tls, and openshell-ssh-handshake (idempotent via kubectl apply --dry-run=client), then exports the client material as an outer-ns Secret <release>-openshell-cluster-client-tls. helm test ships a busybox nc -zv TCP probe. Chart-time validation fails rendering when label.env is empty.
scripts/m12_demo.py — in-k8s federation end-to-end demo. k8s analog of scripts/m8_demo.py: reads each gateway's client mTLS Secret via kubectl, registers both clusters via POST /api/gateway/register with auth_mode=mtls / scheme=https, then drives the same Phase A–J federation assertions (label filter, per-gateway audit attribution, unfiltered audit coalescence, /api/gateway/list with labels + status=connected). Sandbox exec steps (Phases F + G) route through ShoreGuard's /api/gateways/{gw}/sandboxes/{sb}/exec LRO instead of shelling to openshell CLI, so the host running the demo only needs kubectl, helm, and uv.
scripts/m12-federation.md — runbook for the M12 demo: kind cluster, privileged namespace, two helm install cluster-{dev,staging}, one helm install sg, kubectl port-forward, the Phase-A-J walk-through, and Phase K (kubectl rollout restart statefulset/cluster-dev-openshell-cluster while driving cluster-staging traffic, proving gateway-independence of the control plane).
CI m12-fixture-lint job. .github/workflows/ci.yml now runs helm lint tests/fixtures/charts/openshell-cluster plus a positive render matrix (label.env=dev + label.env=staging) and a negative test asserting empty label.env must fail rendering. Job is named and scoped separately from the supported helm-lint job so fixture status never gets mistaken for production chart status.

Added (docs)¶

Production Kubernetes deployment runbook at docs/deploy/production-k8s.md. End-to-end walkable guide for ops teams deploying ShoreGuard alongside NVIDIA's upstream OpenShell Helm chart on a real k8s cluster. Covers prerequisites (CNI with NetworkPolicy enforcement, cert-manager, ingress-nginx), BYO Secret pattern, helm install with the production preset and all required overrides, gateway registration with mTLS material extracted from NVIDIA's chart-created Secrets, a post-deploy verification checklist, and day-2 operations (multi-replica scaling, secret rotation). Cross-linked from docs/admin/deployment.md, charts/shoreguard/README.md, and the MkDocs nav.

Added (chart)¶

networkPolicy.egress.inClusterGateways chart value. First-class egress rule for in-cluster OpenShell gateways (TCP 30051 to private Pod IPs). The existing LLM-providers block only allows 443/tcp to non-RFC1918 CIDRs, so federated gateways running inside the cluster were unreachable unless patched via the egress.extra escape hatch. New value: enabled: false (default off), port: 30051, podSelector: {}, namespaceSelector: {}. Point the selectors at NVIDIA's upstream OpenShell Helm chart pod labels and flip enabled: true for in-k8s federation deploys. CI render test added.

Added (M10 + M11)¶

Helm chart MVP at charts/shoreguard/ (M10). Single-replica, SQLite-in-emptyDir, no Ingress by default — gets ShoreGuard running on a fresh kind/k3d cluster with helm install sg ./charts/shoreguard --set admin.password=.... Secret key is generated once per release and preserved across upgrades via a lookup. SHOREGUARD_ALLOW_UNSAFE_CONFIG is injected automatically when database.url is empty so the pod boots past the prod-readiness gate. New helm-lint CI job covers helm lint plus a helm template render smoke check.
charts/shoreguard — M11 production hardening. Turns the M10 MVP chart into something an ops team would actually roll. New values: replicaCount, persistence.{enabled,storageClassName,size,accessMode,existingClaim}, existingSecret (BYO Secret path), networkPolicy.* (ingress-namespace selector, DNS/LLM-provider/Postgres/extra egress blocks), podDisruptionBudget.{enabled,minAvailable}, tests.{enabled,image}, forwardedAllowIps. New templates: pvc.yaml, networkpolicy.yaml, pdb.yaml, tests/test-connection.yaml. The Deployment switches strategy between Recreate (single-replica) and RollingUpdate (maxSurge=1, maxUnavailable=0 for multi-replica), passes SHOREGUARD_REPLICAS and SHOREGUARD_FORWARDED_ALLOW_IPS to the pod, and swaps the data volume between emptyDir and a PVC based on persistence.enabled. Chart version bumped 0.1.0 → 0.2.0, appVersion → 0.30.1.
charts/shoreguard/values.production.yaml — opinionated preset that enables PVC + cert-manager + nginx-ingress + NetworkPolicy + structured JSON logs + forwarded-headers trust. Single-replica by default (the preset is RWO-PVC-shaped); scale out only after setting database.url to an external Postgres.
Chart-time footgun guards (templates/_helpers.tpl:shoreguard.validate). helm template now fails with a clear message when existingSecret collides with admin.password/secretKey, when replicaCount > 1 is combined with persistence.enabled=true and no database.url (RWO-PVC deadlock), or when replicaCount > 1 is combined with no secretKey/existingSecret (session HMAC drift).
helm test hook. helm test <release> now runs a tiny curlimages/curl pod that curls /healthz and /version against the in-cluster Service (not the Ingress — keeps the test independent of cluster DNS and TLS trust). Gated on tests.enabled.
shoreguard.server.forwarded_allow_ips setting (env SHOREGUARD_FORWARDED_ALLOW_IPS, default "127.0.0.1"). Passed to uvicorn as forwarded_allow_ips together with proxy_headers=True, so X-Forwarded-Proto/Host from a trusted TLS-terminating proxy is honored. Without this, sessions behind nginx-ingress would see http:// internally and issue non-Secure cookies. The production chart preset sets it to "*".
Backend hard-fail for multi-replica without a stable secret key. check_production_readiness() now emits an ERROR (escalated from a WARN) when SHOREGUARD_REPLICAS > 1 and auth.secret_key is unset, which causes enforce_production_safety() to raise a RuntimeError at startup. The original rate-limiter WARN stays because the in-process limit problem is orthogonal to the secret key one.
CI helm-lint job extended to render the production preset and assert that the multi-replica-without-secretKey footgun guard actually fires.

Fixed¶

Release workflow: aquasecurity/trivy-action pin bumped from the non-existent @0.28.0 tag to @v0.35.0. The old pin failed the GitHub Actions resolver before any step ran, so the docker job in the release workflow never reached build-push-action and the v0.30.0 image never landed on GHCR (only :latest was available). Verified against the failed run of v0.30.0 (gh run view 24282878746 --log-failed → Unable to resolve action aquasecurity/trivy-action@0.28.0). The action's maintainers migrated all tags to the v-prefix convention — @v0.35.0 is the current stable tag and keeps the same image-ref / exit-code / severity / vuln-type input surface we rely on.

[0.30.0] — 2026-04-11¶

The headline of this release is federation in production shape: ShoreGuard now ships with a topbar switcher, label-based gateway filtering, per-gateway audit attribution, and a single-file Python script that drives the complete agent → routed inference → L7 denial → approve → audit → retry flow against two live OpenShell clusters in parallel. The same release also closes the long-standing "webhook backend exists, no UI" gap with a new /webhooks admin page, and shore-up audit attribution across the gateway routes so the audit log can be sliced by gateway with no cross-attribution leaks.

Two end-to-end automation scripts (scripts/m7_demo.py and scripts/m8_demo.py) now exercise the full vision flow on every run; both pass exit 0 in ~30 seconds and ~3-4 minutes respectively, against real OpenShell gateways and a real Anthropic API key.

Added¶

Webhook management UI at /webhooks (admin only). Lists every registered webhook with channel badge, event-type chips, active / paused state, and per-row actions for test, view delivery log, pause/resume, edit, and delete. Inline create form with a one-time HMAC signing-secret reveal callout. Edit and delivery-log modals. The webhook backend has shipped for several releases — this is the first operator-facing surface for it.
Topbar gateway switcher. The read-only gateway status badge has been replaced with a dropdown that lists every registered gateway with status dot and labels, and navigates to the picked gateway's detail page on click. Pure URL navigation, no client-side "active gateway" state. Available on every page.
Label filter on the gateways list page. New text input next to the existing free-text filter accepts key:value (or comma-separated k:v,k2:v2 for AND semantics) and reduces the table to gateways carrying those labels. The backend ?label= query parameter on /api/gateway/list was already supported.
Audit log filterable by gateway. New ?gateway=<name> query parameter on /api/audit and /api/audit/export, plus a matching text input on the audit page. Lets an operator reconstruct the full register → configure → run → deny → approve sequence for one gateway in chronological order, even when other gateways are active concurrently.
Webhook CRUD now lands in the audit log. New webhook.create, webhook.update, webhook.delete, and webhook.test audit entries carry the URL, event types, and channel type in the detail blob. This was the last unaudited route family in the API.
WebhookService.fire_to() direct delivery. The webhook service now exposes a method to deliver an event to one specific active webhook, bypassing the subscription filter. The /test endpoint uses this so clicking the "Test" button on a webhook always reaches its target — even if the webhook doesn't subscribe to webhook.test. Paused webhooks now return HTTP 409 instead of silently dropping the request.
End-to-end demo scripts and runbooks. scripts/m7_demo.py drives the single-gateway vision flow (login → register → inference provider → launch sandbox → claude agent → L7 denial → approve → audit → retry) in ~30 seconds. scripts/m8_demo.py does the federated version against two clusters in ~3-4 minutes, with per-gateway audit-attribution assertions. Each script ships alongside a markdown runbook (scripts/m7-demo.md, scripts/m8-demo.md) for the manual recipe. Both scripts are idempotent — re-running deletes any leftover state before recreating.

Fixed¶

GET /api/gateway/{name}/info returned 500 on a connected gateway. GatewayService.get_info() injects configured and version into the response dict, but GatewayResponse was extra="forbid", so the live endpoint crashed inside FastAPI's response validator. Schema now accepts both fields.
Gateway-route audit entries were landing with gateway_name=NULL. gateway.register, unregister, setting_update/delete, update_metadata, start / stop / restart / destroy all pass gateway=name to audit_log() now, so the new ?gateway=<name> filter actually finds them. Without this, every gateway-scoped audit row was invisible to per-gateway queries.
Webhook /test endpoint silently produced zero deliveries when the target webhook didn't subscribe to webhook.test (or *). The global fire() path filters by subscription, so the test button was a lie unless the webhook happened to subscribe to the test event type. The new fire_to() direct-delivery path fixes it; paused webhooks now return 409 instead of dropping.
CSP-strict header tests asserted 'unsafe-inline' was not a substring of the CSP header, which broke after the v0.29 fix that added style-src-attr 'unsafe-inline' for Alpine.js's inline style attributes (x-show / x-cloak / x-transition). Replaced with a per-directive check that allows the narrower style-src-attr while keeping default-src, script-src, and style-src strict.

[0.29.0] — 2026-04-11¶

This release closes M1 OpenShell v0.0.26 Alignment, M2 OCSF Observability, M3 L7 Denial Intelligence (in the reduced form documented in S3.1), and M5 Production Readiness. Highlights: OpenShell v0.0.26 stub regeneration with TTY exec and named inference routes, the full gateway settings API, effective-policy and provider-env projection views, a policy-analysis submission endpoint, OCSF parsing plus server-side filters in the sandbox logs viewer, denial context UX on the approvals page, /version and hard-fail production checks, backup/restore scripts, a rollback runbook, and Trivy + Bandit in CI.

Added¶

OpenShell v0.0.26 alignment (M1 / S1.1). Protobuf stubs regenerated against upstream OpenShell v0.0.26 (was v0.0.22). Three stub files actually changed (inference_pb2.py, openshell_pb2.py, openshell_pb2.pyi); the rest compiled byte-identically. This unblocks the two user-visible features below.
TTY exec for interactive commands. POST /api/gateways/{gw}/sandboxes/{name}/exec now accepts a boolean tty field in the request body. When true, the gateway allocates a pseudo-terminal so interactive programs that check isatty() (e.g. python REPL, vim, htop) behave correctly. Defaults to false, so existing callers are unaffected. Requires a gateway running OpenShell v0.0.23 or newer.
Named inference routes on GET /inference. GET /api/gateways/{gw}/inference now accepts an optional ?route_name= query parameter. Empty (the default) returns the cluster's default inference route; passing a name like sandbox-system returns the route that OpenShell v0.0.25+ uses for sandbox system-level model calls. PUT /inference already accepted route_name in the request body; this release closes the GET-side gap.
Gateway Settings API (M1 / S1.2). New admin-gated REST endpoints expose OpenShell's global gateway configuration: GET /api/gateway/{name}/settings, PUT /api/gateway/{name}/settings/{key} (body {"value": …} accepting string, bool, or int), and DELETE /api/gateway/{name}/settings/{key}. OpenShell has no separate UpdateGatewayConfig RPC; updates are sent per-key via the existing UpdateConfig RPC with the global flag set. The new API is value-agnostic — any settings key the gateway recognises (including the new ocsf_logging_enabled toggle) can be read and written without further code changes.
Effective policy view — GET /sandboxes/{name}/policy/effective (M1 / S1.3). Stable contract endpoint for "what the gateway actually enforces", as opposed to "what was last PUT". Presets are merged eagerly into the declared policy today, so the endpoint returns the stored envelope with an added source: "gateway_runtime" marker, giving the UI a stable route even if OpenShell ever separates declared from effective server-side.
Provider env-var projection view — GET /providers/{name}/env (M1 / S1.3). Read-only endpoint that returns the environment variables a provider injects into sandboxes — keys only, values always redacted. Each entry is tagged with source: credential, config, or type_default (from the provider type's cred_key in openshell.yaml). Useful for debugging agent misconfiguration without exposing secrets.
POST /sandboxes/{name}/policy/analysis (M1 / S1.3, closes M1). Pass-through REST endpoint for the OpenShell SubmitPolicyAnalysis RPC. External denial analyzers (LLM-backed or rule-based) can submit observed denial summaries + proposed policy chunks through ShoreGuard's HTTP API; the gateway decides accept/reject per chunk and returns counters plus rejection reasons. Admin-only, rate-limited, audit-logged as sandbox.policy.analyze. This closes M1 OpenShell v0.0.26 Alignment — S1.1, S1.2, and S1.3 are now all merged.
OCSF parsing & rendering in sandbox logs (M2 / S2.1). OpenShell v0.0.26 emits structured security events in an OCSF shorthand format over the existing SandboxLogLine stream (level "OCSF", target "ocsf"). ShoreGuard now parses these lines via a new shoreguard.services.ocsf module and exposes class_prefix, activity, severity, disposition, summary, and both bracket + gRPC structured fields on every log entry that looks like OCSF. The sandbox logs viewer renders class badges, disposition colours (green = ALLOWED, red = DENIED/BLOCKED), dynamic class-prefix chips, and a per-row expand for structured field details. Live websocket stream and the REST /sandboxes/{name}/logs endpoint both include the parsed ocsf dict when present.
OCSF server-side filters on GET /sandboxes/{name}/logs (M2 / S2.2). Four new query parameters — ocsf_only, ocsf_class, ocsf_disposition, ocsf_severity — let advanced consumers pull forensic-sized windows without client-side post-processing. The sandbox logs viewer exposes ocsf_only as a "Server OCSF" toggle next to the existing level filters.
Gateway observability toggle UI (M2 / S2.2). The gateway detail page now includes an "Observability" fieldset with a form-switch bound to the upstream ocsf_logging_enabled gateway setting, wired via the existing PUT /gateway/{name}/settings/{key} endpoint.
Denial context UX on the approvals page (M3 / S3.1, closes M3 reduced). Re-verification of OpenShell v0.0.26/v0.0.27 protos (byte-identical) showed M3 was only "blocked" in the strictest ListDenials sense: three read paths already bring rich denial context into the control plane (GetDraftPolicy, GetDraftHistory, and the OCSF parser from S2.1). This sprint closes the remaining gaps: _chunk_to_dict() now forwards denial_summary_ids; the approvals table gains a "Seen" column formatting first_seen / last_seen / hit_count; the expand row surfaces stage, denial summary IDs as monospace chips, and a "View in logs" button. The logs viewer extracts a best-effort triggering binary and shows a "Find in approvals" button on DENIED/BLOCKED OCSF events that navigates via #binary=X&host=Y hash fragment; the approvals page listens to sg:approvals-update for live-refresh on draft_policy_update events. The history modal gains per-event-type filter chips with count badges, and the sandbox overview Approvals card now shows security_flagged_count and last_analyzed_at_ms. Closes M3 with the documented caveat that the full DenialSummary struct (l7_request_samples, sample_cmdlines, ancestors, binary_sha256) remains push-only in upstream v0.0.27 and stays a future feature request to NVIDIA/OpenShell.
/version endpoint (M5 / S5.1). New unauthenticated endpoint that returns {version, git_sha, build_time} for the running binary, so operators can verify which artifact is serving traffic after a deploy. Build identity is propagated through new SHOREGUARD_GIT_SHA and SHOREGUARD_BUILD_TIME Dockerfile ARGs, release.yml build-args, and shoreguard_info Prometheus labels. A short-SHA image tag (ghcr.io/.../shoreguard:a1b2c3d) is now published alongside semver.
Hard-fail on production-readiness ERRORs (M5 / S5.1). New Settings.enforce_production_safety() runs at startup and refuses to boot when check_production_readiness() reports any ERROR:-severity config issue (weak secret, CORS wildcard + credentials, SQLite in prod, strict CSP disabled, unrestricted self-registration in prod). Set SHOREGUARD_ALLOW_UNSAFE_CONFIG=true to downgrade the error to a CRITICAL log line — documented as an emergency override in reference/configuration.md.
Backup and restore scripts (M5 / S5.1). New scripts/backup.py and scripts/restore.py auto-detect SQLite vs Postgres from the database URL. SQLite uses the built-in online backup API; Postgres shells out to pg_dump --format=custom / pg_restore --clean --if-exists. The Database Migrations guide now recommends these scripts as the primary backup path.
Rollback runbook (M5 / S5.2). New admin/rollback.md consolidates the incident-response flow (symptom detection → image rollback → optional DB rollback or restore → verification → post-mortem) into one page, with links into existing troubleshooting, migration, and deployment docs.
Supply-chain hardening (M5 / S5.3, closes M5). CI gains a new security job that runs Bandit at medium-and-above severity over the shoreguard package (pip-audit already covers dependency CVEs; Bandit adds source-level SAST for Python-specific patterns — eval, shell injection, insecure hashing). The release pipeline runs Trivy against the freshly-built image by digest between build-push and cosign, with ignore-unfixed=true and failure on CRITICAL/HIGH only. Grafana starter dashboard at deploy/grafana/shoreguard.json covers six panels — HTTP request rate by status, p95/p99 latency by path, gateways by status, operations in flight, webhook success rate — with a shoreguard_info build annotation track so deploys show up as vertical lines across every panel.

Changed¶

Error responses now follow RFC 9457 Problem Details. Error bodies are served with Content-Type: application/problem+json and carry the standard type, title, status, and detail fields alongside the existing ShoreGuard code (and any extension members such as request_id, errors, feature, or upgrade_required). The detail field is unchanged, so existing clients that read only body.detail (including the ShoreGuard web UI) keep working without modification.
PolicyManager.watch() stream flattener forwards target and fields. The live WatchSandbox consumer was dropping both on the live pathway, even though get_logs() surfaced them correctly via the unary RPC. Additive change — no existing consumer breaks.
Sync OperationService removed; tests run against AsyncOperationService directly. Production has used AsyncOperationService exclusively since v0.27; the sync class remained only because tests reached it through an _AsyncOperationAdapter shim in conftest.py. This release deletes the sync class (~480 LOC), the adapter, and the sync-class test file entirely. The new harness runs AsyncOperationService on an in-memory aiosqlite engine with in-flight LRO task drainage on teardown to avoid closed-DB races. Net: -2077 LOC, full suite 2477 passed, 35 skipped.

Fixed¶

verify_password no longer raises on corrupt hashes. A malformed or truncated password hash row in the database used to surface as an unhandled PwdlibError; it now returns False so the login attempt fails cleanly and the account lockout counter advances as intended.
min_level parameter on GET /sandboxes/{name}/logs now preserves OCSF events. OpenShell's level_matches() helper assigns unknown levels (including "OCSF") numeric rank 5, which any non-empty min_level silently dropped. ShoreGuard now always fetches upstream with min_level="" and applies the level filter locally, bypassing OCSF entries unconditionally.
check_production_readiness() now actually returns the warnings list that its type signature promised — the method previously collected the list and fell through without a return statement, so callers always got None.

Tests¶

Auth edge-case coverage raised from 75% to 96% — targeted tests for token expiry, account lockout transitions, and OIDC error paths.
WebSocket auth-error coverage raised from 67% to 94% — coverage for authentication failure branches in the sandbox log stream endpoint.
shoreguard/services/operations.py coverage raised from 61% to 100%. The previous baseline reflected an inverted gap: the test suite exercised the sync OperationService via an adapter in conftest.py while the prod AsyncOperationService (used by the API routes) was untested. The follow-up refactor in this release consolidates the two classes into one, so this coverage win is now permanent.

[0.27.0] — 2026-04-10¶

Security¶

Strict CSP is now the default — SHOREGUARD_CSP_STRICT defaults to true, closing the loop on the M1–M4 hardening work that shipped in v0.26.0 plus the M2.1 inline-event-handler extraction completed in this release. Fresh installs now receive a Content-Security-Policy with a per-request cryptographic nonce on every <script> tag, no 'unsafe-inline', frame-ancestors 'none' (clickjacking protection), base-uri 'self' (base-tag injection protection), and form-action 'self' (form hijacking protection). 'unsafe-eval' is retained in script-src because Alpine.js uses the Function() constructor internally — the @alpinejs/csp build was evaluated during M2.1 but its expression parser is limited to plain property chains (no operators, no literals, no method-call arguments), which proved too restrictive for this UI. Unlike 'unsafe-inline', 'unsafe-eval' does not permit DOM-injected script execution, so the XSS surface remains dramatically smaller than the legacy policy.
CSP hardening M2.1 — inline event handler extraction. M2 in v0.26.0 extracted inline <script> blocks but missed 28 inline event handler attributes (onclick="", onkeydown="") which script-src-attr blocks regardless of nonce. All 28 are now converted: static template handlers become Alpine @click/@keydown directives on registered Alpine.data() components, and dynamically-rendered innerHTML handlers use data-action/data-arg markers dispatched by a single delegated click listener per component. Inline style="" attributes produced by JS renderers (policy editor, wizard) also replaced with Bootstrap utility classes so style-src-attr stays clean.

For operators running stock ShoreGuard: no action needed. The pages you already use (dashboard, sandboxes, wizard, policy editor, audit log, approvals, gateways, providers, users, groups, settings, invite flow) have all been refactored to work under strict CSP.

For operators with custom templates, inline scripts, or third-party embeds that cannot yet be nonce-gated: set SHOREGUARD_CSP_STRICT=false to fall back to the legacy 'unsafe-inline' policy. The legacy field SHOREGUARD_CSP_POLICY continues to work as an escape hatch when strict mode is off.

Changed¶

Production-readiness check is now strict-mode aware. The warning about 'unsafe-*' directives in auth.csp_policy is now gated on csp_strict=False — when strict mode is enabled (default), that field is unused and no warning fires.

[0.26.1] — 2026-04-10¶

Changed¶

Docstring coverage — pydoclint clean across shoreguard/ — Every public function, method, and Pydantic model in shoreguard/ now has a Google-style docstring with Args/Returns/Raises/Attributes sections as appropriate. 410 pre-existing pydoclint violations across 21 files were fixed (96 in api/schemas.py, 64 in services/operations.py, 51 in api/pages.py, …). Zero runtime behaviour changes; this unblocks pydoclint in CI so future docstring drift gets caught at review time.
Removed stale linter suppressions — Systematic audit of all # noqa / # type: ignore / # pyright: ignore comments. ~150 justified suppressions kept (stdlib API signatures, SQLAlchemy column semantics, protobuf stub typing, fake gRPC test doubles, singleton PLW0603, __init__ D107, …) — each now carries a comment explaining why. 12 non-justified suppressions removed by adding proper types or narrowing: SQLAlchemy event-handler params in db.py, operation_service / gateway_service narrowing in api/main.py + api/metrics.py, _get_auth_settings / _webhook_settings / _cli_init_db return types, _UNSET sentinel cast() in services/registry.py + sandbox_meta.py. The cleanup surfaced two real type bugs: _cli_init_db was annotated -> None despite callers invoking .dispose() on the returned Engine, and the module-level operation_service carried a stale AsyncOperationService | OperationService | None union that never matched runtime reality. Both fixed; no runtime behaviour change.

[0.26.0] — 2026-04-09¶

Added¶

CSP strict-mode foundation — SHOREGUARD_CSP_STRICT=true opt-in enables a per-request nonce on request.state.csp_nonce and an unsafe-*-free Content-Security-Policy built from auth.csp_policy_strict (default remains off until the frontend refactor lands). Templates can reference {{ csp_nonce(request) }} on inline <script> tags and switch between the standard and CSP-safe Alpine.js builds via {% if csp_strict_enabled() %}. This is Milestone 1 of the multi-session CSP hardening plan — see csp-hardening-followup.md for the full roadmap.

Changed¶

CSP hardening M2 — All inline <script> blocks extracted from Jinja templates into frontend/js/ (theme-init.js, dashboard.js, audit.js; providers.js and wizard.js bind their own DOMContentLoaded handlers). GW is now read from document.documentElement.dataset.gateway in constants.js, eliminating the last Jinja-templated inline script. With SHOREGUARD_CSP_STRICT=true, strict CSP no longer reports inline-script violations — only inline-style (M3) and Alpine x-data (M4) violations remain.
CSP hardening M3 — All inline style="..." attributes and <style> blocks removed from Jinja templates. Shared patterns moved to the new frontend/css/utilities.css (sg-prefixed width/max-width/font-size/cursor utilities) and auth pages share frontend/css/auth.css. Wizard step toggling now uses classList.toggle('d-none', ...) instead of element.style.display. With SHOREGUARD_CSP_STRICT=true, strict CSP no longer reports style-src violations — only Alpine x-data (M4) remains.
CSP hardening M4 — Every Alpine.js component is now registered via Alpine.data(name, factory) (per-file inside each frontend/js/*.js factory file, plus a new frontend/js/auth.js for the login/register/setup/invite forms). Templates reference them by name (x-data="loginForm") instead of inline object or spread-merge literals — the four { ...pageFn(), ...sortableTable(...) } patterns on the gateways/policies/users/groups pages are now gatewaysList, presetsListPage, usersListPage, groupsListPage. Directive expressions containing arrow functions, if statements, or multi-statement sequences (logout click, toast auto-remove, ws-state listener, clipboard-copy buttons, inference-config x-effect, filesystem-policy add-form focus) were extracted to store/component methods ($store.auth.logout, $store.toasts.scheduleRemove, onWsState, copyInvite, copyKey, maybeLoad, openAddForm). Auth pages now share the new components/alpine_loader.html partial with the main base template so the CSP build is loaded consistently. With SHOREGUARD_CSP_STRICT=true, the application loads with zero CSP-related Alpine violations — clearing the last blocker to making strict CSP the default in a future minor bump.
Pyright on tests/ + parallel test execution — Pyright's include list now covers tests/ alongside shoreguard/, and pytest-xdist is a dev dependency so the suite runs with pytest -n auto. Enabling pyright on tests surfaced 303 pre-existing errors across 19 files (Optional narrowing, fake gRPC stub assignments typed as OpenShellStub, protobuf enum kwargs passed as raw ints, and a handful of test-setup bugs such as _FakeRpcError missing cancel()). All fixed test-side — zero changes to shoreguard/ — via assert x is not None narrowing and narrow # type: ignore[assignment|arg-type|override] comments where the fake object pattern made narrowing impossible. On a 16-core box the suite now runs in ~43s parallel instead of ~4:46 serial (6.6× speedup).

[0.25.0] — 2026-04-09¶

Added¶

shoreguard config show [section] — dump the effective configuration as a table, JSON, or .env-style output. Secret values (secret_key, admin_password, client_secret, password) are redacted by default; --show-sensitive reveals them.
shoreguard config schema [section] — dump pristine defaults plus descriptions in table/json/env/markdown format. Used to regenerate docs/reference/settings.md.
Self-documenting settings — every Settings field now carries Field(default=..., description=...). All ~100 environment variables have a one-line description surfaced via config show.
shoreguard audit export — offline audit log export (JSON or CSV) with a sha256sum-compatible digest file and a manifest.json carrying entry count, filters, timestamp, and tool version. All three files are written with 0600 permissions.
Structured logging improvements — text mode now renders [request_id] via the RequestIdFilter (was silently dropped); JSONFormatter adds module/func/line, merges caller extras, and emits stack_info. uvicorn access logs carry the same request-id as application logs in both modes.
Global per-IP rate limiter (SHOREGUARD_GLOBAL_RATE_LIMIT_*) as a coarse DDoS guardrail applied by global_rate_limit_middleware to every HTTP request except health/metrics endpoints.
Request body size limit middleware (SHOREGUARD_LIMIT_MAX_REQUEST_BODY_BYTES, default 10 MiB) returning HTTP 413 before Starlette reads the body.
DB migration retry loop on startup with exponential backoff against OperationalError (SHOREGUARD_DB_STARTUP_RETRY_*). Compose-friendly.
Background task supervision surfaced in /readyz with asyncio.wait_for on dependency probes (SHOREGUARD_READYZ_TIMEOUT).
Production-readiness check expansion — six new warnings: HSTS off, CSP contains unsafe-*, allow_registration in prod, multi-replica with in-process rate limiter, SQLite in prod, text log format in prod. Warnings now carry ERROR: / WARN: severity prefixes.
docs/reference/settings.md — auto-generated reference of every SHOREGUARD_* environment variable grouped by sub-model.

Changed¶

Audit log is now ORM-level append-only. AuditEntry rows cannot be updated via the ORM, and deletion is only permitted from AuditService.cleanup() via a ContextVar-gated bypass. Enforcement raises AuditIntegrityError on commit. cleanup() switched to row-by-row deletion so the before_delete listener fires. Direct SQL still bypasses enforcement — DB-level triggers are a post-v1.0 item.
CLI callback respects ctx.invoked_subcommand — the main Typer callback no longer tries to bind a socket when shoreguard config ... or shoreguard audit ... subcommands are invoked.
Graceful shutdown timeout honoured by uvicorn startup path.
CORS settings tightened and exposed via SHOREGUARD_CORS_*.

Security¶

OIDC SSRF protection — discover(), get_jwks(), and exchange_code() run all URLs (including those returned by a provider's discovery document) through the existing private-IP check. A compromised identity provider can no longer pivot requests to internal services like cloud metadata endpoints.

Fixed¶

Version drift — pyproject.toml was still reporting 0.23.0 after the v0.24.0 tag was cut. This release bumps directly to 0.25.0 to resync the package metadata with the release stream.

[0.24.0] — 2026-04-08¶

Added¶

1,193 mutation-killing tests — targeted tests designed to eliminate survived mutants identified by mutmut v3.5. Test count: 1,175 → 2,368.
New test_openshell_meta.py — first-ever coverage for OpenShell metadata loader (27 mutants, previously 100% survival).
New test_auth_mutations.py (194 tests) — exhaustive auth CRUD, RBAC role resolution, service principal lifecycle, group management, session tokens, gateway-scoped roles.
Extended 20 existing test files across all major modules: formatters, sandbox templates, routes, OIDC, local gateway, webhooks, gateway service, operations, registry, policy, all client modules, DB, presets, CLI import, and audit service.

Fixed¶

Pyright strict mode — resolved all 30 type-check errors (0 remaining):
operation_service union type corrected for async/sync variants.
_get_svc() return type narrowed to AsyncOperationService in route handlers (routes/operations.py, lro.py).
db_cfg possibly-unbound variable in db.py PostgreSQL branch.
discover() return type in api/oidc.py.
update_group sentinel parameter type in api/auth.py.
Async/sync union narrowing in main.py, metrics.py, routes/gateway.py, routes/sandboxes.py.

[0.23.0] — 2026-04-08¶

Added¶

OIDC/SSO authentication — multi-provider support with callback flow, role mapping, and state validation (api/oidc.py, alembic/versions/012_oidc_fields.py).
SSRF validation — URL allowlist/blocklist for webhook targets prevents server-side request forgery via internal addresses.
Input sanitization — centralized validators for names, URLs, certs, env vars, and command strings with configurable limits via SHOREGUARD_LIMIT_* env vars.
pip-audit in CI — automated dependency vulnerability scanning in the GitHub Actions workflow.
Deep health checks — /readyz now measures DB latency, reports gateway health summary (total/connected/degraded), supports ?verbose=true for per-gateway details.
PostgreSQL connection pooling — DatabaseSettings with pool_size, max_overflow, pool_recycle, statement_timeout_ms via SHOREGUARD_DB_* env vars.
Graceful shutdown — LRO task cancellation (shutdown_lros()), webhook delivery task tracking with shutdown(), ordered resource disposal.
Async engine disposal — dispose_async_engine() for clean DB shutdown.
Docs — OIDC guide, security concepts, troubleshooting, audit guide, webhooks guide, Prometheus integration, gateway roles admin.
108+ new tests — OIDC, input validation, SSRF, webhook secret leak. Total: ~1194.

Changed¶

Typed API response models — extra="forbid" on Category-A models prevents uncontrolled field leakage through extra="allow".
Webhook HMAC secret no longer exposed on GET/LIST endpoints — only returned on create (WebhookCreateResponse).
Docs restructured — guide/ → guides/, new concepts/ and integrations/ directories.
graceful_shutdown_timeout default raised from 5 → 15 seconds.

Security¶

Fixed webhook HMAC signing secret leak on all GET/PUT responses.
SSRF protection for webhook target URLs.
Input length/format validation on all mutation endpoints.

[0.22.0] — 2026-04-08¶

Added¶

User groups / teams — named collections of users for group-based RBAC. Groups have a global role and optional per-gateway role overrides, mirroring the existing individual user role system.
Group membership management — add/remove users to groups via API and frontend UI (/groups page with member modal).
Group gateway-scoped roles — per-gateway role overrides for groups, reusing the gateway roles modal from user/SP management.
4-tier role resolution — individual gateway > group gateway > individual global > group global. When a user belongs to multiple groups the highest rank wins.
Group audit trail — group.create, group.update, group.delete, group.member.add, group.member.remove, group.gateway_role.set, group.gateway_role.remove actions logged.
65 new tests — CRUD, membership, cascade deletes, role resolution priority chain, and HTTP-level endpoint tests (test_group_rbac.py). Total: 1086.

Changed¶

Gateway roles modal — now supports user, sp, and group entity types.

[0.21.0] — 2026-04-07¶

Added¶

Rate limiting — per-IP sliding-window rate limiter (api/ratelimit.py) with configurable limits via SHOREGUARD_RATELIMIT_* env vars.
Account lockout — progressive lockout after failed login attempts (api/auth.py) with configurable thresholds.
Security headers — X-Content-Type-Options, X-Frame-Options, Strict-Transport-Security, etc. via middleware (api/security_headers.py).
Password strength validation — api/password.py with length, complexity, and common-password checks.
Structured error codes — machine-readable code field (e.g. GATEWAY_NOT_FOUND, RATE_LIMITED) in all error responses (api/error_codes.py, api/errors.py).
WebSocket server heartbeat — periodic {"type": "heartbeat"} messages during idle with dropped_events counter for backpressure visibility.
WebSocket backpressure disconnect — slow consumers disconnected after configurable consecutive drop limit (SHOREGUARD_WS_BACKPRESSURE_DROP_LIMIT).
WebSocket client reconnect hardening — heartbeat watchdog (45 s timeout), max retry limit (20), exponential backoff, and sg:ws-state events for connection state UI indicator.
Prometheus metrics — /metrics endpoint with login and rate-limit counters.

Changed¶

Dynamic __version__ — shoreguard/__init__.py now reads version from package metadata (importlib.metadata) instead of hardcoded string; single source of truth in pyproject.toml.
Deploy configs — consolidated Caddyfile and standalone compose into deploy/ directory.
.gitignore — trimmed from ~200 to ~30 lines, removed stale entries.

[0.20.0] — 2026-04-07¶

Added¶

Pydantic Settings — centralized shoreguard/settings.py with 11 nested sub-models replacing 11 os.environ.get() reads and 60+ hardcoded constants. All tuneable via SHOREGUARD_* env vars (e.g. SHOREGUARD_GATEWAY_BACKOFF_MIN, SHOREGUARD_OPS_RUNNING_TTL).
Pydantic response models — typed response schemas (schemas.py) on all API endpoints with OpenAPI tag metadata.
Request-ID tracking — X-Request-ID header propagated through middleware, available in all log records via %(request_id)s.
Prometheus latency metrics — shoreguard_request_duration_seconds histogram with method/path/status labels, plus /metrics endpoint.
Structured JSON logging — SHOREGUARD_LOG_FORMAT=json for machine-readable log output.
GZip compression — responses ≥ 1 KB automatically compressed via Starlette GZip middleware.
Audit pagination — GET /api/audit supports offset/limit with items/total response format.
Input validation module — api/validation.py with reusable description, label, and gateway-name validators.
DB-backed operations — AsyncOperationService with SQLAlchemy async, orphan recovery, and configurable retention.
SSE streaming for LROs — GET /api/operations/{id}/stream streams real-time status/progress updates via Server-Sent Events.
run_lro helper — api/lro.py with idempotency-key support, automatic 202 response, and background task lifecycle.
Async DB layer — init_async_db() / get_async_session_factory() in db.py for aiosqlite-backed async sessions.
Performance indexes — migrations 008–010 adding indexes on audit timestamp, webhook delivery, and operation status.
Gateway register page — /gateways/new with breadcrumb navigation, description and labels fields (replaces modal).
Provider create/edit pages — /gateways/{gw}/providers/new and .../providers/{name}/edit with Alpine.js providerForm() component (replaces modal).

Changed¶

Consistent pagination — all list endpoints return {"items": [...], "total": N} format.
CLI env-var hack removed — cli.py no longer writes os.environ["SHOREGUARD_*"]; uses override_settings() instead.
Frontend modals→pages — gateway registration and provider create/edit modals replaced with dedicated page routes and breadcrumb navigation.

Removed¶

In-memory LRO store — replaced by DB-backed AsyncOperationService.
Hardcoded constants — _BACKOFF_MIN, _MAX_RESULT_BYTES, DELIVERY_TIMEOUT, MAX_DESCRIPTION_LEN, etc. now read from Settings.
Gateway/provider modals — #registerGatewayModal and #createProviderModal removed from frontend templates.

Dependencies¶

Added pydantic-settings>=2.0.

[0.19.0] — 2026-04-07¶

Added¶

Async sandbox exec — POST /sandboxes/{name}/exec now returns a long-running operation (LRO) with polling pattern instead of blocking.
Exec audit fields — command, exit_code, and status added to sandbox.exec audit detail for full traceability.
mTLS auto-generation — openshell-client-tls secret with CA cert is automatically created for OpenShell gateway connections.
Docker Compose profiles — optional paperclip profile for Paperclip integration alongside ShoreGuard.
Caddy reverse proxy — new Caddy service and OpenClaw profile in the deploy stack for production-ready TLS termination.
Hardened OpenClaw sandbox — dedicated sandbox image with security documentation and deployment via generic ShoreGuard APIs.
Deploy stack README — ecosystem section and deploy stack overview added to the project README.

Fixed¶

gRPC exec timeout — default timeout raised to 600 s for long-running agent sessions.
SetClusterInference — no_verify flag now correctly set in the gRPC request.
LOCAL_MODE endpoints — private IP addresses are now accepted when registering gateways in local mode.
Gateway context — switched from ContextVar to request.state to avoid cross-request leaks.
openshell-client-tls — secret now includes the CA certificate for proper chain verification.
sandbox_meta_store import — resolved binding issue that caused startup failures.
Exec tests — aligned with async LRO pattern and added shlex validation before returning 202.

Changed¶

README — redesigned with updated architecture diagram and sandbox vision narrative.
Architecture diagram — added multi-gateway topology, observability components, unified operators, agent platform UIs, and plugins.
Mermaid diagrams — improved contrast for dark-mode rendering.

Docs¶

Deploy guide expanded with profiles and Paperclip integration steps.
Plugin install command updated to @shoreguard/paperclip-plugin from npm.
Discord reference removed from OpenClaw README.

[0.18.1] — 2026-04-06¶

Added¶

Sandbox metadata UI — labels and description are now visible and editable across the entire frontend:
Detail page: Metadata fieldset with description input, label badges (add/remove), and Save button (PATCH, operator role).
Wizard: Description and labels fields in Step 2 (Configuration), shown in Step 4 summary, included in create payload.
List page: Description column (truncated) and label badges inline under sandbox name.

[0.18.0] — 2026-04-05¶

Added¶

Sandbox labels & description — sandboxes now support labels (key-value pairs) and description metadata, stored in ShoreGuard's DB (OpenShell is unaware). New sandbox_meta table with per-gateway scoping.
PATCH /sandboxes/{name} — update labels and/or description on existing sandboxes (requires operator role).
Label filtering — GET /sandboxes?label=key:value filters sandboxes by labels (AND-combined, same semantics as gateway list).
Alembic migration 007 — creates sandbox_meta table with (gateway_name, sandbox_name) unique constraint.

[0.17.0] — 2026-04-05¶

Fixed¶

Exception handling — narrowed overly broad except Exception blocks in health check logging, webhook delivery, reconnection loop, and operation lifecycle. All handlers now log with full traceback and re-raise or return safe error responses.
SP expiry timezone — expires_at comparison in _lookup_sp_identity now correctly handles naive datetimes by normalising to UTC before comparison.
Bootstrap admin — bootstrap_admin_user() no longer raises on duplicate email when called during startup with an existing database.

Changed¶

Logging consistency — webhook delivery success/failure, gateway reconnection attempts, and operation lifecycle transitions now log at appropriate levels (INFO for business events, WARNING for recoverable errors, DEBUG for technical details).
Docstrings — all public functions and classes pass pydoclint with strict Google-style checking (raises, return types, class attributes).
Type hints — require_role return type corrected. Zero pyright errors on standard mode.
CI — Python 3.14 target for CI matrix, ruff, and pyright. Bumped docker/setup-buildx-action to v4, docker/build-push-action to v7, astral-sh/setup-uv to v7.

Added (tests only)¶

Webhook route tests — 24 integration tests covering CRUD, validation, role enforcement (admin/viewer/unauthenticated), and service-not-initialised.
Error-case tests — 13 tests across approvals (4), policies (3), providers (4), and sandboxes (2) for 404/409 error paths.
Template tests — 9 tests for sandbox_templates.py (list, get, path traversal protection) and template route handlers.
Webhook delivery tests — 13 tests for delivery records, cleanup, email channel dispatch, and the fire_webhook convenience function.
Auth endpoint tests — 31 tests for pages.py covering setup wizard, login validation, user CRUD, gateway role management, self-registration, and service principal management error paths.
Total: 915 tests (+86 from 0.16.2), coverage 82% → 84%.

[0.16.0] — 2026-04-04¶

Added¶

Webhook delivery log — new webhook_deliveries table tracks every delivery attempt with status, response code, error message, and timestamps. Query via GET /api/webhooks/{id}/deliveries.
Webhook retry with exponential backoff — HTTP 5xx and network errors trigger up to 3 retries (5s → 30s → 120s). Client 4xx errors fail immediately.
New webhook events — gateway.registered, gateway.unregistered, inference.updated, policy.updated fire automatically after the corresponding API actions.
Enriched sandbox.created payload — now includes image, gpu, and providers fields from the creation request.
API-key rotation — POST /api/auth/service-principals/{id}/rotate generates a new key and immediately invalidates the old one (admin only).
API-key expiry — optional expires_at timestamp on service principals. Expired keys are rejected at auth time.
API-key prefix — new keys are prefixed with sg_ and the first 12 characters are stored as key_prefix for identification without exposing the full key. Legacy keys remain functional.
Sandbox templates — YAML-based full-stack templates (data-science, web-dev, secure-coding) that pre-configure image, GPU, providers, environment variables, and policy presets. Available via GET /api/sandbox-templates and integrated into the wizard.
Alembic migration 005 — adds webhook_deliveries table.
Alembic migration 006 — adds key_prefix and expires_at columns to service_principals table.

Changed¶

Webhook service — fire() now creates delivery records per target before dispatching. _deliver_http replaced by _deliver_http_with_retry with retry logic.
Service principal creation — keys now use sg_ prefix format. list_service_principals() returns key_prefix and expires_at fields.
Users UI — SP table shows key prefix, expiry badge (green/yellow/red), and rotate button. SP creation form includes optional expiry date.
Wizard UI — step 1 shows sandbox template cards above community sandboxes. Selecting a template pre-fills all fields and jumps to summary. "Customize" button navigates back to configuration step.
Formatters — _EVENT_LABELS, _SLACK_COLORS, _DISCORD_COLORS extended for 4 new events. _payload_fields() extracts provider, model, image, and endpoint fields.
Cleanup loop — webhook delivery records older than 7 days are purged alongside operations and audit entries.
Documentation — API reference updated with sandbox templates, delivery log, rotate endpoint, and new event types. Service principals guide expanded with key rotation, expiry, and prefix sections. Sandbox guide includes templates section with wizard integration.

[0.15.0] — 2026-04-04¶

Added¶

Gateway description — free-text description field on gateways for documenting purpose and context (e.g. "Production EU-West for ML team").
Gateway labels — key-value labels (env=prod, team=ml, region=eu-west) stored as labels_json column. Kubernetes-style key validation, max 20 labels per gateway, values up to 253 chars.
PATCH /api/gateway/{name} — new endpoint to update gateway description and/or labels after registration (admin only). Supports partial updates via Pydantic model_fields_set.
Label filtering — GET /api/gateway/list?label=env:prod&label=team:ml filters gateways by labels (AND semantics).
Alembic migration 004 — adds description (Text) and labels_json (Text) columns to the gateways table.

Changed¶

Gateway list UI — new description column (hidden on small screens) and label badges displayed below gateway names.
Gateway detail UI — description and labels shown in details card with inline edit form (admin only).
Gateway registration modal — new description textarea and labels textarea (one key=value per line).
GatewayRegistry — register(), _to_dict(), and list_all() extended for description, labels, and label filtering. New update_gateway_metadata() method with sentinel-based partial updates.

[0.14.0] — 2026-04-04¶

Added¶

Notification channels — webhooks now support channel_type field with values generic (default, HMAC-signed), slack (Block Kit formatting), discord (embed formatting), and email (SMTP delivery). Alembic migration 003 adds channel_type and extra_config columns to the webhooks table.
Message formatters — shoreguard/services/formatters.py with channel-specific formatting: Slack Block Kit with mrkdwn and color coding, Discord embeds with color-coded fields, plain-text email bodies.
Prometheus /metrics endpoint — unauthenticated, exposes shoreguard_info, shoreguard_gateways_total (by status), shoreguard_operations_total (by status), shoreguard_webhook_deliveries_total (success/failed), and shoreguard_http_requests_total (by method and status code).
HTTP request counting middleware — counts all API requests by method and status code for Prometheus.
OperationStore.status_counts() — thread-safe method returning operation counts grouped by status.

Changed¶

WebhookService refactored for channel-type-aware delivery: _deliver dispatches to _deliver_http (generic/slack/discord) or _deliver_email. HMAC signature only applied for generic channel type.
Webhook API routes accept channel_type and extra_config in create and update requests. Email channel requires smtp_host and to_addrs in extra_config.
Webhook API docs expanded — channel types table, email extra_config example, corrected event types, Prometheus metrics table with scrape config.
Deployment docs — new monitoring section with Prometheus scrape config.
README — notifications and Prometheus metrics in features list and roadmap.
Version bumped to 0.14.0.
791 tests (up from 770).

Fixed¶

deps.py type safety — get_client(), set_client(), and reset_backoff() now raise HTTPException(500) when called without a gateway context instead of passing None to the gateway service. Fixes 3 pre-existing pyright reportArgumentType errors.

Dependencies¶

Added prometheus_client>=0.21.
Added aiosmtplib>=3.0.

[0.13.0] — 2026-04-04¶

Added¶

Docker deployment polish — OCI image labels in Dockerfile, restart policies, dedicated shoreguard-net bridge network, configurable port and log level, and resource limits in docker-compose.yml.
.env.example — documented all environment variables with required/optional separation for quick Docker Compose setup.
docker-compose.dev.yml — standalone development compose with SQLite, hot-reload, no-auth, and local gateway mode. No PostgreSQL required.
Justfile — task runner with dev, test, lint, format, check, docker-build, docker-up, docker-down, docs, and sync targets.
Webhooks — event subscriptions with HMAC-SHA256 signing, Alembic migration 002, WebhookService with async delivery, and admin API (POST/GET/DELETE /api/webhooks).

Changed¶

README overhaul — new "Why ShoreGuard?" section, dual quick-start paths (pip + Docker Compose), collapsible screenshot gallery, expanded development section with Justfile references, updated roadmap.
Deployment docs expanded — step-by-step Docker setup, full environment variable reference table, backup/restore procedures, network isolation explanation, upgrade process, and troubleshooting section.
Contributing docs expanded — "Clone to first sandbox" walkthrough, Justfile task runner section, corrected clone URL and port references.
Local mode docs expanded — developer workflow section with --no-auth combination, SQLite defaults, and state reset instructions.
mkdocs nav — added migration runbook to admin guide navigation.
Version bumped to 0.13.0.

Fixed¶

Duplicate auth log — removed redundant "Authentication DISABLED" warning from init_auth() that appeared unformatted when running with --reload.
Logger name formatting — replaced one-shot name rewriting with a custom Formatter that strips the shoreguard. prefix at render time, so late-created loggers (e.g. shoreguard.db) are also shortened correctly.
Contributing docs — corrected clone URL (your-org → FloHofstetter) and port reference (8000 → 8888).

[0.12.0] — 2026-04-03¶

Added¶

Inference timeout — timeout_secs field on PUT /api/gateways/{gw}/inference allows configuring per-route request timeouts (0 = default 60s). Displayed in the gateway detail inference card.
L7 query parameter matchers — network policy rules can now match on URL query parameters using glob (single pattern) or any (list of patterns) matchers.

Changed¶

Protobuf stubs regenerated from OpenShell v0.0.22 (was ~v0.0.16).

[0.11.0] — 2026-04-03¶

Added¶

Docker containerisation — multi-stage Dockerfile and docker-compose.yml (ShoreGuard + PostgreSQL) for production deployments.
Health probes — unauthenticated GET /healthz (liveness) and GET /readyz (readiness — checks database and gateway service).
protobuf runtime dependency — added to pyproject.toml (was previously only available transitively via grpcio-tools in dev).
.dockerignore for minimal build context.

Fixed¶

PostgreSQL migration — users.is_active column used server_default=sa.text("1") which fails on PostgreSQL. Changed to sa.true() for cross-database compatibility.
Gateway health endpoint — GET /api/gateways/{gw}/health called get_client() directly instead of via dependency injection, causing GatewayNotConnectedError to return 200 instead of 503.

Changed¶

FastAPI version field now matches the package version (was stale at 0.8.0).

[0.10.0] — 2026-04-03¶

Removed¶

"Active gateway" concept — the server-side active_gateway file (~/.config/openshell/active_gateway) is no longer read or written by the web service. Every gateway operation now requires an explicit gateway name from the URL. Removed endpoints: POST /{name}/select, GET /info, POST /start, POST /stop, POST /restart (non-named variants). The named variants (/{name}/start etc.) remain unchanged.
active field removed from all gateway API responses (list, info, register).
Service methods removed: get_active_name(), write_active_gateway(), select(), health().
Auto-select of first registered gateway removed from register().

Changed¶

Stateless gateway routing — the name parameter is now required on get_client(), set_client(), reset_backoff(), get_info(), and get_config(). No method falls back to the active gateway file anymore.
GET /info → GET /{name}/info — gateway info endpoint is now name-scoped.
GET /config → GET /{name}/config — gateway config endpoint is now name-scoped.
LocalGatewayManager — start(), stop(), restart() now require a gateway name. Connection and client management simplified: always operates on the explicitly named gateway.
Frontend inference config — now shows when gateway is connected (gw.connected) instead of when it was the "active" gateway (gw.active). Gateway list highlights connected gateways.
Health store — uses GW directly for gateway name instead of fetching from /api/gateway/info.
Version bumped to 0.10.0.
756 tests (down from 774 — 18 tests for removed active-gateway functionality deleted).

[0.9.0] — 2026-04-03¶

Added¶

Sidebar navigation — collapsible sidebar with grouped navigation (Gateways, Policies, gateway-scoped Sandboxes/Providers, admin-only Audit/Users). Replaces the icon buttons in the topbar. Responsive: collapses to hamburger menu on mobile (<768px).
Light/dark theme toggle — switchable via sidebar button, persisted in localStorage. All custom CSS variables scoped to [data-bs-theme]; Bootstrap 5.3 handles the rest automatically.

Fixed¶

Audit page breadcrumbs — audit.html now has breadcrumbs and uses the standard layout instead of container-fluid.
Dashboard breadcrumbs — dashboard.html now has breadcrumbs.
Theme-aware tables — removed hardcoded table-dark class from all templates and JS files; tables now adapt to the active theme.

[0.8.0] — 2026-04-03¶

Fixed¶

RBAC response_model crash — added response_model=None to 17 route decorators (16 in pages.py, 1 in main.py) returning TemplateResponse, HTMLResponse, or RedirectResponse. Prevents FastAPI Pydantic serialization errors on non-JSON responses.
IntegrityError/ValueError split — gateway-role SET endpoints now return 409 on constraint conflicts and 404 on missing user/SP/gateway, instead of a blanket 404 for both.

Added¶

Migration verification tests — 5 tests (tests/test_migrations.py) covering SQLite and PostgreSQL: fresh-DB, head revision, schema-matches-models, downgrade, and PostgreSQL fresh-DB.
RBAC regression & validation tests — 10 new tests (tests/test_rbac.py) for DELETE gateway-role 404s, invalid gateway name 400s, and invalid role 400s (user and SP symmetry).
Migration check script — scripts/verify_migrations.sh runs all Alembic migrations against a fresh database and verifies the final revision.
Migration CI workflow — .github/workflows/test-migrations.yml runs migration tests on SQLite and PostgreSQL for PRs touching migrations or models.
PR template — .github/PULL_REQUEST_TEMPLATE.md with migration checklist.
Migration runbook — docs/admin/migration-runbook.md with backup, upgrade, and rollback procedures.
Warning logs on error paths — all gateway-role endpoints now log logger.warning() for invalid names, invalid roles, not-found, and conflict responses.
Backoff for background tasks — _cleanup_operations() and _health_monitor() double their interval (up to a cap) after 10 consecutive failures and reset on success.
postgres pytest marker in pyproject.toml.

Security¶

Shell injection fix — verify_migrations.sh passes database URL via os.environ instead of bash interpolation in a Python heredoc.

Changed¶

Migrations squashed — all 7 incremental migrations replaced by a single 001_initial_schema.py that creates the final schema directly. Existing databases must be reset (rm ~/.config/shoreguard/shoreguard.db).
Migration CI caches uv dependencies via enable-cache: true.

[0.7.1] — 2026-04-01¶

Added¶

API reference docs — mkdocstrings[python] generates reference pages from existing Google-style docstrings. New pages under docs/reference/: Client, Services, API Internals, Models, and Config & Exceptions.

[0.7.0] — 2026-04-01¶

Added¶

pydoclint integration — new [tool.pydoclint] section in pyproject.toml with maximum strictness (Google-style, skip-checking-short-docstrings = false, all checks enabled). Added pydoclint >= 0.8 as dev dependency.
Comprehensive Google-style docstrings — all 1 193 pydoclint violations resolved across the entire codebase. Every function, method, and class now has Args:, Returns:, Raises:, and Yields: sections as appropriate. Compatible with mkdocstrings for future API reference generation.
Page templates — dedicated HTML templates for approval edit, approval history, gateway register, gateway roles, policy revisions, and provider form pages, replacing Bootstrap modal dialogs.

Changed¶

Database schema cleanup (migration 007):
Timestamp columns (registered_at, last_seen, created_at, last_used, timestamp) converted from String to DateTime(timezone=True) across gateways, users, service_principals, and audit_log tables.
gateways table rebuilt with auto-incrementing integer primary key (id) replacing the old name-based primary key.
user_gateway_roles and sp_gateway_roles migrated from gateway_name (String FK) to gateway_id (Integer FK) with ON DELETE CASCADE.
audit_log column gateway renamed to gateway_name; new gateway_id FK added with ON DELETE SET NULL.
Audit service refactored — uses with session_factory() context manager instead of manual session.close() in finally blocks. Gateway ID resolution via FK lookup on write.
Version bumped to 0.7.0.

Fixed¶

GatewayNotConnectedError in _try_connect_from_config — exception is now caught instead of propagating as an unhandled error.
request.state.role not set from _require_page_auth — page auth guard now correctly stores the resolved role in request state.

[0.6.0] — 2026-03-31¶

Added¶

Gateway-scoped RBAC — per-gateway role overrides for users and service principals. Alembic migration 006 adds user_gateway_roles and sp_gateway_roles tables.
Policy diff viewer — compare two policy revisions side-by-side.
Hardened RBAC — async correctness improvements and additional test coverage.

[0.5.0] — 2026-03-30¶

Added¶

Persistent audit log — all state-changing operations (sandbox/policy/gateway CRUD, user management, approvals, provider changes) are recorded in a database table with actor, role, action, resource, gateway context, and client IP.
Audit API — GET /api/audit lists entries with filters (actor, action, resource type, date range). GET /api/audit/export?format=csv|json exports the full log. Both endpoints are admin-only.
Audit page — /audit admin page with filter inputs, pagination, and CSV/JSON export buttons. Built with Alpine.js.
Alembic migration 005 — audit_log table with indexes on timestamp, actor, action, and resource type.
Audit cleanup — entries older than 90 days are automatically purged by the existing background cleanup task.

Fixed¶

Fail-closed auth — when the database is unavailable, requests are now denied with 503 instead of silently granting admin access.
Async audit logging — audit_log() is now async and runs DB writes in a thread pool via asyncio.to_thread, preventing event-loop blocking on every state-changing request.
UnboundLocalError in AuditService — log(), list(), and cleanup() no longer crash if the session factory itself raises; session is now guarded with None checks in except/finally blocks.
Audit actor for auth events — login, setup, register, and invite-accept now set request.state.user_id before calling audit_log(), so the audit trail records the actual user instead of "unknown".
Failed login auditing — failed login attempts now produce a user.login_failed audit entry, enabling brute-force detection.
Authorization failure auditing — require_role() now writes an auth.forbidden audit entry when a user is denied access.
Audit ordering in approvals — all six approval endpoints now log the audit entry after the operation succeeds, preventing false entries on failure.
Conditional delete audit — sandbox.delete and provider.delete only write audit/log entries when the resource was actually deleted.
Async background cleanup — the periodic cleanup task now uses asyncio.to_thread for DB calls instead of blocking the event loop.
Gateway retry button — the "Retry" button in the gateway error banner now correctly calls Alpine.store('health').check() instead of the removed checkGatewayHealth() function.

Changed¶

Frontend migrated to Alpine.js — all 20+ pages rewritten from Vanilla JS template-literal rendering (innerHTML = renderX(data)) to Alpine.js reactive directives (x-data, x-for, x-text, x-show, @click). No build step required — Alpine.js loaded via CDN.
Three Alpine stores replace scattered global state:
auth — role, email, authenticated status (replaces inline script + window.SG_ROLE)
toasts — notification queue (replaces showToast() DOM manipulation)
health — gateway connectivity monitoring (replaces checkGatewayHealth() globals)
XSS surface reduced — Alpine's x-text auto-escapes all dynamic content, eliminating the need for manual escapeHtml() calls in templates.
Render functions removed — renderGatewayTable(), renderSandboxList(), renderDashboard(), and ~50 other renderX() functions replaced by declarative Alpine templates in HTML.
app.js slimmed — reduced from ~340 lines to ~95 lines. Only retains apiFetch(), showConfirm(), escapeHtml(), formatTimestamp(), navigateTo(), and URL helpers.
WebSocket integration — sandbox detail, logs, and approvals pages receive live updates via CustomEvent dispatching from websocket.js to Alpine components.
Version bumped to 0.5.0.
717 tests (up from 710), including audit service, API route, and DB schema tests.

[0.4.0] — 2026-03-30¶

Added¶

User-based RBAC — three-tier role hierarchy (admin → operator → viewer) replaces the single shared API key. Users authenticate with email + password via session cookies; service principals use Bearer tokens for API/CI access.
Invite flow — admins invite users by email. The invite generates a single-use, time-limited token (7 days). The invitee sets their password on the /invite page and receives a session cookie.
Self-registration — opt-in via SHOREGUARD_ALLOW_REGISTRATION=1. New users register as viewers. Disabled by default.
Setup wizard — first-run /setup page creates the initial admin account. All API access is blocked until setup is complete.
Service principals — named API keys with roles, created by admins. Keys are SHA-256 hashed (never stored in plaintext). last_used timestamp tracked on each request.
User management UI — /users page for admins with invite form, role badges, and delete actions. Dedicated /users/new and /users/new-service-principal pages replace the old modal dialogs.
Error pages — styled error pages for 403, 404, and other HTTP errors instead of raw JSON responses in the browser.
User email in navbar — logged-in user email and role badge shown in the navigation bar.
Alembic migrations 002–004 — api_keys table, users + service_principals tables with FK constraints, invite token hashing.
CLI commands — create-user, delete-user, list-users, create-service-principal, delete-service-principal, list-service-principals.
710 tests (up from 635), including comprehensive RBAC, auth flow, invite expiry, self-deletion guard, and last-admin protection tests.

Changed¶

Auth module rewritten — shoreguard/api/auth.py expanded from ~100 to ~700 lines. Session tokens are HMAC-signed with a 5-part format (nonce.expiry.user_id.role.signature). Roles are always verified against the database, not the session token, so demotions take effect immediately.
All state-changing endpoints now enforce minimum role via require_role() FastAPI dependency (admin for user/SP management and gateway registration; operator for sandbox/policy/provider operations).
Frontend role-based UI — buttons and nav items hidden based on role via data-sg-min-role attributes. escapeHtml() used consistently across all JavaScript files.
Policies router split — preset routes (/api/policies/presets) are mounted globally; sandbox policy routes remain gateway-scoped only. Fixes a bug where /api/sandboxes/{name}/policy was reachable without gateway context.
Audit logging standardised — all log messages use actor= consistently. Role denials now include method, path, and actor. IntegrityError on duplicate user/SP creation is logged. Logout resolves email instead of numeric user ID.

Fixed¶

Timing attack in authenticate_user() — bcrypt verification now runs against a dummy hash when the user does not exist, preventing email enumeration via response time analysis.
Policies router double-inclusion — the full policies router was mounted both globally and under the gateway prefix, exposing sandbox policy routes without gateway context. Now only preset routes are global.
Missing exception handling — is_setup_complete(), list_users(), and list_service_principals() now catch SQLAlchemyError instead of letting database errors propagate as 500s.
verify_password() bare Exception catch — narrowed to (ValueError, TypeError) to avoid masking unexpected errors.
WebSocket XSS — sandboxName in toast messages is now escaped with escapeHtml(). Log level CSS class validated against a whitelist.
delete_filesystem_path missing Query annotation — path parameter now uses explicit Query(...) instead of relying on FastAPI inference.
Migration 004 downgrade documented as non-reversible (SHA-256 hashes cannot be reversed; pending invites are invalidated on downgrade).

Security¶

Constant-time authentication prevents timing-based email enumeration.
Invite tokens are SHA-256 hashed in the database (migration 004).
Session invalidation on user deletion and deactivation — existing sessions are rejected on the next request.
Last-admin guard with database-level FOR UPDATE lock prevents TOCTOU race.
Self-deletion guard prevents admins from deleting their own account.
Email normalisation (.strip().lower()) prevents duplicate accounts.
Password length enforced (8–128 characters) on all auth endpoints.
XSS escaping hardened across all frontend JavaScript files.

Dependencies¶

Added pwdlib[bcrypt] — password hashing with bcrypt.

[0.3.0] — 2026-03-28¶

Added¶

Central gateway management — Shoreguard transforms from a local sidecar into a central management plane for multiple remote OpenShell gateways (like Rancher for Kubernetes clusters). Gateways are deployed independently and registered with Shoreguard via API.
SQLAlchemy ORM + Alembic — persistent gateway registry backed by SQLAlchemy with automatic embedded migrations on startup. SQLite by default, PostgreSQL via SHOREGUARD_DATABASE_URL for container deployments.
Gateway registration API — POST /api/gateway/register to register remote gateways with endpoint, auth mode, and mTLS certificates. DELETE /api/gateway/{name} to unregister. POST /{name}/test-connection to explicitly test connectivity.
ShoreGuardClient.from_credentials() — new factory method that accepts raw certificate bytes from the database instead of filesystem paths.
Background health monitor — probes all registered gateways every 30 seconds and updates health status (last_seen, last_status) in the registry.
import-gateways CLI command — imports gateways from openshell filesystem config (~/.config/openshell/gateways/) into the database, including mTLS certificates. Replaces the old migrate-v2 command.
SHOREGUARD_DATABASE_URL — environment variable to configure an external database (PostgreSQL) for container/multi-instance deployments.
--local / SHOREGUARD_LOCAL_MODE — opt-in flag to enable local Docker container lifecycle management (start/stop/restart/create/destroy). In local mode, filesystem gateways are auto-imported into the database on startup.
--database-url / SHOREGUARD_DATABASE_URL — all env vars now also available as CLI flags.

Changed¶

GatewayService refactored — reduced from ~800 to ~250 lines. Gateway discovery now queries the SQLAlchemy registry instead of scanning the filesystem. Connection management (backoff, health checks) preserved.
Docker/CLI methods extracted to LocalGatewayManager (shoreguard/services/local_gateway.py), only active in local mode.
Frontend updated — "Create Gateway" replaced with "Register Gateway" modal (endpoint, auth mode, PEM certificate upload). Start/Stop/Restart buttons replaced with "Test Connection". "Destroy" renamed to "Unregister". New "Last Seen" column, Port column removed.
API route changes — POST /create (202 LRO) → POST /register (201 sync). POST /{name}/destroy → DELETE /{name}. Local lifecycle routes (start/stop/restart/diagnostics) return 404 unless SHOREGUARD_LOCAL_MODE=1.
Request-level logging — gateway register, unregister, test-connection, and select routes now log at INFO/WARNING level. LocalGatewayManager logs Docker daemon errors, port conflicts, missing openshell CLI, and openshell command failures.
api/main.py split into modules — extracted cli.py (Typer CLI + import logic), pages.py (HTML routes + auth endpoints), websocket.py (WebSocket handler), and errors.py (exception handlers). main.py reduced from 1 084 to ~190 lines (pure wiring).
Version bumped to 0.3.0.
Test suite rewritten for registry-backed architecture (635 tests).
Logger names standardised — all modules now use getLogger(__name__) instead of hardcoded "shoreguard". Removes duplicate log lines caused by parent-logger propagation.
Unified log format — single format (HH:MM:SS LEVEL module message) shared by shoreguard and uvicorn loggers with fixed-width aligned columns.
Duplicate "API-key authentication enabled" log line removed.

Fixed¶

SSRF protection — _is_private_ip() now performs real DNS resolution instead of AI_NUMERICHOST. Hostnames that resolve to private/loopback/ link-local addresses are correctly blocked. Includes a 2 s DNS timeout.
import-gateways crash on single gateway — registry.register() failures no longer abort the entire import; individual errors are logged and skipped.
from_active_cluster error handling — missing metadata files, corrupt JSON, and missing gateway_endpoint keys now raise GatewayNotConnectedError with a clear message instead of raw FileNotFoundError / KeyError.
init_db() failure logging — database initialisation errors in the FastAPI lifespan are now logged before re-raising.
_get_gateway_service() guard — raises RuntimeError if called before the app lifespan has initialised the service (instead of AttributeError on None).
WebSocket RuntimeError swallowed — RuntimeError during websocket.send_json() is now debug-logged instead of silently passed.
SQLite pragma errors — failures setting WAL/busy_timeout/synchronous pragmas are now logged as warnings.
_import_filesystem_gateways SSRF gap — filesystem-imported gateways were not checked against is_private_ip(). Now blocked in non-local mode, consistent with the API registration endpoint.
_import_filesystem_gateways skipped count — corrupt metadata JSON was logged but not counted in the skipped total, making the summary misleading.
_import_filesystem_gateways mTLS read error — read_bytes() on cert files had no error handling (TOCTOU race). Now wrapped in try/except with a 64 KB size limit matching the API route.
check_all_health DB error isolation — a database error updating health for one gateway no longer prevents health updates for all remaining gateways.
select() implicit name resolution — get_client() was called without name=, relying on a filesystem round-trip via active_gateway file. Now passes the name explicitly.
CLI import-gateways NameError — if init_db() failed, engine was undefined and engine.dispose() in the finally block raised NameError.
DB engine not disposed on shutdown — the SQLAlchemy engine was not disposed during FastAPI lifespan shutdown, skipping the SQLite WAL checkpoint.
Docker start/stop errors silently swallowed — SubprocessError/OSError in _docker_start_container/_docker_stop_container was caught but never logged.
Gateway start retry without summary — when all 10 connection retries failed after a gateway start, no warning was logged.
Frontend 404 on gateway list page — inference-providers was fetched without a gateway context, hitting a non-existent global route.

Security¶

SSRF DNS resolution bypass fixed (hostnames resolving to RFC 1918 / loopback addresses were not blocked).
SSRF validation includes DNS timeout protection (2 s) to prevent slow-DNS attacks.
remote_host input validation — CreateGatewayRequest.remote_host is now validated with a hostname regex (max 253 chars) before being passed to subprocess.
SSRF check skipped in local mode — is_private_ip() checks at connect-time and import-time now allow private/loopback addresses when SHOREGUARD_LOCAL_MODE is set, since locally managed gateways always run on 127.0.0.1.

Dependencies¶

Added sqlalchemy >= 2.0 (runtime) — ORM and database abstraction.
Added alembic >= 1.15 (runtime) — embedded schema migrations on startup.

[0.2.0] — 2026-03-27¶

Added¶

API-key authentication — optional shared API key via --api-key flag or SHOREGUARD_API_KEY env var. Supports Bearer tokens, HMAC-signed session cookies, and WebSocket query-param auth. Zero-config local development remains unchanged (auth is a no-op when no key is set).
Login page for the web UI with session cookie management and automatic redirect for unauthenticated users.
Long-Running Operations (LRO) — gateway and sandbox creation now return 202 Accepted with an operation ID. Clients can poll /api/operations/{id} for progress. Includes automatic cleanup of expired operations.
force flag for gateway destroy with dependency checking — prevents accidental deletion of gateways that still have running sandboxes unless --force is passed.
UNIMPLEMENTED error handling — gRPC UNIMPLEMENTED errors now return a human-readable 501 response with feature context instead of a generic 500.
OpenAPI documentation is automatically hidden when authentication is enabled.
Session cookies set secure flag automatically when served over HTTPS.
DEADLINE_EXCEEDED mapping — gRPC DEADLINE_EXCEEDED wird jetzt auf HTTP 504 (Gateway Timeout) gemappt.
ValidationError exception — neuer Fehlertyp für Eingabevalidierung (ungültige Namen, shlex-Fehler) mit HTTP 400 Response.
Gateway/Sandbox name validation — Regex-basierte Validierung von Ressourcennamen zur Verhinderung von Argument-Injection.
Client-IP tracking — Client-IP wird bei Auth-Fehlern und Login-Fehlversuchen mitgeloggt.

Changed¶

Sandbox creation returns 202 Accepted (was 201 Created) to reflect the asynchronous LRO pattern.
Destroyed gateways are now filtered from the gateway list by default.
Version bumped to 0.2.0.
Exception-Handler im gesamten Codebase von breitem except Exception auf spezifische Typen (grpc.RpcError, OSError, ssl.SSLError, ConnectionError, TimeoutError) eingeschränkt.
Logging deutlich erweitert: Debug-Logging für bisher stille Pass-Blöcke, Error-Level für Status ≥ 500, Warning-Level für Status < 500.
WebSocket-Auth-Logging von INFO/WARNING auf DEBUG normalisiert.
friendly_grpc_error() prüft jetzt freundliche Nachrichten vor Raw-Details.

Fixed¶

Auth credential check logic deduplicated into a single check_request_auth() helper shared by API dependencies, the /api/auth/check endpoint, and page auth guards.
Fire-and-forget Task-GC — Background-Tasks werden jetzt in einem Set gehalten, um Garbage-Collection durch asyncio zu verhindern.
Cross-Thread WebSocket-Signaling — asyncio.Event durch threading.Event ersetzt für korrekte Thread-übergreifende Signalisierung.
WebSocket Queue-Overflow — QueueFull-Exception wird abgefangen mit Fallback auf cancel_event.
Event-Loop-Blocking — get_client() im WebSocket-Handler mit asyncio.to_thread() gewrappt.
gRPC-Client-Leak — Client-Leak in _try_connect() behoben, wenn Health-Check fehlschlägt.
Login-Redirect-Validation — Open-Redirect-Schutz: URLs die nicht mit / beginnen oder mit // starten werden abgelehnt.
Error-Message-Sanitization — friendly_grpc_error() verhindert, dass rohe gRPC-Fehlermeldungen an API-Clients geleitet werden.
Thread-Safety — threading.Lock für GatewayService._clients und thread-safe Reads in OperationStore.to_dict().
YAML-Parsing-Robustheit — YAMLError, None- und Skalar-Werte werden in presets.py abgefangen.
Metadata-Datei-Robustheit — JSONDecodeError und OSError bei Gateway-Metadata-Reads mit Fallback behandelt.

Security¶

Open-Redirect-Schutz auf der Login-Seite.
API-Fehlermeldungen werden sanitisiert, um interne Details nicht preiszugeben.
Thread-sichere Client-Verwaltung und Operation-Store-Zugriffe.
Argument-Injection-Prävention durch Regex-Namensvalidierung.
Client-IP-Logging bei Auth-Events für Security-Monitoring.

[0.1.0] — 2026-03-25¶

Initial release.

Added¶

Sandbox management — create, list, get, delete sandboxes with custom images, environment variables, GPU support, and provider integrations.
Real-time monitoring — WebSocket streaming of sandbox logs, events, and status changes.
Command execution — run commands inside sandboxes with stdout/stderr capture.
SSH sessions — create and revoke interactive SSH terminal sessions.
Security policy editor — visual network rule, filesystem access, and process/Landlock policy management without raw YAML editing.
Policy approval workflow — review, approve, reject, or edit agent- requested endpoint rules with real-time WebSocket notifications.
Policy presets — 9 bundled templates (PyPI, npm, Docker Hub, NVIDIA NGC, HuggingFace, Slack, Discord, Telegram, Jira, Microsoft Outlook).
Multi-gateway support — manage multiple OpenShell gateways with status monitoring, diagnostics, and automatic reconnection.
Provider management — CRUD for inference/API providers with credential templates and community sandbox browser.
Sandbox wizard — guided step-by-step sandbox creation with agent type selection and one-click preset application.
Web dashboard — responsive Bootstrap 5 UI with gateway, sandbox, policy, approval, log, and terminal views.
REST API — full async FastAPI backend with Swagger UI documentation.
CLI — shoreguard command with configurable host, port, log level, and auto-reload.