Changelog¶
All notable changes to Shoreguard are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.37.0] — 2026-06-10¶
Solo-dev quality of life¶
Builds out the single-box path (homelab, DGX Spark): safe-by-default no-auth binding, one-click local inference providers, and phone push notifications for approvals.
Added¶
--unsafe-lan/SHOREGUARD_UNSAFE_LAN— explicit opt-in required to combine--no-authwith a non-loopback bind address. Without it, the CLI refuses the combination andenforce_production_safety()blocks startup (override still possible viaSHOREGUARD_ALLOW_UNSAFE_CONFIG).- Local inference auto-detect — in local mode, the Providers page probes
the default loopback ports of Ollama (11434), vLLM/NIM (8000), llama.cpp
(8080), and LM Studio (1234), and offers one-click provider creation with
the right OpenAI-compatible
base_urlprefilled (GET /api/gateway/local-inference). Agents reach local models throughinference.local/v1— no cloud API key ever exists. - ntfy webhook channel — new
channel_type: ntfyposts JSON publishes to an ntfy topic URL (ntfy.sh or self-hosted; optional access token viaextra_config.token). Approval events arrive as high-priority pushes (approval.pending= high,approval.escalated= urgent) — overnight agent runs can ping your phone for a decision. For a self-hosted ntfy on a LAN address, combine withSHOREGUARD_SSRF_ALLOWED_IPS.
Changed¶
--no-authnow binds to127.0.0.1instead of0.0.0.0when no explicit--host/SHOREGUARD_HOSTis given. Previously the unauthenticated admin UI was reachable from the entire network by default. Containerised no-auth deployments must now pass--host 0.0.0.0 --unsafe-lanexplicitly.
Fixed¶
- CLI flags now reach the uvicorn reload worker. Under
--reload(the default), uvicorn spawns the server as a fresh process that re-reads settings from the environment — flags like--localand--no-authsilently vanished there, and since the worker then looked prod-like,shoreguard --local --no-authcrashed at boot with prod-readiness errors. The CLI now exports its resolved flags to the environment. - The
drift_detectionbackground task was missing from the task-health supervision map; its done-callback raised aKeyErrorwhenever the loop exited (e.g. drift detection disabled).
[0.36.3] — 2026-06-10¶
Added¶
- SSRF allowlist
SHOREGUARD_SSRF_ALLOWED_IPS(#13) — comma-separated IPs/CIDR ranges exempted from the private/loopback SSRF rejection, so a homelab OIDC provider (Authelia, Keycloak, authentik) or an internal webhook/SMTP target on a LAN address can be used without--local. Applies consistently to OIDC issuer/JWKS/token endpoints, webhook URLs (registration and delivery-time DNS-rebinding re-checks — private-address webhook delivery was previously impossible even in local mode), SMTP hosts, and gateway endpoints. Matching happens against the resolved address;SHOREGUARD_ALWAYS_BLOCKED_IPSalways takes precedence. Invalid entries hard-fail boot; a/0entry and the redundant local-mode combination surface prod-readiness warnings. SSRF rejection messages now name the setting as the remedy.
Changed¶
- Combining local mode with an allowlisted private gateway endpoint now requires an mTLS bundle for that gateway (the certificate-free plaintext connection no longer applies to allowlisted hosts) — fail-closed edge of the new allowlist, flagged by a prod-readiness warning.
docs/reference/settings.mdregenerated; it was missing several previously-added settings (tracing, discovery, cert rotation, prover, audit export, …).
[0.36.2] — 2026-06-07¶
Solo-dev on-ramp¶
Sharpens the single-box / solo-developer path so it competes with "just run the
OpenShell TUI". The frictionless path already existed (shoreguard --local
--no-auth → SQLite auto-init → auto-import filesystem gateways → no-credential
sandbox) but was mis-signposted and had two sharp default edges.
Added¶
- Solo Dev guide — new
docs/getting-started/solo-dev.mdas the headline single-box on-ramp (in the nav before Quick Start, linked from the README). Quick Start is reframed as the team / remote-gateway flow. - In-app orientation hints (no new API surface) — the dashboard
"No gateways" empty state and the gateway-register form now point single-box
users at
shoreguard --local; the sandbox wizard's image field documents that blank uses the gateway default. - Headless first-admin docs —
SHOREGUARD_ADMIN_PASSWORDandshoreguard create-userdocumented in the installation + solo-dev guides for SSH-only boxes. - Boot-time Docker check — in local mode, a clear startup warning when Docker is unusable, instead of a later opaque gRPC timeout on first sandbox create.
Changed¶
- Local mode connects to a loopback/private gateway without mTLS when it is
registered with no certificate bundle, mirroring the existing private-IP SSRF
bypass. Strictly gated on
--local/SHOREGUARD_LOCAL_MODEand a private/loopback host — production behaviour is unchanged (mTLS still required by default). Emits a warning so the plaintext connection is visible. - Actionable sandbox ready-timeout — the warning now names
SHOREGUARD_SANDBOX_READY_TIMEOUTand points to/api/gateways/diagnosticsinstead of a bare "did not become ready in time". --localhelp text notes it requires a running Docker daemon.
Security¶
- Bumped
pymdown-extensions10.21.2 → 10.21.3 (CVE-2026-46338, docs toolchain only).
[0.36.1] — 2026-06-07¶
Security¶
Dependency-only patch that clears 8 CVEs flagged by pip-audit in four
locked dependencies. These are time-based advisories unrelated to the M38 sync;
the failing ci/audit job had blocked the v0.36.0 publish pipeline (GHCR / PyPI
/ sigstore never ran). No source changes — fastapi 0.135.1 already allows
starlette>=0.46.0, so the starlette bump needs no framework change, and the
full suite (3225 tests) passes unchanged under starlette 1.2.x.
pyjwt2.12.1 → 2.13.0 — PYSEC-2026-175 / -177 / -178 / -179. Direct dependency; thePyJWT[crypto]floor is raised to>=2.13.starlette0.52.1 → 1.2.1 — PYSEC-2026-161 (transitive viafastapi).idna3.11 → 3.18 — CVE-2026-45409 (transitive viahttpx/requests).pip26.1.1 → 26.1.2 — PYSEC-2026-196 (transitive, dev/CI tooling only).
[0.36.0] — 2026-06-07¶
M38 Upstream-Sync¶
Synced ShoreGuard against the NVIDIA/OpenShell v0.0.57 release tag
(stub regen via scripts/generate_proto.py --ref v0.0.57). Twelve new RPCs
wired end-to-end and one wire-breaking field move handled. Every non-supervisor
RPC again has client + REST + UI coverage — the coverage allowlist is back to
the original four supervisor-path RPCs.
Fixed — wire-breaking field move (SandboxStatus). Upstream
PR #1565 moved Sandbox.phase
and Sandbox.current_policy_version into the SandboxStatus sub-message.
Against a v0.0.57 gateway the previous code read empty fields, so every
sandbox showed phase unknown and wait_for_ready hung until timeout.
_sandbox_to_dict now reads status.phase / status.current_policy_version.
Added¶
- WS38.1 — Provider credential refresh / rotation. Four RPCs
(PR #1349) —
ConfigureProviderRefresh,GetProviderRefreshStatus,RotateProviderCredential,DeleteProviderRefresh— wired throughProviderManager, REST under/providers/{name}/refresh, and a Credential-Refresh modal on the providers page. Secret material is passed through to the gateway and never written to the audit log (key names only). - WS38.2 — Local-domain service routing. Four RPCs
(PR #1101) —
ExposeService,GetService,ListServices,DeleteService— via a newServiceManager,/servicesREST surface, and a Service Routing page. - WS38.3 — Interactive terminal.
ExecSandboxInteractive(PR #1331) — a true bidirectional TTY over a new/ws/{gw}/{sandbox}/execWebSocket bridge (api/ws_bridge.py) driving a vendored xterm.js terminal (frontend/vendor/xterm/), replacing the one-shot command runner. - WS38.4 — TCP / SSH forward.
ForwardTcp(PR #1029) — a raw bidi tunnel over/ws/{gw}/{sandbox}/forwardreusing the WebSocket bridge, with a Forward sub-tab (TCP target or SSH, the latter minting a relay session via the existingCreateSshSession). A full in-browser SSH client remains a follow-up; the SSH view streams the raw relay bytes. - WS38.5 — Gateway token (diagnostic).
IssueSandboxToken/RefreshSandboxToken(PR #1404) under/tokens. These RPCs bind the minted JWT to the caller's identity, so from ShoreGuard they mint a token for ShoreGuard's own gateway identity — an admin-only diagnostic, not a sandbox-scoped token (documented in the UI). - New WebSocket RBAC dependency
require_role_wsgates the mutating exec/forward channels at operator level. - Additive v0.0.57 fields surfaced in the client projections:
ObjectMeta.resource_version(on sandbox + provider — foundation for compare-and-swap onUpdateConfig) andProvider.credential_expires_at_ms.
[0.35.0] — 2026-05-09¶
M37 Upstream-Sync¶
Synced ShoreGuard against
NVIDIA/OpenShell origin/main @ 57a80ed2
— 157 commits past the M36 sync point (PR #943) including the
v0.0.37 release tag. Stub regen, eight new RPC clusters wired
end-to-end (client → REST → UI), GraphQL L7 inspection surfaced in the
endpoint-rule editor, and a hard-cut migration of the Provider and
Sandbox wire schemas to the new upstream ObjectMeta convention.
Gateway minimum lifts to v0.0.37+ (effectively
origin/main until the next upstream tag): ShoreGuard no longer
speaks the pre-ObjectMeta Provider/Sandbox shape and will fail at
the first list/get against any older gateway. There is no compat shim.
Added¶
- WS37.3 — Sandbox-provider attach lifecycle. Three new RPCs from
upstream PR #1242 —
ListSandboxProviders,AttachSandboxProvider,DetachSandboxProvider— fully wired throughshoreguard.client.sandboxes.SandboxManager.{list_providers, attach_provider, detach_provider}, the REST surface (GET,POST,DELETEunder/sandboxes/{name}/providers) and a new "Attached Providers" card on the sandbox-detail page with attach picker and detach badges. - WS37.4 — Provider-profile registry. Five new RPCs from upstream
PR #1170 wrapped in a
new
shoreguard.client.provider_profiles.ProviderProfileManager:ListProviderProfiles,GetProviderProfile,ImportProviderProfiles,LintProviderProfiles,DeleteProviderProfile. New REST router under/api/gateways/{gw}/provider-profiles(list, get, lint, import, delete). New UI page at/gateways/{gw}/provider-profileswith a table view and a lint-then-apply import dialog modeled on the GitOps apply flow. Gateway-detail page gained a "Profiles" quick-action button. - WS37.5 — GraphQL L7 inspection. Endpoint-rule editor's protocol
dropdown now offers
graphql(PR #1083). New endpoint-level fields (persisted_queries,graphql_max_body_bytes,pathglob) and per-rule fields (operation_type,operation_name,fields) are rendered conditionally when the protocol is GraphQL. Wire mapping added inshoreguard.client._convertersand reverse projection inshoreguard.client.policies._network_rule_to_dict. - WS37.6 — Two new gateway settings keys registered upstream and smoke-tested in ShoreGuard:
providers_v2_enabled(gateway-level opt-in for the provider profile policy composition surface).agent_policy_proposals_enabled(sandbox-level opt-in for the agent-driven policy proposal surface). Both flow through the existing generic Settings whitelist (UI editor is data-driven and discovers new keys automatically) — only test coverage was added for drift protection à la v0.34.2. Seedocs/reference/gateway-settings.mdfor the full registered-keys table.
Changed¶
- WS37.2 —
Providerwire-schema hard cut. Upstream relocatedProvider.idandProvider.nameinto a nestedObjectMetamessage with shifted field numbers (idwas 1, becamemetadata.id = 1;typewas 3, is now 2; etc.). ShoreGuard's_provider_to_dictandProvider-construction sites inProviderManager.{create, update}were rewritten to read/write viaprovider.metadata.{id, name, created_at_ms, labels}. TheProviderResponsePydantic schema gained explicitid,created_at_msandlabelsfields. Tests acrosstests/test_client_providers.pywere migrated. Sandboxwire-schema hard cut. SameObjectMetamigration also applied toSandbox—sandbox.id,sandbox.name,sandbox.created_at_msnow flow throughsandbox.metadata. The legacynamespacefield was removed upstream and dropped from the ShoreGuard projection. Bulk-rewrote the construction call sites intests/test_client_sandboxes.py,tests/test_client_resilience.py, andtests/test_m28_metrics.py.- Surface-coverage doc updated.
docs/reference/surface-coverage.mdreflects the new totals: 42 upstream RPCs, 38 client-consumed, 137 REST routes, 77 UI apiFetch calls.
Notes¶
- Agent-driven policy MVP RPCs from upstream
PR #1151 (
SubmitPolicyAnalysis,GetDraftPolicy,*DraftChunk*,GetDraftHistory) were already wired throughshoreguard.client.policiesandshoreguard.api.routes.policiesduring the M36 sync; this release verifies the surface remains consistent against the new proto.
[0.34.2] — 2026-04-28¶
Fixed¶
- Gateway OCSF logging toggle was silently broken. The gateway-detail
Observability switch and its REST/CLI counterparts wrote setting key
ocsf_logging_enabled, but OpenShellcrates/openshell-core/src/settings.rsregisters the key asocsf_json_enabled— gateways rejected the unknown key withInvalidArgument, so the toggle never took effect. Renamed all read/write call sites to useocsf_json_enabled(frontend/js/gateway.js,tests/test_rbac.py,tests/test_api_gateway_routes.py). Existing gateway state is unaffected; the next save through the toggle now writes the correct registered key.
Changed¶
- The CI
auditjob and thepre-pushpip-audithook now both ignoreCVE-2026-3219(pip 26.0.1, no fix version published yet). The CVE affects pip itself — bundled into the dev environment by uv — not Shoreguard's runtime dependencies. Whitelisted with an inline comment in both call sites so it's removed automatically once pip ships a patched release. All other CVEs continue to fail the build.
[0.34.1] — 2026-04-23¶
Changed¶
- Upstream-sync confirmation (post-M35). Re-verified parity against
NVIDIA/OpenShell
v0.0.36andorigin/main(through #943).proto/*.protois byte-identicalv0.0.35 → v0.0.36 → origin/main, so ShoreGuard's generated stubs remain wire-parity without regeneration. Documentation pin bumped fromv0.0.35tov0.0.36acrossinstallation.md,production-k8s.md, andsbom.md— any gateway≥ v0.0.30remains wire-compatible for existing flows.
Upstream absorbed (server-side, no ShoreGuard code change)¶
- Gateway-owned VM readiness + VM compute driver E2E (#901) — driver-vm rework; supervisor-gateway readiness handshake moves server-side.
- Optional gateway-native Prometheus endpoint
(#920) — new
--metrics-portflag exposesopenshell_server_grpc_requests_total,openshell_server_grpc_request_duration_seconds,openshell_server_http_requests_total, andopenshell_server_http_request_duration_seconds. Helm chart addsservice.metricsPort(default9090; set to0to disable). Complementary to ShoreGuard's control-plane metrics —docs/integrations/prometheus.mdnow points operators at scraping both jobs for end-to-end request-path visibility. - CI/Helm hygiene: #943 (helm ClusterRole cleanup), #938 (E2E Gate posting), #942, #928, #929, #926 (CI/toolchain bumps), #931 (driver-vm cross-compile preflight).
Upstream watchlist (unmerged at time of release)¶
drew/creating-a-docker-driver-like-the-vm-driver— bundled Docker compute driver; same compute-driver-axis question as v0.33 (#904 Podman).drew/containers-in-virtual-machines— libkrun OCI containers; ShoreGuard-irrelevant wire-wise.vcauxbrisebo/vm-gpu-support,feat/wsl-cdi-spec-watcher— server-only, no ShoreGuard delta expected.
[0.34.0] — 2026-04-23¶
Added¶
- M33 REST/UI Coverage Sweep. Every OpenShell capability ShoreGuard consumes is now reachable end-to-end — Client, REST, and UI:
- REST routes for sandbox inspection. New
GET /api/gateways/{gw}/sandboxes/{name}/configreturns the full stored sandbox configuration viaGetSandboxConfig, andGET …/provider-envsurfaces the gateway-injected provider environment with values redacted to[REDACTED]. Both were implemented on the client since v0.30 but previously had no REST caller. - Endpoint
allow_encoded_slashUI toggle. Rule editor now exposes the M32 GitLab-style-path switch per endpoint; no more YAML-only configuration. - GitOps merge-mode apply dialog. New YAML apply section on the
sandbox policy page with a
replace | mergeradio group, optionalexpected_versionguard, dry-run preview, and guided toast when the gateway rejects non-network edits withmerge_unsupported. - M18 drift indicator. Policy pin panel now renders
Pinned vXandActive vYside-by-side and flags drift when the supervisor has not yet reloaded the pinned revision — using thecurrent_policy_versionfield exposed in v0.33. - Advanced gateway settings editor. Collapsible key/value panel
under each gateway's detail view with add/edit/delete operations
against
PUT/DELETE /api/gateways/{name}/settings/{key}and a warning banner that validation is gateway-side only. - Coverage Matrix CI gate. New
docs/reference/surface-coverage.mdcross-tables every OpenShell RPC against Client / REST / UI reachability.scripts/check_coverage.pyenforces the matrix; thecoverage-matrixCI job is required and will fail the build when a new RPC or route lands without being plumbed through all three layers (with an explicit allowlist for the supervisor-only RPCsPushSandboxLogs,ReportPolicyStatus,ConnectSupervisor,RelayStream). - M34 —
SandboxSpec.log_level. The last unmapped upstreamSandboxSpecfield. Clientsandboxes.create()acceptslog_level=""|"debug"|"info"|"warn"|"error", the RESTPOST /sandboxesschema mirrors the same enum, and the creation wizard surfaces a Log Level select defaulting to "Gateway default" (empty string, upstream-conformant). - M35 — Proactive mTLS Cert Rotation. New
shoreguard.services.cert_rotation.CertRotationServicewires up thereload_credentials()hook that has existed since v0.31 but had no scheduler. A background task polls every registered gateway each hour (SHOREGUARD_CERT_ROTATION_POLL_INTERVAL_S); when a client cert drops below the threshold (SHOREGUARD_CERT_ROTATION_THRESHOLD_DAYS, default 7 days), the registry bytes are re-read and the gRPC channel is rebuilt. Retries use exponential backoff (SHOREGUARD_CERT_ROTATION_MAX_RETRIES, default 3); giveups fire thegateway.cert_rotation_failedwebhook. Rotation is idempotent relative to the registry bytes, so multi-replica deployments need no advisory lock. Successful rotations emit an auditgateway.cert_rotatedwith before/after validity and the attempt count. New metricsg_gateway_cert_rotations_total{gateway,outcome}with labelssuccess,failure,skipped_not_due,skipped_no_cert. New runbook atdocs/operations/cert-rotation.md.
Changed¶
- Configuration reference now documents the four
SHOREGUARD_CERT_ROTATION_*settings in a new "Cert Rotation" section. - Surface Coverage linked from the docs nav under Reference.
Not in scope¶
- Supervisor session relay RPCs (
ConnectSupervisor,RelayStream) remain generated-only; no Client / REST / UI surface. - A full gateway-settings table discovered from upstream remains a follow-up: the Advanced Settings editor is a generic key/value component hidden behind a warn banner, not a typed form.
current_policy_versiondrift indicator only covers the pin panel; cross-sandbox fleet-wide drift dashboards are deferred.
[0.33.0] — 2026-04-23¶
Added¶
- M32 Upstream-Sync + GitOps Incremental Merges. Pin bumped to
NVIDIA/OpenShell
v0.0.35(fromv0.0.32). Three upstream tags and sixmaincommits of delta, absorbed in six commits: - Stub regen (
chore(proto)). Protobuf stubs regenerated againstv0.0.35;sandbox.protorestored to byte-parity with upstream. TheSandbox/SandboxSpec/SandboxTemplatemessage family migrated upstream fromdatamodel.prototoopenshell.proto; call sites updated.compute_driver.protois now skipped by the regen script because the supervisor↔gateway surface is not consumed by ShoreGuard as a control-plane. NetworkEndpoint.allow_encoded_slash(upstream #826). New Field 11 flows through_dict_to_network_rule, the listing converter, and the Z3 prover. Endpoints fronting GitLab-style upstreams can now preserve%2Fin paths instead of rejecting. Default staysFalse, upstream-conformant.- L7 path canonicalization (upstream #878). Closes a soundness
gap in the Z3 prover's L7 reasoning. A new
canonicalize_request_path()mirrors the upstream Rust canonicalizer (percent-decode, dot-segment resolution, slash collapse,;paramsstrip) so prover counterexamples live in the same path universe as the gateway's enforcement path. - SSH session response charset contract (upstream #876). Every
field of
CreateSshSessionResponseis now validated against the documented charset before ShoreGuard surfaces the response. Defence-in-depth: a compromised or misconfigured gateway cannot push ProxyCommand-injection metacharacters into the REST response. current_policy_versionon sandbox endpoints. The field was already extracted client-side but not declared onSandboxResponse; REST consumers and OpenAPI tooling now see it and the M18 policy-pinning UI can render "configured vs. active".- GitOps incremental merge mode (upstream #860, requires gateway
≥ v0.0.33).POST /policy/applyacceptsmode: "replace" | "merge". In merge mode, ShoreGuard diffs current against target policy, emitsPolicyMergeOperations (remove_rulebeforeadd_rule, safety-ordered), and sends them through the gateway'sUpdateConfigRequest.merge_operationssurface.UnsupportedMergeErrorsurfaces as HTTP 400 for non-network edits (filesystem / process / landlock); callers retry withmode=replace. CLI:shoreguard policy apply --mode merge. New doc pagedocs/guides/gitops-merge.md.
Changed¶
- Documentation pin references bumped from
v0.0.32tov0.0.35acrossinstallation.md,production-k8s.md,sbom.md, andm8-demo.md. New compatibility matrix ininstallation.md.
Upstream absorbed (server-side, no ShoreGuard code change)¶
These upstream changes land on the gateway and pass through our existing surfaces without modification:
- Supervisor-initiated SSH connect / exec over gRPC-multiplexed relay
(#867).
ConnectSupervisor/RelayStreamRPCs and 12 new messages exist only as generated Python; ShoreGuard does not consume them. - Seccomp / procfs hardening (#844, #869, #891).
- Dedicated health-check listener on a separate unauthenticated port (#903, #915).
tower-httpTraceLayer request-level logging (#895).install-vmCLI for the gateway binary (#887).- Dedicated Kubernetes client without read-timeout for watches (#907).
- Read-only baseline path preservation (#910).
- SSRF host-alias resolution (#912).
- Configurable image-transfer timeout (#914).
- Sandbox
git clonetrusts the internal CA viaGIT_SSL_CAINFOinjection (#918). - E2E pipeline on external forks via copy-pr-bot flow (#922).
Upstream watchlist (unmerged at time of release)¶
pull-request/904— Podman compute driver. Clarify whether this exposes a new gateway runtime tag (extending the M30KNOWN_RUNTIMESset) or a sandbox-side compute driver (orthogonal axis) before the next sync.drew/creating-a-docker-driver-like-the-vm-driver— bundled Docker compute driver. Same clarification needed.drew/containers-in-virtual-machines— libkrun OCI containers; ShoreGuard-irrelevant.vcauxbrisebo/vm-gpu-support,feat/wsl-cdi-spec-watcher— server only, tracked until merge, no ShoreGuard delta expected.
[0.32.2] — 2026-04-19¶
Changed¶
- Upstream-sync confirmation (post-M29). Re-verified parity against
NVIDIA/OpenShell
v0.0.32andorigin/main@e39bb380. All four upstream.protofiles (sandbox.proto,inference.proto,datamodel.proto,openshell.proto) are byte-identical acrossv0.0.30→v0.0.32→origin/main, so ShoreGuard's generated stubs remain wire-parity without regeneration. Documentation references bumped fromv0.0.26tov0.0.32as the recommended gateway pin — any gateway≥ v0.0.30is wire-compatible. - Routed-inference docs now describe the upstream header
sanitization behaviour added in OpenShell PR
NVIDIA/OpenShell#826:
the gateway's router forwards only a common-header set
(
content-type,accept,accept-encoding,user-agent), the per-routedefault_headers, and a per-provider passthrough list (anthropic-version,anthropic-beta,openai-organization,x-model-id). ShoreGuard itself does not inject inference-path HTTP headers, so no code change is required — OpenTelemetrytraceparentpropagates on gRPC metadata, not on the forwarded HTTP surface. - Installation guide now mentions the standalone
openshell-gatewaybinary upstream began publishing in NVIDIA/OpenShell#853 as an alternative to the full cluster image.
No behavior changes at runtime, no schema changes, no new dependencies.
Upstream watchlist (not pulled)¶
Unmerged upstream branches tracked for a future milestone:
feat/os-81-incremental-policy-merge— adds incremental sandbox policy updates (proto/openshell.proto+45). Relevant to M23 GitOps once tagged.feat/supervisor-session-grpc-data+feat/supervisor-session-relay— re-platforms sandbox SSH connect + exec onto an HTTP/2 relay (proto/compute_driver.proto-25,proto/openshell.proto+120). Wire-break candidate; will require a stub regen + parity pass on the scale of M29 once released.fix/l7-path-canonicalization— L7 path canonicalization fix; may shift policy-prover counterexamples for path-prefix rules.tmutch/include-runtime-policy-revision-sandbox-get-output— surfaces runtime policy revision onsandbox get; pairs with M18 policy pinning UI once merged.
[0.32.1] — 2026-04-16¶
Fixed¶
- Release pipeline Trivy scan. The v0.31.0 and v0.32.0 Release
workflows both failed at the "Scan image for CRITICAL/HIGH CVEs"
step because the computed image reference carried the original
repository casing (
ghcr.io/FloHofstetter/shoreguard@sha256:…) and Trivy rejects uppercase OCI references withcould not parse reference. Thedocker/metadataanddocker/build-pushactions lowercase internally so the push itself worked, but the Trivy step readgithub.repositoryraw. A new workflow step lowercases the name into a step output and the Trivyimage-refnow reads from it. No code change; no wire-format change. This release exists purely to get a passing Release pipeline on tag so the Docker image, PyPI package, and GitHub Release body actually ship for the 0.32.x line. - Auto-tag version parser. The
auto-tagworkflow's version extractor used(.+)$which swallowed any trailing commit-subject text after the version, so a subject likerelease: ShoreGuard v0.32.0 — M29 + M30produced anEXPECTED=0.32.0 — M29 + M30and failed the pyproject.toml equality check. Bug was latent on v0.31.0 already. The capture group now stops at the first whitespace.
[0.32.0] — 2026-04-16¶
Added¶
- M29 Upstream-Sync Hardening. Parity pass against NVIDIA/OpenShell v0.0.27–v0.0.30 for the parts of the upstream delta that apply to a control-plane:
- Network-policy deny rules (upstream #822). Regenerated
sandbox.protofrom upstreamorigin/main— adds theL7DenyRulemessage andNetworkEndpoint.deny_rulesfield. Deny rules flow through the dict ↔ proto converter and the Z3 prover encoding asNOT(any_deny_matches)AND-ed over the allow-clause disjunction, so a deny rule that overlaps an allow rule makes the otherwise matching path UNSAT in counterexample search (deny wins). - TLD-level host wildcard rejection (upstream #791). Registering
a network policy with
*.com/*.io/*.localnow raisesPolicyValidationErrorat policy-write time and surfaces as HTTP 400. Multi-label suffixes like*.example.comstay allowed. - Symlink-aware binary paths (upstream #774). When a network policy declares a binary path that exists locally and is a symlink, the resolved realpath is persisted in the proto instead of the symlink. Remote-gateway paths (the common case) pass through unchanged.
- SSE error framing hardening (upstream #842). A new
_format_sse_eventhelper strips stray control characters from thedata:line before emission so a pre-escaped payload cannot smuggle a premature SSE-framing terminator. - Always-blocked IP list (upstream #814). New
SHOREGUARD_ALWAYS_BLOCKED_IPSsetting parses operator-supplied CIDRs at startup (invalid entries hard-fail boot via a Pydanticfield_validator) and feedsis_private_ip()so extra ranges (metadata VIPs, internal-management subnets) can be hard-blocked without a code change. - M30 libkrun microVM gateway awareness. Upstream PR #611 added a third gateway runtime alongside Docker and Kubernetes; ShoreGuard now models gateway runtimes via a strict metadata convention:
- New
shoreguard.gateway_runtimemodule defines the closedKNOWN_RUNTIMESset (docker,kubernetes,libkrun) plusget_runtime()/validate_runtime()helpers. The validator rejects typos (libKrun,krun) and normalises mixed-case inputs to the canonical lowercase spelling at registration time. RegisterGatewayRequesthas a Pydanticfield_validatoronmetadatathat picks upmetadata.runtime, validates it, and persists the normalised value. Unknown runtimes surface as HTTP 422 with a precise field pointer.GatewayResponsegets a top-levelruntime: str | Nonefield, populated byGatewayRegistry._to_dictsoGET /gateway/listandGET /gateway/{name}/infoexpose the same surface.GET /gateway/list?runtime=libkrunfilters gateways by runtime tag. Unknown runtime filters hard-fail with HTTP 400 instead of silently returning an empty list.gateway.registeraudit-log entries include the resolved runtime alongside endpoint, auth mode, and labels.
Changed¶
sandbox.protois now byte-identical toNVIDIA/OpenShell@origin/main, so ShoreGuard can speak wire-level parity with any OpenShell gateway ≥ v0.0.30, including for network-policy deny rules over the gRPC channel._dict_to_network_ruleis now the single chokepoint for network policy validation — it runs TLD-wildcard rejection and symlink resolution before the proto is built, so every caller (YAML import, REST CRUD, policy-apply proposal) picks up the same guarantees.
[0.31.0] — 2026-04-14¶
Added¶
- M28 Observability — complete. Three independent pillars land in this release; all three are off by default and opt in via env vars:
- Prometheus
/metrics— M28 gRPC call counters, retry counters, sandbox phase-transition counters, boot-hook run counters and duration histograms, and a gateway client-cert expiry gauge. The/metricsendpoint honours the normal auth gate and can be flipped public viaSHOREGUARD_METRICS_PUBLIC. - OpenTelemetry trace context through the routed-inference path.
FastAPI + gRPC auto-instrumentation so a W3C
traceparentheader propagates end-to-end from an incoming HTTP request through every outgoing gRPC call to an OpenShell gateway. Console exporter by default; setSHOREGUARD_TRACING_OTLP_ENDPOINT=...for OTLP/HTTP. NewSHOREGUARD_TRACING_*settings. Disabled unlessSHOREGUARD_TRACING_ENABLED=true. - Structured audit-log export lanes. A new
AuditExporterservice fans every successfully-written audit entry across three lanes, each independently togglable: stdout-JSON (one JSON line per entry for Loki/Vector), syslog viaSysLogHandler(RFC 5424 framing, JSON body, for SIEM receivers), and webhook via the existingfire_webhook()pipeline asaudit.entryevents. Lane errors are isolated — a broken receiver in one lane never prevents siblings from firing, and a lane exception never propagates into the audit write path. NewSHOREGUARD_AUDIT_EXPORT_*settings. - mTLS hardening. Client-cert validation is now eager at gateway
registration time (not lazy at first use), and the registered cert
is rotated proactively before expiry. Expiry is also surfaced on
/metricsviasg_gateway_cert_expiry_seconds(see above). - gRPC retry + deadline resilience on the sandbox path. Sandbox
create/exec/list now retry through a shared resilience layer with
per-op deadlines, exponential backoff, and jitter. Retry and final
status are surfaced on
/metrics.
Changed¶
- New dependencies:
opentelemetry-api,opentelemetry-sdk,opentelemetry-instrumentation-fastapi,opentelemetry-instrumentation-grpc,opentelemetry-exporter-otlp-proto-http.
[0.30.3] — 2026-04-13¶
Changed¶
- Docstring cleanup across the code surface. Every module,
class, and method docstring in
shoreguard/that was either a trivial one-liner or dominated by sprint-tracking noise (milestone identifiers, release-version timestamps used to justify a design choice, CHANGELOG-style prose) has been rewritten into a proper heading-plus-explanation shape that describes what the code does, why it looks the way it does, and where the non-obvious delegation boundaries are. That kind of context belongs next to the code; sprint history belongs in this file and in the roadmap. - Docstring cleanup — tests, scripts, frontend. Same pass
applied to
tests/module and class docstrings,scripts/demo-walker module/function docstrings plus banner and API description strings, andfrontend/js/,frontend/css/, andfrontend/templates/JSDoc headers and HTML/Jinja section comments. Sprint identifiers no longer appear in any source tree. Demo script filenames (scripts/m*_demo.py) stay as-is — they remain historical markers referenced from runbooks. No behavior, no test assertions, no Alpine bindings or selectors touched. - README. Bumped the
cosign verifyexample from the stale0.27.0image tag to0.30.2. - CI. Cleaned a stale milestone reference out of the
m12-fixture-lintjob comment. The job name itself is preserved so branch-protection status checks don't break.
Fixed¶
- Test rate-limiter leak under serial pytest. The
_disable_authautouse fixture intests/conftest.pyreset only the login rate limiter, not the global or write limiters. Underpytest-xdisteach worker has its own Python process so every worker starts with fresh limiters and the leak was invisible — the local-n autoruns stayed green. CI runs the test job serially, all tests share one process, the global limiter accumulates request counts, and tests late in collection order started seeing HTTP 429 on reads that should have been 200/404. The cascade failure looked like two distinct problems in the CI log (14 tests failing with 429 plus one withKeyError: 'total'), but theKeyErroris the same bug: the assertion readdata["total"]without checking the status code, so a 429 body without that field triggered it. Fix: callreset_limiters()instead ofreset_login_limiter(). Verified with a full serial pytest run — 2933 passed, 1 skipped.
No behavior changes at runtime, no schema changes, no new dependencies.
[0.30.2] — 2026-04-13¶
Added (M24)¶
- Terraform Provider v0.30.0. Released separately at
FloHofstetter/terraform-provider-shoreguard. Provider versioning now mirrors the ShoreGuard server (jump from v0.1.0 straight to v0.30.0) so operators can pin provider + server together. New resources: shoreguard_group,shoreguard_group_membership,shoreguard_group_gateway_role— RBAC as code.shoreguard_approval_workflow— M19 quorum + escalation config.shoreguard_policy_pin— M18 pin (locks active policy version, server returns HTTP 423 on any subsequent edit or approval while the pin is active).shoreguard_sandbox_boot_hook— M22 pre/post-create hooks.- Breaking change.
shoreguard_sandbox_policyhas been removed. Policy content belongs in the M23 GitOps flow (shoreguard policy export/apply), not Terraform state which would drift on every denial flow. Migration snippet in the provider CHANGELOG:terraform state rm shoreguard_sandbox_policy.<name>followed byshoreguard policy export …. - Build. Thin REST wrapper built on
terraform-plugin-frameworkv1.19 (Go 1.25). Acceptance tests are skeletons that skip withoutSHOREGUARD_BASE_URL+GATEWAY_NAME.
Added (M23)¶
- GitOps Policy Sync. Declarative YAML policy management for sandboxes,
driven from CI/CD. Two new endpoints under
/api/gateways/{gw}/sandboxes/{name}/policy/: GET /exportreturns a deterministic YAML document with a metadata block (gateway,sandbox,version,policy_hash,exported_at) plus the full policy. Round-trip stable: re-export of a parsed export yields the samepolicyblock.POST /applyaccepts{yaml, dry_run, expected_version}. Status codes:200 up_to_date,200 dry_run,200 applied,202 vote_recorded,409version mismatch,423pinned,400malformed YAML.- Optimistic locking.
expected_versionfalls back tometadata.policy_hashfrom the YAML document. Mismatch → HTTP 409 with the livecurrent_hashin the body so CI can refetch + retry. - M18 pin guard reuse. Apply (and dry-run) on a pinned sandbox returns HTTP 423 — CI sees the pin instead of silently planning a change that cannot apply. Export remains allowed (read-only).
- M19 workflow gating. When an active multi-stage approval workflow
exists for the sandbox, the first apply records one approve-vote on a
synthetic chunk id
policy.apply:<sha16>and returns 202. Subsequent votes (same YAML body → same chunk id) accumulate until quorum, at which point the upstreamUpdateConfigfires once. New tablepolicy_apply_proposals(Alembic 017) caches the pending YAML body so the second voter does not need to resubmit bytes. shoreguard policyCLI. Three Typer subcommands wrapping the new endpoints:export(stdout/file),diff(dry-run, exits 1 on drift),apply(writes, exits 1 if a vote was recorded but quorum not yet met, exit 2 on errors). ReadsSHOREGUARD_URL+SHOREGUARD_TOKENfrom env or--url/--tokenflags.- Drift detection (optional). New
DriftDetectionServicebackground loop, off by default behindSHOREGUARD_DRIFT_DETECTION_ENABLED. Polls every registered sandbox every interval and firespolicy.drift_detectedwebhook on any hash change between scans (someone edited the policy outside the GitOps pipeline). The first scan after restart bootstraps the snapshot silently. Failures per sandbox are logged + swallowed — one broken sandbox does not kill the loop. - New webhook events.
policy.applied(now also fires from apply),policy.drift_detected. Existingapproval.vote_cast/approval.quorum_metare reused when apply hits the quorum path with ascope: policy.applyfield added to the payload. - New audit events.
policy.exported,policy.apply.dry_run,policy.apply.noop,policy.apply.voted,policy.applied(apply variant),policy.drift_detected. - Demo + runbook.
scripts/m23_demo.pyruns an 8-phase walk against a live local stack (export → no-op → drift → write → vote → quorum → pin → drift hint). Detailed runbook atscripts/m23-gitops.md. - Tests. 49 new tests across
test_policy_diff_service.py,test_policy_yaml_service.py,test_policy_apply_proposal_service.py,test_drift_detection_service.py,test_policy_gitops_api.py, andtest_cli_policy.py.
Added (M22)¶
- Sandbox Boot Hooks. Operators can attach pre/post-create hooks to a
sandbox. Pre-create hooks act as ShoreGuard-side validation gates that
run before
CreateSandboxreaches the gateway: their command executes viasubprocess.runin the ShoreGuard process with a whitelisted environment (SG_SANDBOX_NAME,SG_SANDBOX_IMAGE,SG_SANDBOX_POLICY_ID, plus user-defined env). A failure raisesBootHookErrorand aborts sandbox creation. Post-create hooks run afterCreateSandboxsucceeds, executing inside the new sandbox via the existingExecSandboxRPC — intended for warm-up tasks (apt update, telemetry init). The execution surface is intentionally ShoreGuard-side because OpenShell v0.0.26 has no native hook RPC; once upstream ships one,BootHookServicewill detect it and delegate. - Storage. New table
sandbox_boot_hooks(Alembic 016) withphase,command,workdir,env_json,timeout_seconds,order,enabled,continue_on_failure, plus run state (last_run_at,last_status,last_outputtruncated to 4 KiB). - REST API under
/api/gateways/{gw}/sandboxes/{name}/hooks:GET(list, viewer),GET /{id}(single),POST(admin),PUT /{id}(admin),DELETE /{id}(admin),POST /reorder(admin),POST /{id}/run(operator, manual trigger). Audit events:boot_hook.created,boot_hook.updated,boot_hook.deleted,boot_hook.reordered,boot_hook.manual_run. SandboxService.create()integration. When the boot hook service is wired in,create()runs the pre-create gate beforeCreateSandboxand the post-create chain after. The new admin-onlyskip_hooksflag onPOST .../sandboxesbypasses both phases for recovery scenarios. Failures from continue-on-failure hooks are surfaced in the response underboot_hooks.post_createrather than rolled back.-
Frontend. New "Hooks" tab on the sandbox detail page (
frontend/templates/pages/sandbox_hooks.html+frontend/js/boot_hooks.js) with separate Pre-create / Post-create sections, in-place toggle, reorder buttons, an editor modal (command, workdir, env KEY=VALUE, timeout, continue_on_failure), and a one-click "Run" button that surfaces the captured output inline. -
MicroVM Gateway Discovery. ShoreGuard can now auto-register OpenShell gateways announced via DNS SRV records (
_openshell._tcp.<domain>). Discovery runs both as a manual trigger (POST /api/gateway/discover— operator+) and as a configurable background loop in the application lifespan (analogous to the existing_health_monitor). Discovered endpoints flow through the same_validate_endpoint_formatguard as manual registration, so the*.svc.cluster.localwhitelist still applies and other private IPs are still rejected unlesslocal_modeis enabled. - New dependency.
dnspython >= 2.6(MIT licensed). - Settings. New
DiscoverySettingsblock (SHOREGUARD_DISCOVERY_*):enabled,domains,interval_seconds,default_scheme,auto_register,resolver_timeout_seconds. Off by default. - Service.
shoreguard/services/discovery.pyexposesdiscover_domain,discover_all,auto_register,run_once, andstatus. Names are derived from the SRV target host (sanitised, max 253 chars), with the port appended when not 443/30051. - REST API.
POST /api/gateway/discover(operator+, optional{domains: [...]}override; audit-logged asgateway.discovered) andGET /api/gateway/discovery/status(viewer). -
Frontend. "Discover" button on the gateways list page that triggers
POST /api/gateway/discover, surfaces the result counts in a dismissable banner + toast, and refreshes the table. -
Demo + tests.
scripts/m22_demo.pywalks the boot-hook + discovery flow end-to-end against a live ShoreGuard. New test files (tests/test_boot_hooks_service.py,tests/test_api_boot_hooks_routes.py,tests/test_discovery_service.py,tests/test_api_discovery_routes.py) add ~80 unit + integration tests; total suite remains green.
Added (M21)¶
- SBOM / Supply-Chain Viewer. Operators can upload a CycloneDX JSON
SBOM per sandbox (typically from CI) and browse components, licenses,
and known vulnerabilities directly in the ShoreGuard UI. Vulnerabilities
are read offline from the CycloneDX
vulnerabilitiesarray — no online NVD/OSV lookup. - Storage. Two new tables (Alembic 015):
sbom_snapshots(one per(gateway, sandbox), holds metadata + the original CycloneDX payload) andsbom_components(denormalised rows for fast paginated search). A new upload replaces the prior snapshot — historical snapshots are deliberately out of scope. - Service.
shoreguard/services/sbom.pyparses CycloneDX 1.5, aggregates per-componentvuln_count+max_severityviabom-refjoin, and exposesingest,get_snapshot,get_raw_json,delete_snapshot,search_components,get_vulnerabilities. No new Python dependency — the parser is self-contained (~280 LoC). - REST API under
/api/gateways/{gw}/sandboxes/{name}/sbom:POST(upload, admin, max 10 MiB, audit-logged assbom.uploaded),GET(snapshot metadata),GET /components(paginated search by?search=+?severity=, includingseverity=CLEANfor vuln-free components),GET /vulnerabilities(sorted highest-severity first),GET /raw(original payload asapplication/vnd.cyclonedx+json),DELETE(admin, audit-logged assbom.deleted). - Frontend. New
SBOMtab in the sandbox sub-nav. Empty-state upload flow with cURL example for CI; component table with debounced search + severity-filter chips + pagination; vulnerabilities tab with severity-coloured CVE cards and reference links; admin-only Replace + Delete actions; raw download. - Tests. 46 new service tests covering parser happy + failure paths, ingest replace + cascade, search/filter combinations, pagination clamping; 20 new API route tests covering upload/get/list/ delete + edge cases. Full suite: 2805 passed, 1 skipped.
- Demo script.
scripts/m21_demo.pywalks 8 phases against a running ShoreGuard, using the bundled CycloneDX fixturescripts/fixtures/sample_cyclonedx.json(10 components, 2 CVEs). - Ingestion model. v0.0.26 OpenShell exposes no SBOM RPC, so M21 is
upload-only. CI is the right source anyway — it knows which build is
deploying. A future milestone can add a gateway-pull path once the
upstream
feat/237-sbom-toolingbranch ships.
Added (M20)¶
- RPC Parity.
ShoreGuardClientgrew two new thin wrappers around OpenShell RPCs that were previously unreachable from the UI: aget_inference_bundle()that returns the fully resolved inference config (cluster default + route list + per-route credential state) andSandboxManager.get_config()/get_provider_environment()for inspecting live sandbox config + provider env projection. API keys are redacted tohas_api_key: boolat the wrapper boundary so the UI can render a shield badge without handling secrets. - New endpoint.
GET /api/gateways/{gw}/inference/bundle(viewer, audit-logged). Surfaces the resolved bundle as a table on the gateway detail page with a per-route credential shield badge. - Push-based policy-status wait.
approve_chunk/approve_allno longer busy-poll via_poll_policy_loaded. NewPolicyStatusBroker(shoreguard/services/policy_status.py) opens a short-livedWatchSandboxstream in a worker thread, sets anasyncio.Eventon everydraft_policy_update, confirms the new version viaGetSandboxPolicyStatus, and falls back to a 2 s slow poll if the stream fails. The stream is always cancelled cleanly on success, timeout, or cancellation. Browser receives the same updates over the existing/wschannel via a newsg:policy-status-updateDOM event, so any open page reacts without a hard refresh. The persistent-first sort toggle state is now persisted per browser vialocalStorage. - Tests. 8 new tests covering broker happy-path, wake on draft
update, timeout fallback, cancel cleanup, upstream watch failure,
and
api_key → has_api_keyredaction.
Added (M19)¶
- Multi-Stage Approval Workflows (Quorum). Per-sandbox approval
workflows let teams require multiple sign-offs before a policy
change takes effect. New models
ApprovalWorkflow+ApprovalDecision(Alembic 014) andApprovalWorkflowServicewithupsert,delete,record_decision,check_quorum, and reactiveescalationon each vote (no background scheduler — escalation fires on the next vote after the deadline). - Endpoints.
GET|PUT|DELETE /api/gateways/{gw}/sandboxes/{name}/approval-workflow(admin for writes, viewer for reads).GET /api/gateways/{gw}/approvals/{chunk_id}/decisionsreturns the running tally + voter list.POST .../approveunder an active workflow returns HTTP 202vote_recordeduntil quorum is reached, at which point the upstreamApproveChunkfires exactly once. A single reject is unanimous and kills the proposal immediately.POST .../approve-allis admin-only when a workflow is active and returns HTTP 409 to non-admins (emergency override path). - Webhook events.
approval.vote_cast,approval.quorum_met,approval.escalated— all carry the workflow ID, sandbox, voter, and current tally. - Frontend. Workflow banner + vote-count badge + voter list on the approval detail modal, "Vote to Approve" button (disabled after the current user has voted), and an admin-only workflow config modal on the sandbox detail page.
- Tests. 37 new service + API tests covering upsert, quorum, rejection, escalation, admin override, and webhook firing paths.
Added (M18)¶
- Policy Pinning. Operators can pin the active policy version of
a sandbox to prevent accidental edits during an incident or change
freeze. New
PolicyPinmodel (Alembic 013) +PolicyPinServicewithpin,unpin,get,check, and auto-expiry. All seven policy-write endpoints (PUT /policy, network/filesystem/process CRUD, preset apply) plusPOST .../approveand.../approve-allnow raisePolicyLockedError→ HTTP 423 when a pin is active. Export (M23) remains allowed; discovery + read paths are unaffected. - Endpoints.
GET|POST|DELETE /api/gateways/{gw}/sandboxes/{name}/policy/pin.POSTaccepts{reason, expires_at}(operator+, audit-logged aspolicy_pin.created/policy_pin.deleted). - Security-Flagged Rules UI. Rule chunks that OpenShell marks as security-flagged now render a red shield badge per chunk, a dedicated filter chip, and a warning banner on the approval page. The "Approve All" confirmation dialog carries an explicit "include flagged" checkbox so flagged rules cannot be bulk-approved by accident.
- Frontend. Pin banner + lock/unlock button + pin modal (reason + expiry picker) on the sandbox detail page. All policy sub-pages (network, filesystem, process, presets, approvals) disable their edit buttons when the sandbox is pinned.
- Tests. 43 new tests covering service CRUD, auto-expiry, guard-by-endpoint coverage, and UI state.
Added (M17)¶
- Policy Prover (Z3 Formal Verification). New optional
dependency on
z3-solver.ProverService(shoreguard/services/prover.py) ships four query templates encoded as Z3 constraints over the sandbox policy: can_exfiltrate— is there a writable egress path to a non-whitelisted destination?unrestricted_egress— does any network rule allow0.0.0.0/0on an unbounded port range?binary_bypass— can a binary hash outside the allowlist be executed?write_despite_readonly— can any filesystem write succeed despite a readonly root? Each template returns SAT / UNSAT plus a witness model on SAT so operators can see why a policy fails.- Endpoints.
POST /api/gateways/{gw}/sandboxes/{name}/policy/verify(operator+) runs one or more templates.GET /api/gateways/{gw}/policies/presets/verifylists the available templates + default parameters. - Frontend. New "Verify" tab on the sandbox detail page with a preset picker, run-button, and a result panel that renders the witness model as a table on SAT or a green "property holds" banner on UNSAT.
- Tests. 30 new unit tests covering each template happy path, malformed policy input, and Z3 timeout handling.
- Demo.
scripts/m17_demo.pywalks the four templates against a purposefully misconfigured sandbox.
Added (M16)¶
- Binary-Context Approvals. Approval chunks now carry binary +
process context so reviewers can decide with full evidence.
DenialContextServicecaches the denial context atsubmit_analysistime (in-memory TTL cache) and enriches it atget_draft: - Process ancestry breadcrumb (parent → grandparent → …)
- Binary SHA-256 badge
- Persistent-context badge (flagged when the same binary has requested approval before)
- L7 request samples table (up to 10 recent requests with method, path, status, source)
- Frontend. Approval detail modal renders the new context block with collapsible sections and a "Persistent first" sort toggle on the pending-approvals list.
Added (M15)¶
- Bypass Detection Dashboard. OCSF events classified as potential
policy bypasses (denials followed by success, egress via unusual
ports, DNS exfiltration signatures) are streamed into a new
BypassServicering buffer (last 1 000 events, in-memory) and exposed both as an API and a UI tab. - Endpoints.
GET /api/gateways/{gw}/bypass(paginated event list with severity filter) andGET /api/gateways/{gw}/bypass/summary(per-severity counts + top offending sandboxes). - Frontend. New "Bypass" tab on the gateway detail page with a severity filter, event timeline, and a MITRE ATT&CK technique mapping per event.
Fixed (M14)¶
- Approve → reload race. The
POST /approveandPOST /approve-allendpoints now accept a?wait_loaded=truequery parameter. When set, the server polls the gateway's policy status internally (up to 30 s) and only returns once the new policy version is reported asloaded— or 504 on timeout. This eliminates the client-side polling loop that was previously required to avoid spurious 403s from the proxy still running the old policy. All three demo scripts (m7_demo.py,m8_demo.py,m12_demo.py) have been updated to use the server-side wait. - Local-mode plaintext gateway auto-register. When
SHOREGUARD_LOCAL_MODE=true, the filesystem gateway importer now skips mTLS certificate material forhttp://(plaintext) endpoints. Previously, if the OpenShell data directory contained cert files alongside a plaintext gateway, they were imported and the connection attempt used TLS against a plaintext endpoint, resulting in a permanentunreachablestatus.
[0.30.1] — 2026-04-12¶
Changed¶
- Moved
charts/openshell-cluster→tests/fixtures/charts/openshell-cluster. The chart was never a supported production install path — it wraps NVIDIA'sghcr.io/nvidia/openshell/clusterall-in-one image (privileged k3s-in-container, ~10-15% network overhead from double iptables NAT) so thatscripts/m12_demo.pycan exercise the M12 federation code path in local/kind/CI without requiring NVIDIA's upstream OpenShell Helm chart at test time. Keeping it undercharts/misled readers into thinking it was a production option. The fixture is now clearly scoped as internal test infrastructure: its README leads with a "not a supported install path" banner and points at the real production pattern (install NVIDIA's upstream OpenShell chart separately, thencharts/shoreguardalongside it). CI renames the lint/render block into a dedicatedm12-fixture-lintjob so fixture status is tracked distinctly from the supportedhelm-lintjob.scripts/m12_demo.pyandscripts/m12-federation.mdreference the new path and carry the same positioning notice.
Added (M12)¶
- Internal M12 federation test fixture at
tests/fixtures/charts/openshell-cluster/. Runs the upstreamghcr.io/nvidia/openshell/cluster:0.0.26k3s-in-container image as a privileged StatefulSet so the helm-deployed ShoreGuard can federate multiple gateways entirely in k8s. A post-install bootstrap Job (weight 5,bitnami/kubectl)kubectl exec's into the cluster pod, generates a CA + server + client mTLS set inside/certs, creates the k3s-internal secretsopenshell-server-tls,openshell-server-client-ca,openshell-client-tls, andopenshell-ssh-handshake(idempotent viakubectl apply --dry-run=client), then exports the client material as an outer-ns Secret<release>-openshell-cluster-client-tls.helm testships a busyboxnc -zvTCP probe. Chart-time validation fails rendering whenlabel.envis empty. scripts/m12_demo.py— in-k8s federation end-to-end demo. k8s analog ofscripts/m8_demo.py: reads each gateway's client mTLS Secret viakubectl, registers both clusters viaPOST /api/gateway/registerwithauth_mode=mtls/scheme=https, then drives the same Phase A–J federation assertions (label filter, per-gateway audit attribution, unfiltered audit coalescence,/api/gateway/listwith labels +status=connected). Sandbox exec steps (Phases F + G) route through ShoreGuard's/api/gateways/{gw}/sandboxes/{sb}/execLRO instead of shelling toopenshellCLI, so the host running the demo only needskubectl,helm, anduv.scripts/m12-federation.md— runbook for the M12 demo: kind cluster, privileged namespace, twohelm install cluster-{dev,staging}, onehelm install sg,kubectl port-forward, the Phase-A-J walk-through, and Phase K (kubectl rollout restart statefulset/cluster-dev-openshell-clusterwhile drivingcluster-stagingtraffic, proving gateway-independence of the control plane).- CI
m12-fixture-lintjob..github/workflows/ci.ymlnow runshelm lint tests/fixtures/charts/openshell-clusterplus a positive render matrix (label.env=dev+label.env=staging) and a negative test asserting emptylabel.envmust fail rendering. Job is named and scoped separately from the supportedhelm-lintjob so fixture status never gets mistaken for production chart status.
Added (docs)¶
- Production Kubernetes deployment runbook at
docs/deploy/production-k8s.md. End-to-end walkable guide for ops teams deploying ShoreGuard alongside NVIDIA's upstream OpenShell Helm chart on a real k8s cluster. Covers prerequisites (CNI with NetworkPolicy enforcement, cert-manager, ingress-nginx), BYO Secret pattern,helm installwith the production preset and all required overrides, gateway registration with mTLS material extracted from NVIDIA's chart-created Secrets, a post-deploy verification checklist, and day-2 operations (multi-replica scaling, secret rotation). Cross-linked fromdocs/admin/deployment.md,charts/shoreguard/README.md, and the MkDocs nav.
Added (chart)¶
networkPolicy.egress.inClusterGatewayschart value. First-class egress rule for in-cluster OpenShell gateways (TCP 30051 to private Pod IPs). The existing LLM-providers block only allows 443/tcp to non-RFC1918 CIDRs, so federated gateways running inside the cluster were unreachable unless patched via theegress.extraescape hatch. New value:enabled: false(default off),port: 30051,podSelector: {},namespaceSelector: {}. Point the selectors at NVIDIA's upstream OpenShell Helm chart pod labels and flipenabled: truefor in-k8s federation deploys. CI render test added.
Added (M10 + M11)¶
- Helm chart MVP at
charts/shoreguard/(M10). Single-replica, SQLite-in-emptyDir, no Ingress by default — gets ShoreGuard running on a freshkind/k3dcluster withhelm install sg ./charts/shoreguard --set admin.password=.... Secret key is generated once per release and preserved across upgrades via alookup.SHOREGUARD_ALLOW_UNSAFE_CONFIGis injected automatically whendatabase.urlis empty so the pod boots past the prod-readiness gate. Newhelm-lintCI job covershelm lintplus ahelm templaterender smoke check. charts/shoreguard— M11 production hardening. Turns the M10 MVP chart into something an ops team would actually roll. New values:replicaCount,persistence.{enabled,storageClassName,size,accessMode,existingClaim},existingSecret(BYO Secret path),networkPolicy.*(ingress-namespace selector, DNS/LLM-provider/Postgres/extra egress blocks),podDisruptionBudget.{enabled,minAvailable},tests.{enabled,image},forwardedAllowIps. New templates:pvc.yaml,networkpolicy.yaml,pdb.yaml,tests/test-connection.yaml. The Deployment switches strategy betweenRecreate(single-replica) andRollingUpdate(maxSurge=1, maxUnavailable=0for multi-replica), passesSHOREGUARD_REPLICASandSHOREGUARD_FORWARDED_ALLOW_IPSto the pod, and swaps thedatavolume betweenemptyDirand a PVC based onpersistence.enabled. Chart version bumped0.1.0 → 0.2.0,appVersion → 0.30.1.charts/shoreguard/values.production.yaml— opinionated preset that enables PVC + cert-manager + nginx-ingress + NetworkPolicy + structured JSON logs + forwarded-headers trust. Single-replica by default (the preset is RWO-PVC-shaped); scale out only after settingdatabase.urlto an external Postgres.- Chart-time footgun guards (
templates/_helpers.tpl:shoreguard.validate).helm templatenow fails with a clear message whenexistingSecretcollides withadmin.password/secretKey, whenreplicaCount > 1is combined withpersistence.enabled=trueand nodatabase.url(RWO-PVC deadlock), or whenreplicaCount > 1is combined with nosecretKey/existingSecret(session HMAC drift). helm testhook.helm test <release>now runs a tinycurlimages/curlpod thatcurls/healthzand/versionagainst the in-cluster Service (not the Ingress — keeps the test independent of cluster DNS and TLS trust). Gated ontests.enabled.shoreguard.server.forwarded_allow_ipssetting (envSHOREGUARD_FORWARDED_ALLOW_IPS, default"127.0.0.1"). Passed to uvicorn asforwarded_allow_ipstogether withproxy_headers=True, so X-Forwarded-Proto/Host from a trusted TLS-terminating proxy is honored. Without this, sessions behind nginx-ingress would seehttp://internally and issue non-Secure cookies. The production chart preset sets it to"*".- Backend hard-fail for multi-replica without a stable secret key.
check_production_readiness()now emits anERROR(escalated from aWARN) whenSHOREGUARD_REPLICAS > 1andauth.secret_keyis unset, which causesenforce_production_safety()to raise aRuntimeErrorat startup. The original rate-limiterWARNstays because the in-process limit problem is orthogonal to the secret key one. - CI
helm-lintjob extended to render the production preset and assert that the multi-replica-without-secretKey footgun guard actually fires.
Fixed¶
- Release workflow:
aquasecurity/trivy-actionpin bumped from the non-existent@0.28.0tag to@v0.35.0. The old pin failed the GitHub Actions resolver before any step ran, so thedockerjob in the release workflow never reachedbuild-push-actionand thev0.30.0image never landed on GHCR (only:latestwas available). Verified against the failed run ofv0.30.0(gh run view 24282878746 --log-failed→Unable to resolve action aquasecurity/trivy-action@0.28.0). The action's maintainers migrated all tags to thev-prefix convention —@v0.35.0is the current stable tag and keeps the sameimage-ref/exit-code/severity/vuln-typeinput surface we rely on.
[0.30.0] — 2026-04-11¶
The headline of this release is federation in production shape:
ShoreGuard now ships with a topbar switcher, label-based gateway
filtering, per-gateway audit attribution, and a single-file Python
script that drives the complete agent → routed inference → L7 denial
→ approve → audit → retry flow against two live OpenShell
clusters in parallel. The same release also closes the long-standing
"webhook backend exists, no UI" gap with a new /webhooks admin
page, and shore-up audit attribution across the gateway routes so the
audit log can be sliced by gateway with no cross-attribution leaks.
Two end-to-end automation scripts (scripts/m7_demo.py and
scripts/m8_demo.py) now exercise the full vision flow on every run;
both pass exit 0 in ~30 seconds and ~3-4 minutes respectively,
against real OpenShell gateways and a real Anthropic API key.
Added¶
- Webhook management UI at
/webhooks(admin only). Lists every registered webhook with channel badge, event-type chips, active / paused state, and per-row actions for test, view delivery log, pause/resume, edit, and delete. Inline create form with a one-time HMAC signing-secret reveal callout. Edit and delivery-log modals. The webhook backend has shipped for several releases — this is the first operator-facing surface for it. - Topbar gateway switcher. The read-only gateway status badge has been replaced with a dropdown that lists every registered gateway with status dot and labels, and navigates to the picked gateway's detail page on click. Pure URL navigation, no client-side "active gateway" state. Available on every page.
- Label filter on the gateways list page. New text input next
to the existing free-text filter accepts
key:value(or comma-separatedk:v,k2:v2for AND semantics) and reduces the table to gateways carrying those labels. The backend?label=query parameter on/api/gateway/listwas already supported. - Audit log filterable by gateway. New
?gateway=<name>query parameter on/api/auditand/api/audit/export, plus a matching text input on the audit page. Lets an operator reconstruct the full register → configure → run → deny → approve sequence for one gateway in chronological order, even when other gateways are active concurrently. - Webhook CRUD now lands in the audit log. New
webhook.create,webhook.update,webhook.delete, andwebhook.testaudit entries carry the URL, event types, and channel type in the detail blob. This was the last unaudited route family in the API. WebhookService.fire_to()direct delivery. The webhook service now exposes a method to deliver an event to one specific active webhook, bypassing the subscription filter. The/testendpoint uses this so clicking the "Test" button on a webhook always reaches its target — even if the webhook doesn't subscribe towebhook.test. Paused webhooks now return HTTP 409 instead of silently dropping the request.- End-to-end demo scripts and runbooks.
scripts/m7_demo.pydrives the single-gateway vision flow (login → register → inference provider → launch sandbox → claude agent → L7 denial → approve → audit → retry) in ~30 seconds.scripts/m8_demo.pydoes the federated version against two clusters in ~3-4 minutes, with per-gateway audit-attribution assertions. Each script ships alongside a markdown runbook (scripts/m7-demo.md,scripts/m8-demo.md) for the manual recipe. Both scripts are idempotent — re-running deletes any leftover state before recreating.
Fixed¶
GET /api/gateway/{name}/inforeturned 500 on a connected gateway.GatewayService.get_info()injectsconfiguredandversioninto the response dict, butGatewayResponsewasextra="forbid", so the live endpoint crashed inside FastAPI's response validator. Schema now accepts both fields.- Gateway-route audit entries were landing with
gateway_name=NULL.gateway.register,unregister,setting_update/delete,update_metadata,start/stop/restart/destroyall passgateway=nametoaudit_log()now, so the new?gateway=<name>filter actually finds them. Without this, every gateway-scoped audit row was invisible to per-gateway queries. - Webhook
/testendpoint silently produced zero deliveries when the target webhook didn't subscribe towebhook.test(or*). The globalfire()path filters by subscription, so the test button was a lie unless the webhook happened to subscribe to the test event type. The newfire_to()direct-delivery path fixes it; paused webhooks now return 409 instead of dropping. - CSP-strict header tests asserted
'unsafe-inline'was not a substring of the CSP header, which broke after the v0.29 fix that addedstyle-src-attr 'unsafe-inline'for Alpine.js's inlinestyleattributes (x-show / x-cloak / x-transition). Replaced with a per-directive check that allows the narrowerstyle-src-attrwhile keepingdefault-src,script-src, andstyle-srcstrict.
[0.29.0] — 2026-04-11¶
This release closes M1 OpenShell v0.0.26 Alignment, M2 OCSF
Observability, M3 L7 Denial Intelligence (in the reduced form
documented in S3.1), and M5 Production Readiness. Highlights:
OpenShell v0.0.26 stub regeneration with TTY exec and named inference
routes, the full gateway settings API, effective-policy and provider-env
projection views, a policy-analysis submission endpoint, OCSF parsing
plus server-side filters in the sandbox logs viewer, denial context UX
on the approvals page, /version and hard-fail production checks,
backup/restore scripts, a rollback runbook, and Trivy + Bandit in CI.
Added¶
- OpenShell v0.0.26 alignment (M1 / S1.1). Protobuf stubs regenerated
against upstream OpenShell v0.0.26 (was v0.0.22). Three stub files
actually changed (
inference_pb2.py,openshell_pb2.py,openshell_pb2.pyi); the rest compiled byte-identically. This unblocks the two user-visible features below. - TTY exec for interactive commands.
POST /api/gateways/{gw}/sandboxes/{name}/execnow accepts a booleanttyfield in the request body. Whentrue, the gateway allocates a pseudo-terminal so interactive programs that checkisatty()(e.g.pythonREPL,vim,htop) behave correctly. Defaults tofalse, so existing callers are unaffected. Requires a gateway running OpenShell v0.0.23 or newer. - Named inference routes on
GET /inference.GET /api/gateways/{gw}/inferencenow accepts an optional?route_name=query parameter. Empty (the default) returns the cluster's default inference route; passing a name likesandbox-systemreturns the route that OpenShell v0.0.25+ uses for sandbox system-level model calls.PUT /inferencealready acceptedroute_namein the request body; this release closes the GET-side gap. - Gateway Settings API (M1 / S1.2). New admin-gated REST endpoints
expose OpenShell's global gateway configuration:
GET /api/gateway/{name}/settings,PUT /api/gateway/{name}/settings/{key}(body{"value": …}accepting string, bool, or int), andDELETE /api/gateway/{name}/settings/{key}. OpenShell has no separateUpdateGatewayConfigRPC; updates are sent per-key via the existingUpdateConfigRPC with theglobalflag set. The new API is value-agnostic — any settings key the gateway recognises (including the newocsf_logging_enabledtoggle) can be read and written without further code changes. - Effective policy view —
GET /sandboxes/{name}/policy/effective(M1 / S1.3). Stable contract endpoint for "what the gateway actually enforces", as opposed to "what was last PUT". Presets are merged eagerly into the declared policy today, so the endpoint returns the stored envelope with an addedsource: "gateway_runtime"marker, giving the UI a stable route even if OpenShell ever separates declared from effective server-side. - Provider env-var projection view —
GET /providers/{name}/env(M1 / S1.3). Read-only endpoint that returns the environment variables a provider injects into sandboxes — keys only, values always redacted. Each entry is tagged withsource:credential,config, ortype_default(from the provider type'scred_keyinopenshell.yaml). Useful for debugging agent misconfiguration without exposing secrets. POST /sandboxes/{name}/policy/analysis(M1 / S1.3, closes M1). Pass-through REST endpoint for the OpenShellSubmitPolicyAnalysisRPC. External denial analyzers (LLM-backed or rule-based) can submit observed denial summaries + proposed policy chunks through ShoreGuard's HTTP API; the gateway decides accept/reject per chunk and returns counters plus rejection reasons. Admin-only, rate-limited, audit-logged assandbox.policy.analyze. This closes M1 OpenShell v0.0.26 Alignment — S1.1, S1.2, and S1.3 are now all merged.- OCSF parsing & rendering in sandbox logs (M2 / S2.1). OpenShell v0.0.26
emits structured security events in an OCSF shorthand format over the
existing
SandboxLogLinestream (level"OCSF", target"ocsf"). ShoreGuard now parses these lines via a newshoreguard.services.ocsfmodule and exposesclass_prefix,activity,severity,disposition,summary, and both bracket + gRPC structured fields on every log entry that looks like OCSF. The sandbox logs viewer renders class badges, disposition colours (green = ALLOWED, red = DENIED/BLOCKED), dynamic class-prefix chips, and a per-row expand for structured field details. Live websocket stream and the REST/sandboxes/{name}/logsendpoint both include the parsedocsfdict when present. - OCSF server-side filters on
GET /sandboxes/{name}/logs(M2 / S2.2). Four new query parameters —ocsf_only,ocsf_class,ocsf_disposition,ocsf_severity— let advanced consumers pull forensic-sized windows without client-side post-processing. The sandbox logs viewer exposesocsf_onlyas a "Server OCSF" toggle next to the existing level filters. - Gateway observability toggle UI (M2 / S2.2). The gateway detail page
now includes an "Observability" fieldset with a form-switch bound to the
upstream
ocsf_logging_enabledgateway setting, wired via the existingPUT /gateway/{name}/settings/{key}endpoint. - Denial context UX on the approvals page (M3 / S3.1, closes M3
reduced). Re-verification of OpenShell v0.0.26/v0.0.27 protos
(byte-identical) showed M3 was only "blocked" in the strictest
ListDenialssense: three read paths already bring rich denial context into the control plane (GetDraftPolicy,GetDraftHistory, and the OCSF parser from S2.1). This sprint closes the remaining gaps:_chunk_to_dict()now forwardsdenial_summary_ids; the approvals table gains a "Seen" column formattingfirst_seen/last_seen/hit_count; the expand row surfacesstage, denial summary IDs as monospace chips, and a "View in logs" button. The logs viewer extracts a best-effort triggering binary and shows a "Find in approvals" button on DENIED/BLOCKED OCSF events that navigates via#binary=X&host=Yhash fragment; the approvals page listens tosg:approvals-updatefor live-refresh on draft_policy_update events. The history modal gains per-event-type filter chips with count badges, and the sandbox overview Approvals card now showssecurity_flagged_countandlast_analyzed_at_ms. Closes M3 with the documented caveat that the fullDenialSummarystruct (l7_request_samples,sample_cmdlines,ancestors,binary_sha256) remains push-only in upstream v0.0.27 and stays a future feature request to NVIDIA/OpenShell. /versionendpoint (M5 / S5.1). New unauthenticated endpoint that returns{version, git_sha, build_time}for the running binary, so operators can verify which artifact is serving traffic after a deploy. Build identity is propagated through newSHOREGUARD_GIT_SHAandSHOREGUARD_BUILD_TIMEDockerfile ARGs,release.ymlbuild-args, andshoreguard_infoPrometheus labels. A short-SHA image tag (ghcr.io/.../shoreguard:a1b2c3d) is now published alongside semver.- Hard-fail on production-readiness ERRORs (M5 / S5.1). New
Settings.enforce_production_safety()runs at startup and refuses to boot whencheck_production_readiness()reports anyERROR:-severity config issue (weak secret, CORS wildcard + credentials, SQLite in prod, strict CSP disabled, unrestricted self-registration in prod). SetSHOREGUARD_ALLOW_UNSAFE_CONFIG=trueto downgrade the error to aCRITICALlog line — documented as an emergency override in reference/configuration.md. - Backup and restore scripts (M5 / S5.1). New
scripts/backup.pyandscripts/restore.pyauto-detect SQLite vs Postgres from the database URL. SQLite uses the built-in online backup API; Postgres shells out topg_dump --format=custom/pg_restore --clean --if-exists. The Database Migrations guide now recommends these scripts as the primary backup path. - Rollback runbook (M5 / S5.2). New admin/rollback.md consolidates the incident-response flow (symptom detection → image rollback → optional DB rollback or restore → verification → post-mortem) into one page, with links into existing troubleshooting, migration, and deployment docs.
- Supply-chain hardening (M5 / S5.3, closes M5). CI gains a new
securityjob that runs Bandit at medium-and-above severity over theshoreguardpackage (pip-audit already covers dependency CVEs; Bandit adds source-level SAST for Python-specific patterns — eval, shell injection, insecure hashing). The release pipeline runs Trivy against the freshly-built image by digest between build-push and cosign, withignore-unfixed=trueand failure on CRITICAL/HIGH only. Grafana starter dashboard atdeploy/grafana/shoreguard.jsoncovers six panels — HTTP request rate by status, p95/p99 latency by path, gateways by status, operations in flight, webhook success rate — with ashoreguard_infobuild annotation track so deploys show up as vertical lines across every panel.
Changed¶
- Error responses now follow RFC 9457 Problem Details. Error bodies
are served with
Content-Type: application/problem+jsonand carry the standardtype,title,status, anddetailfields alongside the existing ShoreGuardcode(and any extension members such asrequest_id,errors,feature, orupgrade_required). Thedetailfield is unchanged, so existing clients that read onlybody.detail(including the ShoreGuard web UI) keep working without modification. PolicyManager.watch()stream flattener forwardstargetandfields. The liveWatchSandboxconsumer was dropping both on the live pathway, even thoughget_logs()surfaced them correctly via the unary RPC. Additive change — no existing consumer breaks.- Sync
OperationServiceremoved; tests run againstAsyncOperationServicedirectly. Production has usedAsyncOperationServiceexclusively since v0.27; the sync class remained only because tests reached it through an_AsyncOperationAdaptershim inconftest.py. This release deletes the sync class (~480 LOC), the adapter, and the sync-class test file entirely. The new harness runsAsyncOperationServiceon an in-memoryaiosqliteengine with in-flight LRO task drainage on teardown to avoid closed-DB races. Net: -2077 LOC, full suite 2477 passed, 35 skipped.
Fixed¶
verify_passwordno longer raises on corrupt hashes. A malformed or truncated password hash row in the database used to surface as an unhandledPwdlibError; it now returnsFalseso the login attempt fails cleanly and the account lockout counter advances as intended.min_levelparameter onGET /sandboxes/{name}/logsnow preserves OCSF events. OpenShell'slevel_matches()helper assigns unknown levels (including"OCSF") numeric rank 5, which any non-emptymin_levelsilently dropped. ShoreGuard now always fetches upstream withmin_level=""and applies the level filter locally, bypassing OCSF entries unconditionally.check_production_readiness()now actually returns the warnings list that its type signature promised — the method previously collected the list and fell through without areturnstatement, so callers always gotNone.
Tests¶
- Auth edge-case coverage raised from 75% to 96% — targeted tests for token expiry, account lockout transitions, and OIDC error paths.
- WebSocket auth-error coverage raised from 67% to 94% — coverage for authentication failure branches in the sandbox log stream endpoint.
shoreguard/services/operations.pycoverage raised from 61% to 100%. The previous baseline reflected an inverted gap: the test suite exercised the syncOperationServicevia an adapter inconftest.pywhile the prodAsyncOperationService(used by the API routes) was untested. The follow-up refactor in this release consolidates the two classes into one, so this coverage win is now permanent.
[0.27.0] — 2026-04-10¶
Security¶
-
Strict CSP is now the default —
SHOREGUARD_CSP_STRICTdefaults totrue, closing the loop on the M1–M4 hardening work that shipped in v0.26.0 plus the M2.1 inline-event-handler extraction completed in this release. Fresh installs now receive a Content-Security-Policy with a per-request cryptographic nonce on every<script>tag, no'unsafe-inline',frame-ancestors 'none'(clickjacking protection),base-uri 'self'(base-tag injection protection), andform-action 'self'(form hijacking protection).'unsafe-eval'is retained inscript-srcbecause Alpine.js uses theFunction()constructor internally — the@alpinejs/cspbuild was evaluated during M2.1 but its expression parser is limited to plain property chains (no operators, no literals, no method-call arguments), which proved too restrictive for this UI. Unlike'unsafe-inline','unsafe-eval'does not permit DOM-injected script execution, so the XSS surface remains dramatically smaller than the legacy policy. -
CSP hardening M2.1 — inline event handler extraction. M2 in v0.26.0 extracted inline
<script>blocks but missed 28 inline event handler attributes (onclick="",onkeydown="") whichscript-src-attrblocks regardless of nonce. All 28 are now converted: static template handlers become Alpine@click/@keydowndirectives on registeredAlpine.data()components, and dynamically-renderedinnerHTMLhandlers usedata-action/data-argmarkers dispatched by a single delegated click listener per component. Inlinestyle=""attributes produced by JS renderers (policy editor, wizard) also replaced with Bootstrap utility classes sostyle-src-attrstays clean.
For operators running stock ShoreGuard: no action needed. The pages you already use (dashboard, sandboxes, wizard, policy editor, audit log, approvals, gateways, providers, users, groups, settings, invite flow) have all been refactored to work under strict CSP.
For operators with custom templates, inline scripts, or third-party
embeds that cannot yet be nonce-gated: set
SHOREGUARD_CSP_STRICT=false to fall back to the legacy
'unsafe-inline' policy. The legacy field SHOREGUARD_CSP_POLICY
continues to work as an escape hatch when strict mode is off.
Changed¶
- Production-readiness check is now strict-mode aware. The warning
about
'unsafe-*'directives inauth.csp_policyis now gated oncsp_strict=False— when strict mode is enabled (default), that field is unused and no warning fires.
[0.26.1] — 2026-04-10¶
Changed¶
- Docstring coverage — pydoclint clean across
shoreguard/— Every public function, method, and Pydantic model inshoreguard/now has a Google-style docstring withArgs/Returns/Raises/Attributessections as appropriate. 410 pre-existing pydoclint violations across 21 files were fixed (96 inapi/schemas.py, 64 inservices/operations.py, 51 inapi/pages.py, …). Zero runtime behaviour changes; this unblockspydoclintin CI so future docstring drift gets caught at review time. - Removed stale linter suppressions — Systematic audit of all
# noqa/# type: ignore/# pyright: ignorecomments. ~150 justified suppressions kept (stdlib API signatures, SQLAlchemy column semantics, protobuf stub typing, fake gRPC test doubles, singletonPLW0603,__init__D107, …) — each now carries a comment explaining why. 12 non-justified suppressions removed by adding proper types or narrowing: SQLAlchemy event-handler params indb.py,operation_service/gateway_servicenarrowing inapi/main.py+api/metrics.py,_get_auth_settings/_webhook_settings/_cli_init_dbreturn types,_UNSETsentinelcast()inservices/registry.py+sandbox_meta.py. The cleanup surfaced two real type bugs:_cli_init_dbwas annotated-> Nonedespite callers invoking.dispose()on the returnedEngine, and the module-leveloperation_servicecarried a staleAsyncOperationService | OperationService | Noneunion that never matched runtime reality. Both fixed; no runtime behaviour change.
[0.26.0] — 2026-04-09¶
Added¶
- CSP strict-mode foundation —
SHOREGUARD_CSP_STRICT=trueopt-in enables a per-request nonce onrequest.state.csp_nonceand an unsafe-*-free Content-Security-Policy built fromauth.csp_policy_strict(default remains off until the frontend refactor lands). Templates can reference{{ csp_nonce(request) }}on inline<script>tags and switch between the standard and CSP-safe Alpine.js builds via{% if csp_strict_enabled() %}. This is Milestone 1 of the multi-session CSP hardening plan — seecsp-hardening-followup.mdfor the full roadmap.
Changed¶
- CSP hardening M2 — All inline
<script>blocks extracted from Jinja templates intofrontend/js/(theme-init.js,dashboard.js,audit.js;providers.jsandwizard.jsbind their ownDOMContentLoadedhandlers).GWis now read fromdocument.documentElement.dataset.gatewayinconstants.js, eliminating the last Jinja-templated inline script. WithSHOREGUARD_CSP_STRICT=true, strict CSP no longer reports inline-script violations — only inline-style (M3) and Alpinex-data(M4) violations remain. - CSP hardening M3 — All inline
style="..."attributes and<style>blocks removed from Jinja templates. Shared patterns moved to the newfrontend/css/utilities.css(sg-prefixed width/max-width/font-size/cursor utilities) and auth pages sharefrontend/css/auth.css. Wizard step toggling now usesclassList.toggle('d-none', ...)instead ofelement.style.display. WithSHOREGUARD_CSP_STRICT=true, strict CSP no longer reportsstyle-srcviolations — only Alpinex-data(M4) remains. - CSP hardening M4 — Every Alpine.js component is now registered via
Alpine.data(name, factory)(per-file inside eachfrontend/js/*.jsfactory file, plus a newfrontend/js/auth.jsfor the login/register/setup/invite forms). Templates reference them by name (x-data="loginForm") instead of inline object or spread-merge literals — the four{ ...pageFn(), ...sortableTable(...) }patterns on the gateways/policies/users/groups pages are nowgatewaysList,presetsListPage,usersListPage,groupsListPage. Directive expressions containing arrow functions,ifstatements, or multi-statement sequences (logout click, toast auto-remove, ws-state listener, clipboard-copy buttons, inference-configx-effect, filesystem-policy add-form focus) were extracted to store/component methods ($store.auth.logout,$store.toasts.scheduleRemove,onWsState,copyInvite,copyKey,maybeLoad,openAddForm). Auth pages now share the newcomponents/alpine_loader.htmlpartial with the main base template so the CSP build is loaded consistently. WithSHOREGUARD_CSP_STRICT=true, the application loads with zero CSP-related Alpine violations — clearing the last blocker to making strict CSP the default in a future minor bump. - Pyright on
tests/+ parallel test execution — Pyright's include list now coverstests/alongsideshoreguard/, andpytest-xdistis a dev dependency so the suite runs withpytest -n auto. Enabling pyright on tests surfaced 303 pre-existing errors across 19 files (Optional narrowing, fake gRPC stub assignments typed asOpenShellStub, protobuf enum kwargs passed as raw ints, and a handful of test-setup bugs such as_FakeRpcErrormissingcancel()). All fixed test-side — zero changes toshoreguard/— viaassert x is not Nonenarrowing and narrow# type: ignore[assignment|arg-type|override]comments where the fake object pattern made narrowing impossible. On a 16-core box the suite now runs in ~43s parallel instead of ~4:46 serial (6.6× speedup).
[0.25.0] — 2026-04-09¶
Added¶
shoreguard config show [section]— dump the effective configuration as a table, JSON, or.env-style output. Secret values (secret_key,admin_password,client_secret,password) are redacted by default;--show-sensitivereveals them.shoreguard config schema [section]— dump pristine defaults plus descriptions in table/json/env/markdown format. Used to regeneratedocs/reference/settings.md.- Self-documenting settings — every
Settingsfield now carriesField(default=..., description=...). All ~100 environment variables have a one-line description surfaced viaconfig show. shoreguard audit export— offline audit log export (JSON or CSV) with asha256sum-compatible digest file and amanifest.jsoncarrying entry count, filters, timestamp, and tool version. All three files are written with 0600 permissions.- Structured logging improvements — text mode now renders
[request_id]via theRequestIdFilter(was silently dropped);JSONFormatteraddsmodule/func/line, merges caller extras, and emitsstack_info. uvicorn access logs carry the same request-id as application logs in both modes. - Global per-IP rate limiter (
SHOREGUARD_GLOBAL_RATE_LIMIT_*) as a coarse DDoS guardrail applied byglobal_rate_limit_middlewareto every HTTP request except health/metrics endpoints. - Request body size limit middleware
(
SHOREGUARD_LIMIT_MAX_REQUEST_BODY_BYTES, default 10 MiB) returning HTTP 413 before Starlette reads the body. - DB migration retry loop on startup with exponential backoff against
OperationalError(SHOREGUARD_DB_STARTUP_RETRY_*). Compose-friendly. - Background task supervision surfaced in
/readyzwithasyncio.wait_foron dependency probes (SHOREGUARD_READYZ_TIMEOUT). - Production-readiness check expansion — six new warnings: HSTS off,
CSP contains
unsafe-*,allow_registrationin prod, multi-replica with in-process rate limiter, SQLite in prod, text log format in prod. Warnings now carryERROR:/WARN:severity prefixes. docs/reference/settings.md— auto-generated reference of everySHOREGUARD_*environment variable grouped by sub-model.
Changed¶
- Audit log is now ORM-level append-only.
AuditEntryrows cannot be updated via the ORM, and deletion is only permitted fromAuditService.cleanup()via aContextVar-gated bypass. Enforcement raisesAuditIntegrityErroron commit.cleanup()switched to row-by-row deletion so thebefore_deletelistener fires. Direct SQL still bypasses enforcement — DB-level triggers are a post-v1.0 item. - CLI callback respects
ctx.invoked_subcommand— the main Typer callback no longer tries to bind a socket whenshoreguard config ...orshoreguard audit ...subcommands are invoked. - Graceful shutdown timeout honoured by uvicorn startup path.
- CORS settings tightened and exposed via
SHOREGUARD_CORS_*.
Security¶
- OIDC SSRF protection —
discover(),get_jwks(), andexchange_code()run all URLs (including those returned by a provider's discovery document) through the existing private-IP check. A compromised identity provider can no longer pivot requests to internal services like cloud metadata endpoints.
Fixed¶
- Version drift —
pyproject.tomlwas still reporting0.23.0after the v0.24.0 tag was cut. This release bumps directly to 0.25.0 to resync the package metadata with the release stream.
[0.24.0] — 2026-04-08¶
Added¶
- 1,193 mutation-killing tests — targeted tests designed to eliminate survived mutants identified by mutmut v3.5. Test count: 1,175 → 2,368.
- New
test_openshell_meta.py— first-ever coverage for OpenShell metadata loader (27 mutants, previously 100% survival). - New
test_auth_mutations.py(194 tests) — exhaustive auth CRUD, RBAC role resolution, service principal lifecycle, group management, session tokens, gateway-scoped roles. - Extended 20 existing test files across all major modules: formatters, sandbox templates, routes, OIDC, local gateway, webhooks, gateway service, operations, registry, policy, all client modules, DB, presets, CLI import, and audit service.
Fixed¶
- Pyright strict mode — resolved all 30 type-check errors (0 remaining):
operation_serviceunion type corrected for async/sync variants._get_svc()return type narrowed toAsyncOperationServicein route handlers (routes/operations.py,lro.py).db_cfgpossibly-unbound variable indb.pyPostgreSQL branch.discover()return type inapi/oidc.py.update_groupsentinel parameter type inapi/auth.py.- Async/sync union narrowing in
main.py,metrics.py,routes/gateway.py,routes/sandboxes.py.
[0.23.0] — 2026-04-08¶
Added¶
- OIDC/SSO authentication — multi-provider support with callback flow,
role mapping, and state validation (
api/oidc.py,alembic/versions/012_oidc_fields.py). - SSRF validation — URL allowlist/blocklist for webhook targets prevents server-side request forgery via internal addresses.
- Input sanitization — centralized validators for names, URLs, certs,
env vars, and command strings with configurable limits via
SHOREGUARD_LIMIT_*env vars. - pip-audit in CI — automated dependency vulnerability scanning in the GitHub Actions workflow.
- Deep health checks —
/readyznow measures DB latency, reports gateway health summary (total/connected/degraded), supports?verbose=truefor per-gateway details. - PostgreSQL connection pooling —
DatabaseSettingswithpool_size,max_overflow,pool_recycle,statement_timeout_msviaSHOREGUARD_DB_*env vars. - Graceful shutdown — LRO task cancellation (
shutdown_lros()), webhook delivery task tracking withshutdown(), ordered resource disposal. - Async engine disposal —
dispose_async_engine()for clean DB shutdown. - Docs — OIDC guide, security concepts, troubleshooting, audit guide, webhooks guide, Prometheus integration, gateway roles admin.
- 108+ new tests — OIDC, input validation, SSRF, webhook secret leak. Total: ~1194.
Changed¶
- Typed API response models —
extra="forbid"on Category-A models prevents uncontrolled field leakage throughextra="allow". - Webhook HMAC secret no longer exposed on GET/LIST endpoints — only
returned on create (
WebhookCreateResponse). - Docs restructured —
guide/→guides/, newconcepts/andintegrations/directories. graceful_shutdown_timeoutdefault raised from 5 → 15 seconds.
Security¶
- Fixed webhook HMAC signing secret leak on all GET/PUT responses.
- SSRF protection for webhook target URLs.
- Input length/format validation on all mutation endpoints.
[0.22.0] — 2026-04-08¶
Added¶
- User groups / teams — named collections of users for group-based RBAC. Groups have a global role and optional per-gateway role overrides, mirroring the existing individual user role system.
- Group membership management — add/remove users to groups via API and
frontend UI (
/groupspage with member modal). - Group gateway-scoped roles — per-gateway role overrides for groups, reusing the gateway roles modal from user/SP management.
- 4-tier role resolution — individual gateway > group gateway > individual global > group global. When a user belongs to multiple groups the highest rank wins.
- Group audit trail —
group.create,group.update,group.delete,group.member.add,group.member.remove,group.gateway_role.set,group.gateway_role.removeactions logged. - 65 new tests — CRUD, membership, cascade deletes, role resolution priority
chain, and HTTP-level endpoint tests (
test_group_rbac.py). Total: 1086.
Changed¶
- Gateway roles modal — now supports
user,sp, andgroupentity types.
[0.21.0] — 2026-04-07¶
Added¶
- Rate limiting — per-IP sliding-window rate limiter (
api/ratelimit.py) with configurable limits viaSHOREGUARD_RATELIMIT_*env vars. - Account lockout — progressive lockout after failed login attempts
(
api/auth.py) with configurable thresholds. - Security headers —
X-Content-Type-Options,X-Frame-Options,Strict-Transport-Security, etc. via middleware (api/security_headers.py). - Password strength validation —
api/password.pywith length, complexity, and common-password checks. - Structured error codes — machine-readable
codefield (e.g.GATEWAY_NOT_FOUND,RATE_LIMITED) in all error responses (api/error_codes.py,api/errors.py). - WebSocket server heartbeat — periodic
{"type": "heartbeat"}messages during idle withdropped_eventscounter for backpressure visibility. - WebSocket backpressure disconnect — slow consumers disconnected after
configurable consecutive drop limit (
SHOREGUARD_WS_BACKPRESSURE_DROP_LIMIT). - WebSocket client reconnect hardening — heartbeat watchdog (45 s timeout),
max retry limit (20), exponential backoff, and
sg:ws-stateevents for connection state UI indicator. - Prometheus metrics —
/metricsendpoint with login and rate-limit counters.
Changed¶
- Dynamic
__version__—shoreguard/__init__.pynow reads version from package metadata (importlib.metadata) instead of hardcoded string; single source of truth inpyproject.toml. - Deploy configs — consolidated Caddyfile and standalone compose into
deploy/directory. - .gitignore — trimmed from ~200 to ~30 lines, removed stale entries.
[0.20.0] — 2026-04-07¶
Added¶
- Pydantic Settings — centralized
shoreguard/settings.pywith 11 nested sub-models replacing 11os.environ.get()reads and 60+ hardcoded constants. All tuneable viaSHOREGUARD_*env vars (e.g.SHOREGUARD_GATEWAY_BACKOFF_MIN,SHOREGUARD_OPS_RUNNING_TTL). - Pydantic response models — typed response schemas (
schemas.py) on all API endpoints with OpenAPI tag metadata. - Request-ID tracking —
X-Request-IDheader propagated through middleware, available in all log records via%(request_id)s. - Prometheus latency metrics —
shoreguard_request_duration_secondshistogram with method/path/status labels, plus/metricsendpoint. - Structured JSON logging —
SHOREGUARD_LOG_FORMAT=jsonfor machine-readable log output. - GZip compression — responses ≥ 1 KB automatically compressed via Starlette GZip middleware.
- Audit pagination —
GET /api/auditsupportsoffset/limitwithitems/totalresponse format. - Input validation module —
api/validation.pywith reusable description, label, and gateway-name validators. - DB-backed operations —
AsyncOperationServicewith SQLAlchemy async, orphan recovery, and configurable retention. - SSE streaming for LROs —
GET /api/operations/{id}/streamstreams real-time status/progress updates via Server-Sent Events. run_lrohelper —api/lro.pywith idempotency-key support, automatic 202 response, and background task lifecycle.- Async DB layer —
init_async_db()/get_async_session_factory()indb.pyfor aiosqlite-backed async sessions. - Performance indexes — migrations 008–010 adding indexes on audit timestamp, webhook delivery, and operation status.
- Gateway register page —
/gateways/newwith breadcrumb navigation, description and labels fields (replaces modal). - Provider create/edit pages —
/gateways/{gw}/providers/newand.../providers/{name}/editwith Alpine.jsproviderForm()component (replaces modal).
Changed¶
- Consistent pagination — all list endpoints return
{"items": [...], "total": N}format. - CLI env-var hack removed —
cli.pyno longer writesos.environ["SHOREGUARD_*"]; usesoverride_settings()instead. - Frontend modals→pages — gateway registration and provider create/edit modals replaced with dedicated page routes and breadcrumb navigation.
Removed¶
- In-memory LRO store — replaced by DB-backed
AsyncOperationService. - Hardcoded constants —
_BACKOFF_MIN,_MAX_RESULT_BYTES,DELIVERY_TIMEOUT,MAX_DESCRIPTION_LEN, etc. now read from Settings. - Gateway/provider modals —
#registerGatewayModaland#createProviderModalremoved from frontend templates.
Dependencies¶
- Added
pydantic-settings>=2.0.
[0.19.0] — 2026-04-07¶
Added¶
- Async sandbox exec —
POST /sandboxes/{name}/execnow returns a long-running operation (LRO) with polling pattern instead of blocking. - Exec audit fields —
command,exit_code, andstatusadded tosandbox.execaudit detail for full traceability. - mTLS auto-generation —
openshell-client-tlssecret with CA cert is automatically created for OpenShell gateway connections. - Docker Compose profiles — optional
paperclipprofile for Paperclip integration alongside ShoreGuard. - Caddy reverse proxy — new Caddy service and OpenClaw profile in the deploy stack for production-ready TLS termination.
- Hardened OpenClaw sandbox — dedicated sandbox image with security documentation and deployment via generic ShoreGuard APIs.
- Deploy stack README — ecosystem section and deploy stack overview added to the project README.
Fixed¶
- gRPC exec timeout — default timeout raised to 600 s for long-running agent sessions.
- SetClusterInference —
no_verifyflag now correctly set in the gRPC request. - LOCAL_MODE endpoints — private IP addresses are now accepted when registering gateways in local mode.
- Gateway context — switched from
ContextVartorequest.stateto avoid cross-request leaks. - openshell-client-tls — secret now includes the CA certificate for proper chain verification.
- sandbox_meta_store import — resolved binding issue that caused startup failures.
- Exec tests — aligned with async LRO pattern and added shlex validation before returning 202.
Changed¶
- README — redesigned with updated architecture diagram and sandbox vision narrative.
- Architecture diagram — added multi-gateway topology, observability components, unified operators, agent platform UIs, and plugins.
- Mermaid diagrams — improved contrast for dark-mode rendering.
Docs¶
- Deploy guide expanded with profiles and Paperclip integration steps.
- Plugin install command updated to
@shoreguard/paperclip-pluginfrom npm. - Discord reference removed from OpenClaw README.
[0.18.1] — 2026-04-06¶
Added¶
- Sandbox metadata UI — labels and description are now visible and editable across the entire frontend:
- Detail page: Metadata fieldset with description input, label badges (add/remove), and Save button (PATCH, operator role).
- Wizard: Description and labels fields in Step 2 (Configuration), shown in Step 4 summary, included in create payload.
- List page: Description column (truncated) and label badges inline under sandbox name.
[0.18.0] — 2026-04-05¶
Added¶
- Sandbox labels & description — sandboxes now support
labels(key-value pairs) anddescriptionmetadata, stored in ShoreGuard's DB (OpenShell is unaware). Newsandbox_metatable with per-gateway scoping. PATCH /sandboxes/{name}— update labels and/or description on existing sandboxes (requires operator role).- Label filtering —
GET /sandboxes?label=key:valuefilters sandboxes by labels (AND-combined, same semantics as gateway list). - Alembic migration 007 — creates
sandbox_metatable with(gateway_name, sandbox_name)unique constraint.
[0.17.0] — 2026-04-05¶
Fixed¶
- Exception handling — narrowed overly broad
except Exceptionblocks in health check logging, webhook delivery, reconnection loop, and operation lifecycle. All handlers now log with full traceback and re-raise or return safe error responses. - SP expiry timezone —
expires_atcomparison in_lookup_sp_identitynow correctly handles naive datetimes by normalising to UTC before comparison. - Bootstrap admin —
bootstrap_admin_user()no longer raises on duplicate email when called during startup with an existing database.
Changed¶
- Logging consistency — webhook delivery success/failure, gateway reconnection attempts, and operation lifecycle transitions now log at appropriate levels (INFO for business events, WARNING for recoverable errors, DEBUG for technical details).
- Docstrings — all public functions and classes pass
pydoclintwith strict Google-style checking (raises, return types, class attributes). - Type hints —
require_rolereturn type corrected. Zeropyrighterrors on standard mode. - CI — Python 3.14 target for CI matrix, ruff, and pyright.
Bumped
docker/setup-buildx-actionto v4,docker/build-push-actionto v7,astral-sh/setup-uvto v7.
Added (tests only)¶
- Webhook route tests — 24 integration tests covering CRUD, validation, role enforcement (admin/viewer/unauthenticated), and service-not-initialised.
- Error-case tests — 13 tests across approvals (4), policies (3), providers (4), and sandboxes (2) for 404/409 error paths.
- Template tests — 9 tests for
sandbox_templates.py(list, get, path traversal protection) and template route handlers. - Webhook delivery tests — 13 tests for delivery records, cleanup,
email channel dispatch, and the
fire_webhookconvenience function. - Auth endpoint tests — 31 tests for
pages.pycovering setup wizard, login validation, user CRUD, gateway role management, self-registration, and service principal management error paths. - Total: 915 tests (+86 from 0.16.2), coverage 82% → 84%.
[0.16.0] — 2026-04-04¶
Added¶
- Webhook delivery log — new
webhook_deliveriestable tracks every delivery attempt with status, response code, error message, and timestamps. Query viaGET /api/webhooks/{id}/deliveries. - Webhook retry with exponential backoff — HTTP 5xx and network errors trigger up to 3 retries (5s → 30s → 120s). Client 4xx errors fail immediately.
- New webhook events —
gateway.registered,gateway.unregistered,inference.updated,policy.updatedfire automatically after the corresponding API actions. - Enriched sandbox.created payload — now includes
image,gpu, andprovidersfields from the creation request. - API-key rotation —
POST /api/auth/service-principals/{id}/rotategenerates a new key and immediately invalidates the old one (admin only). - API-key expiry — optional
expires_attimestamp on service principals. Expired keys are rejected at auth time. - API-key prefix — new keys are prefixed with
sg_and the first 12 characters are stored askey_prefixfor identification without exposing the full key. Legacy keys remain functional. - Sandbox templates — YAML-based full-stack templates (
data-science,web-dev,secure-coding) that pre-configure image, GPU, providers, environment variables, and policy presets. Available viaGET /api/sandbox-templatesand integrated into the wizard. - Alembic migration 005 — adds
webhook_deliveriestable. - Alembic migration 006 — adds
key_prefixandexpires_atcolumns toservice_principalstable.
Changed¶
- Webhook service —
fire()now creates delivery records per target before dispatching._deliver_httpreplaced by_deliver_http_with_retrywith retry logic. - Service principal creation — keys now use
sg_prefix format.list_service_principals()returnskey_prefixandexpires_atfields. - Users UI — SP table shows key prefix, expiry badge (green/yellow/red), and rotate button. SP creation form includes optional expiry date.
- Wizard UI — step 1 shows sandbox template cards above community sandboxes. Selecting a template pre-fills all fields and jumps to summary. "Customize" button navigates back to configuration step.
- Formatters —
_EVENT_LABELS,_SLACK_COLORS,_DISCORD_COLORSextended for 4 new events._payload_fields()extracts provider, model, image, and endpoint fields. - Cleanup loop — webhook delivery records older than 7 days are purged alongside operations and audit entries.
- Documentation — API reference updated with sandbox templates, delivery log, rotate endpoint, and new event types. Service principals guide expanded with key rotation, expiry, and prefix sections. Sandbox guide includes templates section with wizard integration.
[0.15.0] — 2026-04-04¶
Added¶
- Gateway description — free-text
descriptionfield on gateways for documenting purpose and context (e.g. "Production EU-West for ML team"). - Gateway labels — key-value labels (
env=prod,team=ml,region=eu-west) stored aslabels_jsoncolumn. Kubernetes-style key validation, max 20 labels per gateway, values up to 253 chars. PATCH /api/gateway/{name}— new endpoint to update gateway description and/or labels after registration (admin only). Supports partial updates via Pydanticmodel_fields_set.- Label filtering —
GET /api/gateway/list?label=env:prod&label=team:mlfilters gateways by labels (AND semantics). - Alembic migration 004 — adds
description(Text) andlabels_json(Text) columns to thegatewaystable.
Changed¶
- Gateway list UI — new description column (hidden on small screens) and label badges displayed below gateway names.
- Gateway detail UI — description and labels shown in details card with inline edit form (admin only).
- Gateway registration modal — new description textarea and labels textarea
(one
key=valueper line). GatewayRegistry—register(),_to_dict(), andlist_all()extended for description, labels, and label filtering. Newupdate_gateway_metadata()method with sentinel-based partial updates.
[0.14.0] — 2026-04-04¶
Added¶
- Notification channels — webhooks now support
channel_typefield with valuesgeneric(default, HMAC-signed),slack(Block Kit formatting),discord(embed formatting), andemail(SMTP delivery). Alembic migration 003 addschannel_typeandextra_configcolumns to thewebhookstable. - Message formatters —
shoreguard/services/formatters.pywith channel-specific formatting: Slack Block Kit with mrkdwn and color coding, Discord embeds with color-coded fields, plain-text email bodies. - Prometheus
/metricsendpoint — unauthenticated, exposesshoreguard_info,shoreguard_gateways_total(by status),shoreguard_operations_total(by status),shoreguard_webhook_deliveries_total(success/failed), andshoreguard_http_requests_total(by method and status code). - HTTP request counting middleware — counts all API requests by method and status code for Prometheus.
OperationStore.status_counts()— thread-safe method returning operation counts grouped by status.
Changed¶
WebhookServicerefactored for channel-type-aware delivery:_deliverdispatches to_deliver_http(generic/slack/discord) or_deliver_email. HMAC signature only applied forgenericchannel type.- Webhook API routes accept
channel_typeandextra_configin create and update requests. Email channel requiressmtp_hostandto_addrsinextra_config. - Webhook API docs expanded — channel types table, email
extra_configexample, corrected event types, Prometheus metrics table with scrape config. - Deployment docs — new monitoring section with Prometheus scrape config.
- README — notifications and Prometheus metrics in features list and roadmap.
- Version bumped to
0.14.0. - 791 tests (up from 770).
Fixed¶
deps.pytype safety —get_client(),set_client(), andreset_backoff()now raiseHTTPException(500)when called without a gateway context instead of passingNoneto the gateway service. Fixes 3 pre-existing pyrightreportArgumentTypeerrors.
Dependencies¶
- Added
prometheus_client>=0.21. - Added
aiosmtplib>=3.0.
[0.13.0] — 2026-04-04¶
Added¶
- Docker deployment polish — OCI image labels in
Dockerfile, restart policies, dedicatedshoreguard-netbridge network, configurable port and log level, and resource limits indocker-compose.yml. .env.example— documented all environment variables with required/optional separation for quick Docker Compose setup.docker-compose.dev.yml— standalone development compose with SQLite, hot-reload, no-auth, and local gateway mode. No PostgreSQL required.- Justfile — task runner with
dev,test,lint,format,check,docker-build,docker-up,docker-down,docs, andsynctargets. - Webhooks — event subscriptions with HMAC-SHA256 signing, Alembic
migration 002,
WebhookServicewith async delivery, and admin API (POST/GET/DELETE /api/webhooks).
Changed¶
- README overhaul — new "Why ShoreGuard?" section, dual quick-start paths (pip + Docker Compose), collapsible screenshot gallery, expanded development section with Justfile references, updated roadmap.
- Deployment docs expanded — step-by-step Docker setup, full environment variable reference table, backup/restore procedures, network isolation explanation, upgrade process, and troubleshooting section.
- Contributing docs expanded — "Clone to first sandbox" walkthrough, Justfile task runner section, corrected clone URL and port references.
- Local mode docs expanded — developer workflow section with
--no-authcombination, SQLite defaults, and state reset instructions. - mkdocs nav — added migration runbook to admin guide navigation.
- Version bumped to
0.13.0.
Fixed¶
- Duplicate auth log — removed redundant "Authentication DISABLED" warning
from
init_auth()that appeared unformatted when running with--reload. - Logger name formatting — replaced one-shot name rewriting with a custom
Formatterthat strips theshoreguard.prefix at render time, so late-created loggers (e.g.shoreguard.db) are also shortened correctly. - Contributing docs — corrected clone URL (
your-org→FloHofstetter) and port reference (8000→8888).
[0.12.0] — 2026-04-03¶
Added¶
- Inference timeout —
timeout_secsfield onPUT /api/gateways/{gw}/inferenceallows configuring per-route request timeouts (0 = default 60s). Displayed in the gateway detail inference card. - L7 query parameter matchers — network policy rules can now match on URL query
parameters using
glob(single pattern) orany(list of patterns) matchers.
Changed¶
- Protobuf stubs regenerated from OpenShell v0.0.22 (was ~v0.0.16).
[0.11.0] — 2026-04-03¶
Added¶
- Docker containerisation — multi-stage
Dockerfileanddocker-compose.yml(ShoreGuard + PostgreSQL) for production deployments. - Health probes — unauthenticated
GET /healthz(liveness) andGET /readyz(readiness — checks database and gateway service). protobufruntime dependency — added topyproject.toml(was previously only available transitively viagrpcio-toolsin dev)..dockerignorefor minimal build context.
Fixed¶
- PostgreSQL migration —
users.is_activecolumn usedserver_default=sa.text("1")which fails on PostgreSQL. Changed tosa.true()for cross-database compatibility. - Gateway health endpoint —
GET /api/gateways/{gw}/healthcalledget_client()directly instead of via dependency injection, causingGatewayNotConnectedErrorto return 200 instead of 503.
Changed¶
- FastAPI
versionfield now matches the package version (was stale at0.8.0).
[0.10.0] — 2026-04-03¶
Removed¶
- "Active gateway" concept — the server-side
active_gatewayfile (~/.config/openshell/active_gateway) is no longer read or written by the web service. Every gateway operation now requires an explicit gateway name from the URL. Removed endpoints:POST /{name}/select,GET /info,POST /start,POST /stop,POST /restart(non-named variants). The named variants (/{name}/startetc.) remain unchanged. activefield removed from all gateway API responses (list,info,register).- Service methods removed:
get_active_name(),write_active_gateway(),select(),health(). - Auto-select of first registered gateway removed from
register().
Changed¶
- Stateless gateway routing — the
nameparameter is now required onget_client(),set_client(),reset_backoff(),get_info(), andget_config(). No method falls back to the active gateway file anymore. GET /info→GET /{name}/info— gateway info endpoint is now name-scoped.GET /config→GET /{name}/config— gateway config endpoint is now name-scoped.LocalGatewayManager—start(),stop(),restart()now require a gateway name. Connection and client management simplified: always operates on the explicitly named gateway.- Frontend inference config — now shows when gateway is connected
(
gw.connected) instead of when it was the "active" gateway (gw.active). Gateway list highlights connected gateways. - Health store — uses
GWdirectly for gateway name instead of fetching from/api/gateway/info. - Version bumped to
0.10.0. - 756 tests (down from 774 — 18 tests for removed active-gateway functionality deleted).
[0.9.0] — 2026-04-03¶
Added¶
- Sidebar navigation — collapsible sidebar with grouped navigation (Gateways, Policies, gateway-scoped Sandboxes/Providers, admin-only Audit/Users). Replaces the icon buttons in the topbar. Responsive: collapses to hamburger menu on mobile (<768px).
- Light/dark theme toggle — switchable via sidebar button, persisted
in
localStorage. All custom CSS variables scoped to[data-bs-theme]; Bootstrap 5.3 handles the rest automatically.
Fixed¶
- Audit page breadcrumbs — audit.html now has breadcrumbs and uses
the standard layout instead of
container-fluid. - Dashboard breadcrumbs — dashboard.html now has breadcrumbs.
- Theme-aware tables — removed hardcoded
table-darkclass from all templates and JS files; tables now adapt to the active theme.
[0.8.0] — 2026-04-03¶
Fixed¶
- RBAC response_model crash — added
response_model=Noneto 17 route decorators (16 inpages.py, 1 inmain.py) returningTemplateResponse,HTMLResponse, orRedirectResponse. Prevents FastAPI Pydantic serialization errors on non-JSON responses. - IntegrityError/ValueError split — gateway-role SET endpoints now return 409 on constraint conflicts and 404 on missing user/SP/gateway, instead of a blanket 404 for both.
Added¶
- Migration verification tests — 5 tests (
tests/test_migrations.py) covering SQLite and PostgreSQL: fresh-DB, head revision, schema-matches-models, downgrade, and PostgreSQL fresh-DB. - RBAC regression & validation tests — 10 new tests (
tests/test_rbac.py) for DELETE gateway-role 404s, invalid gateway name 400s, and invalid role 400s (user and SP symmetry). - Migration check script —
scripts/verify_migrations.shruns all Alembic migrations against a fresh database and verifies the final revision. - Migration CI workflow —
.github/workflows/test-migrations.ymlruns migration tests on SQLite and PostgreSQL for PRs touching migrations or models. - PR template —
.github/PULL_REQUEST_TEMPLATE.mdwith migration checklist. - Migration runbook —
docs/admin/migration-runbook.mdwith backup, upgrade, and rollback procedures. - Warning logs on error paths — all gateway-role endpoints now log
logger.warning()for invalid names, invalid roles, not-found, and conflict responses. - Backoff for background tasks —
_cleanup_operations()and_health_monitor()double their interval (up to a cap) after 10 consecutive failures and reset on success. postgrespytest marker inpyproject.toml.
Security¶
- Shell injection fix —
verify_migrations.shpasses database URL viaos.environinstead of bash interpolation in a Python heredoc.
Changed¶
- Migrations squashed — all 7 incremental migrations replaced by a single
001_initial_schema.pythat creates the final schema directly. Existing databases must be reset (rm ~/.config/shoreguard/shoreguard.db). - Migration CI caches
uvdependencies viaenable-cache: true.
[0.7.1] — 2026-04-01¶
Added¶
- API reference docs — mkdocstrings[python] generates reference pages from
existing Google-style docstrings. New pages under
docs/reference/: Client, Services, API Internals, Models, and Config & Exceptions.
[0.7.0] — 2026-04-01¶
Added¶
- pydoclint integration — new
[tool.pydoclint]section inpyproject.tomlwith maximum strictness (Google-style,skip-checking-short-docstrings = false, all checks enabled). Addedpydoclint >= 0.8as dev dependency. - Comprehensive Google-style docstrings — all 1 193 pydoclint violations
resolved across the entire codebase. Every function, method, and class now
has
Args:,Returns:,Raises:, andYields:sections as appropriate. Compatible with mkdocstrings for future API reference generation. - Page templates — dedicated HTML templates for approval edit, approval history, gateway register, gateway roles, policy revisions, and provider form pages, replacing Bootstrap modal dialogs.
Changed¶
- Database schema cleanup (migration 007):
- Timestamp columns (
registered_at,last_seen,created_at,last_used,timestamp) converted fromStringtoDateTime(timezone=True)acrossgateways,users,service_principals, andaudit_logtables. gatewaystable rebuilt with auto-incrementing integer primary key (id) replacing the oldname-based primary key.user_gateway_rolesandsp_gateway_rolesmigrated fromgateway_name(String FK) togateway_id(Integer FK) withON DELETE CASCADE.audit_logcolumngatewayrenamed togateway_name; newgateway_idFK added withON DELETE SET NULL.- Audit service refactored — uses
with session_factory()context manager instead of manualsession.close()in finally blocks. Gateway ID resolution via FK lookup on write. - Version bumped to
0.7.0.
Fixed¶
GatewayNotConnectedErrorin_try_connect_from_config— exception is now caught instead of propagating as an unhandled error.request.state.rolenot set from_require_page_auth— page auth guard now correctly stores the resolved role in request state.
[0.6.0] — 2026-03-31¶
Added¶
- Gateway-scoped RBAC — per-gateway role overrides for users and service
principals. Alembic migration 006 adds
user_gateway_rolesandsp_gateway_rolestables. - Policy diff viewer — compare two policy revisions side-by-side.
- Hardened RBAC — async correctness improvements and additional test coverage.
[0.5.0] — 2026-03-30¶
Added¶
- Persistent audit log — all state-changing operations (sandbox/policy/gateway CRUD, user management, approvals, provider changes) are recorded in a database table with actor, role, action, resource, gateway context, and client IP.
- Audit API —
GET /api/auditlists entries with filters (actor, action, resource type, date range).GET /api/audit/export?format=csv|jsonexports the full log. Both endpoints are admin-only. - Audit page —
/auditadmin page with filter inputs, pagination, and CSV/JSON export buttons. Built with Alpine.js. - Alembic migration 005 —
audit_logtable with indexes on timestamp, actor, action, and resource type. - Audit cleanup — entries older than 90 days are automatically purged by the existing background cleanup task.
Fixed¶
- Fail-closed auth — when the database is unavailable, requests are now denied with 503 instead of silently granting admin access.
- Async audit logging —
audit_log()is now async and runs DB writes in a thread pool viaasyncio.to_thread, preventing event-loop blocking on every state-changing request. - UnboundLocalError in AuditService —
log(),list(), andcleanup()no longer crash if the session factory itself raises; session is now guarded withNonechecks in except/finally blocks. - Audit actor for auth events — login, setup, register, and invite-accept
now set
request.state.user_idbefore callingaudit_log(), so the audit trail records the actual user instead of "unknown". - Failed login auditing — failed login attempts now produce a
user.login_failedaudit entry, enabling brute-force detection. - Authorization failure auditing —
require_role()now writes anauth.forbiddenaudit entry when a user is denied access. - Audit ordering in approvals — all six approval endpoints now log the audit entry after the operation succeeds, preventing false entries on failure.
- Conditional delete audit —
sandbox.deleteandprovider.deleteonly write audit/log entries when the resource was actually deleted. - Async background cleanup — the periodic cleanup task now uses
asyncio.to_threadfor DB calls instead of blocking the event loop. - Gateway retry button — the "Retry" button in the gateway error banner now
correctly calls
Alpine.store('health').check()instead of the removedcheckGatewayHealth()function.
Changed¶
- Frontend migrated to Alpine.js — all 20+ pages rewritten from Vanilla JS
template-literal rendering (
innerHTML = renderX(data)) to Alpine.js reactive directives (x-data,x-for,x-text,x-show,@click). No build step required — Alpine.js loaded via CDN. - Three Alpine stores replace scattered global state:
auth— role, email, authenticated status (replaces inline script +window.SG_ROLE)toasts— notification queue (replacesshowToast()DOM manipulation)health— gateway connectivity monitoring (replacescheckGatewayHealth()globals)- XSS surface reduced — Alpine's
x-textauto-escapes all dynamic content, eliminating the need for manualescapeHtml()calls in templates. - Render functions removed —
renderGatewayTable(),renderSandboxList(),renderDashboard(), and ~50 otherrenderX()functions replaced by declarative Alpine templates in HTML. app.jsslimmed — reduced from ~340 lines to ~95 lines. Only retainsapiFetch(),showConfirm(),escapeHtml(),formatTimestamp(),navigateTo(), and URL helpers.- WebSocket integration — sandbox detail, logs, and approvals pages receive
live updates via
CustomEventdispatching fromwebsocket.jsto Alpine components. - Version bumped to
0.5.0. - 717 tests (up from 710), including audit service, API route, and DB schema tests.
[0.4.0] — 2026-03-30¶
Added¶
- User-based RBAC — three-tier role hierarchy (admin → operator → viewer) replaces the single shared API key. Users authenticate with email + password via session cookies; service principals use Bearer tokens for API/CI access.
- Invite flow — admins invite users by email. The invite generates a
single-use, time-limited token (7 days). The invitee sets their password on
the
/invitepage and receives a session cookie. - Self-registration — opt-in via
SHOREGUARD_ALLOW_REGISTRATION=1. New users register as viewers. Disabled by default. - Setup wizard — first-run
/setuppage creates the initial admin account. All API access is blocked until setup is complete. - Service principals — named API keys with roles, created by admins.
Keys are SHA-256 hashed (never stored in plaintext).
last_usedtimestamp tracked on each request. - User management UI —
/userspage for admins with invite form, role badges, and delete actions. Dedicated/users/newand/users/new-service-principalpages replace the old modal dialogs. - Error pages — styled error pages for 403, 404, and other HTTP errors instead of raw JSON responses in the browser.
- User email in navbar — logged-in user email and role badge shown in the navigation bar.
- Alembic migrations 002–004 —
api_keystable,users+service_principalstables with FK constraints, invite token hashing. - CLI commands —
create-user,delete-user,list-users,create-service-principal,delete-service-principal,list-service-principals. - 710 tests (up from 635), including comprehensive RBAC, auth flow, invite expiry, self-deletion guard, and last-admin protection tests.
Changed¶
- Auth module rewritten —
shoreguard/api/auth.pyexpanded from ~100 to ~700 lines. Session tokens are HMAC-signed with a 5-part format (nonce.expiry.user_id.role.signature). Roles are always verified against the database, not the session token, so demotions take effect immediately. - All state-changing endpoints now enforce minimum role via
require_role()FastAPI dependency (admin for user/SP management and gateway registration; operator for sandbox/policy/provider operations). - Frontend role-based UI — buttons and nav items hidden based on role
via
data-sg-min-roleattributes.escapeHtml()used consistently across all JavaScript files. - Policies router split — preset routes (
/api/policies/presets) are mounted globally; sandbox policy routes remain gateway-scoped only. Fixes a bug where/api/sandboxes/{name}/policywas reachable without gateway context. - Audit logging standardised — all log messages use
actor=consistently. Role denials now include method, path, and actor. IntegrityError on duplicate user/SP creation is logged. Logout resolves email instead of numeric user ID.
Fixed¶
- Timing attack in
authenticate_user()— bcrypt verification now runs against a dummy hash when the user does not exist, preventing email enumeration via response time analysis. - Policies router double-inclusion — the full policies router was mounted both globally and under the gateway prefix, exposing sandbox policy routes without gateway context. Now only preset routes are global.
- Missing exception handling —
is_setup_complete(),list_users(), andlist_service_principals()now catchSQLAlchemyErrorinstead of letting database errors propagate as 500s. verify_password()bare Exception catch — narrowed to(ValueError, TypeError)to avoid masking unexpected errors.- WebSocket XSS —
sandboxNamein toast messages is now escaped withescapeHtml(). Log level CSS class validated against a whitelist. delete_filesystem_pathmissing Query annotation —pathparameter now uses explicitQuery(...)instead of relying on FastAPI inference.- Migration 004 downgrade documented as non-reversible (SHA-256 hashes cannot be reversed; pending invites are invalidated on downgrade).
Security¶
- Constant-time authentication prevents timing-based email enumeration.
- Invite tokens are SHA-256 hashed in the database (migration 004).
- Session invalidation on user deletion and deactivation — existing sessions are rejected on the next request.
- Last-admin guard with database-level
FOR UPDATElock prevents TOCTOU race. - Self-deletion guard prevents admins from deleting their own account.
- Email normalisation (
.strip().lower()) prevents duplicate accounts. - Password length enforced (8–128 characters) on all auth endpoints.
- XSS escaping hardened across all frontend JavaScript files.
Dependencies¶
- Added
pwdlib[bcrypt]— password hashing with bcrypt.
[0.3.0] — 2026-03-28¶
Added¶
- Central gateway management — Shoreguard transforms from a local sidecar into a central management plane for multiple remote OpenShell gateways (like Rancher for Kubernetes clusters). Gateways are deployed independently and registered with Shoreguard via API.
- SQLAlchemy ORM + Alembic — persistent gateway registry backed by
SQLAlchemy with automatic embedded migrations on startup. SQLite by default,
PostgreSQL via
SHOREGUARD_DATABASE_URLfor container deployments. - Gateway registration API —
POST /api/gateway/registerto register remote gateways with endpoint, auth mode, and mTLS certificates.DELETE /api/gateway/{name}to unregister.POST /{name}/test-connectionto explicitly test connectivity. ShoreGuardClient.from_credentials()— new factory method that accepts raw certificate bytes from the database instead of filesystem paths.- Background health monitor — probes all registered gateways every 30
seconds and updates health status (
last_seen,last_status) in the registry. import-gatewaysCLI command — imports gateways from openshell filesystem config (~/.config/openshell/gateways/) into the database, including mTLS certificates. Replaces the oldmigrate-v2command.SHOREGUARD_DATABASE_URL— environment variable to configure an external database (PostgreSQL) for container/multi-instance deployments.--local/SHOREGUARD_LOCAL_MODE— opt-in flag to enable local Docker container lifecycle management (start/stop/restart/create/destroy). In local mode, filesystem gateways are auto-imported into the database on startup.--database-url/SHOREGUARD_DATABASE_URL— all env vars now also available as CLI flags.
Changed¶
- GatewayService refactored — reduced from ~800 to ~250 lines. Gateway discovery now queries the SQLAlchemy registry instead of scanning the filesystem. Connection management (backoff, health checks) preserved.
- Docker/CLI methods extracted to
LocalGatewayManager(shoreguard/services/local_gateway.py), only active in local mode. - Frontend updated — "Create Gateway" replaced with "Register Gateway" modal (endpoint, auth mode, PEM certificate upload). Start/Stop/Restart buttons replaced with "Test Connection". "Destroy" renamed to "Unregister". New "Last Seen" column, Port column removed.
- API route changes —
POST /create(202 LRO) →POST /register(201 sync).POST /{name}/destroy→DELETE /{name}. Local lifecycle routes (start/stop/restart/diagnostics) return 404 unlessSHOREGUARD_LOCAL_MODE=1. - Request-level logging — gateway register, unregister, test-connection,
and select routes now log at INFO/WARNING level.
LocalGatewayManagerlogs Docker daemon errors, port conflicts, missing openshell CLI, and openshell command failures. api/main.pysplit into modules — extractedcli.py(Typer CLI + import logic),pages.py(HTML routes + auth endpoints),websocket.py(WebSocket handler), anderrors.py(exception handlers).main.pyreduced from 1 084 to ~190 lines (pure wiring).- Version bumped to
0.3.0. - Test suite rewritten for registry-backed architecture (635 tests).
- Logger names standardised — all modules now use
getLogger(__name__)instead of hardcoded"shoreguard". Removes duplicate log lines caused by parent-logger propagation. - Unified log format — single format (
HH:MM:SS LEVEL module message) shared by shoreguard and uvicorn loggers with fixed-width aligned columns. - Duplicate "API-key authentication enabled" log line removed.
Fixed¶
- SSRF protection —
_is_private_ip()now performs real DNS resolution instead ofAI_NUMERICHOST. Hostnames that resolve to private/loopback/ link-local addresses are correctly blocked. Includes a 2 s DNS timeout. import-gatewayscrash on single gateway —registry.register()failures no longer abort the entire import; individual errors are logged and skipped.from_active_clustererror handling — missing metadata files, corrupt JSON, and missinggateway_endpointkeys now raiseGatewayNotConnectedErrorwith a clear message instead of rawFileNotFoundError/KeyError.init_db()failure logging — database initialisation errors in the FastAPI lifespan are now logged before re-raising._get_gateway_service()guard — raisesRuntimeErrorif called before the app lifespan has initialised the service (instead ofAttributeErroronNone).- WebSocket
RuntimeErrorswallowed —RuntimeErrorduringwebsocket.send_json()is now debug-logged instead of silently passed. - SQLite pragma errors — failures setting WAL/busy_timeout/synchronous pragmas are now logged as warnings.
_import_filesystem_gatewaysSSRF gap — filesystem-imported gateways were not checked againstis_private_ip(). Now blocked in non-local mode, consistent with the API registration endpoint._import_filesystem_gatewaysskipped count — corrupt metadata JSON was logged but not counted in theskippedtotal, making the summary misleading._import_filesystem_gatewaysmTLS read error —read_bytes()on cert files had no error handling (TOCTOU race). Now wrapped in try/except with a 64 KB size limit matching the API route.check_all_healthDB error isolation — a database error updating health for one gateway no longer prevents health updates for all remaining gateways.select()implicit name resolution —get_client()was called withoutname=, relying on a filesystem round-trip viaactive_gatewayfile. Now passes the name explicitly.- CLI
import-gatewaysNameError — ifinit_db()failed,enginewas undefined andengine.dispose()in thefinallyblock raisedNameError. - DB engine not disposed on shutdown — the SQLAlchemy engine was not disposed during FastAPI lifespan shutdown, skipping the SQLite WAL checkpoint.
- Docker start/stop errors silently swallowed —
SubprocessError/OSErrorin_docker_start_container/_docker_stop_containerwas caught but never logged. - Gateway start retry without summary — when all 10 connection retries failed after a gateway start, no warning was logged.
- Frontend 404 on gateway list page —
inference-providerswas fetched without a gateway context, hitting a non-existent global route.
Security¶
- SSRF DNS resolution bypass fixed (hostnames resolving to RFC 1918 / loopback addresses were not blocked).
- SSRF validation includes DNS timeout protection (2 s) to prevent slow-DNS attacks.
remote_hostinput validation —CreateGatewayRequest.remote_hostis now validated with a hostname regex (max 253 chars) before being passed to subprocess.- SSRF check skipped in local mode —
is_private_ip()checks at connect-time and import-time now allow private/loopback addresses whenSHOREGUARD_LOCAL_MODEis set, since locally managed gateways always run on127.0.0.1.
Dependencies¶
- Added
sqlalchemy >= 2.0(runtime) — ORM and database abstraction. - Added
alembic >= 1.15(runtime) — embedded schema migrations on startup.
[0.2.0] — 2026-03-27¶
Added¶
- API-key authentication — optional shared API key via
--api-keyflag orSHOREGUARD_API_KEYenv var. Supports Bearer tokens, HMAC-signed session cookies, and WebSocket query-param auth. Zero-config local development remains unchanged (auth is a no-op when no key is set). - Login page for the web UI with session cookie management and automatic redirect for unauthenticated users.
- Long-Running Operations (LRO) — gateway and sandbox creation now return
202 Acceptedwith an operation ID. Clients can poll/api/operations/{id}for progress. Includes automatic cleanup of expired operations. forceflag for gateway destroy with dependency checking — prevents accidental deletion of gateways that still have running sandboxes unless--forceis passed.- UNIMPLEMENTED error handling — gRPC
UNIMPLEMENTEDerrors now return a human-readable 501 response with feature context instead of a generic 500. - OpenAPI documentation is automatically hidden when authentication is enabled.
- Session cookies set
secureflag automatically when served over HTTPS. DEADLINE_EXCEEDEDmapping — gRPCDEADLINE_EXCEEDEDwird jetzt auf HTTP 504 (Gateway Timeout) gemappt.ValidationErrorexception — neuer Fehlertyp für Eingabevalidierung (ungültige Namen, shlex-Fehler) mit HTTP 400 Response.- Gateway/Sandbox name validation — Regex-basierte Validierung von Ressourcennamen zur Verhinderung von Argument-Injection.
- Client-IP tracking — Client-IP wird bei Auth-Fehlern und Login-Fehlversuchen mitgeloggt.
Changed¶
- Sandbox creation returns
202 Accepted(was201 Created) to reflect the asynchronous LRO pattern. - Destroyed gateways are now filtered from the gateway list by default.
- Version bumped to
0.2.0. - Exception-Handler im gesamten Codebase von breitem
except Exceptionauf spezifische Typen (grpc.RpcError,OSError,ssl.SSLError,ConnectionError,TimeoutError) eingeschränkt. - Logging deutlich erweitert: Debug-Logging für bisher stille Pass-Blöcke, Error-Level für Status ≥ 500, Warning-Level für Status < 500.
- WebSocket-Auth-Logging von INFO/WARNING auf DEBUG normalisiert.
friendly_grpc_error()prüft jetzt freundliche Nachrichten vor Raw-Details.
Fixed¶
- Auth credential check logic deduplicated into a single
check_request_auth()helper shared by API dependencies, the/api/auth/checkendpoint, and page auth guards. - Fire-and-forget Task-GC — Background-Tasks werden jetzt in einem Set gehalten, um Garbage-Collection durch asyncio zu verhindern.
- Cross-Thread WebSocket-Signaling —
asyncio.Eventdurchthreading.Eventersetzt für korrekte Thread-übergreifende Signalisierung. - WebSocket Queue-Overflow —
QueueFull-Exception wird abgefangen mit Fallback aufcancel_event. - Event-Loop-Blocking —
get_client()im WebSocket-Handler mitasyncio.to_thread()gewrappt. - gRPC-Client-Leak — Client-Leak in
_try_connect()behoben, wenn Health-Check fehlschlägt. - Login-Redirect-Validation — Open-Redirect-Schutz: URLs die nicht mit
/beginnen oder mit//starten werden abgelehnt. - Error-Message-Sanitization —
friendly_grpc_error()verhindert, dass rohe gRPC-Fehlermeldungen an API-Clients geleitet werden. - Thread-Safety —
threading.LockfürGatewayService._clientsund thread-safe Reads inOperationStore.to_dict(). - YAML-Parsing-Robustheit —
YAMLError, None- und Skalar-Werte werden inpresets.pyabgefangen. - Metadata-Datei-Robustheit —
JSONDecodeErrorundOSErrorbei Gateway-Metadata-Reads mit Fallback behandelt.
Security¶
- Open-Redirect-Schutz auf der Login-Seite.
- API-Fehlermeldungen werden sanitisiert, um interne Details nicht preiszugeben.
- Thread-sichere Client-Verwaltung und Operation-Store-Zugriffe.
- Argument-Injection-Prävention durch Regex-Namensvalidierung.
- Client-IP-Logging bei Auth-Events für Security-Monitoring.
[0.1.0] — 2026-03-25¶
Initial release.
Added¶
- Sandbox management — create, list, get, delete sandboxes with custom images, environment variables, GPU support, and provider integrations.
- Real-time monitoring — WebSocket streaming of sandbox logs, events, and status changes.
- Command execution — run commands inside sandboxes with stdout/stderr capture.
- SSH sessions — create and revoke interactive SSH terminal sessions.
- Security policy editor — visual network rule, filesystem access, and process/Landlock policy management without raw YAML editing.
- Policy approval workflow — review, approve, reject, or edit agent- requested endpoint rules with real-time WebSocket notifications.
- Policy presets — 9 bundled templates (PyPI, npm, Docker Hub, NVIDIA NGC, HuggingFace, Slack, Discord, Telegram, Jira, Microsoft Outlook).
- Multi-gateway support — manage multiple OpenShell gateways with status monitoring, diagnostics, and automatic reconnection.
- Provider management — CRUD for inference/API providers with credential templates and community sandbox browser.
- Sandbox wizard — guided step-by-step sandbox creation with agent type selection and one-click preset application.
- Web dashboard — responsive Bootstrap 5 UI with gateway, sandbox, policy, approval, log, and terminal views.
- REST API — full async FastAPI backend with Swagger UI documentation.
- CLI —
shoreguardcommand with configurable host, port, log level, and auto-reload.