openova

Author	SHA1	Message	Date
e3mrah	6618392407	fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22) (#1568 ) * feat(handover): auto-seed owner UserAccess CR on chroot (D21) Closes the D21 gap on Sovereign DoD: /users page returned empty after fresh handover because Keycloak `sovereign-admins` membership was established but no UserAccess CR existed for the operator. After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper `EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped like the canonical user_access.go `CreateUserAccess` write: apiVersion: access.openova.io/v1alpha1 kind: UserAccess metadata: name: useraccess-owner-<sanitized-email> annotations: catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint spec: user: keycloakSubject: <email> sovereignRef: <fqdn-first-label> applications: - app: "" role: admin # owner -> admin The Composition (issue #322) reconciles the Claim into per-app RoleBindings on the Sovereign so the operator surfaces in /users. Best-effort + idempotent: AlreadyExists on the second handover is folded to nil; any other error is logged at Warn and the handover itself never fails. If the access.openova.io CRD has not rolled yet, the next handover retries automatically. Architect-first: mirrors `userAccessToUnstructured` shape and uses existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier mapping follows the documented lossy `owner -> admin` rule in `userAccessTierToRole` (CRD only accepts admin\|editor\|viewer). Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21 chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged into chart 1.4.147. Pin slot so t133+ gets both gates on first prov. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5) PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file substitute, but Flux Kustomize's postBuild can still re-parse the JSON-shaped string as a YAML flow-sequence depending on quoting context. When that happens .Values.sovereign.regionsJson is a Go []interface{} of map[interface{}]interface{} and `\| quote` prints Go's `[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of the env var then fails and Request.Regions is empty. toJson normalises both string and list inputs to valid JSON. Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as `[map[cloudRegion:hel1 ...]]` despite #1551 being in effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): populate deployment Result + Request fields for D22 Settings page on Sovereign Console renders `—` for Region / Sovereign / Created / DeploymentID / Pool subdomain because chroot's GET /api/v1/deployments/<id> returns empty strings for those fields. Populate from existing env vars (best-effort — empty when chart hasn't wired them yet, which is no worse than today's behaviour): - Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN) - Result.GitOpsRepoURL from GITOPS_REPO_URL env - Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env - Request.Region = regions[0].CloudRegion (top-level legacy field) - Request.OrgEmail from OPERATOR_EMAIL env - Request.OrgName from ORG_NAME env Companion chart PR will wire the env vars from .Values.global.* + cloud-init substitute placeholders. This PR is BACKWARD-compatible — unset env vars produce empty strings, same as today. Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign- t132.omani.works` returns empty ownerEmail/region/consoleURL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22) GetDeployment was the only handler that returned 404 without calling chrootEnsureDeployment. After a catalyst-api Pod restart on the chroot the in-memory store is empty until some other handler (StreamLogs, jobs list) primes it via its own synth call — meanwhile the Sovereign Console Settings page loads /api/v1/deployments/<id> first and gets 404, rendering the entire page broken. Mirror the StreamLogs pattern (lines 1247-1254): try in-memory load, fall through to chrootEnsureDeployment, return 404 only when both miss. This unblocks PR #1567's deployment-record population — without the fallback, GetDeployment can never serve the populated record on chroot. Caught live on t132 2026-05-16 after #1567 image roll: Settings page 404 because in-memory store was empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:54:20 +04:00
github-actions[bot]	b094a354b7	deploy: update catalyst images to `ed63ecd`	2026-05-16 20:31:39 +00:00
e3mrah	ed63ecd09f	fix(chroot): populate deployment Result + Request fields for D22 settings (#1567 ) * feat(handover): auto-seed owner UserAccess CR on chroot (D21) Closes the D21 gap on Sovereign DoD: /users page returned empty after fresh handover because Keycloak `sovereign-admins` membership was established but no UserAccess CR existed for the operator. After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper `EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped like the canonical user_access.go `CreateUserAccess` write: apiVersion: access.openova.io/v1alpha1 kind: UserAccess metadata: name: useraccess-owner-<sanitized-email> annotations: catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint spec: user: keycloakSubject: <email> sovereignRef: <fqdn-first-label> applications: - app: "" role: admin # owner -> admin The Composition (issue #322) reconciles the Claim into per-app RoleBindings on the Sovereign so the operator surfaces in /users. Best-effort + idempotent: AlreadyExists on the second handover is folded to nil; any other error is logged at Warn and the handover itself never fails. If the access.openova.io CRD has not rolled yet, the next handover retries automatically. Architect-first: mirrors `userAccessToUnstructured` shape and uses existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier mapping follows the documented lossy `owner -> admin` rule in `userAccessTierToRole` (CRD only accepts admin\|editor\|viewer). Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21 chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged into chart 1.4.147. Pin slot so t133+ gets both gates on first prov. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5) PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file substitute, but Flux Kustomize's postBuild can still re-parse the JSON-shaped string as a YAML flow-sequence depending on quoting context. When that happens .Values.sovereign.regionsJson is a Go []interface{} of map[interface{}]interface{} and `\| quote` prints Go's `[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of the env var then fails and Request.Regions is empty. toJson normalises both string and list inputs to valid JSON. Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as `[map[cloudRegion:hel1 ...]]` despite #1551 being in effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chroot): populate deployment Result + Request fields for D22 Settings page on Sovereign Console renders `—` for Region / Sovereign / Created / DeploymentID / Pool subdomain because chroot's GET /api/v1/deployments/<id> returns empty strings for those fields. Populate from existing env vars (best-effort — empty when chart hasn't wired them yet, which is no worse than today's behaviour): - Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN) - Result.GitOpsRepoURL from GITOPS_REPO_URL env - Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env - Request.Region = regions[0].CloudRegion (top-level legacy field) - Request.OrgEmail from OPERATOR_EMAIL env - Request.OrgName from ORG_NAME env Companion chart PR will wire the env vars from .Values.global.* + cloud-init substitute placeholders. This PR is BACKWARD-compatible — unset env vars produce empty strings, same as today. Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign- t132.omani.works` returns empty ownerEmail/region/consoleURL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:29:44 +04:00
github-actions[bot]	d82e06bfe9	deploy: update catalyst images to `0a45fb0`	2026-05-16 20:03:41 +00:00
e3mrah	0a45fb0449	fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5) (#1566 ) * feat(handover): auto-seed owner UserAccess CR on chroot (D21) Closes the D21 gap on Sovereign DoD: /users page returned empty after fresh handover because Keycloak `sovereign-admins` membership was established but no UserAccess CR existed for the operator. After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper `EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped like the canonical user_access.go `CreateUserAccess` write: apiVersion: access.openova.io/v1alpha1 kind: UserAccess metadata: name: useraccess-owner-<sanitized-email> annotations: catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint spec: user: keycloakSubject: <email> sovereignRef: <fqdn-first-label> applications: - app: "" role: admin # owner -> admin The Composition (issue #322) reconciles the Claim into per-app RoleBindings on the Sovereign so the operator surfaces in /users. Best-effort + idempotent: AlreadyExists on the second handover is folded to nil; any other error is logged at Warn and the handover itself never fails. If the access.openova.io CRD has not rolled yet, the next handover retries automatically. Architect-first: mirrors `userAccessToUnstructured` shape and uses existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier mapping follows the documented lossy `owner -> admin` rule in `userAccessTierToRole` (CRD only accepts admin\|editor\|viewer). Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21 chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged into chart 1.4.147. Pin slot so t133+ gets both gates on first prov. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5) PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file substitute, but Flux Kustomize's postBuild can still re-parse the JSON-shaped string as a YAML flow-sequence depending on quoting context. When that happens .Values.sovereign.regionsJson is a Go []interface{} of map[interface{}]interface{} and `\| quote` prints Go's `[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of the env var then fails and Request.Regions is empty. toJson normalises both string and list inputs to valid JSON. Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as `[map[cloudRegion:hel1 ...]]` despite #1551 being in effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 00:01:43 +04:00
e3mrah	3f8e2b925e	chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) (#1565 ) * feat(handover): auto-seed owner UserAccess CR on chroot (D21) Closes the D21 gap on Sovereign DoD: /users page returned empty after fresh handover because Keycloak `sovereign-admins` membership was established but no UserAccess CR existed for the operator. After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper `EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped like the canonical user_access.go `CreateUserAccess` write: apiVersion: access.openova.io/v1alpha1 kind: UserAccess metadata: name: useraccess-owner-<sanitized-email> annotations: catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint spec: user: keycloakSubject: <email> sovereignRef: <fqdn-first-label> applications: - app: "" role: admin # owner -> admin The Composition (issue #322) reconciles the Claim into per-app RoleBindings on the Sovereign so the operator surfaces in /users. Best-effort + idempotent: AlreadyExists on the second handover is folded to nil; any other error is logged at Warn and the handover itself never fails. If the access.openova.io CRD has not rolled yet, the next handover retries automatically. Architect-first: mirrors `userAccessToUnstructured` shape and uses existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier mapping follows the documented lossy `owner -> admin` rule in `userAccessTierToRole` (CRD only accepts admin\|editor\|viewer). Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21 chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged into chart 1.4.147. Pin slot so t133+ gets both gates on first prov. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:58:46 +04:00
github-actions[bot]	f8c8a87151	deploy: update catalyst images to `8d2a947`	2026-05-16 19:51:40 +00:00
e3mrah	8d2a947cfb	feat(handover): auto-seed owner UserAccess CR on chroot (D21) (#1564 ) Closes the D21 gap on Sovereign DoD: /users page returned empty after fresh handover because Keycloak `sovereign-admins` membership was established but no UserAccess CR existed for the operator. After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper `EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped like the canonical user_access.go `CreateUserAccess` write: apiVersion: access.openova.io/v1alpha1 kind: UserAccess metadata: name: useraccess-owner-<sanitized-email> annotations: catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint spec: user: keycloakSubject: <email> sovereignRef: <fqdn-first-label> applications: - app: "*" role: admin # owner -> admin The Composition (issue #322) reconciles the Claim into per-app RoleBindings on the Sovereign so the operator surfaces in /users. Best-effort + idempotent: AlreadyExists on the second handover is folded to nil; any other error is logged at Warn and the handover itself never fails. If the access.openova.io CRD has not rolled yet, the next handover retries automatically. Architect-first: mirrors `userAccessToUnstructured` shape and uses existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier mapping follows the documented lossy `owner -> admin` rule in `userAccessTierToRole` (CRD only accepts admin\|editor\|viewer). Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-16 23:49:32 +04:00
e3mrah	5510ab91f9	chore(slot-13): pin bp-catalyst-platform to 1.4.146 (D29 billing JWT bypass) (#1563 ) PR #1561 added billing-service JWT exemptions matching gateway public routes (D29 voucher-redeem zero-touch). Pin slot so future provisions inherit the full D29 unblocker chain. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:41:34 +04:00
github-actions[bot]	d6b6aca581	deploy: update sme service images to `c04b2ec` + bump chart to 1.4.147	2026-05-16 19:41:18 +00:00
e3mrah	c04b2ec76d	feat(wordpress-tenant): activeHotStandby option wires bp-cnpg-pair (D31) (#1562 ) Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app may opt into a cross-region active-hot-standby Postgres pair for their WordPress instance instead of the default single CNPG Cluster. Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster CRs with WAL streaming over Cilium ClusterMesh via a managed Service annotated service.cilium.io/global=true). When the new pg.activeHotStandby.enabled flag is false (default), templates render the existing single Cluster bit-for-bit — no regression for non-HA tenants. Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace HA filter can surface it. Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default single-cluster shape AND the enabled 2-Cluster shape with the right nodeSelectors, replica.source, externalCluster.host, Cilium global annotation, and bootstrap.pg_basebackup; all 5 cases pass. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:39:29 +04:00
github-actions[bot]	af4d9b1b87	deploy: update sme service images to `f9ed292` + bump chart to 1.4.146	2026-05-16 19:29:50 +00:00
e3mrah	f9ed292198	fix(billing): /redeem-preview + plans + addons bypass JWT (D29) (#1561 ) * chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes) PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as public gateway routes — required for the marketplace /redeem zero-touch flow. Pin the slot so future provisions inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing): /redeem-preview + plans + addons bypass JWT (D29) Mirror PR #1559's gateway public routes in the billing service's own middleware chain. The gateway now lets these requests through without an Authorization header (D29 voucher-redeem landing), but billing service's main.go was JWT-gating EVERY /billing/* path except /billing/webhook — so the request still got 401, just one hop later. Caught live on t132 2026-05-16 after PR #1559 rolled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:48 +04:00
e3mrah	936e76f79a	chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes) (#1560 ) PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as public gateway routes — required for the marketplace /redeem zero-touch flow. Pin the slot so future provisions inherit it. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:25:14 +04:00
github-actions[bot]	696aa26f83	deploy: update sme service images to `a11067d` + bump chart to 1.4.145	2026-05-16 19:18:09 +00:00
e3mrah	a11067da1a	fix(gateway): /redeem-preview + plans + addons must be public (D29) (#1559 ) * feat(billing+notification): wire voucher-issued email (D28) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): admin pod uses dedicated image tag (D27 SME stack) t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` — the SME services CI run for that mono-repo SHA published 10 services but admin's image was missing from GHCR. Decouple admin's tag from smeTag so a missing-build for one service doesn't wedge the SME stack. Default to `3c2f7e4` (matches marketplaceApi + console, known-published). When admin's UI changes, bump in lockstep with those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(slot-13): pin bp-catalyst-platform to 1.4.144 PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override) landed and Blueprint Release packaged 1.4.144. Pin the slot file so future provisions get the latest chart by default — t132 manually upgraded via kubectl patch but t133+ will inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): /redeem-preview + plans + addons must be public (D29) The marketplace /redeem?code=XXX landing page calls /api/billing/vouchers/redeem-preview unauthenticated per docs/FRANCHISE- MODEL.md §3, but the gateway's catch-all /api/billing/ entry was returning 401 to it — breaking the entire voucher-redeem zero-touch flow that D29 depends on. Also expose /api/billing/plans and /api/billing/addons so the marketplace landing can render pricing without a session. Caught live on t132 2026-05-16 — every /redeem call returned 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:17:04 +04:00
e3mrah	27bd5d486d	chore(slot-13): pin bp-catalyst-platform to 1.4.144 (#1558 ) * feat(billing+notification): wire voucher-issued email (D28) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): admin pod uses dedicated image tag (D27 SME stack) t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` — the SME services CI run for that mono-repo SHA published 10 services but admin's image was missing from GHCR. Decouple admin's tag from smeTag so a missing-build for one service doesn't wedge the SME stack. Default to `3c2f7e4` (matches marketplaceApi + console, known-published). When admin's UI changes, bump in lockstep with those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(slot-13): pin bp-catalyst-platform to 1.4.144 PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override) landed and Blueprint Release packaged 1.4.144. Pin the slot file so future provisions get the latest chart by default — t132 manually upgraded via kubectl patch but t133+ will inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:12:34 +04:00
github-actions[bot]	48eb653f79	deploy: update sme service images to `1fe7067` + bump chart to 1.4.144	2026-05-16 19:05:51 +00:00
e3mrah	7c3724591c	fix(chart): admin pod uses dedicated image tag (D27 SME stack) (#1557 ) * feat(billing+notification): wire voucher-issued email (D28) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): admin pod uses dedicated image tag (D27 SME stack) t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` — the SME services CI run for that mono-repo SHA published 10 services but admin's image was missing from GHCR. Decouple admin's tag from smeTag so a missing-build for one service doesn't wedge the SME stack. Default to `3c2f7e4` (matches marketplaceApi + console, known-published). When admin's UI changes, bump in lockstep with those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:05:09 +04:00
e3mrah	1fe706769f	feat(billing+notification): wire voucher-issued email (D28) (#1556 ) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:04:46 +04:00
github-actions[bot]	9718ba2924	deploy: update catalyst images to `2fd4e3c`	2026-05-16 18:26:16 +00:00
e3mrah	2fd4e3cbf4	feat(wizard): default marketplaceEnabled=true for D27 zero-touch (#1555 ) Founder ruling 2026-05-16: D27 mandates that a fresh wizard provisions a Sovereign already ready to host tenant orgs (D29). Operator can still flip the toggle off on StepMarketplace if they explicitly want a private Sovereign. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:24:09 +04:00
e3mrah	77c80c9728	docs(DoD): add D27-D31 (marketplace + voucher + tenant org + free subdomain + CNPG active-hot-standby) (#1554 ) Founder ruling 2026-05-16: tenant onboarding flow is part of the Sovereign DoD. D27 — Marketplace enabled on the Sovereign (zero-touch from provision body) D28 — Owner-tier voucher issuance (one-click, voucher mailed via Sovereign SMTP) D29 — Voucher-redeem → org wizard → tenant namespace+RBAC+bootstrap (zero-touch) D30 — Free-subdomain pool selection (omani.homes, omani.rest, omani.trades) D31 — Tenant app with CNPG active-hot-standby cross-region replication Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:18:55 +04:00
github-actions[bot]	564fe4f4e5	deploy: update catalyst images to `9f096b0`	2026-05-16 18:01:02 +00:00
e3mrah	9f096b0b18	fix(chroot): populate Result.LoadBalancerIP so canvas shows LB chip (D15) (#1553 ) chrootEnsureDeployment was synthesizing a Deployment with Result=nil. The topology loader's buildLBs() returned [] on nil-Result → canvas chip showed `LoadBalancer 0/0` on every chroot Sovereign Console even though the Sovereign ingress LB was allocated and serving console.<fqdn>. Populate Result with LoadBalancerIP from `SOVEREIGN_LB_IP` env (set by bp-catalyst-platform's sovereign-fqdn ConfigMap `lbIP` key per issue #900 / PR #145). buildLBs then emits one LoadBalancer entry per region using the canonical primary LB. Caught on t131 2026-05-16 — DoD D15. Same chroot-synth-enrichment pattern as PR #1534 (SOVEREIGN_REGIONS_JSON). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:58:53 +04:00
github-actions[bot]	dd9b631740	deploy: update catalyst images to `124ac13`	2026-05-16 17:58:31 +00:00
e3mrah	124ac13c1d	fix(router): chroot Sovereign /app/<name> resolves to AppDetail, not mothership AppsPage (D17b) (#1552 ) Two route trees claim `/app`: 1. `appRoute` (line 364) — mothership AppLayout chrome, prefix `/app`, children `/app/$deploymentId/applications/`, `/app/$deploymentId/ settings`, `/app/dashboard` (fleet view), etc. ~30 children. 2. `consoleAppDetailRoute` (line 1141, under consoleLayoutRoute) — clean `/app/$componentId` for the chroot Sovereign Console's per-app detail. On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign') the operator clicks `/apps/<card>` → AppCard generates HREF `/app/<name>` (AppsPage.tsx line ~720, correct for chroot context). TanStack router resolves to the MOTHERSHIP `appRoute` because it matches first (registered earlier under rootRoute) and its children accept `<name>` as $deploymentId. The page renders AppLayout chrome + AppsPage with mothership sidebar — looks nothing like AppDetail. Founder observation (BUG-002 from /tmp/test-matrix-t129.json + reported on t131 2026-05-16): > Application individual pages are not visible at all in the child > while mothership doesn't have that issue, this is the biggest blunder! Fix: `appRoute.beforeLoad` redirects on chroot: - `/app/<componentId>` → `/<componentId>` (caught by consoleAppDetailRoute) - `/app/dashboard`, `/app/install`, `/app/sre/`, `/app/sec/*`, `/app/blueprints` → `/dashboard` (canonical Sovereign landing; these are mothership-only surfaces — already partially fixed at dashboardRoute level by PR #1547) Mothership behavior unchanged (DETECTED_MODE.mode !== 'sovereign' falls through to the existing AppLayout-rooted tree). Refs DoD D17b. Caught on t131 (623354058b114dd6, 2026-05-16). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:56:31 +04:00
e3mrah	f88e60726c	fix(slot-13): single-quote SOVEREIGN_REGIONS_JSON to preserve JSON literal (D5) (#1551 ) The substitute `${SOVEREIGN_REGIONS_JSON:-}` produces valid JSON like `[{"cloudRegion":"hel1","controlPlaneSize":"cpx52",...}]`. Unquoted in the slot-13 YAML, the YAML parser interprets it as a flow-sequence of flow-mappings, parsing into Go `[]map[string]interface{}`. Helm chart template `{{ .Values.sovereign.regionsJson }}` then stringifies via `%v` printf, producing Go map syntax: [map[cloudRegion:hel1 controlPlaneSize:cpx52 ...]] The chroot catalyst-api's `chrootRegionsFromEnv` calls json.Unmarshal which fails → Request.Regions stays empty → topology loader falls back to live-Nodes path → /cloud renders "1 region 1 cluster" on every multi-region Sovereign. Caught on t131 (623354058b114dd6, 2026-05-16) — DoD D5. Fix: single-quote the substitute so YAML treats it as a string literal, preserving the JSON byte-for-byte. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:33:23 +04:00
e3mrah	7e87a4d7b9	fix(clustermesh-lb): revert use-private-ip to false (D11) (#1550 ) PR #1537 set `use-private-ip: "true"` on the clustermesh-apiserver Service annotations. CCM rejected with: ReconcileHCLBTargets: use private ip: missing network id The per-region Hetzner LB allocated by CCM has no private-network attachment by default (LB private_net is empty), so it can't route to the backend's private IP. Result: LB never allocated, clustermesh apiserver Service stays `<pending>`, orchestrator waits 5min and bails with empty peerEntries. Caught on t130 (30463cd0a5a931be, 2026-05-16). PR #1538's canonical fix opens TCP 30000-32767 in the Hetzner firewall so the public-IP LB→backend health checks pass. Revert use-private-ip to false so the chain works end-to-end. Refs DoD D11. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:01:20 +04:00
github-actions[bot]	8980b727fb	deploy: update catalyst images to `fbe23da`	2026-05-16 16:34:04 +00:00
e3mrah	fbe23da091	fix(ui-nginx): allow Google Fonts domains in CSP (D26) (#1549 ) Sovereign Console pages reference Inter + JetBrains Mono fonts via fonts.googleapis.com (index.html lines 9, 11). The nginx CSP only allowed font-src 'self' data: — so the browser blocked the font stylesheet AND the woff2 fetches, falling back to system fonts. Add fonts.googleapis.com to style-src (for the @import CSS) and fonts.gstatic.com to font-src (for the woff2 assets). All 3 CSP occurrences in nginx.conf updated identically. Alternative considered: self-host the woff2 + drop the external references. Skipped for now — sticking with Google Fonts CDN is faster + matches every other web app's posture. If the operator wants air-gap-compatible Sovereigns later, switch to self-hosted. Caught on t129 2026-05-16 — DoD D26. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:31:51 +04:00
github-actions[bot]	27556577f7	deploy: update catalyst images to `7845a00`	2026-05-16 16:30:19 +00:00
e3mrah	7845a00799	fix(dashboard): add region + vcluster as TreemapDimensions (D16) (#1548 ) Multi-region operators on the Sovereign Console couldn't pivot the /dashboard treemap by region or vCluster. The TreemapDimension union (FE) and dashboardDimension set (BE) only included sovereign/cluster/family/namespace/application. This PR: - Adds 'region' + 'vcluster' to TreemapDimension type (products/catalyst/bootstrap/ui/src/lib/treemap.types.ts) - Adds them to the dimension select options (products/catalyst/bootstrap/ui/src/components/TreemapLayerController.tsx) - Adds them to the validated set in dashboard.go - Adds podRow.region + podRow.vcluster fields populated from openova.io/region and catalyst.openova.io/vcluster-role labels - Extends dimensionKey switch to bucket by these new dimensions (fallback: region→cluster, vcluster→"host") Caught on t129 2026-05-16 — DoD D16. Note that full multi-cluster fan-out (aggregating pods across all 3 region kubeconfigs into one treemap) is a separate refactor not included here; this PR delivers the dimension surface so the layer selector is usable + a fresh prov with the chroot's k8scache extended to multi-region will render 3 cluster bubbles when the operator picks Layer-1=cluster. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:24:34 +04:00
github-actions[bot]	477bd0966f	deploy: update catalyst images to `52015ff`	2026-05-16 16:15:32 +00:00
e3mrah	52015ff468	fix(ui): t129 SPA routing — bp-bp- prefix, PIN /wizard leak, /app/dashboard fleet leak (#1547 ) Three operator-visible SPA routing bugs caught on live t129 Sovereign Console (t129.omani.works, 2026-05-16). Closes #1546. BUG-001 (D19) — doubled /app/bp-bp-* href on 10 of 44 app cards. build-catalog.mjs::listBootstrapKit extracted slug from `NN-(.+)\.yaml` without stripping an optional `bp-` already present in some filenames (e.g. `13-bp-catalyst-platform.yaml`). The captured slug became `bp-catalyst-platform`, then `id: \`bp-${slug}\`` doubled it to `bp-bp-catalyst-platform`, breaking the FE↔BE HR-name join and printing the doubled prefix on the AppsPage card href. Fix: strip a leading `bp-` from the captured slug before forming the canonical id. Regenerated catalog.generated.ts + blueprints.json — 10 entries collapse to their single-prefix canonical form (bp-catalyst-platform, bp-cert-manager-powerdns-webhook, bp-k8s-ws-proxy, bp-guacamole, bp-dmz-vcluster, bp-hcloud-ccm, bp-openova-flow-server, bp-openova-flow-emitter, bp-mgmt-vcluster, bp-rtz-vcluster). BUG-015 (D23, extends D0) — PIN-verify lands /wizard on Sovereign. VerifyPinPage default landing was `/wizard` regardless of operating mode. On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign' the operator has just been auto-redirected from the mothership handover URL; their Sovereign is already converged. Routing them to the new-prov wizard re-prompts for org details and contradicts D0. Fix: branch on DETECTED_MODE.mode — `/dashboard` on sovereign, `/wizard` on catalyst-zero. Mothership flow unchanged. Test: VerifyPinPage.test.tsx asserts the 3 cases (sovereign default, catalyst-zero default, explicit next= override). BUG-016 (D24) — /app/dashboard exposes mothership fleet view. appRoute's `/dashboard` child mounts DashboardPage (multi-Sovereign fleet, "7 Sovereigns" with duplicate rows). On a Sovereign Console this surface MUST NOT be reachable — the Sovereign owns ONE deployment, fleet is mothership-only. Fix: beforeLoad on dashboardRoute redirects to `/dashboard` (consoleDashboardRoute, the per-Sovereign landing) when DETECTED_MODE.mode === 'sovereign'. Mothership keeps the fleet view as today. Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D19/D23/D24, /tmp/test-matrix-t129.json discoveries BUG-001/015/016. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 20:13:26 +04:00
e3mrah	d7b2c017f1	fix(gitea): override DOMAIN/ROOT_URL with SOVEREIGN_FQDN (D25) (#1545 ) Chart values.yaml ships `gitea.gitea.config.server.DOMAIN = gitea.catalyst.local` + `ROOT_URL = https://gitea.catalyst.local` — the bootstrap dev hostname. Without per-Sovereign override, Gitea's Web UI rendered the dev hostname in pageData.appUrl, internal links, and `git clone` URLs. Operators on every freshly-provisioned Sovereign were shown a gitea.catalyst.local hostname that public DNS can't resolve. Slot 10-gitea Kustomization adds the per-Sovereign override: gitea.gitea.config.server.DOMAIN: gitea.${SOVEREIGN_FQDN} gitea.gitea.config.server.ROOT_URL: https://gitea.${SOVEREIGN_FQDN} Caught on t129 2026-05-16 — DoD D25. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:57:43 +04:00
e3mrah	9e47fd122a	docs(DoD): add D19-D26 — counters/jobs/users/settings/wizard/mothership-leak/hostnames/CSP (#1544 ) Test-plan-author agent walked the live t129 Sovereign UI and discovered 22 bugs across 11 categories. 8 propose new DoD gates beyond D0-D18: - D19: Apps + Cloud counter consistency (44 vs 36 mismatch; vCluster=0/0, LoadBalancer=0/0, Bucket=0/0, Volume=0/0; PVC 66 vs 33; doubled /app/bp-bp-* hrefs on 10/44 cards) - D20: Jobs page region-prefix visibility + per-region filter - D21: Operator pre-populated as owner-tier on /users - D22: Settings shows real values (no "—" / "API PENDING") - D23: Post-handover lands /dashboard, not /wizard - D24: Mothership-only views (fleet view, "+ New deployment") absent from Sovereign Console - D25: All operator-facing service hostnames reachable + no `gitea.catalyst.local` dev refs in HTML - D26: CSP allows fonts (or self-host woff2) Matrix: /tmp/test-matrix-t129.json — 224 TCs covering every operator surface, 123 P0 / 52 P1 / 49 P2. Per founder ruling: the DoD list is the convergence contract that grows as test-writer / test-executor find more operator-visible bugs. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:52:53 +04:00
github-actions[bot]	1405275af9	deploy: update catalyst images to `2b3888e`	2026-05-16 15:48:21 +00:00
e3mrah	2b3888eed5	fix(ui): suppress chroot-side false-positive notifications (D17, D18) (#1543 ) Two notification spammers on the chroot Sovereign Console that produce noise on every /apps + /app/<name> visit: D17 — "Deployment id in the URL is malformed": AppsPage.tsx fires on isDeploymentID(rawDeploymentId)=false. On the chroot, useResolvedDeploymentId resolves to /api/v1/sovereign/self which returns the synthesized canonical id `sovereign-<fqdn>` (26 chars, not hex). The notification claims that path-segment is invalid even though there is no URL segment — the resolution path is in-process. Suppress on DETECTED_MODE.mode === 'sovereign'. D18 — "Per-component install monitoring is unavailable": Fires on state.phase1WatchSkipped. On the chroot, phase1WatchSkipped is a MOTHERSHIP-only concept (mother's observer pod failed to fetch the new cluster's kubeconfig). The Sovereign-side catalyst-api runs IN the cluster it's reporting on — has the in-cluster ServiceAccount + bundled sovereignDynamicClient + informer cache watching HelmReleases natively. Firing this here tells operator to drop to kubectl when the data is on the page. Suppress on chroot. Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — DoD D17 + D18. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:46:25 +04:00
github-actions[bot]	f8c20137a5	deploy: update catalyst images to `536bfcb`	2026-05-16 15:42:43 +00:00
e3mrah	536bfcb699	fix(infrastructure): vCluster fallback from namespace label (D15) (#1542 ) loadVClusters() queried vcluster.io/v1alpha1 CRs only. Our bootstrap topology ships loft-sh/vcluster as a plain Helm chart (StatefulSet + Service, NO CRD installed) so the CR list is always empty on a converged Sovereign → canvas `vCluster N/N` chip shows `0/0` even though Pods are Running. Add a fallback: enumerate Namespaces carrying `catalyst.openova.io/vcluster-role` label (stamped by bp-{mgmt,dmz,rtz}-vcluster's namespace template at PR #1526). Emits one VCluster row per labeled namespace with role = the label value. Status `healthy` since the namespace exists (operator-visible Pod state is surfaced elsewhere). Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — D15. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:40:50 +04:00
e3mrah	7c7b6277e4	docs(DoD): add D15-D18 — canvas-accuracy/routing/self-monitoring (t129) (#1541 ) Founder ruling 2026-05-16: 100% DoD was premature. Real operator-visible issues remain on t129 (omani.works 3×cpx52, 6cddff7ef4432bdc) after the D0-D14 + A4 gates passed: - D15: /cloud canvas shows vCluster 0/0 and LoadBalancer 0/0 despite vCluster Pods Running + LBs allocated. Canvas adapter not reading the live cluster state for these kinds. - D16: /dashboard Layer-1=Cluster grouping renders single Sovereign, not 3 cluster-grouped bubbles. Multi-region hierarchy collapse broken at the dashboard level. - D17: /app/<name> routes (e.g. /app/bp-cnpg) emit "Deployment id in the URL is malformed (expected 16 lowercase hex characters; got 7)" — the SPA router treats the app-name segment as a deployment-id. Every application card click produces a notifications-drawer entry. - D18: Sovereign-side catalyst-api can't fetch its own kubeconfig to monitor Phase-1 install state. Operator is told "Use kubectl directly to check Helm releases" — should be invisible. DoD list explicitly grows per iteration as test-writer / test-executor discover more operator-visible issues. The list is the convergence contract. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:37:38 +04:00
github-actions[bot]	f010ca16a7	deploy: update catalyst images to `5b69247`	2026-05-16 15:11:00 +00:00
e3mrah	5b69247135	fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540 ) Tofu's `secondary_region_cluster_mesh_name` local at infra/hetzner/main.tf:389 generates secondary names as `<sovereign-stem>-<region-stem-no-digits>` (e.g. `t129-nbg`, `t129-sin`). The bootstrap-kit slot 01-cilium.yaml renders cilium-config cluster.name from this value via the CLUSTER_MESH_NAME envsubst. The orchestrator's clusterName derivation was wrong: it appended `-<region-key>` to the primary's name (e.g. `t129-mesh-nbg1-1`), which matched NEITHER the tofu scheme NOR the cilium-config value. Caught on t129 (6cddff7ef4432bdc, 2026-05-16): TLS, etcd RBAC, and connection all working after PRs #1530, #1536, #1538, #1539 — but agent reported `failed to retrieve cluster configuration: not found` for every secondary peer because it queried `cilium/cluster-config/v1/t129-mesh-nbg1-1` against an etcd that only had `t129-nbg`. Fix: export `DeriveSecondaryClusterMeshName(req, rs)` that mirrors tofu's local exactly, plus a `stripTrailingDigits` helper. Orchestrator's buildRegionSlots uses this for secondaries; primary keeps the `<stem>-mesh` shape. Closes D11 incident chain: #1525 → #1528 → #1530 → #1536 → #1538 → #1539 → this. With this PR landed t129's secondary→primary connection already works (verified on live cluster — secondary agents show "ready, 2 nodes, 113 endpoints, 326 identities"); primary→secondary will work on a fresh prov once the name match is correct from the start. Refs DoD D11. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 19:08:55 +04:00
github-actions[bot]	6b519b5573	deploy: update catalyst images to `d0fd32d`	2026-05-16 15:01:32 +00:00
e3mrah	d0fd32dc04	fix(clustermesh): use peer's clustermesh-apiserver-remote-cert (D11) (#1539 ) The orchestrator was minting a fresh client cert (CN = local cluster name) for each peer connection. Even with PR #1530's "sign with peer's CA" fix the TLS handshake succeeded but etcd RBAC rejected: error="etcdserver: permission denied" Cilium's clustermesh-apiserver etcd has RBAC with a `remote` user that has read access on the cilium/* prefix. The chart generates `kube-system/clustermesh-apiserver-remote-cert` with CN=`remote`. Canonical `cilium clustermesh connect` CLI copies THIS Secret's tls.crt/tls.key as the client cert the REMOTE cluster presents — matches the etcd RBAC user verbatim. This PR adopts that pattern: snapshotRemoteCert() reads the peer's existing `clustermesh-apiserver-remote-cert` Secret, returns tls.crt + tls.key bytes, and the orchestrator writes them into A's `cilium-clustermesh` Secret instead of minting. Caught on t129 (6cddff7ef4432bdc, 2026-05-16): - TLS handshake succeeded after firewall fix (PR #1538) opened NodePort range so LB→backend health check passed - cilium-dbg status reported `etcd: 1/1 connected, has-quorum=true` (TLS path working) - BUT `remote configuration: expected=true, retrieved=false` and agent logs spammed `etcdserver: permission denied` With this PR's CN=remote cert, etcd authorizes the kvstore List and clustermesh sync completes — agent should flip to `2/2 remote clusters ready`. Completes the D11 chain: #1525 (regionKeyFromSpec) → #1528 (clusterName derivation) → #1530 (cert with peer's CA — no longer needed but kept as defense-in-depth) → #1536 (hostAlias pattern) → #1538 (firewall NodePort range) → this. Refs DoD D11. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 18:58:22 +04:00
github-actions[bot]	1cfe0d758f	deploy: update catalyst images to `1c988b9`	2026-05-16 14:45:56 +00:00
e3mrah	1c988b9a4b	fix(firewall): open NodePort range 30000-32767 for clustermesh LB (D11) (#1538 ) PR #1537's use-private-ip approach was not viable: the per-region Hetzner LB has no private-network attachment by default (LB private_net is empty) and our DoD A2 architecture pins one private /24 per region that does NOT span across regions. The LB->backend hop has to transit the public path. The actual blocker is the Sovereign firewall: it permits 80/443/6443/53 and blocks the NodePort range. Hetzner LB TCP health-check probes `<node-public-ip>:<NodePort>` and gets dropped → all targets marked unhealthy → external clients see "unexpected eof while reading" at TLS handshake → cilium clustermesh agent stays `0/N remote clusters ready, Waiting for initial connection`. Security: clustermesh-apiserver requires mTLS. Peer agents must present a client cert signed by the peer cluster's cilium-ca (PR #1530). Anonymous connections rejected at handshake. mTLS is the security boundary, NOT the firewall — opening NodePorts is safe here. Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the D11 incident chain (#1525 → #1528 → #1530 → #1536 → this). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 18:44:02 +04:00
e3mrah	e21f7bd0fc	fix(clustermesh-lb): use-private-ip=true for LB→backend transit (D11) (#1537 ) Hetzner firewall on Sovereign nodes permits public ingress on 80, 443, 6443, 53/tcp+udp, icmp, and wg-udp (51871). The NodePort range (30000-32767) is BLOCKED — that's the security posture (privileged Sovereign workloads should not be reachable on arbitrary NodePorts from the internet). Hetzner LB TCP health checks probe `<node-ip>:<destination_port>` where destination_port is the Service NodePort. With public-IP transit the probe goes through the firewall and gets dropped. All 3 clustermesh LB targets in t129 reported `health_status=unhealthy` because of this. With no healthy targets the LB refuses connections — external clients see "unexpected eof while reading" at TLS handshake time. Cilium agent stays `0/N remote clusters ready, Waiting for initial connection`. Fix: `load-balancer.hetzner.cloud/use-private-ip: "true"` so the LB → backend connection transits the per-region private network (10.0.1.0/24). External clients still connect to the LB's PUBLIC IP — this annotation only controls the LB→backend hop, which is in-region anyway (one LB per region, points at that region's CP node). Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the D11 chain that began with PR #1525 (regionKeyFromSpec), continued through PR #1528 (clusterName derivation), PR #1530 (peer cert signed by peer's CA), and PR #1536 (hostAlias pattern). With this PR's traffic-path fix landed, the LB→backend hop should succeed and the chain becomes end-to-end working. Refs DoD D11. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 18:36:54 +04:00
github-actions[bot]	bfc5a6143f	deploy: update catalyst images to `83d771d`	2026-05-16 14:13:28 +00:00

1 2 3 4 5 ...

2204 Commits