* feat(handover): auto-seed owner UserAccess CR on chroot (D21)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.
After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:
apiVersion: access.openova.io/v1alpha1
kind: UserAccess
metadata:
name: useraccess-owner-<sanitized-email>
annotations:
catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint
spec:
user:
keycloakSubject: <email>
sovereignRef: <fqdn-first-label>
applications:
- app: "*"
role: admin # owner -> admin
The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.
Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.
Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21
* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)
PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)
PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.
toJson normalises both string and list inputs to valid JSON.
Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): populate deployment Result + Request fields for D22
Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.
Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env
Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.
Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22)
GetDeployment was the only handler that returned 404 without calling
chrootEnsureDeployment. After a catalyst-api Pod restart on the chroot
the in-memory store is empty until some other handler (StreamLogs,
jobs list) primes it via its own synth call — meanwhile the Sovereign
Console Settings page loads /api/v1/deployments/<id> first and gets
404, rendering the entire page broken.
Mirror the StreamLogs pattern (lines 1247-1254): try in-memory load,
fall through to chrootEnsureDeployment, return 404 only when both miss.
This unblocks PR #1567's deployment-record population — without the
fallback, GetDeployment can never serve the populated record on chroot.
Caught live on t132 2026-05-16 after #1567 image roll: Settings page
404 because in-memory store was empty.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.
After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:
apiVersion: access.openova.io/v1alpha1
kind: UserAccess
metadata:
name: useraccess-owner-<sanitized-email>
annotations:
catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint
spec:
user:
keycloakSubject: <email>
sovereignRef: <fqdn-first-label>
applications:
- app: "*"
role: admin # owner -> admin
The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.
Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.
Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21
* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)
PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)
PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.
toJson normalises both string and list inputs to valid JSON.
Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chroot): populate deployment Result + Request fields for D22
Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.
Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env
Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.
Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.
After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:
apiVersion: access.openova.io/v1alpha1
kind: UserAccess
metadata:
name: useraccess-owner-<sanitized-email>
annotations:
catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint
spec:
user:
keycloakSubject: <email>
sovereignRef: <fqdn-first-label>
applications:
- app: "*"
role: admin # owner -> admin
The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.
Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.
Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21
* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)
PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)
PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.
toJson normalises both string and list inputs to valid JSON.
Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.
After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:
apiVersion: access.openova.io/v1alpha1
kind: UserAccess
metadata:
name: useraccess-owner-<sanitized-email>
annotations:
catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint
spec:
user:
keycloakSubject: <email>
sovereignRef: <fqdn-first-label>
applications:
- app: "*"
role: admin # owner -> admin
The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.
Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.
Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21
* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)
PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.
After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:
apiVersion: access.openova.io/v1alpha1
kind: UserAccess
metadata:
name: useraccess-owner-<sanitized-email>
annotations:
catalyst.openova.io/user-email: <email> # rbac_matrix:309 hint
spec:
user:
keycloakSubject: <email>
sovereignRef: <fqdn-first-label>
applications:
- app: "*"
role: admin # owner -> admin
The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.
Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.
Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app
may opt into a cross-region active-hot-standby Postgres pair for their
WordPress instance instead of the default single CNPG Cluster.
Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster
CRs with WAL streaming over Cilium ClusterMesh via a managed Service
annotated service.cilium.io/global=true). When the new
pg.activeHotStandby.enabled flag is false (default), templates render
the existing single Cluster bit-for-bit — no regression for non-HA
tenants.
Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace
HA filter can surface it.
Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default
single-cluster shape AND the enabled 2-Cluster shape with the right
nodeSelectors, replica.source, externalCluster.host, Cilium global
annotation, and bootstrap.pg_basebackup; all 5 cases pass.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes)
PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as
public gateway routes — required for the marketplace /redeem zero-touch
flow. Pin the slot so future provisions inherit it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(billing): /redeem-preview + plans + addons bypass JWT (D29)
Mirror PR #1559's gateway public routes in the billing service's own
middleware chain. The gateway now lets these requests through without
an Authorization header (D29 voucher-redeem landing), but billing
service's main.go was JWT-gating EVERY /billing/* path except
/billing/webhook — so the request still got 401, just one hop later.
Caught live on t132 2026-05-16 after PR #1559 rolled.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as
public gateway routes — required for the marketplace /redeem zero-touch
flow. Pin the slot so future provisions inherit it.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(billing+notification): wire voucher-issued email (D28)
D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.
Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.
Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.
Changes:
notification/templates/templates.go
+ VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
validityHint) — renders code prominently, redeem button to
https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
always supplied by caller, NEVER hardcoded.
notification/handlers/handlers.go
+ renderTemplate("voucher-issued") case parsing
{code, credit_omr, description, sovereign_fqdn, validity_hint}.
+ Default subject "You've been gifted a voucher for OpenOva SME".
billing/handlers/handlers.go
+ Handler fields: NotificationURL, SovereignFQDN, NotificationClient.
billing/handlers/vouchers.go
+ issueVoucherRequest = store.PromoCode + RecipientEmail (request-
only; never persisted).
+ sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
timeout. Best-effort: a non-2xx or transport error logs but does
NOT fail the IssueVoucher response, because the row is already
persisted and re-issuing the same code re-fires the email.
+ Re-issue semantics (#91 resurrects soft-deleted rows) extend to
the email path — documented in the handler comment.
billing/main.go
+ Reads NOTIFICATION_SERVICE_URL (default
http://notification.sme.svc.cluster.local:8087/notification/send)
and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.
products/catalyst/chart/templates/sme-services/billing.yaml
+ Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
hardcoded) into the billing Deployment.
Tests:
notification/handlers/handlers_test.go (new)
+ TestRenderTemplate_VoucherIssued: rendered HTML contains code +
credit + a redeem URL built from the supplied FQDN; never falls
back to marketplace.openova.io.
+ TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
+ TestRenderTemplate_UnknownTemplate as guard rails.
billing/handlers/vouchers_test.go
+ TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
tripper sees the POST to notification with the right URL +
template + data (code upper-cased, credit_omr, sovereign_fqdn,
description) when recipient_email is set.
+ TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
call when recipient is empty.
+ TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
operator gets 200 even when notification returns 500.
+ TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): admin pod uses dedicated image tag (D27 SME stack)
t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.
Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(slot-13): pin bp-catalyst-platform to 1.4.144
PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override)
landed and Blueprint Release packaged 1.4.144. Pin the slot file so
future provisions get the latest chart by default — t132 manually
upgraded via kubectl patch but t133+ will inherit it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(gateway): /redeem-preview + plans + addons must be public (D29)
The marketplace /redeem?code=XXX landing page calls
/api/billing/vouchers/redeem-preview unauthenticated per docs/FRANCHISE-
MODEL.md §3, but the gateway's catch-all /api/billing/ entry was
returning 401 to it — breaking the entire voucher-redeem zero-touch
flow that D29 depends on.
Also expose /api/billing/plans and /api/billing/addons so the
marketplace landing can render pricing without a session.
Caught live on t132 2026-05-16 — every /redeem call returned 401.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(billing+notification): wire voucher-issued email (D28)
D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.
Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.
Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.
Changes:
notification/templates/templates.go
+ VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
validityHint) — renders code prominently, redeem button to
https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
always supplied by caller, NEVER hardcoded.
notification/handlers/handlers.go
+ renderTemplate("voucher-issued") case parsing
{code, credit_omr, description, sovereign_fqdn, validity_hint}.
+ Default subject "You've been gifted a voucher for OpenOva SME".
billing/handlers/handlers.go
+ Handler fields: NotificationURL, SovereignFQDN, NotificationClient.
billing/handlers/vouchers.go
+ issueVoucherRequest = store.PromoCode + RecipientEmail (request-
only; never persisted).
+ sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
timeout. Best-effort: a non-2xx or transport error logs but does
NOT fail the IssueVoucher response, because the row is already
persisted and re-issuing the same code re-fires the email.
+ Re-issue semantics (#91 resurrects soft-deleted rows) extend to
the email path — documented in the handler comment.
billing/main.go
+ Reads NOTIFICATION_SERVICE_URL (default
http://notification.sme.svc.cluster.local:8087/notification/send)
and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.
products/catalyst/chart/templates/sme-services/billing.yaml
+ Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
hardcoded) into the billing Deployment.
Tests:
notification/handlers/handlers_test.go (new)
+ TestRenderTemplate_VoucherIssued: rendered HTML contains code +
credit + a redeem URL built from the supplied FQDN; never falls
back to marketplace.openova.io.
+ TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
+ TestRenderTemplate_UnknownTemplate as guard rails.
billing/handlers/vouchers_test.go
+ TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
tripper sees the POST to notification with the right URL +
template + data (code upper-cased, credit_omr, sovereign_fqdn,
description) when recipient_email is set.
+ TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
call when recipient is empty.
+ TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
operator gets 200 even when notification returns 500.
+ TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): admin pod uses dedicated image tag (D27 SME stack)
t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.
Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(slot-13): pin bp-catalyst-platform to 1.4.144
PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override)
landed and Blueprint Release packaged 1.4.144. Pin the slot file so
future provisions get the latest chart by default — t132 manually
upgraded via kubectl patch but t133+ will inherit it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(billing+notification): wire voucher-issued email (D28)
D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.
Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.
Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.
Changes:
notification/templates/templates.go
+ VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
validityHint) — renders code prominently, redeem button to
https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
always supplied by caller, NEVER hardcoded.
notification/handlers/handlers.go
+ renderTemplate("voucher-issued") case parsing
{code, credit_omr, description, sovereign_fqdn, validity_hint}.
+ Default subject "You've been gifted a voucher for OpenOva SME".
billing/handlers/handlers.go
+ Handler fields: NotificationURL, SovereignFQDN, NotificationClient.
billing/handlers/vouchers.go
+ issueVoucherRequest = store.PromoCode + RecipientEmail (request-
only; never persisted).
+ sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
timeout. Best-effort: a non-2xx or transport error logs but does
NOT fail the IssueVoucher response, because the row is already
persisted and re-issuing the same code re-fires the email.
+ Re-issue semantics (#91 resurrects soft-deleted rows) extend to
the email path — documented in the handler comment.
billing/main.go
+ Reads NOTIFICATION_SERVICE_URL (default
http://notification.sme.svc.cluster.local:8087/notification/send)
and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.
products/catalyst/chart/templates/sme-services/billing.yaml
+ Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
hardcoded) into the billing Deployment.
Tests:
notification/handlers/handlers_test.go (new)
+ TestRenderTemplate_VoucherIssued: rendered HTML contains code +
credit + a redeem URL built from the supplied FQDN; never falls
back to marketplace.openova.io.
+ TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
+ TestRenderTemplate_UnknownTemplate as guard rails.
billing/handlers/vouchers_test.go
+ TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
tripper sees the POST to notification with the right URL +
template + data (code upper-cased, credit_omr, sovereign_fqdn,
description) when recipient_email is set.
+ TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
call when recipient is empty.
+ TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
operator gets 200 even when notification returns 500.
+ TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(chart): admin pod uses dedicated image tag (D27 SME stack)
t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.
Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.
Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.
Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.
Changes:
notification/templates/templates.go
+ VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
validityHint) — renders code prominently, redeem button to
https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
always supplied by caller, NEVER hardcoded.
notification/handlers/handlers.go
+ renderTemplate("voucher-issued") case parsing
{code, credit_omr, description, sovereign_fqdn, validity_hint}.
+ Default subject "You've been gifted a voucher for OpenOva SME".
billing/handlers/handlers.go
+ Handler fields: NotificationURL, SovereignFQDN, NotificationClient.
billing/handlers/vouchers.go
+ issueVoucherRequest = store.PromoCode + RecipientEmail (request-
only; never persisted).
+ sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
timeout. Best-effort: a non-2xx or transport error logs but does
NOT fail the IssueVoucher response, because the row is already
persisted and re-issuing the same code re-fires the email.
+ Re-issue semantics (#91 resurrects soft-deleted rows) extend to
the email path — documented in the handler comment.
billing/main.go
+ Reads NOTIFICATION_SERVICE_URL (default
http://notification.sme.svc.cluster.local:8087/notification/send)
and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.
products/catalyst/chart/templates/sme-services/billing.yaml
+ Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
hardcoded) into the billing Deployment.
Tests:
notification/handlers/handlers_test.go (new)
+ TestRenderTemplate_VoucherIssued: rendered HTML contains code +
credit + a redeem URL built from the supplied FQDN; never falls
back to marketplace.openova.io.
+ TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
+ TestRenderTemplate_UnknownTemplate as guard rails.
billing/handlers/vouchers_test.go
+ TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
tripper sees the POST to notification with the right URL +
template + data (code upper-cased, credit_omr, sovereign_fqdn,
description) when recipient_email is set.
+ TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
call when recipient is empty.
+ TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
operator gets 200 even when notification returns 500.
+ TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Founder ruling 2026-05-16: D27 mandates that a fresh wizard provisions a
Sovereign already ready to host tenant orgs (D29). Operator can still
flip the toggle off on StepMarketplace if they explicitly want a
private Sovereign.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Founder ruling 2026-05-16: tenant onboarding flow is part of the Sovereign DoD.
D27 — Marketplace enabled on the Sovereign (zero-touch from provision body)
D28 — Owner-tier voucher issuance (one-click, voucher mailed via Sovereign SMTP)
D29 — Voucher-redeem → org wizard → tenant namespace+RBAC+bootstrap (zero-touch)
D30 — Free-subdomain pool selection (omani.homes, omani.rest, omani.trades)
D31 — Tenant app with CNPG active-hot-standby cross-region replication
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chrootEnsureDeployment was synthesizing a Deployment with Result=nil.
The topology loader's buildLBs() returned [] on nil-Result → canvas
chip showed `LoadBalancer 0/0` on every chroot Sovereign Console
even though the Sovereign ingress LB was allocated and serving
console.<fqdn>.
Populate Result with LoadBalancerIP from `SOVEREIGN_LB_IP` env (set
by bp-catalyst-platform's sovereign-fqdn ConfigMap `lbIP` key per
issue #900 / PR #145). buildLBs then emits one LoadBalancer entry
per region using the canonical primary LB.
Caught on t131 2026-05-16 — DoD D15. Same chroot-synth-enrichment
pattern as PR #1534 (SOVEREIGN_REGIONS_JSON).
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two route trees claim `/app`:
1. `appRoute` (line 364) — mothership AppLayout chrome, prefix `/app`,
children `/app/$deploymentId/applications/*`, `/app/$deploymentId/
settings`, `/app/dashboard` (fleet view), etc. ~30 children.
2. `consoleAppDetailRoute` (line 1141, under consoleLayoutRoute) —
clean `/app/$componentId` for the chroot Sovereign Console's
per-app detail.
On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign')
the operator clicks `/apps/<card>` → AppCard generates HREF
`/app/<name>` (AppsPage.tsx line ~720, correct for chroot context).
TanStack router resolves to the MOTHERSHIP `appRoute` because it
matches first (registered earlier under rootRoute) and its
children accept `<name>` as $deploymentId. The page renders
AppLayout chrome + AppsPage with mothership sidebar — looks
nothing like AppDetail.
Founder observation (BUG-002 from /tmp/test-matrix-t129.json + reported
on t131 2026-05-16):
> Application individual pages are not visible at all in the child
> while mothership doesn't have that issue, this is the biggest blunder!
Fix: `appRoute.beforeLoad` redirects on chroot:
- `/app/<componentId>` → `/<componentId>` (caught by consoleAppDetailRoute)
- `/app/dashboard`, `/app/install`, `/app/sre/*`, `/app/sec/*`, `/app/blueprints`
→ `/dashboard` (canonical Sovereign landing; these are mothership-only
surfaces — already partially fixed at dashboardRoute level by PR #1547)
Mothership behavior unchanged (DETECTED_MODE.mode !== 'sovereign'
falls through to the existing AppLayout-rooted tree).
Refs DoD D17b. Caught on t131 (623354058b114dd6, 2026-05-16).
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The substitute `${SOVEREIGN_REGIONS_JSON:-}` produces valid JSON like
`[{"cloudRegion":"hel1","controlPlaneSize":"cpx52",...}]`. Unquoted in
the slot-13 YAML, the YAML parser interprets it as a flow-sequence
of flow-mappings, parsing into Go `[]map[string]interface{}`. Helm
chart template `{{ .Values.sovereign.regionsJson }}` then stringifies
via `%v` printf, producing Go map syntax:
[map[cloudRegion:hel1 controlPlaneSize:cpx52 ...]]
The chroot catalyst-api's `chrootRegionsFromEnv` calls
json.Unmarshal which fails → Request.Regions stays empty → topology
loader falls back to live-Nodes path → /cloud renders "1 region 1
cluster" on every multi-region Sovereign.
Caught on t131 (623354058b114dd6, 2026-05-16) — DoD D5.
Fix: single-quote the substitute so YAML treats it as a string literal,
preserving the JSON byte-for-byte.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1537 set `use-private-ip: "true"` on the clustermesh-apiserver
Service annotations. CCM rejected with:
ReconcileHCLBTargets: use private ip: missing network id
The per-region Hetzner LB allocated by CCM has no private-network
attachment by default (LB private_net is empty), so it can't route
to the backend's private IP. Result: LB never allocated, clustermesh
apiserver Service stays `<pending>`, orchestrator waits 5min and
bails with empty peerEntries. Caught on t130 (30463cd0a5a931be,
2026-05-16).
PR #1538's canonical fix opens TCP 30000-32767 in the Hetzner
firewall so the public-IP LB→backend health checks pass. Revert
use-private-ip to false so the chain works end-to-end.
Refs DoD D11.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sovereign Console pages reference Inter + JetBrains Mono fonts via
fonts.googleapis.com (index.html lines 9, 11). The nginx CSP only
allowed font-src 'self' data: — so the browser blocked the font
stylesheet AND the woff2 fetches, falling back to system fonts.
Add fonts.googleapis.com to style-src (for the @import CSS) and
fonts.gstatic.com to font-src (for the woff2 assets). All 3 CSP
occurrences in nginx.conf updated identically.
Alternative considered: self-host the woff2 + drop the external
references. Skipped for now — sticking with Google Fonts CDN is
faster + matches every other web app's posture. If the operator
wants air-gap-compatible Sovereigns later, switch to self-hosted.
Caught on t129 2026-05-16 — DoD D26.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-region operators on the Sovereign Console couldn't pivot the
/dashboard treemap by region or vCluster. The TreemapDimension
union (FE) and dashboardDimension set (BE) only included
sovereign/cluster/family/namespace/application.
This PR:
- Adds 'region' + 'vcluster' to TreemapDimension type
(products/catalyst/bootstrap/ui/src/lib/treemap.types.ts)
- Adds them to the dimension select options
(products/catalyst/bootstrap/ui/src/components/TreemapLayerController.tsx)
- Adds them to the validated set in dashboard.go
- Adds podRow.region + podRow.vcluster fields populated from
openova.io/region and catalyst.openova.io/vcluster-role labels
- Extends dimensionKey switch to bucket by these new dimensions
(fallback: region→cluster, vcluster→"host")
Caught on t129 2026-05-16 — DoD D16. Note that full multi-cluster
fan-out (aggregating pods across all 3 region kubeconfigs into one
treemap) is a separate refactor not included here; this PR delivers
the dimension surface so the layer selector is usable + a fresh prov
with the chroot's k8scache extended to multi-region will render
3 cluster bubbles when the operator picks Layer-1=cluster.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three operator-visible SPA routing bugs caught on live t129 Sovereign
Console (t129.omani.works, 2026-05-16). Closes#1546.
BUG-001 (D19) — doubled /app/bp-bp-* href on 10 of 44 app cards.
build-catalog.mjs::listBootstrapKit extracted slug from `NN-(.+)\.yaml`
without stripping an optional `bp-` already present in some filenames
(e.g. `13-bp-catalyst-platform.yaml`). The captured slug became
`bp-catalyst-platform`, then `id: \`bp-${slug}\`` doubled it to
`bp-bp-catalyst-platform`, breaking the FE↔BE HR-name join and
printing the doubled prefix on the AppsPage card href. Fix: strip a
leading `bp-` from the captured slug before forming the canonical id.
Regenerated catalog.generated.ts + blueprints.json — 10 entries
collapse to their single-prefix canonical form (bp-catalyst-platform,
bp-cert-manager-powerdns-webhook, bp-k8s-ws-proxy, bp-guacamole,
bp-dmz-vcluster, bp-hcloud-ccm, bp-openova-flow-server,
bp-openova-flow-emitter, bp-mgmt-vcluster, bp-rtz-vcluster).
BUG-015 (D23, extends D0) — PIN-verify lands /wizard on Sovereign.
VerifyPinPage default landing was `/wizard` regardless of operating
mode. On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign'
the operator has just been auto-redirected from the mothership
handover URL; their Sovereign is already converged. Routing them to
the new-prov wizard re-prompts for org details and contradicts D0.
Fix: branch on DETECTED_MODE.mode — `/dashboard` on sovereign,
`/wizard` on catalyst-zero. Mothership flow unchanged. Test:
VerifyPinPage.test.tsx asserts the 3 cases (sovereign default,
catalyst-zero default, explicit next= override).
BUG-016 (D24) — /app/dashboard exposes mothership fleet view.
appRoute's `/dashboard` child mounts DashboardPage (multi-Sovereign
fleet, "7 Sovereigns" with duplicate rows). On a Sovereign Console
this surface MUST NOT be reachable — the Sovereign owns ONE deployment,
fleet is mothership-only. Fix: beforeLoad on dashboardRoute redirects
to `/dashboard` (consoleDashboardRoute, the per-Sovereign landing)
when DETECTED_MODE.mode === 'sovereign'. Mothership keeps the fleet
view as today.
Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D19/D23/D24,
/tmp/test-matrix-t129.json discoveries BUG-001/015/016.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Chart values.yaml ships `gitea.gitea.config.server.DOMAIN = gitea.catalyst.local`
+ `ROOT_URL = https://gitea.catalyst.local` — the bootstrap dev hostname.
Without per-Sovereign override, Gitea's Web UI rendered the dev
hostname in pageData.appUrl, internal links, and `git clone` URLs.
Operators on every freshly-provisioned Sovereign were shown a
gitea.catalyst.local hostname that public DNS can't resolve.
Slot 10-gitea Kustomization adds the per-Sovereign override:
gitea.gitea.config.server.DOMAIN: gitea.${SOVEREIGN_FQDN}
gitea.gitea.config.server.ROOT_URL: https://gitea.${SOVEREIGN_FQDN}
Caught on t129 2026-05-16 — DoD D25.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test-plan-author agent walked the live t129 Sovereign UI and discovered
22 bugs across 11 categories. 8 propose new DoD gates beyond D0-D18:
- D19: Apps + Cloud counter consistency (44 vs 36 mismatch; vCluster=0/0,
LoadBalancer=0/0, Bucket=0/0, Volume=0/0; PVC 66 vs 33; doubled
/app/bp-bp-* hrefs on 10/44 cards)
- D20: Jobs page region-prefix visibility + per-region filter
- D21: Operator pre-populated as owner-tier on /users
- D22: Settings shows real values (no "—" / "API PENDING")
- D23: Post-handover lands /dashboard, not /wizard
- D24: Mothership-only views (fleet view, "+ New deployment") absent
from Sovereign Console
- D25: All operator-facing service hostnames reachable + no
`gitea.catalyst.local` dev refs in HTML
- D26: CSP allows fonts (or self-host woff2)
Matrix: /tmp/test-matrix-t129.json — 224 TCs covering every operator
surface, 123 P0 / 52 P1 / 49 P2.
Per founder ruling: the DoD list is the convergence contract that
grows as test-writer / test-executor find more operator-visible bugs.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two notification spammers on the chroot Sovereign Console that produce
noise on every /apps + /app/<name> visit:
D17 — "Deployment id in the URL is malformed":
AppsPage.tsx fires on isDeploymentID(rawDeploymentId)=false. On the
chroot, useResolvedDeploymentId resolves to /api/v1/sovereign/self
which returns the synthesized canonical id `sovereign-<fqdn>` (26
chars, not hex). The notification claims that path-segment is
invalid even though there is no URL segment — the resolution path
is in-process. Suppress on DETECTED_MODE.mode === 'sovereign'.
D18 — "Per-component install monitoring is unavailable":
Fires on state.phase1WatchSkipped. On the chroot, phase1WatchSkipped
is a MOTHERSHIP-only concept (mother's observer pod failed to fetch
the new cluster's kubeconfig). The Sovereign-side catalyst-api runs
IN the cluster it's reporting on — has the in-cluster ServiceAccount
+ bundled sovereignDynamicClient + informer cache watching HelmReleases
natively. Firing this here tells operator to drop to kubectl when
the data is on the page. Suppress on chroot.
Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — DoD D17 + D18.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
loadVClusters() queried vcluster.io/v1alpha1 CRs only. Our bootstrap
topology ships loft-sh/vcluster as a plain Helm chart (StatefulSet +
Service, NO CRD installed) so the CR list is always empty on a
converged Sovereign → canvas `vCluster N/N` chip shows `0/0` even
though Pods are Running.
Add a fallback: enumerate Namespaces carrying
`catalyst.openova.io/vcluster-role` label (stamped by
bp-{mgmt,dmz,rtz}-vcluster's namespace template at PR #1526).
Emits one VCluster row per labeled namespace with role = the label
value. Status `healthy` since the namespace exists (operator-visible
Pod state is surfaced elsewhere).
Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — D15.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Founder ruling 2026-05-16: 100% DoD was premature. Real operator-visible
issues remain on t129 (omani.works 3×cpx52, 6cddff7ef4432bdc) after
the D0-D14 + A4 gates passed:
- D15: /cloud canvas shows vCluster 0/0 and LoadBalancer 0/0 despite
vCluster Pods Running + LBs allocated. Canvas adapter not reading
the live cluster state for these kinds.
- D16: /dashboard Layer-1=Cluster grouping renders single Sovereign,
not 3 cluster-grouped bubbles. Multi-region hierarchy collapse
broken at the dashboard level.
- D17: /app/<name> routes (e.g. /app/bp-cnpg) emit "Deployment id in
the URL is malformed (expected 16 lowercase hex characters; got 7)"
— the SPA router treats the app-name segment as a deployment-id.
Every application card click produces a notifications-drawer entry.
- D18: Sovereign-side catalyst-api can't fetch its own kubeconfig to
monitor Phase-1 install state. Operator is told "Use kubectl
directly to check Helm releases" — should be invisible.
DoD list explicitly grows per iteration as test-writer / test-executor
discover more operator-visible issues. The list is the convergence
contract.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tofu's `secondary_region_cluster_mesh_name` local at
infra/hetzner/main.tf:389 generates secondary names as
`<sovereign-stem>-<region-stem-no-digits>` (e.g. `t129-nbg`,
`t129-sin`). The bootstrap-kit slot 01-cilium.yaml renders
cilium-config cluster.name from this value via the
CLUSTER_MESH_NAME envsubst.
The orchestrator's clusterName derivation was wrong: it appended
`-<region-key>` to the primary's name (e.g. `t129-mesh-nbg1-1`),
which matched NEITHER the tofu scheme NOR the cilium-config value.
Caught on t129 (6cddff7ef4432bdc, 2026-05-16): TLS, etcd RBAC,
and connection all working after PRs #1530, #1536, #1538, #1539 —
but agent reported `failed to retrieve cluster configuration:
not found` for every secondary peer because it queried
`cilium/cluster-config/v1/t129-mesh-nbg1-1` against an etcd that
only had `t129-nbg`.
Fix: export `DeriveSecondaryClusterMeshName(req, rs)` that
mirrors tofu's local exactly, plus a `stripTrailingDigits` helper.
Orchestrator's buildRegionSlots uses this for secondaries; primary
keeps the `<stem>-mesh` shape.
Closes D11 incident chain: #1525 → #1528 → #1530 → #1536 → #1538
→ #1539 → this. With this PR landed t129's secondary→primary
connection already works (verified on live cluster — secondary
agents show "ready, 2 nodes, 113 endpoints, 326 identities");
primary→secondary will work on a fresh prov once the name match
is correct from the start.
Refs DoD D11.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The orchestrator was minting a fresh client cert (CN = local cluster
name) for each peer connection. Even with PR #1530's "sign with
peer's CA" fix the TLS handshake succeeded but etcd RBAC rejected:
error="etcdserver: permission denied"
Cilium's clustermesh-apiserver etcd has RBAC with a `remote` user
that has read access on the cilium/* prefix. The chart generates
`kube-system/clustermesh-apiserver-remote-cert` with CN=`remote`.
Canonical `cilium clustermesh connect` CLI copies THIS Secret's
tls.crt/tls.key as the client cert the REMOTE cluster presents —
matches the etcd RBAC user verbatim.
This PR adopts that pattern: snapshotRemoteCert() reads the peer's
existing `clustermesh-apiserver-remote-cert` Secret, returns
tls.crt + tls.key bytes, and the orchestrator writes them into
A's `cilium-clustermesh` Secret instead of minting.
Caught on t129 (6cddff7ef4432bdc, 2026-05-16):
- TLS handshake succeeded after firewall fix (PR #1538) opened
NodePort range so LB→backend health check passed
- cilium-dbg status reported `etcd: 1/1 connected, has-quorum=true`
(TLS path working)
- BUT `remote configuration: expected=true, retrieved=false` and
agent logs spammed `etcdserver: permission denied`
With this PR's CN=remote cert, etcd authorizes the kvstore List
and clustermesh sync completes — agent should flip to
`2/2 remote clusters ready`.
Completes the D11 chain: #1525 (regionKeyFromSpec) → #1528
(clusterName derivation) → #1530 (cert with peer's CA — no longer
needed but kept as defense-in-depth) → #1536 (hostAlias pattern)
→ #1538 (firewall NodePort range) → this.
Refs DoD D11.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #1537's use-private-ip approach was not viable: the per-region
Hetzner LB has no private-network attachment by default (LB private_net
is empty) and our DoD A2 architecture pins one private /24 per region
that does NOT span across regions. The LB->backend hop has to transit
the public path.
The actual blocker is the Sovereign firewall: it permits 80/443/6443/53
and blocks the NodePort range. Hetzner LB TCP health-check probes
`<node-public-ip>:<NodePort>` and gets dropped → all targets marked
unhealthy → external clients see "unexpected eof while reading" at
TLS handshake → cilium clustermesh agent stays `0/N remote clusters
ready, Waiting for initial connection`.
Security: clustermesh-apiserver requires mTLS. Peer agents must present
a client cert signed by the peer cluster's cilium-ca (PR #1530).
Anonymous connections rejected at handshake. mTLS is the security
boundary, NOT the firewall — opening NodePorts is safe here.
Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the D11
incident chain (#1525 → #1528 → #1530 → #1536 → this).
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hetzner firewall on Sovereign nodes permits public ingress on
80, 443, 6443, 53/tcp+udp, icmp, and wg-udp (51871). The NodePort
range (30000-32767) is BLOCKED — that's the security posture
(privileged Sovereign workloads should not be reachable on
arbitrary NodePorts from the internet).
Hetzner LB TCP health checks probe `<node-ip>:<destination_port>`
where destination_port is the Service NodePort. With public-IP
transit the probe goes through the firewall and gets dropped.
All 3 clustermesh LB targets in t129 reported
`health_status=unhealthy` because of this. With no healthy
targets the LB refuses connections — external clients see
"unexpected eof while reading" at TLS handshake time. Cilium
agent stays `0/N remote clusters ready, Waiting for initial
connection`.
Fix: `load-balancer.hetzner.cloud/use-private-ip: "true"` so the
LB → backend connection transits the per-region private network
(10.0.1.0/24). External clients still connect to the LB's PUBLIC
IP — this annotation only controls the LB→backend hop, which is
in-region anyway (one LB per region, points at that region's CP
node).
Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the
D11 chain that began with PR #1525 (regionKeyFromSpec), continued
through PR #1528 (clusterName derivation), PR #1530 (peer cert
signed by peer's CA), and PR #1536 (hostAlias pattern). With
this PR's traffic-path fix landed, the LB→backend hop should
succeed and the chain becomes end-to-end working.
Refs DoD D11.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>