Commit Graph

2204 Commits

Author SHA1 Message Date
e3mrah
6618392407
fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22) (#1568)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)

PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.

toJson normalises both string and list inputs to valid JSON.

Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): populate deployment Result + Request fields for D22

Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.

Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env

Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.

Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22)

GetDeployment was the only handler that returned 404 without calling
chrootEnsureDeployment. After a catalyst-api Pod restart on the chroot
the in-memory store is empty until some other handler (StreamLogs,
jobs list) primes it via its own synth call — meanwhile the Sovereign
Console Settings page loads /api/v1/deployments/<id> first and gets
404, rendering the entire page broken.

Mirror the StreamLogs pattern (lines 1247-1254): try in-memory load,
fall through to chrootEnsureDeployment, return 404 only when both miss.

This unblocks PR #1567's deployment-record population — without the
fallback, GetDeployment can never serve the populated record on chroot.

Caught live on t132 2026-05-16 after #1567 image roll: Settings page
404 because in-memory store was empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:54:20 +04:00
github-actions[bot]
b094a354b7 deploy: update catalyst images to ed63ecd 2026-05-16 20:31:39 +00:00
e3mrah
ed63ecd09f
fix(chroot): populate deployment Result + Request fields for D22 settings (#1567)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)

PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.

toJson normalises both string and list inputs to valid JSON.

Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): populate deployment Result + Request fields for D22

Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.

Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env

Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.

Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:29:44 +04:00
github-actions[bot]
d82e06bfe9 deploy: update catalyst images to 0a45fb0 2026-05-16 20:03:41 +00:00
e3mrah
0a45fb0449
fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5) (#1566)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)

PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.

toJson normalises both string and list inputs to valid JSON.

Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:01:43 +04:00
e3mrah
3f8e2b925e
chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked) (#1565)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:58:46 +04:00
github-actions[bot]
f8c8a87151 deploy: update catalyst images to 8d2a947 2026-05-16 19:51:40 +00:00
e3mrah
8d2a947cfb
feat(handover): auto-seed owner UserAccess CR on chroot (D21) (#1564)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-16 23:49:32 +04:00
e3mrah
5510ab91f9
chore(slot-13): pin bp-catalyst-platform to 1.4.146 (D29 billing JWT bypass) (#1563)
PR #1561 added billing-service JWT exemptions matching gateway public
routes (D29 voucher-redeem zero-touch). Pin slot so future provisions
inherit the full D29 unblocker chain.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:41:34 +04:00
github-actions[bot]
d6b6aca581 deploy: update sme service images to c04b2ec + bump chart to 1.4.147 2026-05-16 19:41:18 +00:00
e3mrah
c04b2ec76d
feat(wordpress-tenant): activeHotStandby option wires bp-cnpg-pair (D31) (#1562)
Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app
may opt into a cross-region active-hot-standby Postgres pair for their
WordPress instance instead of the default single CNPG Cluster.

Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster
CRs with WAL streaming over Cilium ClusterMesh via a managed Service
annotated service.cilium.io/global=true). When the new
pg.activeHotStandby.enabled flag is false (default), templates render
the existing single Cluster bit-for-bit — no regression for non-HA
tenants.

Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace
HA filter can surface it.

Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default
single-cluster shape AND the enabled 2-Cluster shape with the right
nodeSelectors, replica.source, externalCluster.host, Cilium global
annotation, and bootstrap.pg_basebackup; all 5 cases pass.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:39:29 +04:00
github-actions[bot]
af4d9b1b87 deploy: update sme service images to f9ed292 + bump chart to 1.4.146 2026-05-16 19:29:50 +00:00
e3mrah
f9ed292198
fix(billing): /redeem-preview + plans + addons bypass JWT (D29) (#1561)
* chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes)

PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as
public gateway routes — required for the marketplace /redeem zero-touch
flow. Pin the slot so future provisions inherit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(billing): /redeem-preview + plans + addons bypass JWT (D29)

Mirror PR #1559's gateway public routes in the billing service's own
middleware chain. The gateway now lets these requests through without
an Authorization header (D29 voucher-redeem landing), but billing
service's main.go was JWT-gating EVERY /billing/* path except
/billing/webhook — so the request still got 401, just one hop later.

Caught live on t132 2026-05-16 after PR #1559 rolled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:28:48 +04:00
e3mrah
936e76f79a
chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes) (#1560)
PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as
public gateway routes — required for the marketplace /redeem zero-touch
flow. Pin the slot so future provisions inherit it.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:25:14 +04:00
github-actions[bot]
696aa26f83 deploy: update sme service images to a11067d + bump chart to 1.4.145 2026-05-16 19:18:09 +00:00
e3mrah
a11067da1a
fix(gateway): /redeem-preview + plans + addons must be public (D29) (#1559)
* feat(billing+notification): wire voucher-issued email (D28)

D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.

Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.

Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.

Changes:

  notification/templates/templates.go
    + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
      validityHint) — renders code prominently, redeem button to
      https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
      always supplied by caller, NEVER hardcoded.

  notification/handlers/handlers.go
    + renderTemplate("voucher-issued") case parsing
      {code, credit_omr, description, sovereign_fqdn, validity_hint}.
    + Default subject "You've been gifted a voucher for OpenOva SME".

  billing/handlers/handlers.go
    + Handler fields: NotificationURL, SovereignFQDN, NotificationClient.

  billing/handlers/vouchers.go
    + issueVoucherRequest = store.PromoCode + RecipientEmail (request-
      only; never persisted).
    + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
      timeout. Best-effort: a non-2xx or transport error logs but does
      NOT fail the IssueVoucher response, because the row is already
      persisted and re-issuing the same code re-fires the email.
    + Re-issue semantics (#91 resurrects soft-deleted rows) extend to
      the email path — documented in the handler comment.

  billing/main.go
    + Reads NOTIFICATION_SERVICE_URL (default
      http://notification.sme.svc.cluster.local:8087/notification/send)
      and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.

  products/catalyst/chart/templates/sme-services/billing.yaml
    + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
      SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
      hardcoded) into the billing Deployment.

Tests:

  notification/handlers/handlers_test.go (new)
    + TestRenderTemplate_VoucherIssued: rendered HTML contains code +
      credit + a redeem URL built from the supplied FQDN; never falls
      back to marketplace.openova.io.
    + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
      + TestRenderTemplate_UnknownTemplate as guard rails.

  billing/handlers/vouchers_test.go
    + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
      tripper sees the POST to notification with the right URL +
      template + data (code upper-cased, credit_omr, sovereign_fqdn,
      description) when recipient_email is set.
    + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
      call when recipient is empty.
    + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
      operator gets 200 even when notification returns 500.
    + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): admin pod uses dedicated image tag (D27 SME stack)

t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.

Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): pin bp-catalyst-platform to 1.4.144

PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override)
landed and Blueprint Release packaged 1.4.144. Pin the slot file so
future provisions get the latest chart by default — t132 manually
upgraded via kubectl patch but t133+ will inherit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gateway): /redeem-preview + plans + addons must be public (D29)

The marketplace /redeem?code=XXX landing page calls
/api/billing/vouchers/redeem-preview unauthenticated per docs/FRANCHISE-
MODEL.md §3, but the gateway's catch-all /api/billing/ entry was
returning 401 to it — breaking the entire voucher-redeem zero-touch
flow that D29 depends on.

Also expose /api/billing/plans and /api/billing/addons so the
marketplace landing can render pricing without a session.

Caught live on t132 2026-05-16 — every /redeem call returned 401.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:17:04 +04:00
e3mrah
27bd5d486d
chore(slot-13): pin bp-catalyst-platform to 1.4.144 (#1558)
* feat(billing+notification): wire voucher-issued email (D28)

D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.

Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.

Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.

Changes:

  notification/templates/templates.go
    + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
      validityHint) — renders code prominently, redeem button to
      https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
      always supplied by caller, NEVER hardcoded.

  notification/handlers/handlers.go
    + renderTemplate("voucher-issued") case parsing
      {code, credit_omr, description, sovereign_fqdn, validity_hint}.
    + Default subject "You've been gifted a voucher for OpenOva SME".

  billing/handlers/handlers.go
    + Handler fields: NotificationURL, SovereignFQDN, NotificationClient.

  billing/handlers/vouchers.go
    + issueVoucherRequest = store.PromoCode + RecipientEmail (request-
      only; never persisted).
    + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
      timeout. Best-effort: a non-2xx or transport error logs but does
      NOT fail the IssueVoucher response, because the row is already
      persisted and re-issuing the same code re-fires the email.
    + Re-issue semantics (#91 resurrects soft-deleted rows) extend to
      the email path — documented in the handler comment.

  billing/main.go
    + Reads NOTIFICATION_SERVICE_URL (default
      http://notification.sme.svc.cluster.local:8087/notification/send)
      and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.

  products/catalyst/chart/templates/sme-services/billing.yaml
    + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
      SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
      hardcoded) into the billing Deployment.

Tests:

  notification/handlers/handlers_test.go (new)
    + TestRenderTemplate_VoucherIssued: rendered HTML contains code +
      credit + a redeem URL built from the supplied FQDN; never falls
      back to marketplace.openova.io.
    + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
      + TestRenderTemplate_UnknownTemplate as guard rails.

  billing/handlers/vouchers_test.go
    + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
      tripper sees the POST to notification with the right URL +
      template + data (code upper-cased, credit_omr, sovereign_fqdn,
      description) when recipient_email is set.
    + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
      call when recipient is empty.
    + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
      operator gets 200 even when notification returns 500.
    + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): admin pod uses dedicated image tag (D27 SME stack)

t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.

Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): pin bp-catalyst-platform to 1.4.144

PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override)
landed and Blueprint Release packaged 1.4.144. Pin the slot file so
future provisions get the latest chart by default — t132 manually
upgraded via kubectl patch but t133+ will inherit it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:12:34 +04:00
github-actions[bot]
48eb653f79 deploy: update sme service images to 1fe7067 + bump chart to 1.4.144 2026-05-16 19:05:51 +00:00
e3mrah
7c3724591c
fix(chart): admin pod uses dedicated image tag (D27 SME stack) (#1557)
* feat(billing+notification): wire voucher-issued email (D28)

D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.

Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.

Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.

Changes:

  notification/templates/templates.go
    + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
      validityHint) — renders code prominently, redeem button to
      https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
      always supplied by caller, NEVER hardcoded.

  notification/handlers/handlers.go
    + renderTemplate("voucher-issued") case parsing
      {code, credit_omr, description, sovereign_fqdn, validity_hint}.
    + Default subject "You've been gifted a voucher for OpenOva SME".

  billing/handlers/handlers.go
    + Handler fields: NotificationURL, SovereignFQDN, NotificationClient.

  billing/handlers/vouchers.go
    + issueVoucherRequest = store.PromoCode + RecipientEmail (request-
      only; never persisted).
    + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
      timeout. Best-effort: a non-2xx or transport error logs but does
      NOT fail the IssueVoucher response, because the row is already
      persisted and re-issuing the same code re-fires the email.
    + Re-issue semantics (#91 resurrects soft-deleted rows) extend to
      the email path — documented in the handler comment.

  billing/main.go
    + Reads NOTIFICATION_SERVICE_URL (default
      http://notification.sme.svc.cluster.local:8087/notification/send)
      and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.

  products/catalyst/chart/templates/sme-services/billing.yaml
    + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
      SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
      hardcoded) into the billing Deployment.

Tests:

  notification/handlers/handlers_test.go (new)
    + TestRenderTemplate_VoucherIssued: rendered HTML contains code +
      credit + a redeem URL built from the supplied FQDN; never falls
      back to marketplace.openova.io.
    + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
      + TestRenderTemplate_UnknownTemplate as guard rails.

  billing/handlers/vouchers_test.go
    + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
      tripper sees the POST to notification with the right URL +
      template + data (code upper-cased, credit_omr, sovereign_fqdn,
      description) when recipient_email is set.
    + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
      call when recipient is empty.
    + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
      operator gets 200 even when notification returns 500.
    + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): admin pod uses dedicated image tag (D27 SME stack)

t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` —
the SME services CI run for that mono-repo SHA published 10 services
but admin's image was missing from GHCR. Decouple admin's tag from
smeTag so a missing-build for one service doesn't wedge the SME stack.

Default to `3c2f7e4` (matches marketplaceApi + console, known-published).
When admin's UI changes, bump in lockstep with those.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:05:09 +04:00
e3mrah
1fe706769f
feat(billing+notification): wire voucher-issued email (D28) (#1556)
D28 of the Sovereign DoD requires that issuing a voucher emails it to
the recipient zero-touch. Today POST /billing/vouchers/issue persists
the PromoCode row but never notifies anyone — so a gifted voucher only
reaches its recipient if the operator manually sends the code over a
side channel. This wires sme-billing -> sme-notification so the email
fires automatically on every successful upsert that carries a
recipient_email field.

Architecture follows the existing notification-service seam:
sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/
notification/send with template=voucher-issued; sme-notification renders
the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is
added to billing, no stalwart-mail calls bypass notification.

Server-side only — the owner-UI for issuing vouchers (D28b) is a
separate PR.

Changes:

  notification/templates/templates.go
    + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN,
      validityHint) — renders code prominently, redeem button to
      https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN
      always supplied by caller, NEVER hardcoded.

  notification/handlers/handlers.go
    + renderTemplate("voucher-issued") case parsing
      {code, credit_omr, description, sovereign_fqdn, validity_hint}.
    + Default subject "You've been gifted a voucher for OpenOva SME".

  billing/handlers/handlers.go
    + Handler fields: NotificationURL, SovereignFQDN, NotificationClient.

  billing/handlers/vouchers.go
    + issueVoucherRequest = store.PromoCode + RecipientEmail (request-
      only; never persisted).
    + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s
      timeout. Best-effort: a non-2xx or transport error logs but does
      NOT fail the IssueVoucher response, because the row is already
      persisted and re-issuing the same code re-fires the email.
    + Re-issue semantics (#91 resurrects soft-deleted rows) extend to
      the email path — documented in the handler comment.

  billing/main.go
    + Reads NOTIFICATION_SERVICE_URL (default
      http://notification.sme.svc.cluster.local:8087/notification/send)
      and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client.

  products/catalyst/chart/templates/sme-services/billing.yaml
    + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and
      SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER
      hardcoded) into the billing Deployment.

Tests:

  notification/handlers/handlers_test.go (new)
    + TestRenderTemplate_VoucherIssued: rendered HTML contains code +
      credit + a redeem URL built from the supplied FQDN; never falls
      back to marketplace.openova.io.
    + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription
      + TestRenderTemplate_UnknownTemplate as guard rails.

  billing/handlers/vouchers_test.go
    + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round-
      tripper sees the POST to notification with the right URL +
      template + data (code upper-cased, credit_omr, sovereign_fqdn,
      description) when recipient_email is set.
    + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification
      call when recipient is empty.
    + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert:
      operator gets 200 even when notification returns 500.
    + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:04:46 +04:00
github-actions[bot]
9718ba2924 deploy: update catalyst images to 2fd4e3c 2026-05-16 18:26:16 +00:00
e3mrah
2fd4e3cbf4
feat(wizard): default marketplaceEnabled=true for D27 zero-touch (#1555)
Founder ruling 2026-05-16: D27 mandates that a fresh wizard provisions a
Sovereign already ready to host tenant orgs (D29). Operator can still
flip the toggle off on StepMarketplace if they explicitly want a
private Sovereign.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:24:09 +04:00
e3mrah
77c80c9728
docs(DoD): add D27-D31 (marketplace + voucher + tenant org + free subdomain + CNPG active-hot-standby) (#1554)
Founder ruling 2026-05-16: tenant onboarding flow is part of the Sovereign DoD.

D27 — Marketplace enabled on the Sovereign (zero-touch from provision body)
D28 — Owner-tier voucher issuance (one-click, voucher mailed via Sovereign SMTP)
D29 — Voucher-redeem → org wizard → tenant namespace+RBAC+bootstrap (zero-touch)
D30 — Free-subdomain pool selection (omani.homes, omani.rest, omani.trades)
D31 — Tenant app with CNPG active-hot-standby cross-region replication

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:18:55 +04:00
github-actions[bot]
564fe4f4e5 deploy: update catalyst images to 9f096b0 2026-05-16 18:01:02 +00:00
e3mrah
9f096b0b18
fix(chroot): populate Result.LoadBalancerIP so canvas shows LB chip (D15) (#1553)
chrootEnsureDeployment was synthesizing a Deployment with Result=nil.
The topology loader's buildLBs() returned [] on nil-Result → canvas
chip showed `LoadBalancer 0/0` on every chroot Sovereign Console
even though the Sovereign ingress LB was allocated and serving
console.<fqdn>.

Populate Result with LoadBalancerIP from `SOVEREIGN_LB_IP` env (set
by bp-catalyst-platform's sovereign-fqdn ConfigMap `lbIP` key per
issue #900 / PR #145). buildLBs then emits one LoadBalancer entry
per region using the canonical primary LB.

Caught on t131 2026-05-16 — DoD D15. Same chroot-synth-enrichment
pattern as PR #1534 (SOVEREIGN_REGIONS_JSON).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:58:53 +04:00
github-actions[bot]
dd9b631740 deploy: update catalyst images to 124ac13 2026-05-16 17:58:31 +00:00
e3mrah
124ac13c1d
fix(router): chroot Sovereign /app/<name> resolves to AppDetail, not mothership AppsPage (D17b) (#1552)
Two route trees claim `/app`:

1. `appRoute` (line 364) — mothership AppLayout chrome, prefix `/app`,
   children `/app/$deploymentId/applications/*`, `/app/$deploymentId/
   settings`, `/app/dashboard` (fleet view), etc. ~30 children.
2. `consoleAppDetailRoute` (line 1141, under consoleLayoutRoute) —
   clean `/app/$componentId` for the chroot Sovereign Console's
   per-app detail.

On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign')
the operator clicks `/apps/<card>` → AppCard generates HREF
`/app/<name>` (AppsPage.tsx line ~720, correct for chroot context).
TanStack router resolves to the MOTHERSHIP `appRoute` because it
matches first (registered earlier under rootRoute) and its
children accept `<name>` as $deploymentId. The page renders
AppLayout chrome + AppsPage with mothership sidebar — looks
nothing like AppDetail.

Founder observation (BUG-002 from /tmp/test-matrix-t129.json + reported
on t131 2026-05-16):
> Application individual pages are not visible at all in the child
> while mothership doesn't have that issue, this is the biggest blunder!

Fix: `appRoute.beforeLoad` redirects on chroot:
- `/app/<componentId>` → `/<componentId>` (caught by consoleAppDetailRoute)
- `/app/dashboard`, `/app/install`, `/app/sre/*`, `/app/sec/*`, `/app/blueprints`
  → `/dashboard` (canonical Sovereign landing; these are mothership-only
  surfaces — already partially fixed at dashboardRoute level by PR #1547)

Mothership behavior unchanged (DETECTED_MODE.mode !== 'sovereign'
falls through to the existing AppLayout-rooted tree).

Refs DoD D17b. Caught on t131 (623354058b114dd6, 2026-05-16).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:56:31 +04:00
e3mrah
f88e60726c
fix(slot-13): single-quote SOVEREIGN_REGIONS_JSON to preserve JSON literal (D5) (#1551)
The substitute `${SOVEREIGN_REGIONS_JSON:-}` produces valid JSON like
`[{"cloudRegion":"hel1","controlPlaneSize":"cpx52",...}]`. Unquoted in
the slot-13 YAML, the YAML parser interprets it as a flow-sequence
of flow-mappings, parsing into Go `[]map[string]interface{}`. Helm
chart template `{{ .Values.sovereign.regionsJson }}` then stringifies
via `%v` printf, producing Go map syntax:

  [map[cloudRegion:hel1 controlPlaneSize:cpx52 ...]]

The chroot catalyst-api's `chrootRegionsFromEnv` calls
json.Unmarshal which fails → Request.Regions stays empty → topology
loader falls back to live-Nodes path → /cloud renders "1 region 1
cluster" on every multi-region Sovereign.

Caught on t131 (623354058b114dd6, 2026-05-16) — DoD D5.

Fix: single-quote the substitute so YAML treats it as a string literal,
preserving the JSON byte-for-byte.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:33:23 +04:00
e3mrah
7e87a4d7b9
fix(clustermesh-lb): revert use-private-ip to false (D11) (#1550)
PR #1537 set `use-private-ip: "true"` on the clustermesh-apiserver
Service annotations. CCM rejected with:

  ReconcileHCLBTargets: use private ip: missing network id

The per-region Hetzner LB allocated by CCM has no private-network
attachment by default (LB private_net is empty), so it can't route
to the backend's private IP. Result: LB never allocated, clustermesh
apiserver Service stays `<pending>`, orchestrator waits 5min and
bails with empty peerEntries. Caught on t130 (30463cd0a5a931be,
2026-05-16).

PR #1538's canonical fix opens TCP 30000-32767 in the Hetzner
firewall so the public-IP LB→backend health checks pass. Revert
use-private-ip to false so the chain works end-to-end.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:01:20 +04:00
github-actions[bot]
8980b727fb deploy: update catalyst images to fbe23da 2026-05-16 16:34:04 +00:00
e3mrah
fbe23da091
fix(ui-nginx): allow Google Fonts domains in CSP (D26) (#1549)
Sovereign Console pages reference Inter + JetBrains Mono fonts via
fonts.googleapis.com (index.html lines 9, 11). The nginx CSP only
allowed font-src 'self' data: — so the browser blocked the font
stylesheet AND the woff2 fetches, falling back to system fonts.

Add fonts.googleapis.com to style-src (for the @import CSS) and
fonts.gstatic.com to font-src (for the woff2 assets). All 3 CSP
occurrences in nginx.conf updated identically.

Alternative considered: self-host the woff2 + drop the external
references. Skipped for now — sticking with Google Fonts CDN is
faster + matches every other web app's posture. If the operator
wants air-gap-compatible Sovereigns later, switch to self-hosted.

Caught on t129 2026-05-16 — DoD D26.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:31:51 +04:00
github-actions[bot]
27556577f7 deploy: update catalyst images to 7845a00 2026-05-16 16:30:19 +00:00
e3mrah
7845a00799
fix(dashboard): add region + vcluster as TreemapDimensions (D16) (#1548)
Multi-region operators on the Sovereign Console couldn't pivot the
/dashboard treemap by region or vCluster. The TreemapDimension
union (FE) and dashboardDimension set (BE) only included
sovereign/cluster/family/namespace/application.

This PR:
- Adds 'region' + 'vcluster' to TreemapDimension type
  (products/catalyst/bootstrap/ui/src/lib/treemap.types.ts)
- Adds them to the dimension select options
  (products/catalyst/bootstrap/ui/src/components/TreemapLayerController.tsx)
- Adds them to the validated set in dashboard.go
- Adds podRow.region + podRow.vcluster fields populated from
  openova.io/region and catalyst.openova.io/vcluster-role labels
- Extends dimensionKey switch to bucket by these new dimensions
  (fallback: region→cluster, vcluster→"host")

Caught on t129 2026-05-16 — DoD D16. Note that full multi-cluster
fan-out (aggregating pods across all 3 region kubeconfigs into one
treemap) is a separate refactor not included here; this PR delivers
the dimension surface so the layer selector is usable + a fresh prov
with the chroot's k8scache extended to multi-region will render
3 cluster bubbles when the operator picks Layer-1=cluster.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:24:34 +04:00
github-actions[bot]
477bd0966f deploy: update catalyst images to 52015ff 2026-05-16 16:15:32 +00:00
e3mrah
52015ff468
fix(ui): t129 SPA routing — bp-bp- prefix, PIN /wizard leak, /app/dashboard fleet leak (#1547)
Three operator-visible SPA routing bugs caught on live t129 Sovereign
Console (t129.omani.works, 2026-05-16). Closes #1546.

BUG-001 (D19) — doubled /app/bp-bp-* href on 10 of 44 app cards.
  build-catalog.mjs::listBootstrapKit extracted slug from `NN-(.+)\.yaml`
  without stripping an optional `bp-` already present in some filenames
  (e.g. `13-bp-catalyst-platform.yaml`). The captured slug became
  `bp-catalyst-platform`, then `id: \`bp-${slug}\`` doubled it to
  `bp-bp-catalyst-platform`, breaking the FE↔BE HR-name join and
  printing the doubled prefix on the AppsPage card href. Fix: strip a
  leading `bp-` from the captured slug before forming the canonical id.
  Regenerated catalog.generated.ts + blueprints.json — 10 entries
  collapse to their single-prefix canonical form (bp-catalyst-platform,
  bp-cert-manager-powerdns-webhook, bp-k8s-ws-proxy, bp-guacamole,
  bp-dmz-vcluster, bp-hcloud-ccm, bp-openova-flow-server,
  bp-openova-flow-emitter, bp-mgmt-vcluster, bp-rtz-vcluster).

BUG-015 (D23, extends D0) — PIN-verify lands /wizard on Sovereign.
  VerifyPinPage default landing was `/wizard` regardless of operating
  mode. On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign'
  the operator has just been auto-redirected from the mothership
  handover URL; their Sovereign is already converged. Routing them to
  the new-prov wizard re-prompts for org details and contradicts D0.
  Fix: branch on DETECTED_MODE.mode — `/dashboard` on sovereign,
  `/wizard` on catalyst-zero. Mothership flow unchanged. Test:
  VerifyPinPage.test.tsx asserts the 3 cases (sovereign default,
  catalyst-zero default, explicit next= override).

BUG-016 (D24) — /app/dashboard exposes mothership fleet view.
  appRoute's `/dashboard` child mounts DashboardPage (multi-Sovereign
  fleet, "7 Sovereigns" with duplicate rows). On a Sovereign Console
  this surface MUST NOT be reachable — the Sovereign owns ONE deployment,
  fleet is mothership-only. Fix: beforeLoad on dashboardRoute redirects
  to `/dashboard` (consoleDashboardRoute, the per-Sovereign landing)
  when DETECTED_MODE.mode === 'sovereign'. Mothership keeps the fleet
  view as today.

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D19/D23/D24,
      /tmp/test-matrix-t129.json discoveries BUG-001/015/016.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:13:26 +04:00
e3mrah
d7b2c017f1
fix(gitea): override DOMAIN/ROOT_URL with SOVEREIGN_FQDN (D25) (#1545)
Chart values.yaml ships `gitea.gitea.config.server.DOMAIN = gitea.catalyst.local`
+ `ROOT_URL = https://gitea.catalyst.local` — the bootstrap dev hostname.
Without per-Sovereign override, Gitea's Web UI rendered the dev
hostname in pageData.appUrl, internal links, and `git clone` URLs.
Operators on every freshly-provisioned Sovereign were shown a
gitea.catalyst.local hostname that public DNS can't resolve.

Slot 10-gitea Kustomization adds the per-Sovereign override:
  gitea.gitea.config.server.DOMAIN: gitea.${SOVEREIGN_FQDN}
  gitea.gitea.config.server.ROOT_URL: https://gitea.${SOVEREIGN_FQDN}

Caught on t129 2026-05-16 — DoD D25.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:57:43 +04:00
e3mrah
9e47fd122a
docs(DoD): add D19-D26 — counters/jobs/users/settings/wizard/mothership-leak/hostnames/CSP (#1544)
Test-plan-author agent walked the live t129 Sovereign UI and discovered
22 bugs across 11 categories. 8 propose new DoD gates beyond D0-D18:

- D19: Apps + Cloud counter consistency (44 vs 36 mismatch; vCluster=0/0,
  LoadBalancer=0/0, Bucket=0/0, Volume=0/0; PVC 66 vs 33; doubled
  /app/bp-bp-* hrefs on 10/44 cards)
- D20: Jobs page region-prefix visibility + per-region filter
- D21: Operator pre-populated as owner-tier on /users
- D22: Settings shows real values (no "—" / "API PENDING")
- D23: Post-handover lands /dashboard, not /wizard
- D24: Mothership-only views (fleet view, "+ New deployment") absent
  from Sovereign Console
- D25: All operator-facing service hostnames reachable + no
  `gitea.catalyst.local` dev refs in HTML
- D26: CSP allows fonts (or self-host woff2)

Matrix: /tmp/test-matrix-t129.json — 224 TCs covering every operator
surface, 123 P0 / 52 P1 / 49 P2.

Per founder ruling: the DoD list is the convergence contract that
grows as test-writer / test-executor find more operator-visible bugs.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:52:53 +04:00
github-actions[bot]
1405275af9 deploy: update catalyst images to 2b3888e 2026-05-16 15:48:21 +00:00
e3mrah
2b3888eed5
fix(ui): suppress chroot-side false-positive notifications (D17, D18) (#1543)
Two notification spammers on the chroot Sovereign Console that produce
noise on every /apps + /app/<name> visit:

D17 — "Deployment id in the URL is malformed":
  AppsPage.tsx fires on isDeploymentID(rawDeploymentId)=false. On the
  chroot, useResolvedDeploymentId resolves to /api/v1/sovereign/self
  which returns the synthesized canonical id `sovereign-<fqdn>` (26
  chars, not hex). The notification claims that path-segment is
  invalid even though there is no URL segment — the resolution path
  is in-process. Suppress on DETECTED_MODE.mode === 'sovereign'.

D18 — "Per-component install monitoring is unavailable":
  Fires on state.phase1WatchSkipped. On the chroot, phase1WatchSkipped
  is a MOTHERSHIP-only concept (mother's observer pod failed to fetch
  the new cluster's kubeconfig). The Sovereign-side catalyst-api runs
  IN the cluster it's reporting on — has the in-cluster ServiceAccount
  + bundled sovereignDynamicClient + informer cache watching HelmReleases
  natively. Firing this here tells operator to drop to kubectl when
  the data is on the page. Suppress on chroot.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — DoD D17 + D18.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:46:25 +04:00
github-actions[bot]
f8c20137a5 deploy: update catalyst images to 536bfcb 2026-05-16 15:42:43 +00:00
e3mrah
536bfcb699
fix(infrastructure): vCluster fallback from namespace label (D15) (#1542)
loadVClusters() queried vcluster.io/v1alpha1 CRs only. Our bootstrap
topology ships loft-sh/vcluster as a plain Helm chart (StatefulSet +
Service, NO CRD installed) so the CR list is always empty on a
converged Sovereign → canvas `vCluster N/N` chip shows `0/0` even
though Pods are Running.

Add a fallback: enumerate Namespaces carrying
`catalyst.openova.io/vcluster-role` label (stamped by
bp-{mgmt,dmz,rtz}-vcluster's namespace template at PR #1526).
Emits one VCluster row per labeled namespace with role = the label
value. Status `healthy` since the namespace exists (operator-visible
Pod state is surfaced elsewhere).

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — D15.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:40:50 +04:00
e3mrah
7c7b6277e4
docs(DoD): add D15-D18 — canvas-accuracy/routing/self-monitoring (t129) (#1541)
Founder ruling 2026-05-16: 100% DoD was premature. Real operator-visible
issues remain on t129 (omani.works 3×cpx52, 6cddff7ef4432bdc) after
the D0-D14 + A4 gates passed:

- D15: /cloud canvas shows vCluster 0/0 and LoadBalancer 0/0 despite
  vCluster Pods Running + LBs allocated. Canvas adapter not reading
  the live cluster state for these kinds.
- D16: /dashboard Layer-1=Cluster grouping renders single Sovereign,
  not 3 cluster-grouped bubbles. Multi-region hierarchy collapse
  broken at the dashboard level.
- D17: /app/<name> routes (e.g. /app/bp-cnpg) emit "Deployment id in
  the URL is malformed (expected 16 lowercase hex characters; got 7)"
  — the SPA router treats the app-name segment as a deployment-id.
  Every application card click produces a notifications-drawer entry.
- D18: Sovereign-side catalyst-api can't fetch its own kubeconfig to
  monitor Phase-1 install state. Operator is told "Use kubectl
  directly to check Helm releases" — should be invisible.

DoD list explicitly grows per iteration as test-writer / test-executor
discover more operator-visible issues. The list is the convergence
contract.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:37:38 +04:00
github-actions[bot]
f010ca16a7 deploy: update catalyst images to 5b69247 2026-05-16 15:11:00 +00:00
e3mrah
5b69247135
fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540)
Tofu's `secondary_region_cluster_mesh_name` local at
infra/hetzner/main.tf:389 generates secondary names as
`<sovereign-stem>-<region-stem-no-digits>` (e.g. `t129-nbg`,
`t129-sin`). The bootstrap-kit slot 01-cilium.yaml renders
cilium-config cluster.name from this value via the
CLUSTER_MESH_NAME envsubst.

The orchestrator's clusterName derivation was wrong: it appended
`-<region-key>` to the primary's name (e.g. `t129-mesh-nbg1-1`),
which matched NEITHER the tofu scheme NOR the cilium-config value.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16): TLS, etcd RBAC,
and connection all working after PRs #1530, #1536, #1538, #1539 —
but agent reported `failed to retrieve cluster configuration:
not found` for every secondary peer because it queried
`cilium/cluster-config/v1/t129-mesh-nbg1-1` against an etcd that
only had `t129-nbg`.

Fix: export `DeriveSecondaryClusterMeshName(req, rs)` that
mirrors tofu's local exactly, plus a `stripTrailingDigits` helper.
Orchestrator's buildRegionSlots uses this for secondaries; primary
keeps the `<stem>-mesh` shape.

Closes D11 incident chain: #1525#1528#1530#1536#1538#1539 → this. With this PR landed t129's secondary→primary
connection already works (verified on live cluster — secondary
agents show "ready, 2 nodes, 113 endpoints, 326 identities");
primary→secondary will work on a fresh prov once the name match
is correct from the start.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:08:55 +04:00
github-actions[bot]
6b519b5573 deploy: update catalyst images to d0fd32d 2026-05-16 15:01:32 +00:00
e3mrah
d0fd32dc04
fix(clustermesh): use peer's clustermesh-apiserver-remote-cert (D11) (#1539)
The orchestrator was minting a fresh client cert (CN = local cluster
name) for each peer connection. Even with PR #1530's "sign with
peer's CA" fix the TLS handshake succeeded but etcd RBAC rejected:

    error="etcdserver: permission denied"

Cilium's clustermesh-apiserver etcd has RBAC with a `remote` user
that has read access on the cilium/* prefix. The chart generates
`kube-system/clustermesh-apiserver-remote-cert` with CN=`remote`.

Canonical `cilium clustermesh connect` CLI copies THIS Secret's
tls.crt/tls.key as the client cert the REMOTE cluster presents —
matches the etcd RBAC user verbatim.

This PR adopts that pattern: snapshotRemoteCert() reads the peer's
existing `clustermesh-apiserver-remote-cert` Secret, returns
tls.crt + tls.key bytes, and the orchestrator writes them into
A's `cilium-clustermesh` Secret instead of minting.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16):
- TLS handshake succeeded after firewall fix (PR #1538) opened
  NodePort range so LB→backend health check passed
- cilium-dbg status reported `etcd: 1/1 connected, has-quorum=true`
  (TLS path working)
- BUT `remote configuration: expected=true, retrieved=false` and
  agent logs spammed `etcdserver: permission denied`

With this PR's CN=remote cert, etcd authorizes the kvstore List
and clustermesh sync completes — agent should flip to
`2/2 remote clusters ready`.

Completes the D11 chain: #1525 (regionKeyFromSpec) → #1528
(clusterName derivation) → #1530 (cert with peer's CA — no longer
needed but kept as defense-in-depth) → #1536 (hostAlias pattern)
→ #1538 (firewall NodePort range) → this.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:58:22 +04:00
github-actions[bot]
1cfe0d758f deploy: update catalyst images to 1c988b9 2026-05-16 14:45:56 +00:00
e3mrah
1c988b9a4b
fix(firewall): open NodePort range 30000-32767 for clustermesh LB (D11) (#1538)
PR #1537's use-private-ip approach was not viable: the per-region
Hetzner LB has no private-network attachment by default (LB private_net
is empty) and our DoD A2 architecture pins one private /24 per region
that does NOT span across regions. The LB->backend hop has to transit
the public path.

The actual blocker is the Sovereign firewall: it permits 80/443/6443/53
and blocks the NodePort range. Hetzner LB TCP health-check probes
`<node-public-ip>:<NodePort>` and gets dropped → all targets marked
unhealthy → external clients see "unexpected eof while reading" at
TLS handshake → cilium clustermesh agent stays `0/N remote clusters
ready, Waiting for initial connection`.

Security: clustermesh-apiserver requires mTLS. Peer agents must present
a client cert signed by the peer cluster's cilium-ca (PR #1530).
Anonymous connections rejected at handshake. mTLS is the security
boundary, NOT the firewall — opening NodePorts is safe here.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the D11
incident chain (#1525#1528#1530#1536 → this).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:44:02 +04:00
e3mrah
e21f7bd0fc
fix(clustermesh-lb): use-private-ip=true for LB→backend transit (D11) (#1537)
Hetzner firewall on Sovereign nodes permits public ingress on
80, 443, 6443, 53/tcp+udp, icmp, and wg-udp (51871). The NodePort
range (30000-32767) is BLOCKED — that's the security posture
(privileged Sovereign workloads should not be reachable on
arbitrary NodePorts from the internet).

Hetzner LB TCP health checks probe `<node-ip>:<destination_port>`
where destination_port is the Service NodePort. With public-IP
transit the probe goes through the firewall and gets dropped.
All 3 clustermesh LB targets in t129 reported
`health_status=unhealthy` because of this. With no healthy
targets the LB refuses connections — external clients see
"unexpected eof while reading" at TLS handshake time. Cilium
agent stays `0/N remote clusters ready, Waiting for initial
connection`.

Fix: `load-balancer.hetzner.cloud/use-private-ip: "true"` so the
LB → backend connection transits the per-region private network
(10.0.1.0/24). External clients still connect to the LB's PUBLIC
IP — this annotation only controls the LB→backend hop, which is
in-region anyway (one LB per region, points at that region's CP
node).

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — completes the
D11 chain that began with PR #1525 (regionKeyFromSpec), continued
through PR #1528 (clusterName derivation), PR #1530 (peer cert
signed by peer's CA), and PR #1536 (hostAlias pattern). With
this PR's traffic-path fix landed, the LB→backend hop should
succeed and the chain becomes end-to-end working.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:36:54 +04:00
github-actions[bot]
bfc5a6143f deploy: update catalyst images to 83d771d 2026-05-16 14:13:28 +00:00