openova/core/controllers
e3mrah e83d08ea4e
feat(sandbox+tenant): CNPG active-hot-standby (ReplicaCluster) default for marketplace tenants when SOVEREIGN_ENABLE_HOT_STANDBY=true (#1661)
Sovereign DoD D31 — CNPG-backed apps must replicate across the
Sovereign's regions when the operator opts in. PR #1562 wired this
into bp-wordpress-tenant chart-level. This change extends the same
toggle across BOTH user-facing paths:

1. Marketplace tenant flow (sme_tenant_gitops.go)
   - smeTenantTemplateData gains EnableHotStandby/PrimaryRegion/
     ReplicaRegion. renderSMETenantOverlay reads them from the
     catalyst-api Pod env (SOVEREIGN_ENABLE_HOT_STANDBY +
     SOVEREIGN_PRIMARY_REGION + SOVEREIGN_REPLICA_REGION).
   - Bp-wordpress-tenant HelmRelease emits pg.activeHotStandby.*
     when the trio is valid; bp-wordpress-tenant chart 0.2.0+
     (PR #1562) renders the primary + replica Cluster CR pair.
   - Defence-in-depth: degenerate inputs (empty/identical regions)
     fall back to single-Cluster shape rather than emitting a
     HelmRelease the chart's validateActiveHotStandbyRegions helper
     would fail at template time.

2. Sandbox plane (sandbox.db.provision)
   - Env struct + NewEnvFromOS read the same Sovereign-level trio.
   - sandbox.db.provision emits a primary + replica Cluster CR pair
     when hotStandbyActive() — same shape bp-cnpg-pair renders for
     marketplace apps + bp-wordpress-tenant cnpg-cluster.yaml: WAL
     streaming via spec.managed.services.additional annotated
     service.cilium.io/global=true, nodeAffinity pinning each side
     to its declared region, replica.enabled=true with externalCluster
     resolving the primary through the ClusterMesh-global Service alias.
   - Best-effort rollback if the replica Create fails so the operator
     never sees an orphan primary.

3. Plumbing (one knob, both paths)
   - catalyst chart: values.sovereign.{enableHotStandby,primaryRegion,
     replicaRegion} -> sovereign-fqdn ConfigMap keys -> catalyst-api env.
   - sandbox chart: cnpg.activeHotStandby.{enabled,primaryRegion,
     replicaRegion} -> controller env -> per-Sandbox MCP Pod env.
   - Bootstrap-kit slot 13 + slot 19a wire SOVEREIGN_ENABLE_HOT_STANDBY/
     SOVEREIGN_PRIMARY_REGION/SOVEREIGN_REPLICA_REGION envsubst
     placeholders to BOTH chart paths so the operator flips one knob
     on the per-Sovereign overlay and gets HA across the marketplace
     tenant install AND the sandbox.db plane.

Default empty/false: every Sovereign that has not opted in keeps
rendering single-Cluster CNPG (zero regression).

gitlab-tenant + nextcloud-tenant charts: NOT shipped in this repo
today, so they are out of scope. When they land they can copy the
same value contract (pg.activeHotStandby.*) and the gitops writer
wiring already handles them — no chart-bump or controller change
required.

Tests
- sme_tenant_active_hot_standby_test.go: 8 cases (off, on-happy-path,
  degenerate matrix incl. empty primary, empty replica, identical
  regions, toggle off with regions).
- sandbox_db_hot_standby_test.go: 11 cases covering hotStandbyActive
  matrix + replicaClusterName/replicationServiceName suffix rules +
  full primary + replica CR shapes (nodeAffinity, switchover, managed
  service, externalClusters).
- platform/wordpress-tenant/chart/tests/active-hot-standby-render.sh
  still passes (5/5 gates green).
- catalyst-api SMETenant suite GREEN.
- sandbox-controller suite GREEN.
- helm template clean for sandbox chart (HA + default-off) and
  catalyst chart (sovereign-fqdn-configmap + api-deployment).

Hard rules respected: READ-ONLY clusters, no Chart.yaml bump on
bp-catalyst-platform (envsubst-only wiring change in slot 13), no
host-cluster touch outside the chart-level seam.

Refs DoD D31.

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 12:53:57 +04:00
..
application fix(chart,api,controllers,ui): qa-loop iter-11 Fix #45 — three-cluster closeout (#1265) 2026-05-10 07:26:05 +04:00
blueprint feat(catalog): catalog-svc HTTP REST service + chart wiring (slice L1+L2, #1097) (#1148) 2026-05-09 04:04:52 +04:00
continuum feat(z): cross-EPIC follow-ups — lastLuaRecord + fleet alerts + edit-pr (#1095/#1096/#1099/#1101) (#1170) 2026-05-09 11:54:06 +04:00
environment fix(controllers): create per-Org/App Gitea repos as PUBLIC (Fix #42 follow-up) (#1260) 2026-05-10 04:44:35 +04:00
internal feat(catalyst-ui): live install flow — useCatalog + InstallForm + /applications + preview (slice I, #1097) (#1152) 2026-05-09 05:19:50 +04:00
organization fix(org-controller): render per-tenant HTTPRoute so <slug>.omani.homes serves traffic (#1644) 2026-05-18 11:32:54 +04:00
pkg feat(sandbox-mcp): gitea.pr.create/merge + issue.* + k8s.read.logs (was stubs) (#1656) 2026-05-18 12:12:41 +04:00
sandbox feat(sandbox+tenant): CNPG active-hot-standby (ReplicaCluster) default for marketplace tenants when SOVEREIGN_ENABLE_HOT_STANDBY=true (#1661) 2026-05-18 12:53:57 +04:00
useraccess feat(useraccess-controller): tier-aware RoleBinding emission + developer scope auto-injection (slice T3 + C5-followup, #1098) (#1145) 2026-05-09 03:42:32 +04:00
go.mod feat(continuum): K-Cont-2 — reconciler with lease + CNPG status watch + 7-step switchover sequence + audit emit (#1101) (#1155) 2026-05-09 06:45:34 +04:00
go.sum feat(continuum): K-Cont-2 — reconciler with lease + CNPG status watch + 7-step switchover sequence + audit emit (#1101) (#1155) 2026-05-09 06:45:34 +04:00