openova/clusters/_template/sovereign-tls/kustomization.yaml
e3mrah aa60cfb84e
fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601)
Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.

C9-006 — hcloud-volumes StorageClass missing on fresh prov
  Root cause: platform/hcloud-csi/chart/ existed but was never wired
  into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
  (rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
  Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
  adds templates/hcloud-token-secret.yaml so the controller can
  authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
  bp-cluster-autoscaler-hcloud (slot 50) wiring.

C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
  Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
  ListDeployments). On a steady-state fleet every Sovereign is adopted,
  so the dashboard rendered empty despite hundreds of succeeded jobs.
  Fix: remove the adopted-filter from collectFleetSovereigns (the
  fleet view's whole purpose is to enumerate every provisioned
  Sovereign). ListDeployments still applies the filter — it backs the
  provisioner's in-flight tab, a different surface. Adopted rows
  surface with Health=green when otherwise unknown.

C10-003 — per-region install-* Jobs stuck "pending" despite ready
  Root cause: lastState dedup in helmwatch_bridge — secondary
  watchers attaching AFTER an HR already settled at Installed never
  observed a state transition, so the seed value (HelmStatePending)
  never converged. Fix: at markPhase1Done(OutcomeReady), backfill
  every secondary watcher's informer snapshot into the shared
  jobs.Bridge via the idempotent SeedJobsFromInformerList path.
  Runs INLINE (not goroutine) — runPhase1Watch defers
  stopSecondaries() which clears dep.secondaryWatchers as soon as
  markPhase1Done returns, so a goroutine would race the cleanup.

C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
  Root cause: PR O moved the Cilium Gateway listener's
  certificateRefs to the dashed-suffix per-zone Secret but left the
  legacy bare-name Certificate template behind, so cert-manager
  kept renewing an orphan. Fix: (a) rename the Certificate +
  Secret to the dashed-suffix shape (single-source-of-truth), and
  (b) add a one-shot Job (legacy-cert-cleanup) that deletes the
  pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
  provs. Removable from kustomization.yaml once every live prov
  has reconciled past it.

C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
  Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
  BYO domain from useWizardStore() (zustand+persist localStorage).
  The chroot Sovereign console runs on a fresh browser session
  post-handover with empty localStorage, so the four fields rendered
  em-dashes. The data IS persisted on the deployment record
  (RedactedRequest) — gap was that Deployment.State() never surfaced
  it. Fix: lift controlPlaneSize / sovereignPoolDomain /
  sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
  regionControlPlaneSizes / orgName / orgEmail to the State() map +
  extend DeploymentSnapshot TS type + SettingsPage reads
  snapshot-first with wizard store as fallback (mothership wizard-
  in-flight case).

C8-005 — D20 Jobs page missing region filter dropdown
  Root cause: multi-region Sovereigns expose install-<region>:<chart>
  Jobs but JobsTable offered only status / app / parent filters,
  forcing operators to type the region key into the free-text search.
  Fix: new regionFromJob(job) pure helper parses the canonical
  <region>:<chart> appId (fallback: install-<region>:<chart> jobName).
  Dropdown is visible only when 2+ regions appear in the current job
  set (single-region Sovereigns see no one-option no-op). Sorted
  lexically. Test coverage: 4 helper cases + 3 dropdown cases in
  JobsTable.test.tsx.

Architect-first compliance:
  • bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
  • legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
    self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
  • alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
    (mirror-everything rule)
  • regionFromJob mirrors helmwatch_bridge.go componentID encoding
    (3 input shapes: bare, region-prefixed, install-region-prefixed)
  • State() snapshot additions stay slim — only the 4 founder-flagged
    fields + a few zero-cost adjacents

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:20:29 +04:00

21 lines
1.0 KiB
YAML

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- cilium-gateway-cert.yaml
- cilium-gateway.yaml
# Watch+rollout-restart Job for cilium-envoy. cilium-envoy's xDS SDS
# subscription does NOT recover after the initial-fetch timeout, so a
# fresh Sovereign whose envoy started before the wildcard cert was
# issued serves no listener forever. This Job waits for the Secret
# then bumps the DaemonSet, restoring the listener within ≤90s of
# the cert appearing. See file header for full root cause + design
# rationale (qa-loop bounded-cycle Provision #7).
- cilium-envoy-tls-restart-job.yaml
# C7-007 (2026-05-17 t143) — one-shot cleanup of the pre-PR-O legacy
# `sovereign-wildcard-tls` Certificate + Secret pair. Idempotent
# (`--ignore-not-found`), runs once per Flux reconciliation
# generation. Fresh Sovereigns succeed as a no-op; pre-PR-O
# Sovereigns delete the orphan resources. Removable from the list
# once every live prov has reconciled past it.
- legacy-cert-cleanup-job.yaml