openova/infra
e3mrah 422da46360
fix(sovereign-tls): cilium-gateway listeners per parentZone (#1640)
Issue #831 follow-on to #827. Previously the Cilium Gateway declared a
single listener pair on `*.${SOVEREIGN_FQDN}` only — tenant URLs under
non-primary parent zones (e.g. wp-foo.omani.homes when the operator
brings omani.homes as the SME pool) hit cilium-envoy's default fallback
cert and TLS-handshake-mismatched. The per-zone wildcard Secret rendered
by products/catalyst/chart/templates/sovereign-wildcard-certs.yaml (PR
\#827) existed but had no Gateway listener claiming its hostname.

Fix: render one listener pair (HTTPS:30443 + HTTP:30080) per parent
zone. Materialised at Terraform plan time as a JSON-flow array
(infra/hetzner/main.tf locals.parent_domains_listeners_yaml — jsonencode
of the listener objects iterating decoded parent_domains_yaml), threaded
through Flux postBuild.substitute as PARENT_DOMAINS_LISTENERS_YAML, and
consumed as a scalar value at `listeners: \${PARENT_DOMAINS_LISTENERS_YAML}`
in cilium-gateway.yaml. Each pair's certificateRefs target the per-zone
Secret `sovereign-wildcard-tls-<sanitised-zone>` so listener + cert stay
in lockstep.

Scalar placeholder (not multi-line block) because kustomize-build parses
the YAML before Flux runs envsubst — a placeholder on its own line at
column 0 fails YAML parse. Scalar `${VAR}` parses cleanly; envsubst then
swaps it for the JSON-flow array string, which the apiserver parses as
the real listener list.

Single-zone fallback preserved (var.parent_domains_yaml empty →
[{name: <sovereign_fqdn>, role: primary}]) so legacy single-zone
provisions render 2 listeners (1 HTTPS + 1 HTTP). Multi-zone provisions
(e.g. primary omani.works + sme-pool omani.homes) render 4 listeners.

Verification:
  - kubectl kustomize clusters/_template/sovereign-tls/ → clean
  - End-to-end simulation (single-zone, two-zone) renders correct
    listener counts (2 / 4) with correct certificateRefs per zone.
  - Listener naming `https-<sanitised>` / `http-<sanitised>` is unique
    per listener so Gateway controller programs them all (duplicate
    names produce Conflicting status condition).

Files:
  - clusters/_template/sovereign-tls/cilium-gateway.yaml (scalar
    listeners placeholder + comment block explaining the why)
  - infra/hetzner/main.tf (locals.parent_domains_decoded +
    locals.parent_domains_listeners_yaml; threaded into primary CP and
    secondary regions' templatefile() calls)
  - infra/hetzner/cloudinit-control-plane.tftpl (PARENT_DOMAINS_LISTENERS_YAML
    substitute var in sovereign-tls Kustomization block)

Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 11:09:26 +04:00
..
cloudflare-worker-leases feat(continuum): K-Cont-4 — Cloudflare Worker source + tofu wiring for lease witness (#1101) (#1159) 2026-05-09 08:01:44 +04:00
hetzner fix(sovereign-tls): cilium-gateway listeners per parentZone (#1640) 2026-05-18 11:09:26 +04:00