openova/clusters/_template/sovereign-tls/cilium-gateway.yaml
e3mrah 0242be5c49
fix(infra): PR O — cilium-gateway TLS references per-zone wildcard cert (#1595)
t143 hit LE PROD rate limit (50 certs/week on omani.works exhausted)
because TWO cert templates compete for the same parent-domain quota:
1. clusters/_template/sovereign-tls/cilium-gateway-cert.yaml — legacy
   SAN cert named `sovereign-wildcard-tls`
2. products/catalyst/chart/templates/sovereign-wildcard-certs.yaml —
   chart per-zone cert named `sovereign-wildcard-tls-<sanitised-zone>`

The Cilium Gateway listener hardcoded the legacy name, so when LE 429s
the legacy cert (as happened on t143), HTTPS to console.<fqdn> breaks
even though the per-zone cert is Ready.

Fix: gateway listener now references `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`.
Cloud-init substitutes SOVEREIGN_FQDN_DASHED = replace(fqdn, ".", "-")
in the sovereign-tls Kustomization postBuild.substitute. The per-zone
cert from the chart provides the Ready Secret with this exact name.

The legacy cilium-gateway-cert.yaml SAN cert still renders for
backward-compat (some consumers may still reference it), but the
gateway listener no longer depends on it for TLS termination.

Bumps no chart version — the change is at the Flux/Kustomize layer.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:19:10 +04:00

55 lines
2.1 KiB
YAML

# Cilium Gateway (Phase-8a bug #14 follow-up to #484).
# Moved out of bootstrap-kit/01-cilium.yaml because gateway.networking.k8s.io/v1
# CRDs are installed by the Cilium HelmRelease itself; Flux dry-runs the
# whole Kustomization before applying any HR, so Gateway dry-run fails on
# a fresh cluster. The sovereign-tls Kustomization dependsOn bootstrap-kit
# Ready, so by the time Gateway is applied here, Cilium has installed.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: cilium-gateway
namespace: kube-system
labels:
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
catalyst.openova.io/component: cilium-gateway
spec:
gatewayClassName: cilium
# NOTE: ports 30080/30443 (not 80/443) — even with hostNetwork=true,
# cilium-envoy refuses to bind privileged ports because cilium-agent
# gates that bind through its `envoy-keep-cap-netbindservice` flag and
# the resulting bind() syscall is intercepted by the agent's BPF
# socket-LB program. Setting privileged: true on the cilium-envoy
# DaemonSet + adding NET_BIND_SERVICE + flipping the configmap flag
# all failed to lift the bind() rejection (verified live on otech45,
# otech46, otech47).
#
# High-port (>1024) bind succeeds without NET_BIND_SERVICE. The
# Hetzner LB does the public-facing port translation: HCLB listens on
# 80→forwards to CP node:30080; HCLB listens on 443→forwards to CP
# node:30443. Browsers hit the canonical URL (`https://console.<fqdn>/`)
# so port 30443 is never visible externally.
#
# See infra/hetzner/main.tf hcloud_load_balancer_service.{http,https}
# destination_port settings — they MUST match these listener ports.
listeners:
- name: https
port: 30443
protocol: HTTPS
hostname: "*.${SOVEREIGN_FQDN}"
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}
allowedRoutes:
namespaces:
from: All
- name: http
port: 30080
protocol: HTTP
hostname: "*.${SOVEREIGN_FQDN}"
allowedRoutes:
namespaces:
from: All