t143 hit LE PROD rate limit (50 certs/week on omani.works exhausted)
because TWO cert templates compete for the same parent-domain quota:
1. clusters/_template/sovereign-tls/cilium-gateway-cert.yaml — legacy
SAN cert named `sovereign-wildcard-tls`
2. products/catalyst/chart/templates/sovereign-wildcard-certs.yaml —
chart per-zone cert named `sovereign-wildcard-tls-<sanitised-zone>`
The Cilium Gateway listener hardcoded the legacy name, so when LE 429s
the legacy cert (as happened on t143), HTTPS to console.<fqdn> breaks
even though the per-zone cert is Ready.
Fix: gateway listener now references `sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}`.
Cloud-init substitutes SOVEREIGN_FQDN_DASHED = replace(fqdn, ".", "-")
in the sovereign-tls Kustomization postBuild.substitute. The per-zone
cert from the chart provides the Ready Secret with this exact name.
The legacy cilium-gateway-cert.yaml SAN cert still renders for
backward-compat (some consumers may still reference it), but the
gateway listener no longer depends on it for TLS termination.
Bumps no chart version — the change is at the Flux/Kustomize layer.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
55 lines
2.1 KiB
YAML
55 lines
2.1 KiB
YAML
# Cilium Gateway (Phase-8a bug #14 follow-up to #484).
|
|
# Moved out of bootstrap-kit/01-cilium.yaml because gateway.networking.k8s.io/v1
|
|
# CRDs are installed by the Cilium HelmRelease itself; Flux dry-runs the
|
|
# whole Kustomization before applying any HR, so Gateway dry-run fails on
|
|
# a fresh cluster. The sovereign-tls Kustomization dependsOn bootstrap-kit
|
|
# Ready, so by the time Gateway is applied here, Cilium has installed.
|
|
|
|
apiVersion: gateway.networking.k8s.io/v1
|
|
kind: Gateway
|
|
metadata:
|
|
name: cilium-gateway
|
|
namespace: kube-system
|
|
labels:
|
|
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
|
catalyst.openova.io/component: cilium-gateway
|
|
spec:
|
|
gatewayClassName: cilium
|
|
# NOTE: ports 30080/30443 (not 80/443) — even with hostNetwork=true,
|
|
# cilium-envoy refuses to bind privileged ports because cilium-agent
|
|
# gates that bind through its `envoy-keep-cap-netbindservice` flag and
|
|
# the resulting bind() syscall is intercepted by the agent's BPF
|
|
# socket-LB program. Setting privileged: true on the cilium-envoy
|
|
# DaemonSet + adding NET_BIND_SERVICE + flipping the configmap flag
|
|
# all failed to lift the bind() rejection (verified live on otech45,
|
|
# otech46, otech47).
|
|
#
|
|
# High-port (>1024) bind succeeds without NET_BIND_SERVICE. The
|
|
# Hetzner LB does the public-facing port translation: HCLB listens on
|
|
# 80→forwards to CP node:30080; HCLB listens on 443→forwards to CP
|
|
# node:30443. Browsers hit the canonical URL (`https://console.<fqdn>/`)
|
|
# so port 30443 is never visible externally.
|
|
#
|
|
# See infra/hetzner/main.tf hcloud_load_balancer_service.{http,https}
|
|
# destination_port settings — they MUST match these listener ports.
|
|
listeners:
|
|
- name: https
|
|
port: 30443
|
|
protocol: HTTPS
|
|
hostname: "*.${SOVEREIGN_FQDN}"
|
|
tls:
|
|
mode: Terminate
|
|
certificateRefs:
|
|
- kind: Secret
|
|
name: sovereign-wildcard-tls-${SOVEREIGN_FQDN_DASHED}
|
|
allowedRoutes:
|
|
namespaces:
|
|
from: All
|
|
- name: http
|
|
port: 30080
|
|
protocol: HTTP
|
|
hostname: "*.${SOVEREIGN_FQDN}"
|
|
allowedRoutes:
|
|
namespaces:
|
|
from: All
|