openova/clusters/_template/bootstrap-kit/kustomization.yaml
e3mrah 5e57dfb565
fix(bootstrap-kit): remove bp-hcloud-csi slot 17a — chicken-and-egg with harbor (Wave 7 critical-path hotfix) (#1610)
* fix(bootstrap-kit): remove bp-hcloud-csi slot 17a — chicken-and-egg with harbor

Family G (PR #1601) added bp-hcloud-csi at bootstrap-kit slot 17a to ship
the `hcloud-volumes` default StorageClass for C9-006. Caught live on t11
fresh prov 2026-05-17:

  - Flux source-controller chart pull went through harbor.t11.<sov>
    OCI endpoint BEFORE harbor itself was reachable on the network.
  - Chicken-and-egg: harbor depends on Gateway. Gateway lives in
    `sovereign-tls` Kustomization which dependsOn bootstrap-kit Ready.
    bp-hcloud-csi blocked bootstrap-kit Ready → sovereign-tls never
    applied → no Gateway CR → console.t11.<sov> ERR_CONNECTION_CLOSED.
  - Entire UI test matrix on t11 was BLOCKED on the missing Gateway
    (5 test agents reported the same root cause).

C9-006 (hcloud-volumes default SC) is a cosmetic operator-facing
improvement; Gateway availability is launch-critical. Removing slot 17a
unblocks the chain. Follow-up PR will re-add at a later slot (e.g., 19a
AFTER bp-harbor 19) OR fix the pull path to bypass the registry pivot
during bootstrap.

Also bumps chart 1.4.155 → 1.4.156 + bootstrap-kit pin per the
chart-bump-needs-both rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap-kit): also drop 17a-bp-hcloud-csi from kustomization.yaml resources list

Companion commit to b96d8c50 — the prior commit only removed the file
itself; this commit removes the resources: list entry that referenced
it (otherwise Kustomize fails the dry-run with 'no such file').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:34:40 +04:00

133 lines
6.4 KiB
YAML

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
# Order is documented but not enforced here — Flux respects HelmRelease
# dependsOn declarations for actual install order. Listing in canonical
# Phase 0 sequence per SOVEREIGN-PROVISIONING.md §3.
resources:
- 01-cilium.yaml
- 01a-gateway-api.yaml
- 02-cert-manager.yaml
- 03-flux.yaml
- 04-crossplane.yaml
- 05-sealed-secrets.yaml
- 05a-reflector.yaml
- 07-nats-jetstream.yaml
- 08-openbao.yaml
- 09-keycloak.yaml
- 10-gitea.yaml
- 11-powerdns.yaml
- 12-external-dns.yaml
- 13-bp-catalyst-platform.yaml
- 14-crossplane-claims.yaml
- 15-external-secrets.yaml
- 15a-external-secrets-stores.yaml
- 16-cnpg.yaml
- 17-valkey.yaml
# bp-hcloud-csi (formerly slot 17a) REMOVED 2026-05-17 (Wave 7):
# the Flux source-controller chart pull went through harbor.t11.* OCI
# endpoint BEFORE harbor itself was reachable (chicken-and-egg —
# harbor depends on Gateway, Gateway lives in sovereign-tls which
# dependsOn bootstrap-kit Ready, which never went Ready because
# bp-hcloud-csi was stuck on harbor pull). Caught live on t11 fresh
# prov 2026-05-17: bootstrap-kit Reconciliation-in-progress for 30+
# min → sovereign-tls "not ready: dependency bootstrap-kit not ready"
# → no Gateway CR → console.t11.<sov> ERR_CONNECTION_CLOSED →
# entire UI test matrix BLOCKED. C9-006 (hcloud-volumes default SC)
# is a cosmetic operator-facing nice-to-have; Gateway availability
# is launch-critical. Removing this slot unblocks the chain. Follow-
# up PR will re-add at a later slot (e.g., 19a, AFTER bp-harbor 19)
# OR fix the pull path to bypass the registry pivot during bootstrap.
- 18-seaweedfs.yaml
- 19-harbor.yaml
# 06a — Post-handover Self-Sovereignty Cutover (issue #791). Filename
# carries the 06a prefix to colocate cohorts visually, but the slot's
# dependsOn pins actual install order to AFTER bp-gitea (slot 10) and
# bp-harbor (slot 19). Chart installs DORMANT — catalyst-api stamps
# Jobs only on operator-driven cutover trigger.
- 06a-bp-self-sovereign-cutover.yaml
- 20-opentelemetry.yaml
- 21-alloy.yaml
- 22-loki.yaml
- 23-mimir.yaml
- 24-tempo.yaml
- 25-grafana.yaml
- 27-kyverno.yaml
- 28-reloader.yaml
- 29-vpa.yaml
- 30-trivy.yaml
- 31-falco.yaml
- 32-sigstore.yaml
- 33-syft-grype.yaml
- 34-velero.yaml
- 35-coraza.yaml
- 49-bp-cert-manager-powerdns-webhook.yaml
- 50-cluster-autoscaler.yaml
# qa-loop iter-7 Fix #39 — exec fan-out (Apache Guacamole + per-node
# k8s-ws-proxy DaemonSet). Slots 51/52. Slots 36-48 reserved for the
# W2.K4 AI-runtime cohort (bp-stunner / bp-knative / bp-kserve / vllm
# / bp-llm-gateway / etc.) — see scripts/expected-bootstrap-deps.yaml.
# The k8s-ws-proxy is the apiserver-side proxy; bp-guacamole is the
# operator-facing browser gateway that mounts it via the chart's
# NetworkPolicy egress rule. Both are dependsOn-ordered so Flux
# installs proxy → gateway.
- 51-bp-k8s-ws-proxy.yaml
- 52-bp-guacamole.yaml
# qa-loop iter-12 Fix #53C — EPIC-5 leftovers (NetBird zero-trust mesh
# + DMZ vCluster isolation). Slots 53/54. Both default-OFF; flip on
# via NETBIRD_ENABLED=true / DMZ_VCLUSTER_ENABLED=true on the
# bootstrap-kit Kustomization substitute.
#
# Slot 54 (bp-dmz-vcluster) implements docs/SOVEREIGN-MULTI-REGION-
# DOD.md A4 ("each region runs a DMZ vCluster") + A2 ("inter-region
# link = DMZ WireGuard over PUBLIC IPs"). Default-ON because the DMZ
# vCluster is the public-fronted vCluster AND the inter-region WG
# hop — every region needs it for the topology to converge.
- 54-bp-dmz-vcluster.yaml
# qa-loop iter-12 Fix #54 Workstream 1 — bp-hcloud-ccm (slot 55).
# Hetzner Cloud Controller Manager. The CCM owns node providerID
# flips (k3s://… → hcloud://<server-id>) AND materialisation of
# Service-of-type-LoadBalancer as Hetzner Cloud LBs. Without this,
# every LB-typed Service stays Pending — the proximate root cause
# clustermesh-apiserver could not migrate from NodePort to LB on
# omantel multi-region (qa-loop iter-12 Fix #53D).
- 55-bp-hcloud-ccm.yaml
# OpenovaFlow observability cohort — slots 56/57. Three-agent split
# (Agent #1: TS @openova/flow-core + @openova/flow-canvas, Agent #2:
# Go server + flux adapter, Agent #3: bootstrap-kit + catalyst-api
# proxy integration). Slot 56 (server) installs on PRIMARY clusters
# only; per-Sovereign overlay disables on secondaries. Slot 57
# (emitter) is a DaemonSet — runs on every cluster (mother + every
# Sovereign + every secondary region) so each region's Flux events
# land in the same per-deployment flow.
- 56-bp-openova-flow-server.yaml
- 57-bp-openova-flow-emitter.yaml
# DoD A4 vCluster topology (2026-05-16) — slots 58 + 59 finish the
# primary-mgmt + secondary-rtz pair that goes alongside the slot 54
# DMZ vCluster (every region). Combined topology per region:
# primary region → MGMT (58) + DMZ (54) vCluster
# secondary region → DMZ (54) + RTZ (59) vCluster
# Slot 58 default-OFF until the per-CP postBuild substitute follow-up
# PR adds MGMT_VCLUSTER_ENABLED only on primary. Slot 59 same shape
# for secondaries via RTZ_VCLUSTER_ENABLED. See each slot's header
# comment for the migration plan.
- 58-bp-mgmt-vcluster.yaml
- 59-bp-rtz-vcluster.yaml
# bp-newapi (slot 80) — multi-tenant LLM marketplace gateway. Sequenced
# after the W2.K1 dependency wave (cnpg/keycloak/openbao Ready) so
# NewAPI's ExternalSecret + DSN dependencies resolve on first reconcile.
# See clusters/_template/bootstrap-kit/80-newapi.yaml for full
# dependsOn rationale and per-Sovereign override surface.
- 80-newapi.yaml
# bp-stalwart-sovereign (slot 95) — REMOVED 2026-05-05.
# Phase-2 Sovereign-local mail (per-Sovereign Stalwart for Console
# PIN/magic-link delivery, umbrella #924) is OUT OF SCOPE for the
# current Phase-1 cutover. The Phase-1 design is mothership SMTP
# relay (mail.openova.io:587) — see products/catalyst/chart/values.yaml
# `sovereign.smtp.*` and the catalyst-api `sovereign_smtp_seed.go`
# path. The chart's post-install Job was timing out on otech113 and
# blocking the bootstrap-kit Kustomization. Re-introduce this slot
# only when Phase-2 is explicitly in scope and the chart's readiness
# gate is reliable. See platform/stalwart-sovereign/ for the chart
# itself (kept in-tree for future Phase-2 work).