openova

Author	SHA1	Message	Date
e3mrah	2ff50f0591	fix(bp-newapi+services-build): imagePullSecrets on Pod, sed bumps values.yaml smeTag (#955 ) Two SME-blocker bugs caught live on otech113 (alice signup gate 5 fails on fresh Sovereign): #952 — bp-newapi 1.4.0 Pod has no imagePullSecrets, so kubelet pulls PRIVATE ghcr.io/openova-io/openova/{newapi-mirror,services-metering-sidecar} anonymously and gets 403 Forbidden. Fix: - Templatize spec.imagePullSecrets on Deployment + channel-seed Job. - Default values.yaml `imagePullSecrets: [{name: ghcr-pull}]`. - Add `newapi` to flux-system/ghcr-pull's reflector reflection-{allowed,auto}-namespaces in cloudinit-control-plane.tftpl so bp-reflector mirrors the source Secret into the namespace automatically on every fresh Sovereign. - Bump bp-newapi 1.4.0 -> 1.4.1, update _template overlay. #953 — services-build.yaml's image-rewrite loop only matched the hardcoded `image: ghcr.io/.../services-<svc>:<sha>` form. 7 of 8 sme-services templates use `image: "{{ ... }}/services-<svc>:{{ .Values.images.smeTag }}"`. Each services-build run bumped only auth.yaml while reporting "update sme service images to ${SHA}", leaving the live Pod on stale bytes (PR #951's #941 fix never reached services-catalog despite the merge + chart bump chain). Fix: - After the hardcoded loop, also bump `images.smeTag` in products/catalyst/chart/values.yaml with a strict regex match (`^ smeTag: "<sha>"$`); refuse to auto-bump if the line shape changes (defends against silent drift if a contributor renames the field). - Mirror the change into the retry-path `rewrite()` function so a reset-to-origin/main retry does not recreate the original bug. Tests: - platform/newapi/chart/tests/imagepullsecrets-render.sh — 4 cases asserting the Deployment and channel-seed Job carry the default ghcr-pull reference, that an empty override suppresses the block, and that custom secret names propagate (Inviolable Principle #4). - tests/integration/services-build-rewrite.sh — 3 cases reproducing the workflow's rewrite logic on a sandboxed copy of the live chart, asserting both auth.yaml's hardcoded line AND values.yaml's smeTag get bumped, that helm-render of the catalyst chart with the bumped values produces all 8 SME-service Deployments at the new SHA, and that an idempotent re-bump to a second SHA also lands cleanly. Refs: #952 #953 (umbrella #915 — alice signup gate 5). Co-authored-by: hatiyildiz <143030955+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:47:37 +04:00
e3mrah	b5c9839da7	feat(phase-8b): sovereign wizard auth-gate + handover JWT minting + Playwright CI fixes (#611 ) Squash of PR #611 (feat/607) + PR #615 (feat/605) Phase-8b deliverables: UI: - AuthCallbackPage: mode-aware dispatch (catalyst-zero → magic-link server callback; sovereign → client-side OIDC token exchange via oidc.ts) - Router: sovereign console routes (/console/), DETECTED_MODE index redirect, authCallbackRoute dedup fix, authHandoverRoute safety net - StepSuccess: mints RS256 handover JWT via POST /deployments/{id}/mint-handover-token before redirecting operator to Sovereign console (falls back to plain URL on error) API: - main.go: wires handoverjwt.LoadOrGenerate signer from CATALYST_HANDOVER_KEY_PATH env - deployments.go: stamps HandoverJWTPublicKey from signer.PublicJWK() at create time - provisioner.go: injects HandoverJWTPublicKey into Tofu vars JSON - auth.go: /auth/handover endpoint for seamless single-identity flow Infra: - cloudinit-control-plane.tftpl: writes handover JWT public JWK to /var/lib/catalyst/ - variables.tf: handover_jwt_public_key variable (sensitive, default empty) Chart: - api-deployment.yaml / ui-deployment.yaml / values.yaml: expose handover JWT env vars Playwright CI fixes: - playwright-smoke.yaml / cosmetic-guards.yaml: health-check URL /sovereign/wizard → /wizard - playwright.config.ts: BASEPATH default /sovereign → / + baseURL construction fix - cosmetic-guards.spec.ts: provision URL /sovereign/provision/ → /provision/* - sovereign-wizard.spec.ts: WIZARD_URL /sovereign/wizard → /wizard Closes #605, #606, #607. Fixes Playwright CI (#142 sovereign wizard smoke tests). Co-authored-by: e3mrah <e3mrah@openova.io>	2026-05-02 19:17:56 +04:00
e3mrah	1e7d1e67c9	test(e2e): omantel handover Playwright scaffold for Phase 8 (closes #429 ) (#432 ) Phase 8 of the omantel handover (#369) needs an automated E2E that proves DoD: omantel.omani.works runs as a fully self-sufficient Sovereign with zero contabo dependency post-handover. Today this is a SCAFFOLD — when Phase 4/6/7 land, dispatching the new workflow against a live omantel is the entire Phase 8. Canonical seam (anti-duplication, per memory/feedback_anti_duplication_seam_first.md): - tests/e2e/playwright/tests/ ← mirror of sovereign-wizard.spec.ts shape (NOT specs/ as the issue body said — actual repo path is tests/) - tests/e2e/playwright/playwright.config.ts (BASE_URL handling, retries, workers=1, reporter=list) — reused as-is - tests/e2e/playwright/tests/_helpers.ts:reachable() — reused for the pre-flight skip-when-unreachable pattern - .github/workflows/playwright-smoke.yaml — workflow shape (checkout v4, setup-node v4, npm install, playwright install --with-deps chromium, upload-artifact on failure) — mirrored, NOT duplicated What ships: - tests/e2e/playwright/tests/omantel-handover.spec.ts (NEW, 6 tests): 1. sovereign Ready + 23/23 blueprints 2. all bp-* HelmReleases Ready=True 3. catalyst-platform self-hosts (healthz + dashboard "23 / 23 ready") 4. vendor-agnostic Object Storage (post-#425 canonical secret name flux-system/object-storage — NOT hetzner-object-storage) 5. dig +trace omantel.omani.works ends at omantel NS, not contabo 6. zero contabo dependency (omantel /api/healthz keeps returning 200) Self-skips when OMANTEL_BASE_URL/OMANTEL_API_BASE/OPERATOR_BEARER unset. - .github/workflows/omantel-e2e-handover.yaml (NEW): workflow_dispatch ONLY (no schedule cron — per CLAUDE.md "every workflow MUST be event-driven, NEVER scheduled"). Inputs let the operator override base URLs at dispatch time. - docs/omantel-handover-wbs.md: new §10 "Phase 8 acceptance criteria (executable DoD)" — 6 bullets 1:1 with the spec test() blocks; §9 status row added for #429 (🟢 scaffold-shipped). Local verification: cd tests/e2e/playwright && npm install && \ npx playwright test --list tests/omantel-handover.spec.ts → 6 tests listed cleanly npx playwright test tests/omantel-handover.spec.ts → 6 skipped (env vars unset, expected) Out of scope (per #425 / #428 territory split): - internal/hetzner/, infra/hetzner/, platform/velero/chart/, clusters/.../34-velero.yaml — #425's vendor-agnostic sweep - .github/workflows/check-vendor-coupling.yaml — #428's coupling guard Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>	2026-05-01 17:52:18 +04:00
hatiyildiz	4f56ae47da	fix(cloudinit): keep k3s local-path-provisioner; mark StorageClass default before Flux runs Pre-fix the cloud-init template passed --disable=local-storage to the k3s installer with the design intent that Crossplane would install hcloud-csi day-2 and register a StorageClass after bp-crossplane reconciled. That created a circular dependency on a fresh Sovereign: every PVC-using HelmRelease in the bootstrap-kit (bp-spire, bp-keycloak postgres, bp-openbao, bp-nats-jetstream, bp-gitea, bp-catalyst-platform postgres) blocks Pending on a StorageClass that would only exist after bp-crossplane finished installing — but they ARE in the bootstrap-kit Kustomization that needs to converge before the day-2 path runs. Verified live on omantel.omani.works: data-keycloak-postgresql-0 and spire-data-spire-server-0 both stuck Pending for 20+ min with `no persistent volumes available for this claim and no storage class is set`, `kubectl get sc` empty. This change: 1. Drops --disable=local-storage from INSTALL_K3S_EXEC so k3s ships its built-in local-path-provisioner and registers the `local-path` StorageClass on first boot. 2. Adds a runcmd block AFTER /healthz wait and BEFORE the Flux bootstrap apply that: a. waits for the local-path-provisioner pod Ready b. patches the local-path SC with is-default-class=true c. fails loudly if the SC is missing post-wait (safety gate so a broken cluster doesn't fall through to Flux silently) 3. Adds tests/integration/storageclass.sh — phase 1 render-assertion (regression gate against re-introducing --disable=local-storage, plus positive assertions that the wait/patch/verify steps are present, plus ordering check that the patch precedes the Flux apply); phase 2 kind-cluster proof that a fresh cluster has a default StorageClass that binds a test PVC. 4. Adds docs/RUNBOOK-PROVISIONING.md §"StorageClass missing" — symptom, root cause, and the live-cluster recovery path (apply local-path-storage.yaml + patch default class) for already-provisioned Sovereigns that hit this without reprovisioning. Trade-off: local-path PVs are node-pinned. For the solo-Sovereign target (single CPX21/CPX31 control-plane node) that is the correct shape — the data lives on the node, capacity is bounded by the disk, and there are no other nodes for volumes to migrate to. Operators upgrading to multi-node migrate to hcloud-csi (Hetzner Cloud Volumes) as a separate, deliberate operation; that is not part of the cloud-init bootstrap. Live verification on omantel.omani.works (reproduces the production symptom + proves the recovery path): Before: NAMESPACE NAME STATUS AGE keycloak data-keycloak-postgresql-0 Pending 10m spire-system spire-data-spire-server-0 Pending 10m No StorageClass. After (kubectl apply local-path-storage.yaml + patch): NAME PROVISIONER ... AGE local-path (default) rancher.io/local-path ... 34s NAMESPACE NAME STATUS STORAGECLASS keycloak data-keycloak-postgresql-0 Bound local-path spire-system spire-data-spire-server-0 Bound local-path Gates: - tofu validate: Success! The configuration is valid. - tests/integration/storageclass.sh: PASS (phase 1 render-assertion + phase 2 fresh kind cluster default StorageClass binds test PVC). - Regression sanity: re-injecting --disable=local-storage causes phase 1 to FAIL with the documented error message (verified). Preserves the cloud-init Cilium-pre-Flux ordering (no changes to that block); the StorageClass setup runs between healthz-wait and the Flux bootstrap apply so the bootstrap-kit Kustomization sees a default class on its first reconciliation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 19:43:09 +02:00
hatiyildiz	1e1c8f5c39	merge: cloud-init creates ghcr-pull secret durable + GHCR token pipeline	2026-04-29 18:08:32 +02:00
hatiyildiz	dddbab4b80	fix(cloudinit): create flux-system/ghcr-pull secret on Sovereign so private bp-* charts pull cleanly Every bootstrap-kit HelmRepository CR carries `secretRef: name: ghcr-pull` because bp-* OCI artifacts at ghcr.io/openova-io/ are private. Cloud-init never created the Secret, so every fresh Sovereign's source-controller logs `secrets "ghcr-pull" not found` and Phase 1 stalls at bp-cilium. The operator workaround (kubectl apply by hand) is not durable across reprovisioning. Verified live on omantel.omani.works pre-fix. Changes: - provisioner.Request gains GHCRPullToken (json:"-") so it is never serialized into persisted deployment records. provisioner.New() reads CATALYST_GHCR_PULL_TOKEN at startup; Provision() stamps it onto the Request before tofu.auto.tfvars.json. Validate() rejects empty for domain_mode=pool with a pointer to docs/SECRET-ROTATION.md. - handler.CreateDeployment also stamps the env var onto the Request so the synchronous validation path returns 400 early on misconfiguration. - infra/hetzner: variables.tf adds ghcr_pull_token (sensitive=true, default=""). main.tf computes ghcr_pull_username + ghcr_pull_auth_b64 locals and passes both to templatefile(). cloudinit-control-plane.tftpl emits a kubernetes.io/dockerconfigjson Secret manifest into /var/lib/catalyst/ghcr-pull-secret.yaml; runcmd applies it AFTER Flux core install but BEFORE flux-bootstrap.yaml so the GitRepository + Kustomization land into a cluster that already has working GHCR creds. - products/catalyst/chart/templates/api-deployment.yaml mounts CATALYST_GHCR_PULL_TOKEN from the catalyst-ghcr-pull-token Secret in the catalyst namespace (key: token, optional: true so the Pod still starts on misconfigured installs and Validate() owns the gate). - docs/SECRET-ROTATION.md: yearly-rotation runbook for the GHCR token, Hetzner per-Sovereign tokens, and the Dynadot pool-domain creds. Includes the kubectl create secret one-liner with <GHCR_PULL_TOKEN> placeholder; the token never lives in git. - Tests: provisioner unit tests cover New() reading the env var, tolerance of missing env, pool-mode validation rejection with operator-facing error, BYO acceptance, and the json:"-" serialization invariant. tests/e2e/hetzner-provisioning gains a TestCloudInit_RendersGHCRPullSecret render-only integration test that asserts the rendered cloud-init contains the Secret, applies it before flux-bootstrap, and that the dockerconfigjson round-trips the sample token through templatefile() correctly. Existing pool-mode handler tests now t.Setenv the placeholder token; the on-disk redaction test asserts the placeholder never reaches disk. Gates: - go vet ./... and go test -race -count=1 ./... in products/catalyst/bootstrap/api: PASS. - helm lint products/catalyst/chart: PASS (warnings pre-existing). - tofu fmt + tofu validate: deferred to CI (no tofu binary on the development host). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:07:27 +02:00
hatiyildiz	015e7ab18b	fix(catalyst-chart): annotate api-deployment for Flux strategy-flip recovery DIVERGES from the literal "$patch: replace" prescription on the issue because that directive cannot survive any apply path that actually runs in production (verified end-to-end in tests/integration/strategy-flip.sh): - Flux's kustomize-controller submits via Server-Side Apply. SSA rejects `.spec.strategy.$patch` with "field not declared in schema" — fluxcd/pkg/ssa Manager.Apply does not preprocess SMP directives. - kubectl strict-decoding rejects `$patch` on every CREATE path (`kubectl create`, `kubectl apply` to an empty namespace, every `--server-side` flavor) with "unknown field spec.strategy.$patch" — adding it to a chart base resource BREAKS fresh installs of every new Sovereign. The durable fix is the documented Flux annotation `kustomize.toolkit.fluxcd.io/force: enabled` on the Deployment. When kustomize-controller's SSA dry-run fails Invalid (the contabo- mkt failure mode: `spec.strategy.rollingUpdate: Forbidden` on the post-merge object that retained `rollingUpdate.maxSurge=25%` / `maxUnavailable=25%` from the prior `kubectl-client-side-apply` field manager), the controller falls back to delete-and-recreate THIS resource. The recreated Deployment carries no residual `rollingUpdate.*` fields, so the regression cannot recur. The annotation is IaC, scoped to the Deployment, applies on every reconcile. Verified gates: - `kubectl apply --dry-run=server -f .../api-deployment.yaml` over a Deployment in the bad pre-state (RollingUpdate + maxSurge=25% / maxUnavailable=25%) → exit 0, "deployment.apps/catalyst-api configured (server dry run)". - Same manifest applied to an empty namespace via SSA + CSA → both succeed (the fresh-install gate that catches `$patch:`- shaped regressions). - SSA path correctly REPRODUCES the regression mode (asserted in step 3 of the integration test) → proves the recovery layer is necessary. - Flux force-recovery equivalent (delete + apply) succeeds → proves the recovery path itself works. Files: - products/catalyst/chart/templates/api-deployment.yaml: add `kustomize.toolkit.fluxcd.io/force: enabled` annotation + inline reference comment explaining failure mode and rejecting inline `$patch: replace` as a future regression vector. - docs/CHART-AUTHORING.md (new): authoritative chart-authoring doc, with §"Strategy flips on existing Deployments" anchoring the failure mode + canonical fix + table of related fields (selector, clusterIP, accessModes, etc.) that share the pattern. References docs/INVIOLABLE-PRINCIPLES.md #3 (Flux is the only GitOps reconciler) and #4 (never hardcode runtime knobs in operator runbooks). - tests/integration/strategy-flip.yaml (new): bad-state fixture + assertion ConfigMap. Reproduces the exact 25%/25% pre-state that triggered contabo-mkt. - tests/integration/strategy-flip.sh (new): 6-step runner — bad-state stage, CSA gate, SSA failure-mode reproduction, structural annotation check, recovery-path proof, fresh- install gate. Exits non-zero on any regression. - .github/workflows/test-strategy-flip.yaml (new): CI wiring on kind v1.30.6 (matches contabo-mkt k3s decoding behavior), triggered by edits to the chart manifest, the test, the doc, or the workflow itself. Sweep of the rest of the Catalyst chart templates: the only `strategy.type: Recreate` Deployment in the chart is catalyst-api. catalyst-ui, marketplace-api, and all 11 sme-services Deployments declare default RollingUpdate and live as RollingUpdate on contabo- mkt — no latent flips. Services use ClusterIP with default IP allocation; the api-deployments PVC is RWO and never re-shaped by the chart. No additional resources needed hardening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:04:07 +02:00
hatiyildiz	55b8a18b32	test(e2e): #142 , #143 , #144 — Playwright UI smoke tests for sovereign wizard, admin vouchers, marketplace bp-<x> grid Group L closes the three UI smoke-test gaps the verify-sweep flagged: #142 sovereign wizard — tests/e2e/playwright/tests/sovereign-wizard.spec.ts #143 admin voucher UI — tests/e2e/playwright/tests/admin-vouchers.spec.ts #144 unified bp-<x> grid — tests/e2e/playwright/tests/marketplace-cards.spec.ts Tests target the actual shipped UI shape (Pass 105+): * Wizard step model is StepOrg → StepTopology → StepProvider → StepCredentials → StepComponents → StepReview, not the original ticket's StepDomain/StepHetzner draft from before the unified-Blueprints refactor. * Admin voucher model uses an `active` toggle, not ISSUED/REVOKED status. * "Marketplace card grid" = the Catalyst wizard's StepComponents (bp-<x> Blueprints), NOT the SME marketplace at core/marketplace (which is for SaaS Apps). Today every Blueprint is `visibility: unlisted`, so the test asserts the data layer (catalog.generated.ts) plus the documented EmptyState; once `visibility: listed` lands, the third assertion auto-extends to the rendered card grid. Per principle #4 ("never hardcode"), all URLs come from env vars with sensible local-dev defaults. Per principle #1 ("never speculate"), tests self-skip with explicit reasons when their target app isn't reachable instead of fail-noisy. CI: .github/workflows/playwright-smoke.yaml boots the Catalyst UI in the background and runs the suite on PRs touching UI sources or tests; admin and marketplace specs self-skip in that workflow because spinning up all three Astro apps + catalyst-api + Postgres is the full E2E pipeline's job, not this smoke. Local run (Catalyst UI on :4399, admin on :4398): 5 passed, 2 skipped (skip reasons: marketplace #3 needs StepComponents reachable past required-field gating; admin #2 needs ADMIN_TEST_COOKIE for an authenticated session). Refs: #142, #143, #144	2026-04-28 19:54:04 +02:00
hatiyildiz	919514ca78	merge: /sovereign nginx routing — values-driven /sovereign + /api/v1 (`a35da92`)	2026-04-28 19:50:39 +02:00
hatiyildiz	a35da929f1	feat(sovereign-route): values-driven /sovereign + /api/v1 routing Per docs/INVIOLABLE-PRINCIPLES.md #4 (never hardcode), the catalyst-ui nginx config now flows from values.yaml at chart-render time: - routing.basePath (/sovereign) — also drives ingress strip-prefix - routing.catalystApi.serviceDNS — in-cluster reverse-proxy target - routing.catalystApi.port — upstream port - dns.resolverIP — CoreDNS for proxy-time resolution (avoids stale ClusterIP after catalyst-api restarts) - ingress.host / ingress.priority / ingress.className Files: - products/catalyst/chart/values.yaml — new, documents every default - products/catalyst/chart/templates/ui-configmap.yaml — new, nginx reverse-proxies /api/* to catalyst-api Service DNS - products/catalyst/chart/templates/ui-deployment.yaml — mounts the ConfigMap at /etc/nginx/conf.d/default.conf - products/catalyst/chart/templates/ingress.yaml — values-driven host + path + priority + class - tests/e2e/sovereign-routing/* — Playwright smoke for the routing Captured from stalled agent /tmp/agent-sovereign-route-finish — agent stream watchdog timed out after the work was authored but before commit.	2026-04-28 19:48:40 +02:00
hatiyildiz	4554bd6d5d	feat(dod): #149-#157 — Group M DoD scaffolding (DEMO-RUNBOOK + dod_test.go + dod.yaml) Manual-dispatch-only DoD scaffolding for the omantel.omani.works end-to-end test. Operator-gated; the test t.Skip()s when HETZNER_TEST_TOKEN env var is missing so CI stays green. - docs/DEMO-RUNBOOK.md: 9-step operator runbook covering Group C cutover, wizard provision, voucher issuance, tenant redemption. - tests/dod/dod_test.go: HTTP-driven E2E that streams SSE through all 11 phases, asserts cert + DNS + voucher + redemption flow. - .github/workflows/dod.yaml: workflow_dispatch only — never on-push (Hetzner cost gating). Cherry-picked additive files from /tmp/agent-group-m-dod (`a40b495`); the agent's branch had stale-base deletions of #108/#109/Pass-107 that we drop.	2026-04-28 19:34:46 +02:00
hatiyildiz	7c7c46bc62	test: Hetzner Sovereign end-to-end provisioning test (#141 ) Closes the Group L "end-to-end provisioning test on Hetzner test project" ticket. Per the ticket's exact wording: scaffolding + harness + CI workflow, gated on HETZNER_TEST_TOKEN, NEVER mocked. Lifecycle when HETZNER_TEST_TOKEN is set: 1. Generate unique sovereign FQDN (e2e-<run-id>.openova.io) 2. Stage canonical infra/hetzner/ OpenTofu module into temp dir 3. Render tofu.auto.tfvars.json with test inputs (BYO domain mode so Dynadot isn't touched; region runtime-configurable; SSH key minted by CI per-run) 4. tofu init && tofu apply -auto-approve (30m timeout) 5. Assert outputs: control_plane_ip + load_balancer_ip are valid IPv4 6. Assert TCP/22 reachable on control plane (5m await) 7. Assert TCP/443 reachable on LB after Cilium + Flux land (15m await, soft-failure since the Catalyst control plane install is the long tail and partial-bootstrap is acceptable proof of OpenTofu + Flux) 8. tofu destroy -auto-approve (always — t.Cleanup, runs even on fail) 9. Verify state list is empty after destroy (no leaked resources) When HETZNER_TEST_TOKEN is absent, the test SKIPS — does not mock, does not fall through to a stub. Per docs/INVIOLABLE-PRINCIPLES.md #2, mocking the cloud would tell us nothing about whether the OpenTofu module, hcloud provider, cloud-init scripts, or k3s actually work. A second test (TestHarness_NoHetznerCredsSkips) explicitly verifies the skip semantics so future refactors don't accidentally land mocking. CI workflow (.github/workflows/test-hetzner-e2e.yaml): - Triggers on workflow_dispatch (operator initiates real run) or PR labeled `test/hetzner-e2e` — NOT on every push (each run costs real Hetzner minutes ~EUR 0.005/run). - Generates a per-run throwaway SSH ed25519 keypair so no secret long-term key lands in any logs. - Installs OpenTofu via opentofu/setup-opentofu@v1. - Reads HETZNER_TEST_TOKEN + HETZNER_TEST_PROJECT_ID from repo secrets; operator populates them out-of-band (per the ticket: "operator will populate later"). - 55m job timeout, plus the test itself uses contexts of 30m apply + 20m destroy. Files: - tests/e2e/hetzner-provisioning/main_test.go (the harness) - tests/e2e/hetzner-provisioning/go.mod (separate module, stdlib-only) - .github/workflows/test-hetzner-e2e.yaml (gated CI) Refs #141 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 14:00:29 +02:00
hatiyildiz	3dced3fdda	test: bootstrap-kit Flux Kustomization integration test (#145 ) Closes the Group L "integration test — provisioner backend bootstrap-kit installer — all 11 phases install in sequence on a kind cluster" ticket. Per the ticket note, the bootstrap installer is now Flux-driven from clusters/<sovereign-fqdn>/ — NOT the bespoke Go-based installer that was reverted in commit `e668637`. The test verifies that Flux reconciles the right Kustomizations rather than that Go code helm-installs anything. Two layers of validation: 1. Static manifest layer (runs on every push, cheap) - All 11 platform/<x>/blueprint.yaml + chart/Chart.yaml exist - Each blueprint.yaml satisfies catalyst.openova.io/v1alpha1 schema (apiVersion/kind/metadata.name/spec.version/card.title/card.summary) - Chart.yaml name matches "bp-<x>" and version matches blueprint.yaml spec.version - clusters/_template/ YAMLs parse after SOVEREIGN_FQDN_PLACEHOLDER substitution (when the template tree is on the branch — Group J/M ticket lands the per-Sovereign template) - The dependency order matches the canonical 11-phase sequence from SOVEREIGN-PROVISIONING.md §3 (cilium → cert-manager → flux → crossplane → sealed-secrets → spire → nats-jetstream → openbao → keycloak → gitea → bp-catalyst-platform) 2. Kind-cluster layer (runs on main pushes, gated on BOOTSTRAP_KIT_KIND_TEST=1) - Brings up kubernetes-in-docker - Installs Flux CRDs + source/kustomize controllers - Registers a GitRepository pointing at this monorepo - Synthesizes the 11 bootstrap-kit Kustomizations and applies them - Asserts the API server accepts all 11 (manifests are valid, schema satisfied) — this is the test's narrow scope per the ticket The test deliberately does NOT wait for the kit to fully install upstream charts or reach steady-state reconciliation. That belongs to #141 (real Hetzner E2E with cloud credentials and outbound network), not a kind cluster test in CI. Files: - tests/e2e/bootstrap-kit/main_test.go (Go test, 11 subtests + 4 main) - tests/e2e/bootstrap-kit/go.mod (separate module — keeps test deps isolated from the production Go modules) - .github/workflows/test-bootstrap-kit.yaml (kind-action + flux2/action) Refs #145 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:58:18 +02:00

13 Commits