openova

Author	SHA1	Message	Date
e3mrah	c96d7b5089	fix(bp-keycloak): retune install retries to fit HR envelope (#146 ) (#1352 ) Diagnostic finding: prov #21 + #22 hung 30+ min on bp-keycloak install. Three coupled root causes from Fix #140 retune: 1. availabilityCheck.timeout=900s meant a single failed availability attempt busted the 30m HR window before Job-level backoff retried. 2. HR install/upgrade.remediation.retries=3 triggered up to 3 full Helm uninstall+reinstall cycles, losing Liquibase state each time. Worst case: 90m+ wall-clock before Flux gave up. 3. Liquibase + JVM cold-start legitimately took 5-10 min before Keycloak's Service had Endpoints, but bitnami's livenessProbe (initialDelaySeconds 300) killed Pods mid-migration. Three target-state changes (chart 1.4.4 -> 1.4.5): - platform/keycloak/chart/values.yaml: availabilityCheck.timeout 900s -> 300s. ~5 attempts fit in 30m HR envelope vs. ~1.5 at 900s. - platform/keycloak/chart/values.yaml: keycloak.startupProbe.enabled with failureThreshold 360 x periodSeconds 5 = 30m budget. Suspends livenessProbe until first /realms/master 200. livenessProbe. initialDelaySeconds 300 -> 60. - clusters/_template/bootstrap-kit/09-keycloak.yaml: install/upgrade.remediation.retries 3 -> 1 + chart pin 1.4.5. Job's own backoffLimit=5 handles retries without losing state. All knobs remain operator-overridable via per-Sovereign overlay valuesFrom (Inviolable Principle #4: no hardcoding). TODO follow-up (out of scope per diagnostic "ship knob bumps first to validate hypothesis"): move realm-import out of the bitnami post-install Helm hook into a Catalyst-owned Job that runs after Keycloak Service has Endpoints. Decouples HR-Ready from realm- imported and lets the orchestrator wait on the Job CR directly. Refs #146. Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 07:37:14 +04:00
e3mrah	6662d672d3	fix(bp-keycloak): post-upgrade hook regression on prov #21 (#140 ) (#1349 ) Fix #129 (1.4.3) set availabilityCheck.timeout=600s + backoffLimit=5, which works for fresh-install but races on UPGRADE. On prov #21 (f84f6c3ff2b60296, 2026-05-11) chart-roll on contabo (PR #1346/#1347 stack) triggered an HR upgrade; the keycloak StatefulSet rolled, Helm fired the post-upgrade hook before the new Pod's admin endpoint recovered (Liquibase re-validation + JVM cold start on freshly- provisioned node), and the inner 600s window expired before the first attempt found Keycloak Ready. With backoffLimit=5 + 10s..6m exponential backoff the worst-case wall clock exceeds the parent HR's 15m upgrade.timeout -> Helm aborts the hook -> "post-upgrade hooks failed". Target-state fix (Principle #3: no half-fix; both chart and HR move together): - platform/keycloak/chart/values.yaml: availabilityCheck.timeout 600s -> 900s (15m inner wait covers a single rolling-restart + Liquibase cycle without retry, eliminating most backoff time); cleanupAfterFinished.enabled true with 1h TTL so stale hook Pods don't race the before-hook-creation delete on subsequent upgrades. - platform/keycloak/chart/Chart.yaml: 1.4.3 -> 1.4.4 + 1.4.4 changelog block. - clusters/_template/bootstrap-kit/09-keycloak.yaml: HR install + upgrade timeout 15m -> 30m so Helm's outer hook-wait gracefully accommodates the inner 15m availability window plus normal backoff. Pin chart version 1.4.3 -> 1.4.4. All knobs remain operator-overridable via per-Sovereign valuesFrom (Principle #4: no hardcoding). Hook semantics stay intact (Principle #3: workaround would be disabling annotations, which breaks downstream bp-gitea + catalyst-api contract that the realm exists before HR Ready). Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 06:33:57 +04:00
e3mrah	a3005ead67	fix(bp-keycloak): bump keycloak-config-cli hook timeouts (#129 ) (#1341 ) Fresh-Sovereign provision #15 (otech 0ad3687ddd72deb7) wedged at phase1-watching for 30+ min: bp-keycloak HelmRelease failed with `post-upgrade hooks failed: timed out waiting for the condition` → bp-gitea (dependsOn keycloak OIDC) blocked → bp-self-sovereign-cutover never converged. Root cause ────────── The bitnami keycloak subchart's `keycloak-config-cli-job.yaml` is rendered as a Helm post-install/post-upgrade/post-rollback hook (default annotations on the Job, weight 5). On a fresh k3s the realm-import Job fires before Postgres+Liquibase finish bootstrapping Keycloak (legitimately 3-10 min), and the bitnami subchart defaults are too tight to absorb that race: - keycloakConfigCli.availabilityCheck.timeout="" → keycloak-config-cli falls back to its internal ~120s wait for Keycloak's /admin endpoint - keycloakConfigCli.backoffLimit: 1 → only 2 Pod attempts total before the Job is marked Failed Both attempts hit the 120s window, Job goes Failed, Helm reports the post-upgrade hook timed out, HR install/upgrade retries (×3) all hit the same race, HR remains Failed → downstream blueprints never install. Fix ─── Tune the hook's internal timing to fit comfortably inside the parent HR's 15m install/upgrade timeout while leaving headroom for cold image pull + Pod scheduling: keycloak.keycloakConfigCli.availabilityCheck.timeout: "600s" (was "") keycloak.keycloakConfigCli.backoffLimit: 5 (was 1) Both knobs remain operator-overridable via per-Sovereign `valuesFrom` (Inviolable Principle #4: no hardcoding). Per Inviolable Principle #3 (no workarounds), this does NOT disable the hook semantics — disabling the hook would break the documented contract that the realm exists before the HR reaches Ready (downstream bp-gitea + catalyst-api consume the realm). Files ───── platform/keycloak/chart/values.yaml (+59 inline rationale) platform/keycloak/chart/Chart.yaml (1.4.2 → 1.4.3 + changelog) clusters/_template/bootstrap-kit/09-keycloak.yaml (HR pin → 1.4.3) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 02:51:21 +04:00
e3mrah	2ef01849bf	fix(bp-keycloak): truncate catalyst-api-server desc <255 chars (1.4.2 backport) (#1285 ) * fix(bp-keycloak): truncate catalyst-api-server description <255 chars (Postgres limit) Keycloak DB column CLIENT.DESCRIPTION = varchar(255). Previous value was 458 chars, causing realm-config-cli post-install hook to fail with PSQLException value too long. Caught on omantel provision #6 iter-13 chart roll — keycloak-config-cli Job CrashLoop, bp-keycloak HR False, upstream HRs blocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-keycloak): truncate catalyst-api-server desc <255 chars (Postgres limit) Keycloak DB column CLIENT.DESCRIPTION = varchar(255). Previous value was 458 chars (since Fix #23 / commit `febd5fef`), causing realm-config-cli post-install hook to fail with PSQLException 'value too long for type character varying(255)' on every fresh Sovereign provision. Caught on omantel provision #6 — keycloak-config-cli Job CrashLoop, bp-keycloak HR False, all upstream HRs blocked from converging. Backport to 1.4.x (1.5.0 had a separate breaking realm-rename change reverted via PR #1282). Bootstrap-kit pin updated to 1.4.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: alierenbaysal <alierenbaysal@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:48:37 +04:00
e3mrah	0a11107630	fix(keycloak): parameterize realm name (target-state realm-per-Sovereign) — qa-loop iter-12 Fix #53A (#1271 ) * fix(keycloak): parameterize realm name (target-state realm-per-Sovereign) — qa-loop iter-12 Fix #53A Per `feedback_no_mvp_no_workarounds.md` target-state rule + matrix assertion drift on TC-124, TC-125, TC-159, TC-160, TC-161, TC-176, TC-190, TC-285 (8 TCs in iter-12 audit Phase 4 cluster A): each Sovereign owns its KC realm named after the tenant short-name, not a hardcoded literal `sovereign`. bp-keycloak chart 1.4.1 → 1.5.0: - New value `sovereignRealm.name` (default `sovereign` for backward compat with overlays not yet migrated) - New value `sovereignRealm.displayName` (default `Sovereign`) - Realm import JSON `"realm"` field + catalyst-kc-sa-credentials Secret `realm` key both flow from `$realmName` so Keycloak realm name and catalyst-api `CATALYST_KC_REALM` env stay in sync (no auth-mismatch risk) omantel chroot overlay: - bp-keycloak HelmRelease pinned to chart 1.5.0 - `sovereignRealm.name: omantel` + `displayName: "Omantel Sovereign"` per matrix tenant convention bp-catalyst-platform 1.4.120 → 1.4.121: chart bump triggers catalyst-api StatefulSet restart so it picks up the new mirrored Secret with realm=omantel. The cutover step-06 patches HR.spec.chart.spec.version dynamically per `incidents.md`. Backward compat: charts not setting sovereignRealm.name (otech, _template) keep realm `sovereign` (no behaviour change). The contabo Catalyst-Zero realm `openova` is a separate KC instance untouched by this change. * fix(blueprint): bump bp-keycloak blueprint.yaml to 1.5.0 to match Chart.yaml — qa-loop iter-12 Fix #53A follow-up	2026-05-10 10:48:09 +04:00
e3mrah	142d42e725	fix(cilium): clustermesh-apiserver NodePort → LoadBalancer (path-1) — qa-loop iter-12 Fix #53D (#1274 ) * fix(cilium): clustermesh-apiserver Service NodePort → LoadBalancer (path-1) — qa-loop iter-12 Fix #53D Per qa-loop-state/incidents.md remediation table path-1 + feedback_no_mvp_no_workarounds.md "no operational hacks": the existing NodePort 32379 was the workaround that triggered Hetzner's stateful firewall to silently drop cross-region SYN packets to BPF-only NodePorts (no LISTEN socket on the host). The canonical multi-region transport is a per-peer Hetzner LoadBalancer via the cloud-controller-manager. Affects: omantel-fsn chroot Sovereign (this PR). Other Sovereigns (otech, _template) keep their existing setting. PRECONDITION (separate bootstrap-kit slot, follow-up): Hetzner cloud-controller-manager (hcloud-ccm) must be installed AND each k3s node's spec.providerID rewritten from `k3s://...` to `hcloud://<server-id>` so the LB Service materializes. Without CCM the LB sits in `<pending>` but does not break in-cluster operation (ClusterIP still works for the local cilium-agent). Test matrix coverage when CCM is also live: TC-260, TC-261, TC-241, TC-050, TC-308, TC-310, TC-311, TC-314, TC-298, TC-297, TC-340, TC-349 (multi-region tests blocked by NodePort filtering). * fix(blueprint): bump bp-gitea blueprint.yaml to 1.2.5 to match Chart.yaml — pre-existing main drift * fix(blueprint): bump bp-keycloak blueprint.yaml to 1.4.1 to match Chart.yaml — pre-existing main drift	2026-05-10 10:45:11 +04:00
e3mrah	febd5fef22	fix(bp-keycloak): grant catalyst-api SA manage-realm + view-realm + view-clients (qa-loop iter-4 Fix #23 ) (#1213 ) Root cause of TC-248: the catalyst-api-server service-account in the sovereign realm was created (PR #604, Phase-8b) with only impersonation+manage-users+view-users+query-users on realm-management. Those four roles let the SA mint tokens and provision users, but they do NOT include manage-realm or view-realm, which are required to read or write realm-roles via the Keycloak Admin REST API. When EPIC-3 T2 added the tier-role bootstrap goroutine (KEYCLOAK_BOOTSTRAP_TIER_ROLES=true, products/catalyst/bootstrap/api/internal/keycloak/realm_bootstrap.go) its very first call — GetRealmRole(catalyst-viewer) — returned 403 Forbidden, EnsureRealmRole gave up after 5 retries and the catalog-tier realm-roles were never materialized. The access-matrix UI (TC-248) then showed an empty role list. Fix: extend clientScopeMappings.realm-management AND users[serviceAccountClientId=catalyst-api-server].clientRoles.realm-management in the sovereign realm import to include manage-realm + view-realm + view-clients. After this change a clean Sovereign install converges the tier-role bootstrap on the FIRST attempt at catalyst-api startup. Verification on omantel (chart 1.4.0 → 1.4.1, runtime fix applied manually first then catalyst-api restarted): kc-bootstrap: tier-role bootstrap converged (attempt 1, realm=sovereign) $ curl /admin/realms/sovereign/roles \| jq '.[].name' catalyst-admin (composite=true, tier-level=40) catalyst-developer (composite=true, tier-level=20) catalyst-operator (composite=true, tier-level=30) catalyst-owner (composite=true, tier-level=50) catalyst-viewer (composite=false, tier-level=10) $ catalyst-owner.composites → catalyst-admin $ catalyst-admin.composites → catalyst-operator $ catalyst-operator.composites → catalyst-developer $ catalyst-developer.composites → catalyst-viewer Adds TestEnsureTierRealmRoles_GetRole403_SurfacesPermissionError to realm_bootstrap_test.go so future regressions of the SA permission contract surface a debuggable error chain ("ensure realm role \"catalyst-viewer\": ... GET role 403: ...") rather than a generic "create failed". Refs: TC-248, EPIC-3 T2 (#1098), bp-keycloak Phase-8b (#604) Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 19:14:30 +04:00
e3mrah	7f859dbb4b	feat(bp-keycloak): tenant-mode realm with wordpress/openclaw/stalwart OIDC clients (1.4.0, #915 ) (#918 ) PR #911 wired the SME tenant orchestrator to emit realmConfig.tenant.enabled=true on the per-tenant bp-keycloak HelmRelease — but the chart had no template that consumed those values, so the WordPress / OpenClaw / Stalwart OIDC integrations had no client registered in the tenant realm and SSO failed end-to-end. This change adds the chart-side template the orchestrator was already emitting for. When realmConfig.tenant.enabled=true: * configmap-sovereign-realm.yaml SKIPS (mutual-exclusion guard added on the existing template) so only one realm CM is rendered. * NEW templates/configmap-tenant-realm.yaml renders a realm import ConfigMap (same name `<release>-sovereign-realm-config` so the upstream keycloak-config-cli existingConfigmap reference still resolves) carrying the tenant realm + 3 OIDC clients: - wordpress (confidential, auth-code; redirect URIs cover the openid-connect-generic plugin's admin-ajax.php callback + /wp-login.php fallback) - openclaw (confidential, auth-code; redirect URI /oauth/callback per #915 spec) - stalwart (confidential, serviceAccountsEnabled=true so the directory.keycloak type=oidc backend can use client_credentials to introspect IMAP/SMTP tokens; standardFlowEnabled=true for webmail UI auth-code) * NEW per-app Secrets emitted in the same template scope as the realm ConfigMap so the realm JSON's `secret` field and the K8s Secret bytes never drift: - wordpress-oidc-client-secret - openclaw-oidc-client-secret - stalwart-oidc-client-secret (carries BOTH client-secret AND OIDC_CLIENT_SECRET keys for the two consumer paths) * Each per-app secret persists across helm upgrade via lookup-or-generate (mirrors marketplace-api/secret.yaml pattern from issue #887 and the existing catalyst-api-server secret in configmap-sovereign-realm.yaml). helm.sh/resource-policy: keep so bytes outlive uninstall. * Fail-closed validation when realmConfig.tenant.enabled=true and any of realmName / parentDomain / subdomain is unset (Inviolable Principle #4). NEW tests/tenant-realm-oidc-clients.sh covers 6 cases: 1. Sovereign-mode default render unchanged (kubectl + catalyst-ui + catalyst-api-server clients present, no tenant artefacts leak). 2. Tenant-mode render produces exactly ONE realm CM under the expected name + zero leaked Sovereign-only resources. 3. Tenant realm JSON parses + 3 OIDC clients present with the redirect-URI / publicClient / serviceAccountsEnabled shape per #915 spec; Secret bytes match realm JSON's `secret` fields. 4. Fail-closed validation when tenant fields missing. 5. keycloak-config-cli post-install Job projects the realm CM by SAME name in BOTH modes. 6. Operator-supplied per-app clientSecret overrides the lookup-or-generate path. Existing tests/observability-toggle.sh + tests/oidc-kubectl-client.sh still pass. Sovereign-mode unchanged. The chart now consumes the values the orchestrator (PR #911) was already emitting; no orchestrator change needed. Closes #915 (C1 sub-task) and unblocks #899 (per-tenant Keycloak realm-config materialisation). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 13:29:40 +04:00
e3mrah	93c4b700de	fix(bp-keycloak): templatize existingConfigmap reference for per-tenant installs (#899 ) (#902 ) bp-keycloak 1.3.2 hardcoded `keycloak.keycloakConfigCli.existingConfigmap` to the literal "keycloak-sovereign-realm-config". This worked for the Sovereign- mothership bootstrap-kit (releaseName=keycloak emits matching ConfigMap) but broke for every per-tenant install where releaseName=bp-keycloak emits "bp-keycloak-sovereign-realm-config" — the post-install keycloak-config-cli Job stuck in ContainerCreating with `MountVolume.SetUp failed for volume "config-volume" : configmap "keycloak-sovereign-realm-config" not found`, HelmRelease InstallFailed after 15m timeout, cascading to bp-openclaw and bp-wordpress-tenant which dependsOn it. The bitnami/keycloak subchart's `keycloak.keycloakConfigCli.configmapName` helper (charts/keycloak/templates/_helpers.tpl) applies `tpl` to the existingConfigmap value, so embedding `{{ .Release.Name }}` inside the string resolves at chart-render time. With this single-line change: - Sovereign-mothership (releaseName=keycloak) → keycloak-sovereign-realm-config (unchanged) - Per-tenant (releaseName=bp-keycloak) → bp-keycloak-sovereign-realm-config (matches actual emitted ConfigMap) Verified via helm template both modes — backendRef and config-volume configMap.name match the actual ConfigMap emitted by templates/configmap-sovereign-realm.yaml. Chart bumped 1.3.2 → 1.3.3 + bootstrap-kit slot 09 + blueprint.yaml. Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:49:39 +04:00
e3mrah	ab67a48fe7	fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817 ) (#819 ) TestBootstrapKit_BlueprintCardsHaveRequiredFields was failing on main for 9 blueprints because their platform/<name>/chart/Chart.yaml version had been bumped without a matching update to platform/<name>/blueprint.yaml spec.version. The pre-existing failure forced 7 recent PRs to self-merge with --admin, masking real CI failures. Aligned spec.version to match Chart.yaml version on: cert-manager 1.1.1 -> 1.1.2 flux 1.1.3 -> 1.1.4 crossplane 1.1.3 -> 1.1.4 sealed-secrets 1.1.1 -> 1.1.2 spire 1.1.4 -> 1.1.7 nats-jetstream 1.1.1 -> 1.1.2 openbao 1.2.0 -> 1.2.14 keycloak 1.3.1 -> 1.3.2 gitea 1.2.1 -> 1.2.3 Verified locally: $ go test ./... -run TestBootstrapKit_BlueprintCardsHaveRequiredFields -count=1 --- PASS: TestBootstrapKit_BlueprintCardsHaveRequiredFields (0.01s) ... all 10 sub-tests pass (cilium + the 9 above) The existing test (tests/e2e/bootstrap-kit/main_test.go:145) is itself the drift guardrail: it fails CI whenever Chart.yaml is bumped without a matching blueprint.yaml bump. No additional script needed. Closes #817 once verified on main. Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>	2026-05-04 22:32:49 +04:00
e3mrah	2e981f36a5	fix(bp-keycloak): catalyst-kc-sa-credentials addr → in-cluster Service URL (closes #781 ) (#788 ) Sovereign-side catalyst-api Pod's intra-cluster Keycloak calls (token mint, EnsureUser) were failing with `dial tcp: lookup auth.<sov-fqdn> on 10.43.0.10:53: no such host`. The Sovereign's CoreDNS resolves *.<sov-fqdn> via upstream resolvers — it does NOT forward to the in-cluster PowerDNS that holds those records. Public DNS works (PowerDNS authoritative), but Pod-side lookups of auth.<sov-fqdn> return NXDOMAIN. Live evidence — otech94 2026-05-04: handover URL returned `{"error":"keycloak error: ensure user"}` from a DNS lookup failure inside the catalyst-api Pod. Fix: bp-keycloak chart now writes the in-cluster Service URL (http://<release>.<namespace>.svc.cluster.local) into the catalyst-kc-sa-credentials Secret's `addr` key instead of the public gateway host (https://auth.<sov-fqdn>). This Secret is consumed EXCLUSIVELY by the in-cluster catalyst-api Pod via reflector mirror into catalyst-system; it is NEVER exposed to browsers. The HTTPRoute hostname (.Values.gateway.host) stays at auth.<sov-fqdn> for operator browsers — only the Pod's intra-cluster OAuth client_credentials calls switch to the Service URL. Catalyst-Zero (contabo) is unaffected: it runs `keycloak-zero` (separate chart in openova-private), not bp-keycloak. Changes: - platform/keycloak/chart/templates/configmap-sovereign-realm.yaml: Secret's $kcAddr unconditionally uses http://<release>.<namespace>.svc.cluster.local - platform/keycloak/chart/Chart.yaml: 1.3.1 → 1.3.2 - clusters/_template/bootstrap-kit/09-keycloak.yaml: chart version 1.3.1 → 1.3.2 - products/catalyst/chart/Chart.yaml: 1.3.0 → 1.3.1 (changelog entry only) - clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: 1.3.0 → 1.3.1 Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 20:34:22 +04:00
e3mrah	e96e31a781	fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713 ) (#714 ) Closes #713 Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing 401 on /auth/handover: 1. SOVEREIGN_FQDN race api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn" with optional:true. On Sovereigns, that ConfigMap is rendered by the sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform HelmRelease. When the Pod starts first, valueFrom collapses to "" and stays empty — audience check rejects every valid token as "invalid audience". Fix: add Reloader annotations so the Pod rolls when the ConfigMap (and the handover-jwt-public Secret) appears. 2. catalyst-api-server SA missing user-level realm-management role mappings bp-keycloak realm import granted roles via clientScopeMappings — wrong level. The actual service-account user had no clientRoles entry, so KC rejected GET /users with 403 when catalyst-api tried to ensure the operator user during handover. Fix: add explicit "users" array binding service-account-catalyst-api-server to realm-management.{impersonation, manage-users, view-users, query-users}. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 01:37:36 +04:00
e3mrah	7ca9541ef9	fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) (#691 ) * fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) Sovereign-side catalyst-api needs Keycloak service-account credentials to provision the operator's user during /auth/handover. Today the chart references K8s Secret `catalyst-kc-sa-credentials` with keys addr/realm/ client-id/client-secret in the catalyst-system namespace — but no zero-touch path materialised it. The dead SealedSecret template at 09a-keycloak-catalyst-api-secret.yaml had a different name AND different keys (CATALYST_KC_), used PLACEHOLDER_SEALED_VALUE markers no provisioner replaced, and wasn't even listed in the bootstrap-kit kustomization. Symptom on otech48: GET /auth/handover?token=<valid-jwt> returns "server misconfiguration: keycloak not configured" (auth_handover.go:169). Fix: bp-keycloak chart's configmap-sovereign-realm.yaml template now emits the realm-import ConfigMap AND the catalyst-kc-sa-credentials Secret in a single template scope so they share the same generated client secret. Pattern mirrors platform/powerdns/chart/templates/ api-credentials-secret.yaml (canonical seam, ADR-0001 §11.3 anti-duplication). Secret-value resolution order (first match wins): 1. operator-supplied .Values.catalystApiServerClientSecret 2. helm `lookup` of existing Secret in keycloak ns (idempotent) 3. fresh randAlphaNum 32 (zero-touch on first install) The Secret carries the four keys exactly as the catalyst-api Pod's secretKeyRef expects — addr / realm / client-id / client-secret — with addr derived from gateway.host (https://auth.<sovereignFQDN>). Reflector annotations auto-mirror the Secret to catalyst-system as soon as that namespace materialises (bootstrap-kit slot 13). The realm import already creates the catalyst-api-server client with serviceAccountsEnabled + impersonation/manage-users/view-users/ query-users role mappings — so once Keycloak is Ready and the realm imports, the SA is fully provisioned and the K8s Secret carries a matching client secret. No post-install Job, no Admin-API script, no out-of-band SealedSecret ceremony. Cleanup: removes the dead 09a SealedSecret template (not in kustomization, never produced a working Secret). Bumps: - bp-keycloak chart 1.3.0 -> 1.3.1 - clusters/_template/bootstrap-kit/09-keycloak.yaml HelmRelease pin 1.3.0 -> 1.3.1 Existing per-Sovereign overlays (clusters/otech.omani.works/, clusters/omantel.omani.works/) intentionally remain on 1.3.0 — fresh otechN provisioning consumes _template at provision time. Will be verified live on otech49 — handover end-to-end without ANY manual Secret creation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(keycloak): bump blueprint.yaml spec.version to match chart 1.3.1 TestBootstrapKit_BlueprintCardsHaveRequiredFields/keycloak asserts Chart.yaml.version == blueprint.yaml.spec.version. Forgot to bump blueprint.yaml in the previous commit. Note: 8 other blueprints (cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, gitea) carry the same pre-existing mismatch and the test fails on main too. Out of scope for this PR; fixing the keycloak case to keep the new chart version internally consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:50:06 +04:00
e3mrah	737574b19a	feat(bp-keycloak): Phase-8b sovereign realm — token-exchange, catalyst-ui/api-server OIDC clients, SMTP, bump 1.2.2 → 1.3.0 (#604 ) (#609 ) Adds the full Phase-8b identity surface required by the seamless handover flow: - Token exchange enabled on sovereign realm (attributes.token-exchange: true) - catalyst-ui public PKCE client: redirectUris + webOrigins keyed on console.<sovereignFQDN>, groups + requiredActions in ID token - catalyst-api-server confidential service-account client: impersonation + manage-users + view-users + query-users roles on realm-management; client secret injected at provisioning time via .Values.catalystApiServerClientSecret - WebAuthn (webauthn-register + webauthn-register-passwordless) registered as Required Action options on the realm - UPDATE_PASSWORD set as defaultAction: true for new users - smtpServer block: pre-handover default = contabo Stalwart relay; fully operator-configurable via .Values.smtp.* (Phase-8c-acceptable) - required-actions client scope + oidc-usermodel-attribute-mapper for requiredActions claim in ID token (catalyst-ui first-login UX) Architectural change: realm JSON moved from inline values.yaml (keycloak: subchart key — no parent scope access) to a parent-chart template platform/keycloak/chart/templates/configmap-sovereign-realm.yaml, which can read .Values.sovereignFQDN and .Values.smtp.* for per-Sovereign interpolation. The upstream bitnami chart's keycloakConfigCli.existingConfigmap is pointed at this ConfigMap. Anti-duplication seam: configmap-sovereign-realm.yaml. New values.yaml keys: sovereignFQDN: "" (REQUIRED — per-Sovereign overlay supplies it) sovereignRealm.enabled: true catalystApiServerClientSecret: "" (REQUIRED — provisioner seals and injects) smtp.host/port/from/user/password/ssl/starttls/auth New bootstrap-kit file: 09a-keycloak-catalyst-api-secret.yaml — SealedSecret template for keycloak-catalyst-api-server-credentials in catalyst-system namespace; provisioner fills encryptedData fields at deploy time Bootstrap-kit refs bumped 1.2.x → 1.3.0 in _template, otech, omantel. helm template clean with sovereignFQDN=otech.omani.works. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 17:05:27 +04:00
e3mrah	b1a25c4235	fix(bp-keycloak,bp-openbao): HTTPRoute backend wrong name + RBAC hook lifecycle bug (#598 ) (#600 ) Bug A — bp-keycloak@1.2.2: HTTPRoute backendService default was `<release>-keycloak` (gave `keycloak-keycloak` with releaseName=keycloak) but bitnami's fullname helper trims the chart-name suffix when Release.Name already contains it, so the Service is just `keycloak`. Changed default to `.Release.Name`. Sovereign realm was already imported (config-cli ran successfully) — only the Gateway routing was broken, returning HTTP 500. Bug B — bp-openbao@1.2.6: auto-unseal-rbac SA/Role/RoleBinding had `helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded`. The `hook-succeeded` clause caused Helm to delete the SA immediately after the weight-0 RBAC hook completed, before the weight-5 init Job pod could mount its SA token and start. Removed all hook annotations from the RBAC resources so they are managed by regular Helm release lifecycle (created before hooks, never deleted mid-install). Bootstrap-kit refs bumped: bp-keycloak 1.2.0→1.2.2, bp-openbao 1.2.4→1.2.6. Verified on otech22 (manual remediation): Keycloak sovereign realm OIDC endpoint returns valid JSON, openbao-0 Initialized=true Sealed=false. Co-authored-by: alierenbaysal <alierenbaysal@openova.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 15:43:32 +04:00
e3mrah	83ec889f06	feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560 ) (#580 ) Charts bumped: - bp-keycloak 1.2.0 -> 1.2.1 (subchart stub; per-component image.registry knobs documented) - bp-crossplane 1.1.3 -> 1.1.4 (subchart stub) - bp-crossplane-claims 1.1.0 -> 1.1.1 (global.kubectlImage added; kubectl Job image templated; Hetzner ubuntu-24.04 server images intentionally untouched) - bp-velero 1.2.0 -> 1.2.1 (subchart stub) - bp-kyverno 1.0.0 -> 1.0.1 (subchart stub; per-controller image.registry knobs documented) - bp-trivy 1.0.0 -> 1.0.1 (subchart stub; both operator + scanner image.registry knobs documented) - bp-grafana 1.0.0 -> 1.0.1 (subchart stub) - bp-flux 1.1.3 -> 1.1.4 (subchart stub; per-controller image.repository knobs documented) - bp-catalyst-platform 1.1.13 -> 1.1.14 (global.imageRegistry + images.{catalystApi,catalystUi,marketplaceApi,console,smeTag} added; all 14 Catalyst-authored image refs templated: catalyst-api, catalyst-ui, marketplace-api, console + 10 SME services) Post-handover per-Sovereign overlays set global.imageRegistry to harbor.<sovereign-fqdn> so every container image pull routes through the Sovereign's own Harbor proxy_cache. Closes (partial): issue #560 — all 23 bp-* charts now carry global.imageRegistry Co-authored-by: alierenbaysal <alierenbaysal@openova.io>	2026-05-02 13:21:53 +04:00
e3mrah	20b896070f	feat(bp-keycloak + infra): Sovereign K8s OIDC config for kubectl via per-Sovereign Keycloak realm (closes #326 ) (#448 ) Wires the per-Sovereign K8s api-server's --oidc-* validator to the per-Sovereign Keycloak realm so customer admins can authenticate kubectl directly against their Sovereign — no static admin-kubeconfig handoff, no rotated bearer-token exchange. infra (cloud-init): - Add 6 --kube-apiserver-arg=oidc-* flags to the k3s install line in infra/hetzner/cloudinit-control-plane.tftpl. Issuer URL composed from sovereign_fqdn (https://auth.\${sovereign_fqdn}/realms/sovereign) per INVIOLABLE-PRINCIPLES #4 — never hardcoded. Username/groups prefixes scope OIDC subjects under "oidc:" so RoleBindings reference e.g. subjects[0].name=oidc:alice@org, distinct from local SAs/x509. Canonical seam (anti-duplication rule, ADR-0001 §11.3): - The bp-keycloak chart already bundles bitnami/keycloak's keycloakConfigCli post-install Helm hook Job, which imports realms declared under values.keycloak.keycloakConfigCli.configuration. We enable the existing seam — no bespoke kubectl-exec realm-creation script, no custom Admin-API call from catalyst-api. bp-keycloak chart (1.1.2 → 1.2.0): - Enable keycloakConfigCli + ship inline sovereign-realm.json with: realm "sovereign" (invariant per Sovereign — Keycloak resolves the issuer claim from the request hostname, so no per-FQDN realm rename), default groups sovereign-admins/-ops/-viewers, oidc-group -membership-mapper emitting "groups" claim, public OIDC client "kubectl" with localhost:8000 + OOB redirect URIs (kubectl-oidc -login defaults), publicClient=true (kubectl runs locally and cannot safely hold a secret), PKCE S256 enforced. - Bump version 1.1.2 → 1.2.0 (semver MINOR, additive shape). - Bump bootstrap-kit slot 09 in _template/, omantel.omani.works/, otech.omani.works/ to version: 1.2.0. - New chart test tests/oidc-kubectl-client.sh (4 cases) — all green. - Existing tests/observability-toggle.sh — still green. Documentation: - Add §11 "kubectl OIDC for customer admins" runbook to docs/omantel-handover-wbs.md with one-time workstation setup (kubectl krew install oidc-login + config set-credentials), sovereign-admin RBAC binding (oidc:sovereign-admins → cluster -admin), and 401-debugging table mapping common symptoms to root causes. - Carve #326 out of §7 "Out of scope" — it is shipped. - Add §9 status row. Validation: - grep -c 'oidc-issuer-url' infra/hetzner/cloudinit-control-plane.tftpl → 2 (comment + the actual flag in the curl line) - grep -c 'oidc-username-claim' → 2 - helm template platform/keycloak/chart → renders post-install keycloak-config-cli Job + ConfigMap with kubectl client (3 hits on grep "kubectl"; 1 hit on "clientId": "kubectl") - bash scripts/check-vendor-coupling.sh → exit 0 (HARD-FAIL mode) - 4/4 oidc-kubectl-client gates green; 3/3 observability-toggle gates green Out of scope (deferred to follow-up tickets): - Per-Sovereign user provisioning UI (#322, #323) - Refresh-token revocation on RoleBinding deletion (#324) - provider-kubernetes Crossplane ProviderConfig per Sovereign (#321) - omantel migration / Phase 8 live execution NO catalyst-api or UI source files touched (those are #319/#322/#323 agents' territories per agent brief). Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>	2026-05-01 19:07:52 +04:00
e3mrah	a1bd550208	fix(charts): HTTPRoute templates skip-render on missing host (was failing default-values render) (#402 ) Blueprint-release for #401 failed because HTTPRoute templates use {{- fail }} when gateway.host is not set, which trips the chart default-values render gate in CI. Switched 6 templates from 'fail loud' to 'skip render': if .Values.gateway.host → emit HTTPRoute else → emit nothing The Gateway API admission already rejects HTTPRoute with empty hostnames, so the loud-fail wasn't buying anything an operator wouldn't see at apply time. Default-values render now produces zero HTTPRoute resources, which is the correct shape for the upstream chart consumers that don't set the Sovereign-only gateway block. Files: keycloak, gitea, openbao, grafana, harbor, catalyst-platform. Verified: helm template t products/catalyst/chart/ → 0 HTTPRoutes (clean) helm template t products/catalyst/chart/ --set ingress.gateway.enabled=true --set ingress.hosts.console.host=console.test --set ingress.hosts.api.host=api.test → 2 HTTPRoutes Closes the blueprint-release failure on commit `abf01b6f`. Co-authored-by: hatiyildiz <hati@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:23:58 +04:00
e3mrah	abf01b6f21	feat(platform): Gateway API migration audit (#387 ) (#401 ) Migrates every minimal-Sovereign-set blueprint chart from networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute, replacing the legacy Traefik-on-Sovereigns assumption with the canonical Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2 correction note (#388). The single per-Sovereign Gateway is added as additional documents in the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml (NOT a new top-level slot), since Cilium owns the GatewayClass. It includes: - Certificate `sovereign-wildcard-tls` requesting `.${SOVEREIGN_FQDN}` from `letsencrypt-dns01-prod` (cert-manager + #373 webhook) - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's existing `templates/` directory): \| Blueprint \| Host pattern \| Backend port \| \|---------------------\|---------------------------------\|--------------\| \| bp-keycloak \| auth.<sov> \| 80 \| \| bp-gitea \| git.<sov> \| 3000 \| \| bp-openbao \| bao.<sov> \| 8200 \| \| bp-grafana \| grafana.<sov> \| 80 \| \| bp-harbor \| registry.<sov> \| 80 \| \| bp-powerdns \| pdns.<sov>/api (dual-mode) \| 8081 \| \| bp-catalyst-platform\| console.<sov>, api.<sov> \| 80, 8080 \| bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute (Sovereign) simultaneously — the per-Sovereign overlay sets `api.gateway.enabled=true` while leaving `api.enabled=true`. The Ingress object is harmless on Cilium clusters with no Traefik. This preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4. bp-harbor flips `expose.type` from `ingress` to `clusterIP` in platform/harbor/chart/values.yaml so the upstream chart no longer emits its own Ingress; the HTTPRoute is the sole HTTP exposure. TLS terminates at the Gateway (wildcard cert) rather than per-host Certificates inside the chart. bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by .helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml, which remain contabo-only legacy demo infra). The contabo path keeps serving console.openova.io/sovereign via Traefik unchanged. Bootstrap-kit slot updates (per-Sovereign hostname interpolation): - 08-openbao.yaml → gateway.host: bao.${SOVEREIGN_FQDN} - 09-keycloak.yaml → gateway.host: auth.${SOVEREIGN_FQDN} - 10-gitea.yaml → gateway.host: gitea.${SOVEREIGN_FQDN} - 11-powerdns.yaml → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true - 19-harbor.yaml → gateway.host: registry.${SOVEREIGN_FQDN} - 25-grafana.yaml → gateway.host: grafana.${SOVEREIGN_FQDN} Server-side dry-run validation against the live Cilium Gateway API CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway + Certificate apply cleanly via `kubectl apply --dry-run=server`. Contabo unaffected: clusters/contabo-mkt/ not modified. The legacy SME ingresses (console-nova, marketplace, admin, axon, talentmesh, stalwart, ...) continue to serve via Traefik as before. powerdns on contabo remains on the Ingress path (api.gateway.enabled defaults to false at the chart level). Closes #387. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:19:30 +04:00
e3mrah	fa0e3a494b	fix(bp-keycloak): pin to current Bitnami tag (closes #191 ) (#198 ) * fix(bp-keycloak): pin to current Bitnami Keycloak tag (closes #191) Bitnami consolidated their tag scheme around 2025-09 (see https://github.com/bitnami/charts/issues/30852). The chart was pinned to upstream bitnami/keycloak Helm chart 24.7.1, whose default image tag `bitnami/keycloak:26.2.4-debian-12-r0` now returns 404 in the Docker Hub registry — installs hit ImagePullBackOff (verified on omantel). Changes: - Upstream Bitnami chart: 24.7.1 -> 25.2.0 (latest, appVersion 26.3.3) - Override image.registry/image.repository for every Bitnami image used by the chart (keycloak app, keycloak-config-cli, postgresql, postgres-exporter, os-shell) to point at `bitnamilegacy/`, where the historic debian-12 tags are preserved - Replace deprecated `proxy: edge` with `proxyHeaders: "xforwarded"` (chart 25.x renamed the field; Catalyst fronts Keycloak with Cilium Gateway which sets X-Forwarded- headers) - bp-keycloak chart version: 1.1.1 -> 1.1.2 Verification (registry HEAD via Bearer token): bitnami/keycloak:26.2.4-debian-12-r0 -> 404 (broken pin) bitnami/keycloak:26.3.3-debian-12-r0 -> 404 (registry move) bitnamilegacy/keycloak:26.3.3-debian-12-r0 -> 200 bitnamilegacy/keycloak-config-cli:6.4.0-... -> 200 bitnamilegacy/postgresql:17.6.0-debian-12-r0 -> 200 bitnamilegacy/postgres-exporter:0.17.1-... -> 200 bitnamilegacy/os-shell:12-debian-12-r50 -> 200 `helm template platform/keycloak/chart` renders cleanly; rendered images all resolve to bitnamilegacy/* tags listed above. Long-term follow-up (not blocking): bitnamilegacy is explicitly marked "no longer updated, may be removed in the future" — Catalyst should either build its own Keycloak image or migrate to the Bitnami Secure Image (BSI/Photon) catalog when chart support catches up. Tracked in the bp-keycloak description block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-keycloak): bump blueprint.yaml version to match Chart.yaml 1.1.2 --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 20:10:17 +02:00
e3mrah	1f5c76def1	fix(platform): sync blueprint.yaml versions with Chart.yaml (#199 ) * feat(ui): Playwright cosmetic + step-flow regression guards 15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic- guards.spec.ts that fail HARD when each user-flagged defect class returns: 1. card height drift from canonical 108px 2. reserved right padding eating description width 3. logo tile drift from per-brand LOGO_SURFACE 4. invisible glyph (white-on-white) via luminance proxy 5. wizard step order Org/Topology/Provider/Credentials/Components/ Domain/Review 6. legacy "Choose Your Stack" / "Always Included" tab labels 7. Domain step reachable before Components 8. CPX32 not the recommended Hetzner SKU 9. per-region SKU dropdown shows wrong provider catalog 10. provision page is .html (static) not SPA route 11. legacy bubble/edge DAG SVG markup on provision page 12. admin sidebar drift from canonical core/console (w-56 + 7 labels) 13. AppDetail uses tablist instead of sectioned layout 14. job rows navigate to /job/<id> instead of expand-in-place 15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage Each test prints a failure message naming the canonical reference, the source-of-truth file, and the data-testid PR needed (if any) so the implementing agent has a precise target. No .skip() — per INVIOLABLE-PRINCIPLES #2, missing components fail loud. CI: .github/workflows/cosmetic-guards.yaml runs the suite on every PR that touches products/catalyst/bootstrap/ui/ or core/console/. Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's original complaint, the canonical reference, and the green/red semantics (5 tests intentionally RED on main today — they stay red until the companion-agent's UI work lands). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 22:07:55 +04:00
hatiyildiz	1ddd569789	fix(bp-): observability toggles default false — break circular CRD dependency Extends the v1.1.1 hardening that started with cilium / cert-manager / crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints. Every observability toggle in every Catalyst-curated Blueprint now ships `false`/`null` by default; the operator opts in via a per-cluster values overlay at clusters/<sovereign>/bootstrap-kit/ once bp-kube-prometheus-stack reconciles. Live failure mode that prompted this (omantel.omani.works 2026-04-29): bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor to true. The upstream Cilium 1.16.5 chart renders a monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with kube-prometheus-stack — a tier-2 Application Blueprint that depends on the bootstrap-kit (cilium first). Helm install fails on a fresh Sovereign with "no matches for kind ServiceMonitor in version monitoring.coreos.com/v1 — ensure CRDs are installed first" and every downstream HelmRelease reports `dep is not ready`. The earlier trustCRDsExist=true mitigation only suppresses Helm's render-time gate; the apiserver still rejects the resource at install-time. Per-Blueprint changes: - bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false; hubble.metrics.enabled → null (this is the exact value that disables the upstream metrics ServiceMonitor template branch — verified by reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor .enabled → false. tests/observability-toggle.sh extended with Case 4 (default render produces no hubble-relay / hubble-ui Deployments). - bp-flux: flux2.prometheus.podMonitor.create → false. - bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled → false (explicit lock; upstream already defaults false). - bp-spire: spire.global.spire.recommendations.enabled + recommendations.prometheus → false. - bp-nats-jetstream: nats.promExporter.enabled + promExporter.podMonitor.enabled → false. - bp-openbao: openbao.injector.metrics.enabled + openbao.serviceMonitor.enabled → false. - bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled + metrics.prometheusRule.enabled → false. - bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.* serviceMonitor + prometheusRule → false. - bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled → false (forward-compatibility guard; current upstream pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future upstream bump cannot silently regress). Each chart ships a tests/observability-toggle.sh that asserts the rule in three cases (default off / explicit on opt-in / explicit off) — runs under blueprint-release.yaml's chart-test gate (added `bdeb0f54` + the existing wiring) before helm push. A regression that re-introduces a hardcoded enabled: true in any chart fails CI before the OCI artifact is published. Versioning: - All 11 leaf charts bumped 1.1.0 → 1.1.1. - products/catalyst/chart (bp-catalyst-platform umbrella) deps updated to 1.1.1 across the board. - clusters/_template/bootstrap-kit/03-flux through 10-gitea bumped to 1.1.1; clusters/omantel.omani.works/bootstrap-kit/* mirror. docs/BLUEPRINT-AUTHORING.md §11.2 table extended to enumerate every toggle disabled across all 11 Blueprints. References docs/INVIOLABLE-PRINCIPLES.md #4. GATES (all green): - helm dep build resolves cleanly post-change for every chart whose upstream is published (umbrella waits on per-leaf publish). - helm lint clean on all 11 leaves. - helm template . default render produces zero monitoring.coreos.com references on every leaf (verified locally). - tests/observability-toggle.sh PASS on all 11 leaves. Live verification: with v1.1.1 published the omantel.omani.works HelmRelease can roll forward without a manual values patch — Flux picks up the new chart digest automatically (semver: 1.x in OCIRepository). Refs: issue #182.	2026-04-29 19:23:52 +02:00
hatiyildiz	43aff20254	feat(bp-): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp- charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope).	2026-04-29 17:21:36 +02:00
hatiyildiz	62d9c7d936	fix(charts): drop dependencies block — wrappers carry values overlay only The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks. Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values. This keeps: - blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd) - the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork) - the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>) Changes: - 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package. - 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values. - products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up. After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.	2026-04-28 12:57:29 +02:00
hatiyildiz	441ebaebb8	fix(charts): pin upstream chart versions/names to ones that exist in their repos The first Blueprint Release CI run (commit `8c0f766`) failed because four chart wrappers referenced upstream chart versions/names that don't exist in their published repositories: - platform/flux/chart: name was "flux", repo was OCI; actual is name "flux2" in plain helm repo at https://fluxcd-community.github.io/helm-charts. Pinned to 2.13.0. - platform/openbao/chart: version 2.1.0 was the binary appVersion, not the chart version. Pinned to 0.16.0 chart (which packages openbao 2.1.0 internally). - platform/keycloak/chart (Bitnami): chart version 25.0.6 was the appVersion of upstream; Bitnami's chart is at 24.7.1 packaging Keycloak 26.0.x. Pinned to 24.7.1. - platform/nats-jetstream/chart: name was "nats-jetstream"; the upstream chart is named "nats" (it always was — JetStream is a feature of NATS, not a separate chart). Renamed. Cilium, cert-manager, crossplane, sealed-secrets, spire wrappers were unaffected; their version pins matched upstream availability. Containerd permission-denied errors from `helm package` on cilium/cert-manager/crossplane/gitea/sealed-secrets are a separate CI plumbing issue (helm tries to pull OCI base images during package build via containerd, but the GitHub Actions runner blocks containerd socket access). Tracked as a follow-up: switch to `helm package --skip-refresh` or use a runner with containerd permissions. After this commit lands, the next blueprint-release CI run should green-build at minimum the 4 fixed charts. Successful builds publish bp-{flux,openbao,keycloak,nats-jetstream}:1.0.0 OCI artifacts to ghcr.io/openova-io/.	2026-04-28 12:55:21 +02:00
hatiyildiz	8c0f76640c	feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit `07b4bcf`) needs. Each chart is the canonical bp-<name> source per BLUEPRINT-AUTHORING.md §1's source-location rule. 11 charts created with Chart.yaml + values.yaml + blueprint.yaml each: Network + GitOps: - platform/cilium/chart — wraps cilium 1.16.5; kubeProxyReplacement, WireGuard mTLS, Hubble, Gateway API - platform/flux/chart — wraps flux 2.4.0 - platform/crossplane/chart — wraps crossplane 1.18.0 + provider-hcloud manifest Security: - platform/cert-manager/chart — wraps cert-manager 1.16.2 with CRDs+ServiceMonitor - platform/sealed-secrets/chart — wraps sealed-secrets 2.16.1 (transient bootstrap-only) - platform/spire/chart — wraps spiffe/spire 1.10.4 (5-min SVID rotation) Catalyst control-plane services: - platform/nats-jetstream/chart — wraps nats 2.10.22 (3-node cluster, JetStream + KV) - platform/openbao/chart — wraps openbao 2.1.0 (3-node Raft, region-local per SECURITY §5) - platform/keycloak/chart — wraps keycloak 25.0.6 (Bitnami flavor, edge proxy mode) - platform/gitea/chart — wraps gitea 10.5.0 (CNPG Postgres backend, no chart-bundled valkey/redis since Catalyst control plane uses JetStream) New platform/ folders (added per AUDIT-PROCEDURE component-count anchor — was 53, now 55): - platform/spire/README.md — workload identity Catalyst control plane component - platform/nats-jetstream/README.md — control-plane event spine - platform/sealed-secrets/README.md — transient bootstrap-only Each blueprint.yaml declares: - catalyst.openova.io/v1alpha1 Blueprint kind (canonical CRD per BLUEPRINT-AUTHORING §3) - visibility: unlisted (mandatory infra, auto-installed by bootstrap kit, not a marketplace card) - manifests.chart: ./chart pointer - depends: [] (foundational components have no Blueprint dependencies; control-plane services depend on each other implicitly via bootstrap order, not via Blueprint depends) .github/workflows/blueprint-release.yaml: - New CI workflow per BLUEPRINT-AUTHORING §11 (path-matrix per Blueprint folder) - Triggers on push to main touching platform//chart/* or products//chart/* - detect job: emits matrix of changed Blueprint folders via git diff - build job (per chart): helm dependency build → helm package → helm push to GHCR → cosign keyless sign (GitHub OIDC) → Syft SBOM attestation - Output: ghcr.io/openova-io/bp-<name>:<semver> with SLSA-3-style supply-chain provenance Closes [F] tickets: 11 G2 charts (cilium, cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, keycloak, gitea, plus the umbrella products/catalyst/chart already exists from Pass 105). blueprint.yaml CRDs added across 11 entries. CI fan-out workflow live. After this commit lands, the bootstrap-kit installer in commit `07b4bcf` has real OCI artifacts to install. The first push to main will trigger 10 build matrix jobs (cilium was created in a separate commit earlier in this session) which produce 10 cosigned bp-<name>:<semver> artifacts on GHCR. Component-count anchor update follows: 53 → 55 (added spire + nats-jetstream + sealed-secrets — but sealed-secrets was already conceptually counted under "supporting services"). Per AUDIT-PROCEDURE the count needs updating in CLAUDE.md, BUSINESS-STRATEGY, TECHNOLOGY-FORECAST L11. Tracked as separate ticket [K] docs.	2026-04-28 12:51:06 +02:00
hatiyildiz	70fea3ab8f	docs(pass-34): banned-term TENANT sweep + keycloak hostname drift GLOSSARY's banned term "tenant" survived in Configuration tables and Flux postBuild substitutions across product READMEs as ${TENANT} (uppercase ENV var). Prior banned-term greps searched lowercase `tenant` so the ALL-CAPS form slipped through. Product README fixes: - products/cortex: TENANT/DOMAIN → ORGANIZATION/SOVEREIGN_DOMAIN, plus two DNS placeholder fixes for llm-gateway and chat URLs (same shape Pass 25/31 fixed elsewhere). - products/fingate: 6 instances (Flux substitution, Configuration table, 4 URL templates) renamed. URL shape api.openbanking.<org>.<sov-dom> flagged as 4-segment FQDN that doesn't match NAMING §5.1 or §5.2 — deferred to a deeper architectural pass. - products/fabric: Configuration table row renamed. Component README: - platform/keycloak: shared-sovereign hostname auth.<sovereign-domain> and per-organization auth.<org>.<sovereign-domain> both missing <location-code> per NAMING §5.1. Fixed. platform/librechat ${TENANT_ID} preserved — that's Microsoft Azure AD tenant-ID (external technology, exempted by GLOSSARY). Validation log Pass 34 entry includes meta-note: always run a global grep for the surfaced drift category before closing a pass, to avoid the asymmetric-drift problem Pass 25 warned against.	2026-04-27 22:42:50 +02:00
hatiyildiz	b467dc3f3b	docs(pass-18): NAMING DR-as-env_type misexample + Keycloak deployment topology Pass 18 — drift-detection on NAMING-CONVENTION + platform/keycloak. Two real findings. NAMING-CONVENTION §11.1: - The example list of Catalyst Environments included `bankdhofar-dr` — but `dr` is NOT a valid env_type. Canonical values per §2.4 are prod / stg / uat / dev / poc. DR is a Placement mode (active-active / active-hotstandby across regions inside the -prod Environment), not a separate Environment. - Replaced `bankdhofar-dr` with `bankdhofar-uat` and added an explicit "DR is a Placement, not an Env Type" note. platform/keycloak/README.md: - Keycloak Deployment YAML example used `namespace: open-banking` with 2 replicas — Fingate-specific narrative that contradicted the per-Org / per-Sovereign topology stated in the banner. Rewrote with two side-by-side examples: shared-sovereign (3 HA replicas, catalyst-keycloak namespace, CNPG-backed) * per-organization (1 replica in <org> namespace, optional embedded DB for smallest SME tier) - HA section was a single set of claims (2+ replicas, CNPG, Infinispan) that only matched corporate. Now branches on topology — corporate gets HA + Infinispan, SME gets single replica with restart-on- deploy as acceptable for tier SLAs. Same kind of drift Pass 17 caught in Harbor: banner says one thing, body still describes the older model. Both fixed. VALIDATION-LOG: Pass 18 entry added. Refs #37	2026-04-27 22:00:42 +02:00
hatiyildiz	14ed84de41	docs(pass-8): role-in-Catalyst banners + dead-link fix in component READMEs Pass 8 — line-by-line read of platform/cnpg, platform/strimzi, platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium. CNPG and Strimzi: read in full and confirmed clean — they correctly position themselves as Application Blueprints and don't drift from the canonical model. CNPG's `<org>-postgres-dr` cluster name (Application-tier database role) is acceptable per NAMING-CONVENTION §1.3 (which only forbids primary/dr in K8s host-cluster names, not in Application-internal CRD names). Four READMEs updated: k8gb: - Header reframed: per-host-cluster infrastructure pointer to PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection. - Removed dead link to ../failover-controller/docs/ADR-FAILOVER- CONTROLLER.md (the failover-controller folder has no docs/); replaced with link to that component's README + SRE §2.4. keycloak: - Header reframed from "FAPI Authorization Server for Open Banking" (narrow) to "User identity for Catalyst Sovereigns" (broad). Keycloak handles ALL user identity in Catalyst, not just FAPI. - Added per-Org / per-Sovereign topology callout matching SECURITY §6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party Providers, not Catalyst's Organization-level multi-tenancy. - FAPI features kept since Keycloak still serves Fingate as the FAPI Authorization Server. cert-manager: - Header reframed as per-host-cluster infrastructure with pointer to PLATFORM-TECH-STACK §3.3. cilium: - Header reframed as per-host-cluster infrastructure with pointer to PLATFORM-TECH-STACK §3.1, including the install-first note (CNI must come before any other workload during Phase 0). VALIDATION-LOG: Pass 8 entry added. Refs #37	2026-04-27 21:39:03 +02:00
talent-mesh	c9d04a53b4	refactor: flatten platform/ structure (41 components) Remove hierarchical grouping (networking/, security/, etc.) and use flat structure for all 41 platform components. Changes: - All components now directly under platform/ (no subfolders) - AI Hub components moved from meta-platforms/ai-hub/components/ to platform/ - Open Banking components (lago, openmeter) moved to platform/ - meta-platforms/ now only contains README files that reference platform/ - Open Banking custom services remain in meta-platforms/open-banking/services/ Structure: - platform/ (41 components, flat) - meta-platforms/ai-hub/ (README only, references platform/) - meta-platforms/open-banking/ (README + 6 custom services) All documentation links updated. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-08 15:19:48 +00:00

30 Commits