sandbox-wave1-controller-chart
30 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
c96d7b5089
|
fix(bp-keycloak): retune install retries to fit HR envelope (#146) (#1352)
Diagnostic finding: prov #21 + #22 hung 30+ min on bp-keycloak install. Three coupled root causes from Fix #140 retune: 1. availabilityCheck.timeout=900s meant a single failed availability attempt busted the 30m HR window before Job-level backoff retried. 2. HR install/upgrade.remediation.retries=3 triggered up to 3 full Helm uninstall+reinstall cycles, losing Liquibase state each time. Worst case: 90m+ wall-clock before Flux gave up. 3. Liquibase + JVM cold-start legitimately took 5-10 min before Keycloak's Service had Endpoints, but bitnami's livenessProbe (initialDelaySeconds 300) killed Pods mid-migration. Three target-state changes (chart 1.4.4 -> 1.4.5): - platform/keycloak/chart/values.yaml: availabilityCheck.timeout 900s -> 300s. ~5 attempts fit in 30m HR envelope vs. ~1.5 at 900s. - platform/keycloak/chart/values.yaml: keycloak.startupProbe.enabled with failureThreshold 360 x periodSeconds 5 = 30m budget. Suspends livenessProbe until first /realms/master 200. livenessProbe. initialDelaySeconds 300 -> 60. - clusters/_template/bootstrap-kit/09-keycloak.yaml: install/upgrade.remediation.retries 3 -> 1 + chart pin 1.4.5. Job's own backoffLimit=5 handles retries without losing state. All knobs remain operator-overridable via per-Sovereign overlay valuesFrom (Inviolable Principle #4: no hardcoding). TODO follow-up (out of scope per diagnostic "ship knob bumps first to validate hypothesis"): move realm-import out of the bitnami post-install Helm hook into a Catalyst-owned Job that runs after Keycloak Service has Endpoints. Decouples HR-Ready from realm- imported and lets the orchestrator wait on the Job CR directly. Refs #146. Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6662d672d3
|
fix(bp-keycloak): post-upgrade hook regression on prov #21 (#140) (#1349)
Fix #129 (1.4.3) set availabilityCheck.timeout=600s + backoffLimit=5, which works for fresh-install but races on UPGRADE. On prov #21 (f84f6c3ff2b60296, 2026-05-11) chart-roll on contabo (PR #1346/#1347 stack) triggered an HR upgrade; the keycloak StatefulSet rolled, Helm fired the post-upgrade hook before the new Pod's admin endpoint recovered (Liquibase re-validation + JVM cold start on freshly- provisioned node), and the inner 600s window expired before the first attempt found Keycloak Ready. With backoffLimit=5 + 10s..6m exponential backoff the worst-case wall clock exceeds the parent HR's 15m upgrade.timeout -> Helm aborts the hook -> "post-upgrade hooks failed". Target-state fix (Principle #3: no half-fix; both chart and HR move together): - platform/keycloak/chart/values.yaml: availabilityCheck.timeout 600s -> 900s (15m inner wait covers a single rolling-restart + Liquibase cycle without retry, eliminating most backoff time); cleanupAfterFinished.enabled true with 1h TTL so stale hook Pods don't race the before-hook-creation delete on subsequent upgrades. - platform/keycloak/chart/Chart.yaml: 1.4.3 -> 1.4.4 + 1.4.4 changelog block. - clusters/_template/bootstrap-kit/09-keycloak.yaml: HR install + upgrade timeout 15m -> 30m so Helm's outer hook-wait gracefully accommodates the inner 15m availability window plus normal backoff. Pin chart version 1.4.3 -> 1.4.4. All knobs remain operator-overridable via per-Sovereign valuesFrom (Principle #4: no hardcoding). Hook semantics stay intact (Principle #3: workaround would be disabling annotations, which breaks downstream bp-gitea + catalyst-api contract that the realm exists before HR Ready). Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a3005ead67
|
fix(bp-keycloak): bump keycloak-config-cli hook timeouts (#129) (#1341)
Fresh-Sovereign provision #15 (otech 0ad3687ddd72deb7) wedged at phase1-watching for 30+ min: bp-keycloak HelmRelease failed with `post-upgrade hooks failed: timed out waiting for the condition` → bp-gitea (dependsOn keycloak OIDC) blocked → bp-self-sovereign-cutover never converged. Root cause ────────── The bitnami keycloak subchart's `keycloak-config-cli-job.yaml` is rendered as a Helm post-install/post-upgrade/post-rollback hook (default annotations on the Job, weight 5). On a fresh k3s the realm-import Job fires before Postgres+Liquibase finish bootstrapping Keycloak (legitimately 3-10 min), and the bitnami subchart defaults are too tight to absorb that race: - keycloakConfigCli.availabilityCheck.timeout="" → keycloak-config-cli falls back to its internal ~120s wait for Keycloak's /admin endpoint - keycloakConfigCli.backoffLimit: 1 → only 2 Pod attempts total before the Job is marked Failed Both attempts hit the 120s window, Job goes Failed, Helm reports the post-upgrade hook timed out, HR install/upgrade retries (×3) all hit the same race, HR remains Failed → downstream blueprints never install. Fix ─── Tune the hook's internal timing to fit comfortably inside the parent HR's 15m install/upgrade timeout while leaving headroom for cold image pull + Pod scheduling: keycloak.keycloakConfigCli.availabilityCheck.timeout: "600s" (was "") keycloak.keycloakConfigCli.backoffLimit: 5 (was 1) Both knobs remain operator-overridable via per-Sovereign `valuesFrom` (Inviolable Principle #4: no hardcoding). Per Inviolable Principle #3 (no workarounds), this does NOT disable the hook semantics — disabling the hook would break the documented contract that the realm exists before the HR reaches Ready (downstream bp-gitea + catalyst-api consume the realm). Files ───── platform/keycloak/chart/values.yaml (+59 inline rationale) platform/keycloak/chart/Chart.yaml (1.4.2 → 1.4.3 + changelog) clusters/_template/bootstrap-kit/09-keycloak.yaml (HR pin → 1.4.3) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2ef01849bf
|
fix(bp-keycloak): truncate catalyst-api-server desc <255 chars (1.4.2 backport) (#1285)
* fix(bp-keycloak): truncate catalyst-api-server description <255 chars (Postgres limit)
Keycloak DB column CLIENT.DESCRIPTION = varchar(255). Previous value was
458 chars, causing realm-config-cli post-install hook to fail with
PSQLException value too long. Caught on omantel provision #6 iter-13
chart roll — keycloak-config-cli Job CrashLoop, bp-keycloak HR False,
upstream HRs blocked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(bp-keycloak): truncate catalyst-api-server desc <255 chars (Postgres limit)
Keycloak DB column CLIENT.DESCRIPTION = varchar(255). Previous value was
458 chars (since Fix #23 / commit
|
||
|
|
0a11107630
|
fix(keycloak): parameterize realm name (target-state realm-per-Sovereign) — qa-loop iter-12 Fix #53A (#1271)
* fix(keycloak): parameterize realm name (target-state realm-per-Sovereign) — qa-loop iter-12 Fix #53A Per `feedback_no_mvp_no_workarounds.md` target-state rule + matrix assertion drift on TC-124, TC-125, TC-159, TC-160, TC-161, TC-176, TC-190, TC-285 (8 TCs in iter-12 audit Phase 4 cluster A): each Sovereign owns its KC realm named after the tenant short-name, not a hardcoded literal `sovereign`. bp-keycloak chart 1.4.1 → 1.5.0: - New value `sovereignRealm.name` (default `sovereign` for backward compat with overlays not yet migrated) - New value `sovereignRealm.displayName` (default `Sovereign`) - Realm import JSON `"realm"` field + catalyst-kc-sa-credentials Secret `realm` key both flow from `$realmName` so Keycloak realm name and catalyst-api `CATALYST_KC_REALM` env stay in sync (no auth-mismatch risk) omantel chroot overlay: - bp-keycloak HelmRelease pinned to chart 1.5.0 - `sovereignRealm.name: omantel` + `displayName: "Omantel Sovereign"` per matrix tenant convention bp-catalyst-platform 1.4.120 → 1.4.121: chart bump triggers catalyst-api StatefulSet restart so it picks up the new mirrored Secret with realm=omantel. The cutover step-06 patches HR.spec.chart.spec.version dynamically per `incidents.md`. Backward compat: charts not setting sovereignRealm.name (otech, _template) keep realm `sovereign` (no behaviour change). The contabo Catalyst-Zero realm `openova` is a separate KC instance untouched by this change. * fix(blueprint): bump bp-keycloak blueprint.yaml to 1.5.0 to match Chart.yaml — qa-loop iter-12 Fix #53A follow-up |
||
|
|
142d42e725
|
fix(cilium): clustermesh-apiserver NodePort → LoadBalancer (path-1) — qa-loop iter-12 Fix #53D (#1274)
* fix(cilium): clustermesh-apiserver Service NodePort → LoadBalancer (path-1) — qa-loop iter-12 Fix #53D Per qa-loop-state/incidents.md remediation table path-1 + feedback_no_mvp_no_workarounds.md "no operational hacks": the existing NodePort 32379 was the workaround that triggered Hetzner's stateful firewall to silently drop cross-region SYN packets to BPF-only NodePorts (no LISTEN socket on the host). The canonical multi-region transport is a per-peer Hetzner LoadBalancer via the cloud-controller-manager. Affects: omantel-fsn chroot Sovereign (this PR). Other Sovereigns (otech, _template) keep their existing setting. PRECONDITION (separate bootstrap-kit slot, follow-up): Hetzner cloud-controller-manager (hcloud-ccm) must be installed AND each k3s node's spec.providerID rewritten from `k3s://...` to `hcloud://<server-id>` so the LB Service materializes. Without CCM the LB sits in `<pending>` but does not break in-cluster operation (ClusterIP still works for the local cilium-agent). Test matrix coverage when CCM is also live: TC-260, TC-261, TC-241, TC-050, TC-308, TC-310, TC-311, TC-314, TC-298, TC-297, TC-340, TC-349 (multi-region tests blocked by NodePort filtering). * fix(blueprint): bump bp-gitea blueprint.yaml to 1.2.5 to match Chart.yaml — pre-existing main drift * fix(blueprint): bump bp-keycloak blueprint.yaml to 1.4.1 to match Chart.yaml — pre-existing main drift |
||
|
|
febd5fef22
|
fix(bp-keycloak): grant catalyst-api SA manage-realm + view-realm + view-clients (qa-loop iter-4 Fix #23) (#1213)
Root cause of TC-248: the catalyst-api-server service-account in the sovereign realm was created (PR #604, Phase-8b) with only impersonation+manage-users+view-users+query-users on realm-management. Those four roles let the SA mint tokens and provision users, but they do NOT include manage-realm or view-realm, which are required to read or write realm-roles via the Keycloak Admin REST API. When EPIC-3 T2 added the tier-role bootstrap goroutine (KEYCLOAK_BOOTSTRAP_TIER_ROLES=true, products/catalyst/bootstrap/api/internal/keycloak/realm_bootstrap.go) its very first call — GetRealmRole(catalyst-viewer) — returned 403 Forbidden, EnsureRealmRole gave up after 5 retries and the catalog-tier realm-roles were never materialized. The access-matrix UI (TC-248) then showed an empty role list. Fix: extend clientScopeMappings.realm-management AND users[serviceAccountClientId=catalyst-api-server].clientRoles.realm-management in the sovereign realm import to include manage-realm + view-realm + view-clients. After this change a clean Sovereign install converges the tier-role bootstrap on the FIRST attempt at catalyst-api startup. Verification on omantel (chart 1.4.0 → 1.4.1, runtime fix applied manually first then catalyst-api restarted): kc-bootstrap: tier-role bootstrap converged (attempt 1, realm=sovereign) $ curl /admin/realms/sovereign/roles | jq '.[].name' catalyst-admin (composite=true, tier-level=40) catalyst-developer (composite=true, tier-level=20) catalyst-operator (composite=true, tier-level=30) catalyst-owner (composite=true, tier-level=50) catalyst-viewer (composite=false, tier-level=10) $ catalyst-owner.composites → catalyst-admin $ catalyst-admin.composites → catalyst-operator $ catalyst-operator.composites → catalyst-developer $ catalyst-developer.composites → catalyst-viewer Adds TestEnsureTierRealmRoles_GetRole403_SurfacesPermissionError to realm_bootstrap_test.go so future regressions of the SA permission contract surface a debuggable error chain ("ensure realm role \"catalyst-viewer\": ... GET role 403: ...") rather than a generic "create failed". Refs: TC-248, EPIC-3 T2 (#1098), bp-keycloak Phase-8b (#604) Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7f859dbb4b
|
feat(bp-keycloak): tenant-mode realm with wordpress/openclaw/stalwart OIDC clients (1.4.0, #915) (#918)
PR #911 wired the SME tenant orchestrator to emit realmConfig.tenant.enabled=true on the per-tenant bp-keycloak HelmRelease — but the chart had no template that consumed those values, so the WordPress / OpenClaw / Stalwart OIDC integrations had no client registered in the tenant realm and SSO failed end-to-end. This change adds the chart-side template the orchestrator was already emitting for. When realmConfig.tenant.enabled=true: * configmap-sovereign-realm.yaml SKIPS (mutual-exclusion guard added on the existing template) so only one realm CM is rendered. * NEW templates/configmap-tenant-realm.yaml renders a realm import ConfigMap (same name `<release>-sovereign-realm-config` so the upstream keycloak-config-cli existingConfigmap reference still resolves) carrying the tenant realm + 3 OIDC clients: - wordpress (confidential, auth-code; redirect URIs cover the openid-connect-generic plugin's admin-ajax.php callback + /wp-login.php fallback) - openclaw (confidential, auth-code; redirect URI /oauth/callback per #915 spec) - stalwart (confidential, serviceAccountsEnabled=true so the directory.keycloak type=oidc backend can use client_credentials to introspect IMAP/SMTP tokens; standardFlowEnabled=true for webmail UI auth-code) * NEW per-app Secrets emitted in the same template scope as the realm ConfigMap so the realm JSON's `secret` field and the K8s Secret bytes never drift: - wordpress-oidc-client-secret - openclaw-oidc-client-secret - stalwart-oidc-client-secret (carries BOTH client-secret AND OIDC_CLIENT_SECRET keys for the two consumer paths) * Each per-app secret persists across helm upgrade via lookup-or-generate (mirrors marketplace-api/secret.yaml pattern from issue #887 and the existing catalyst-api-server secret in configmap-sovereign-realm.yaml). helm.sh/resource-policy: keep so bytes outlive uninstall. * Fail-closed validation when realmConfig.tenant.enabled=true and any of realmName / parentDomain / subdomain is unset (Inviolable Principle #4). NEW tests/tenant-realm-oidc-clients.sh covers 6 cases: 1. Sovereign-mode default render unchanged (kubectl + catalyst-ui + catalyst-api-server clients present, no tenant artefacts leak). 2. Tenant-mode render produces exactly ONE realm CM under the expected name + zero leaked Sovereign-only resources. 3. Tenant realm JSON parses + 3 OIDC clients present with the redirect-URI / publicClient / serviceAccountsEnabled shape per #915 spec; Secret bytes match realm JSON's `secret` fields. 4. Fail-closed validation when tenant fields missing. 5. keycloak-config-cli post-install Job projects the realm CM by SAME name in BOTH modes. 6. Operator-supplied per-app clientSecret overrides the lookup-or-generate path. Existing tests/observability-toggle.sh + tests/oidc-kubectl-client.sh still pass. Sovereign-mode unchanged. The chart now consumes the values the orchestrator (PR #911) was already emitting; no orchestrator change needed. Closes #915 (C1 sub-task) and unblocks #899 (per-tenant Keycloak realm-config materialisation). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
93c4b700de
|
fix(bp-keycloak): templatize existingConfigmap reference for per-tenant installs (#899) (#902)
bp-keycloak 1.3.2 hardcoded `keycloak.keycloakConfigCli.existingConfigmap` to
the literal "keycloak-sovereign-realm-config". This worked for the Sovereign-
mothership bootstrap-kit (releaseName=keycloak emits matching ConfigMap) but
broke for every per-tenant install where releaseName=bp-keycloak emits
"bp-keycloak-sovereign-realm-config" — the post-install keycloak-config-cli
Job stuck in ContainerCreating with `MountVolume.SetUp failed for volume
"config-volume" : configmap "keycloak-sovereign-realm-config" not found`,
HelmRelease InstallFailed after 15m timeout, cascading to bp-openclaw and
bp-wordpress-tenant which dependsOn it.
The bitnami/keycloak subchart's `keycloak.keycloakConfigCli.configmapName`
helper (charts/keycloak/templates/_helpers.tpl) applies `tpl` to the
existingConfigmap value, so embedding `{{ .Release.Name }}` inside the
string resolves at chart-render time. With this single-line change:
- Sovereign-mothership (releaseName=keycloak) → keycloak-sovereign-realm-config (unchanged)
- Per-tenant (releaseName=bp-keycloak) → bp-keycloak-sovereign-realm-config (matches actual emitted ConfigMap)
Verified via helm template both modes — backendRef and config-volume
configMap.name match the actual ConfigMap emitted by
templates/configmap-sovereign-realm.yaml.
Chart bumped 1.3.2 → 1.3.3 + bootstrap-kit slot 09 + blueprint.yaml.
Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ab67a48fe7
|
fix(blueprints): align blueprint.yaml spec.version with Chart.yaml version (#817) (#819)
TestBootstrapKit_BlueprintCardsHaveRequiredFields was failing on main for
9 blueprints because their platform/<name>/chart/Chart.yaml version had
been bumped without a matching update to platform/<name>/blueprint.yaml
spec.version. The pre-existing failure forced 7 recent PRs to self-merge
with --admin, masking real CI failures.
Aligned spec.version to match Chart.yaml version on:
cert-manager 1.1.1 -> 1.1.2
flux 1.1.3 -> 1.1.4
crossplane 1.1.3 -> 1.1.4
sealed-secrets 1.1.1 -> 1.1.2
spire 1.1.4 -> 1.1.7
nats-jetstream 1.1.1 -> 1.1.2
openbao 1.2.0 -> 1.2.14
keycloak 1.3.1 -> 1.3.2
gitea 1.2.1 -> 1.2.3
Verified locally:
$ go test ./... -run TestBootstrapKit_BlueprintCardsHaveRequiredFields -count=1
--- PASS: TestBootstrapKit_BlueprintCardsHaveRequiredFields (0.01s)
... all 10 sub-tests pass (cilium + the 9 above)
The existing test (tests/e2e/bootstrap-kit/main_test.go:145) is itself
the drift guardrail: it fails CI whenever Chart.yaml is bumped without a
matching blueprint.yaml bump. No additional script needed.
Closes #817 once verified on main.
Co-authored-by: Hatice Yildiz <hatice.yildiz@openova.io>
|
||
|
|
2e981f36a5
|
fix(bp-keycloak): catalyst-kc-sa-credentials addr → in-cluster Service URL (closes #781) (#788)
Sovereign-side catalyst-api Pod's intra-cluster Keycloak calls (token
mint, EnsureUser) were failing with `dial tcp: lookup
auth.<sov-fqdn> on 10.43.0.10:53: no such host`. The Sovereign's
CoreDNS resolves *.<sov-fqdn> via upstream resolvers — it does NOT
forward to the in-cluster PowerDNS that holds those records. Public
DNS works (PowerDNS authoritative), but Pod-side lookups of
auth.<sov-fqdn> return NXDOMAIN.
Live evidence — otech94 2026-05-04: handover URL returned
`{"error":"keycloak error: ensure user"}` from a DNS lookup failure
inside the catalyst-api Pod.
Fix: bp-keycloak chart now writes the in-cluster Service URL
(http://<release>.<namespace>.svc.cluster.local) into the
catalyst-kc-sa-credentials Secret's `addr` key instead of the public
gateway host (https://auth.<sov-fqdn>). This Secret is consumed
EXCLUSIVELY by the in-cluster catalyst-api Pod via reflector mirror
into catalyst-system; it is NEVER exposed to browsers.
The HTTPRoute hostname (.Values.gateway.host) stays at auth.<sov-fqdn>
for operator browsers — only the Pod's intra-cluster OAuth
client_credentials calls switch to the Service URL.
Catalyst-Zero (contabo) is unaffected: it runs `keycloak-zero`
(separate chart in openova-private), not bp-keycloak.
Changes:
- platform/keycloak/chart/templates/configmap-sovereign-realm.yaml:
Secret's $kcAddr unconditionally uses
http://<release>.<namespace>.svc.cluster.local
- platform/keycloak/chart/Chart.yaml: 1.3.1 → 1.3.2
- clusters/_template/bootstrap-kit/09-keycloak.yaml: chart version 1.3.1 → 1.3.2
- products/catalyst/chart/Chart.yaml: 1.3.0 → 1.3.1 (changelog entry only)
- clusters/_template/bootstrap-kit/13-bp-catalyst-platform.yaml: 1.3.0 → 1.3.1
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e96e31a781
|
fix(catalyst-api,bp-keycloak): handover 401 root-causes — Reloader annot + realm SA users array (#713) (#714)
Closes #713 Two distinct chart bugs surfaced live on otech62 (2026-05-03), both producing 401 on /auth/handover: 1. SOVEREIGN_FQDN race api-deployment.yaml reads SOVEREIGN_FQDN from ConfigMap "sovereign-fqdn" with optional:true. On Sovereigns, that ConfigMap is rendered by the sovereign-tls Flux Kustomization concurrently with bp-catalyst-platform HelmRelease. When the Pod starts first, valueFrom collapses to "" and stays empty — audience check rejects every valid token as "invalid audience". Fix: add Reloader annotations so the Pod rolls when the ConfigMap (and the handover-jwt-public Secret) appears. 2. catalyst-api-server SA missing user-level realm-management role mappings bp-keycloak realm import granted roles via clientScopeMappings — wrong level. The actual service-account user had no clientRoles entry, so KC rejected GET /users with 403 when catalyst-api tried to ensure the operator user during handover. Fix: add explicit "users" array binding service-account-catalyst-api-server to realm-management.{impersonation, manage-users, view-users, query-users}. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> |
||
|
|
7ca9541ef9
|
fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) (#691)
* fix(handover): provision Keycloak service-account credentials zero-touch (Phase-8b followup) Sovereign-side catalyst-api needs Keycloak service-account credentials to provision the operator's user during /auth/handover. Today the chart references K8s Secret `catalyst-kc-sa-credentials` with keys addr/realm/ client-id/client-secret in the catalyst-system namespace — but no zero-touch path materialised it. The dead SealedSecret template at 09a-keycloak-catalyst-api-secret.yaml had a different name AND different keys (CATALYST_KC_*), used PLACEHOLDER_SEALED_VALUE markers no provisioner replaced, and wasn't even listed in the bootstrap-kit kustomization. Symptom on otech48: GET /auth/handover?token=<valid-jwt> returns "server misconfiguration: keycloak not configured" (auth_handover.go:169). Fix: bp-keycloak chart's configmap-sovereign-realm.yaml template now emits the realm-import ConfigMap AND the catalyst-kc-sa-credentials Secret in a single template scope so they share the same generated client secret. Pattern mirrors platform/powerdns/chart/templates/ api-credentials-secret.yaml (canonical seam, ADR-0001 §11.3 anti-duplication). Secret-value resolution order (first match wins): 1. operator-supplied .Values.catalystApiServerClientSecret 2. helm `lookup` of existing Secret in keycloak ns (idempotent) 3. fresh randAlphaNum 32 (zero-touch on first install) The Secret carries the four keys exactly as the catalyst-api Pod's secretKeyRef expects — addr / realm / client-id / client-secret — with addr derived from gateway.host (https://auth.<sovereignFQDN>). Reflector annotations auto-mirror the Secret to catalyst-system as soon as that namespace materialises (bootstrap-kit slot 13). The realm import already creates the catalyst-api-server client with serviceAccountsEnabled + impersonation/manage-users/view-users/ query-users role mappings — so once Keycloak is Ready and the realm imports, the SA is fully provisioned and the K8s Secret carries a matching client secret. No post-install Job, no Admin-API script, no out-of-band SealedSecret ceremony. Cleanup: removes the dead 09a SealedSecret template (not in kustomization, never produced a working Secret). Bumps: - bp-keycloak chart 1.3.0 -> 1.3.1 - clusters/_template/bootstrap-kit/09-keycloak.yaml HelmRelease pin 1.3.0 -> 1.3.1 Existing per-Sovereign overlays (clusters/otech.omani.works/, clusters/omantel.omani.works/) intentionally remain on 1.3.0 — fresh otechN provisioning consumes _template at provision time. Will be verified live on otech49 — handover end-to-end without ANY manual Secret creation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(keycloak): bump blueprint.yaml spec.version to match chart 1.3.1 TestBootstrapKit_BlueprintCardsHaveRequiredFields/keycloak asserts Chart.yaml.version == blueprint.yaml.spec.version. Forgot to bump blueprint.yaml in the previous commit. Note: 8 other blueprints (cert-manager, flux, crossplane, sealed-secrets, spire, nats-jetstream, openbao, gitea) carry the same pre-existing mismatch and the test fails on main too. Out of scope for this PR; fixing the keycloak case to keep the new chart version internally consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
737574b19a
|
feat(bp-keycloak): Phase-8b sovereign realm — token-exchange, catalyst-ui/api-server OIDC clients, SMTP, bump 1.2.2 → 1.3.0 (#604) (#609)
Adds the full Phase-8b identity surface required by the seamless handover flow: - Token exchange enabled on sovereign realm (attributes.token-exchange: true) - catalyst-ui public PKCE client: redirectUris + webOrigins keyed on console.<sovereignFQDN>, groups + requiredActions in ID token - catalyst-api-server confidential service-account client: impersonation + manage-users + view-users + query-users roles on realm-management; client secret injected at provisioning time via .Values.catalystApiServerClientSecret - WebAuthn (webauthn-register + webauthn-register-passwordless) registered as Required Action options on the realm - UPDATE_PASSWORD set as defaultAction: true for new users - smtpServer block: pre-handover default = contabo Stalwart relay; fully operator-configurable via .Values.smtp.* (Phase-8c-acceptable) - required-actions client scope + oidc-usermodel-attribute-mapper for requiredActions claim in ID token (catalyst-ui first-login UX) Architectural change: realm JSON moved from inline values.yaml (keycloak: subchart key — no parent scope access) to a parent-chart template platform/keycloak/chart/templates/configmap-sovereign-realm.yaml, which can read .Values.sovereignFQDN and .Values.smtp.* for per-Sovereign interpolation. The upstream bitnami chart's keycloakConfigCli.existingConfigmap is pointed at this ConfigMap. Anti-duplication seam: configmap-sovereign-realm.yaml. New values.yaml keys: sovereignFQDN: "" (REQUIRED — per-Sovereign overlay supplies it) sovereignRealm.enabled: true catalystApiServerClientSecret: "" (REQUIRED — provisioner seals and injects) smtp.host/port/from/user/password/ssl/starttls/auth New bootstrap-kit file: 09a-keycloak-catalyst-api-secret.yaml — SealedSecret template for keycloak-catalyst-api-server-credentials in catalyst-system namespace; provisioner fills encryptedData fields at deploy time Bootstrap-kit refs bumped 1.2.x → 1.3.0 in _template, otech, omantel. helm template clean with sovereignFQDN=otech.omani.works. Co-authored-by: hatiyildiz <hatiyildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b1a25c4235
|
fix(bp-keycloak,bp-openbao): HTTPRoute backend wrong name + RBAC hook lifecycle bug (#598) (#600)
Bug A — bp-keycloak@1.2.2: HTTPRoute backendService default was `<release>-keycloak` (gave `keycloak-keycloak` with releaseName=keycloak) but bitnami's fullname helper trims the chart-name suffix when Release.Name already contains it, so the Service is just `keycloak`. Changed default to `.Release.Name`. Sovereign realm was already imported (config-cli ran successfully) — only the Gateway routing was broken, returning HTTP 500. Bug B — bp-openbao@1.2.6: auto-unseal-rbac SA/Role/RoleBinding had `helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded`. The `hook-succeeded` clause caused Helm to delete the SA immediately after the weight-0 RBAC hook completed, before the weight-5 init Job pod could mount its SA token and start. Removed all hook annotations from the RBAC resources so they are managed by regular Helm release lifecycle (created before hooks, never deleted mid-install). Bootstrap-kit refs bumped: bp-keycloak 1.2.0→1.2.2, bp-openbao 1.2.4→1.2.6. Verified on otech22 (manual remediation): Keycloak sovereign realm OIDC endpoint returns valid JSON, openbao-0 Initialized=true Sealed=false. Co-authored-by: alierenbaysal <alierenbaysal@openova.io> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
83ec889f06
|
feat(platform): add global.imageRegistry to remaining bp-* charts + bp-catalyst-platform (PR 3/3, #560) (#580)
Charts bumped:
- bp-keycloak 1.2.0 -> 1.2.1 (subchart stub; per-component image.registry knobs documented)
- bp-crossplane 1.1.3 -> 1.1.4 (subchart stub)
- bp-crossplane-claims 1.1.0 -> 1.1.1 (global.kubectlImage added; kubectl Job image templated; Hetzner ubuntu-24.04 server images intentionally untouched)
- bp-velero 1.2.0 -> 1.2.1 (subchart stub)
- bp-kyverno 1.0.0 -> 1.0.1 (subchart stub; per-controller image.registry knobs documented)
- bp-trivy 1.0.0 -> 1.0.1 (subchart stub; both operator + scanner image.registry knobs documented)
- bp-grafana 1.0.0 -> 1.0.1 (subchart stub)
- bp-flux 1.1.3 -> 1.1.4 (subchart stub; per-controller image.repository knobs documented)
- bp-catalyst-platform 1.1.13 -> 1.1.14 (global.imageRegistry + images.{catalystApi,catalystUi,marketplaceApi,console,smeTag} added; all 14 Catalyst-authored image refs templated: catalyst-api, catalyst-ui, marketplace-api, console + 10 SME services)
Post-handover per-Sovereign overlays set global.imageRegistry to harbor.<sovereign-fqdn> so every container image pull routes through the Sovereign's own Harbor proxy_cache.
Closes (partial): issue #560 — all 23 bp-* charts now carry global.imageRegistry
Co-authored-by: alierenbaysal <alierenbaysal@openova.io>
|
||
|
|
20b896070f
|
feat(bp-keycloak + infra): Sovereign K8s OIDC config for kubectl via per-Sovereign Keycloak realm (closes #326) (#448)
Wires the per-Sovereign K8s api-server's --oidc-* validator to the
per-Sovereign Keycloak realm so customer admins can authenticate
kubectl directly against their Sovereign — no static admin-kubeconfig
handoff, no rotated bearer-token exchange.
infra (cloud-init):
- Add 6 --kube-apiserver-arg=oidc-* flags to the k3s install line in
infra/hetzner/cloudinit-control-plane.tftpl. Issuer URL composed
from sovereign_fqdn (https://auth.\${sovereign_fqdn}/realms/sovereign)
per INVIOLABLE-PRINCIPLES #4 — never hardcoded. Username/groups
prefixes scope OIDC subjects under "oidc:" so RoleBindings reference
e.g. subjects[0].name=oidc:alice@org, distinct from local SAs/x509.
Canonical seam (anti-duplication rule, ADR-0001 §11.3):
- The bp-keycloak chart already bundles bitnami/keycloak's
keycloakConfigCli post-install Helm hook Job, which imports realms
declared under values.keycloak.keycloakConfigCli.configuration. We
enable the existing seam — no bespoke kubectl-exec realm-creation
script, no custom Admin-API call from catalyst-api.
bp-keycloak chart (1.1.2 → 1.2.0):
- Enable keycloakConfigCli + ship inline sovereign-realm.json with:
realm "sovereign" (invariant per Sovereign — Keycloak resolves the
issuer claim from the request hostname, so no per-FQDN realm
rename), default groups sovereign-admins/-ops/-viewers, oidc-group
-membership-mapper emitting "groups" claim, public OIDC client
"kubectl" with localhost:8000 + OOB redirect URIs (kubectl-oidc
-login defaults), publicClient=true (kubectl runs locally and
cannot safely hold a secret), PKCE S256 enforced.
- Bump version 1.1.2 → 1.2.0 (semver MINOR, additive shape).
- Bump bootstrap-kit slot 09 in _template/, omantel.omani.works/,
otech.omani.works/ to version: 1.2.0.
- New chart test tests/oidc-kubectl-client.sh (4 cases) — all green.
- Existing tests/observability-toggle.sh — still green.
Documentation:
- Add §11 "kubectl OIDC for customer admins" runbook to
docs/omantel-handover-wbs.md with one-time workstation setup
(kubectl krew install oidc-login + config set-credentials),
sovereign-admin RBAC binding (oidc:sovereign-admins → cluster
-admin), and 401-debugging table mapping common symptoms to
root causes.
- Carve #326 out of §7 "Out of scope" — it is shipped.
- Add §9 status row.
Validation:
- grep -c 'oidc-issuer-url' infra/hetzner/cloudinit-control-plane.tftpl
→ 2 (comment + the actual flag in the curl line)
- grep -c 'oidc-username-claim' → 2
- helm template platform/keycloak/chart → renders post-install
keycloak-config-cli Job + ConfigMap with kubectl client (3 hits
on grep "kubectl"; 1 hit on "clientId": "kubectl")
- bash scripts/check-vendor-coupling.sh → exit 0 (HARD-FAIL mode)
- 4/4 oidc-kubectl-client gates green; 3/3 observability-toggle
gates green
Out of scope (deferred to follow-up tickets):
- Per-Sovereign user provisioning UI (#322, #323)
- Refresh-token revocation on RoleBinding deletion (#324)
- provider-kubernetes Crossplane ProviderConfig per Sovereign (#321)
- omantel migration / Phase 8 live execution
NO catalyst-api or UI source files touched (those are #319/#322/#323
agents' territories per agent brief).
Co-authored-by: hatiyildiz <hatiyildiz@noreply.github.com>
|
||
|
|
a1bd550208
|
fix(charts): HTTPRoute templates skip-render on missing host (was failing default-values render) (#402)
Blueprint-release for #401 failed because HTTPRoute templates use
{{- fail }} when gateway.host is not set, which trips the chart default-values
render gate in CI. Switched 6 templates from 'fail loud' to 'skip render':
if .Values.gateway.host → emit HTTPRoute
else → emit nothing
The Gateway API admission already rejects HTTPRoute with empty hostnames,
so the loud-fail wasn't buying anything an operator wouldn't see at apply
time. Default-values render now produces zero HTTPRoute resources, which
is the correct shape for the upstream chart consumers that don't set
the Sovereign-only gateway block.
Files: keycloak, gitea, openbao, grafana, harbor, catalyst-platform.
Verified:
helm template t products/catalyst/chart/ → 0 HTTPRoutes (clean)
helm template t products/catalyst/chart/ --set ingress.gateway.enabled=true --set ingress.hosts.console.host=console.test --set ingress.hosts.api.host=api.test → 2 HTTPRoutes
Closes the blueprint-release failure on commit
|
||
|
|
abf01b6f21
|
feat(platform): Gateway API migration audit (#387) (#401)
Migrates every minimal-Sovereign-set blueprint chart from networking.k8s.io/v1.Ingress to gateway.networking.k8s.io/v1.HTTPRoute, replacing the legacy Traefik-on-Sovereigns assumption with the canonical Cilium + Envoy + Gateway API path per ADR-0001 §9.4 and the WBS §2 correction note (#388). The single per-Sovereign Gateway is added as additional documents in the existing bootstrap-kit slot clusters/_template/bootstrap-kit/01-cilium.yaml (NOT a new top-level slot), since Cilium owns the GatewayClass. It includes: - Certificate `sovereign-wildcard-tls` requesting `*.${SOVEREIGN_FQDN}` from `letsencrypt-dns01-prod` (cert-manager + #373 webhook) - Gateway `cilium-gateway` in `kube-system` with HTTPS (443, TLS terminate) + HTTP (80) listeners, allowedRoutes.namespaces.from=All Per-blueprint HTTPRoute templates (canonical seam: each wrapper chart's existing `templates/` directory): | Blueprint | Host pattern | Backend port | |---------------------|---------------------------------|--------------| | bp-keycloak | auth.<sov> | 80 | | bp-gitea | git.<sov> | 3000 | | bp-openbao | bao.<sov> | 8200 | | bp-grafana | grafana.<sov> | 80 | | bp-harbor | registry.<sov> | 80 | | bp-powerdns | pdns.<sov>/api (dual-mode) | 8081 | | bp-catalyst-platform| console.<sov>, api.<sov> | 80, 8080 | bp-powerdns supports both Ingress (contabo legacy) and HTTPRoute (Sovereign) simultaneously — the per-Sovereign overlay sets `api.gateway.enabled=true` while leaving `api.enabled=true`. The Ingress object is harmless on Cilium clusters with no Traefik. This preserves contabo's existing pdns.openova.io flow per ADR-0001 §9.4. bp-harbor flips `expose.type` from `ingress` to `clusterIP` in platform/harbor/chart/values.yaml so the upstream chart no longer emits its own Ingress; the HTTPRoute is the sole HTTP exposure. TLS terminates at the Gateway (wildcard cert) rather than per-host Certificates inside the chart. bp-catalyst-platform's `templates/httproute.yaml` is NOT excluded by .helmignore (unlike templates/ingress.yaml + templates/ingress-console-tls.yaml, which remain contabo-only legacy demo infra). The contabo path keeps serving console.openova.io/sovereign via Traefik unchanged. Bootstrap-kit slot updates (per-Sovereign hostname interpolation): - 08-openbao.yaml → gateway.host: bao.${SOVEREIGN_FQDN} - 09-keycloak.yaml → gateway.host: auth.${SOVEREIGN_FQDN} - 10-gitea.yaml → gateway.host: gitea.${SOVEREIGN_FQDN} - 11-powerdns.yaml → api.host: pdns.${SOVEREIGN_FQDN}, api.gateway.enabled: true - 19-harbor.yaml → gateway.host: registry.${SOVEREIGN_FQDN} - 25-grafana.yaml → gateway.host: grafana.${SOVEREIGN_FQDN} Server-side dry-run validation against the live Cilium Gateway API CRDs on contabo: every HTTPRoute and the per-Sovereign Gateway + Certificate apply cleanly via `kubectl apply --dry-run=server`. Contabo unaffected: clusters/contabo-mkt/* not modified. The legacy SME ingresses (console-nova, marketplace, admin, axon, talentmesh, stalwart, ...) continue to serve via Traefik as before. powerdns on contabo remains on the Ingress path (api.gateway.enabled defaults to false at the chart level). Closes #387. Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fa0e3a494b
|
fix(bp-keycloak): pin to current Bitnami tag (closes #191) (#198)
* fix(bp-keycloak): pin to current Bitnami Keycloak tag (closes #191) Bitnami consolidated their tag scheme around 2025-09 (see https://github.com/bitnami/charts/issues/30852). The chart was pinned to upstream bitnami/keycloak Helm chart 24.7.1, whose default image tag `bitnami/keycloak:26.2.4-debian-12-r0` now returns 404 in the Docker Hub registry — installs hit ImagePullBackOff (verified on omantel). Changes: - Upstream Bitnami chart: 24.7.1 -> 25.2.0 (latest, appVersion 26.3.3) - Override image.registry/image.repository for every Bitnami image used by the chart (keycloak app, keycloak-config-cli, postgresql, postgres-exporter, os-shell) to point at `bitnamilegacy/*`, where the historic debian-12 tags are preserved - Replace deprecated `proxy: edge` with `proxyHeaders: "xforwarded"` (chart 25.x renamed the field; Catalyst fronts Keycloak with Cilium Gateway which sets X-Forwarded-* headers) - bp-keycloak chart version: 1.1.1 -> 1.1.2 Verification (registry HEAD via Bearer token): bitnami/keycloak:26.2.4-debian-12-r0 -> 404 (broken pin) bitnami/keycloak:26.3.3-debian-12-r0 -> 404 (registry move) bitnamilegacy/keycloak:26.3.3-debian-12-r0 -> 200 bitnamilegacy/keycloak-config-cli:6.4.0-... -> 200 bitnamilegacy/postgresql:17.6.0-debian-12-r0 -> 200 bitnamilegacy/postgres-exporter:0.17.1-... -> 200 bitnamilegacy/os-shell:12-debian-12-r50 -> 200 `helm template platform/keycloak/chart` renders cleanly; rendered images all resolve to bitnamilegacy/* tags listed above. Long-term follow-up (not blocking): bitnamilegacy is explicitly marked "no longer updated, may be removed in the future" — Catalyst should either build its own Keycloak image or migrate to the Bitnami Secure Image (BSI/Photon) catalog when chart support catches up. Tracked in the bp-keycloak description block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bp-keycloak): bump blueprint.yaml version to match Chart.yaml 1.1.2 --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1f5c76def1
|
fix(platform): sync blueprint.yaml versions with Chart.yaml (#199)
* feat(ui): Playwright cosmetic + step-flow regression guards
15 regression guards in products/catalyst/bootstrap/ui/e2e/cosmetic-
guards.spec.ts that fail HARD when each user-flagged defect class
returns:
1. card height drift from canonical 108px
2. reserved right padding eating description width
3. logo tile drift from per-brand LOGO_SURFACE
4. invisible glyph (white-on-white) via luminance proxy
5. wizard step order Org/Topology/Provider/Credentials/Components/
Domain/Review
6. legacy "Choose Your Stack" / "Always Included" tab labels
7. Domain step reachable before Components
8. CPX32 not the recommended Hetzner SKU
9. per-region SKU dropdown shows wrong provider catalog
10. provision page is .html (static) not SPA route
11. legacy bubble/edge DAG SVG markup on provision page
12. admin sidebar drift from canonical core/console (w-56 + 7 labels)
13. AppDetail uses tablist instead of sectioned layout
14. job rows navigate to /job/<id> instead of expand-in-place
15. Phase 0 banners (Hetzner infra / Cluster bootstrap) on AdminPage
Each test prints a failure message naming the canonical reference,
the source-of-truth file, and the data-testid PR needed (if any) so
the implementing agent has a precise target. No .skip() — per
INVIOLABLE-PRINCIPLES #2, missing components fail loud.
CI: .github/workflows/cosmetic-guards.yaml runs the suite on every
PR that touches products/catalyst/bootstrap/ui/** or core/console/**.
Docs: docs/UI-REGRESSION-GUARDS.md maps each test to the user's
original complaint, the canonical reference, and the green/red
semantics (5 tests intentionally RED on main today — they stay red
until the companion-agent's UI work lands).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(platform): sync blueprint.yaml versions with Chart.yaml so manifest-validation passes
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1ddd569789 |
fix(bp-*): observability toggles default false — break circular CRD dependency
Extends the v1.1.1 hardening that started with cilium / cert-manager /
crossplane to the remaining 8 bootstrap-kit + per-Sovereign Blueprints.
Every observability toggle in every Catalyst-curated Blueprint now ships
`false`/`null` by default; the operator opts in via a per-cluster values
overlay at clusters/<sovereign>/bootstrap-kit/* once
bp-kube-prometheus-stack reconciles.
Live failure mode that prompted this (omantel.omani.works 2026-04-29):
bp-cilium @ 1.1.0 defaulted hubble.relay/ui + prometheus.serviceMonitor
to true. The upstream Cilium 1.16.5 chart renders a
monitoring.coreos.com/v1 ServiceMonitor whose CRD ships with
kube-prometheus-stack — a tier-2 Application Blueprint that depends on
the bootstrap-kit (cilium first). Helm install fails on a fresh
Sovereign with "no matches for kind ServiceMonitor in version
monitoring.coreos.com/v1 — ensure CRDs are installed first" and every
downstream HelmRelease reports `dep is not ready`. The earlier
trustCRDsExist=true mitigation only suppresses Helm's render-time gate;
the apiserver still rejects the resource at install-time.
Per-Blueprint changes:
- bp-cilium: hubble.relay.enabled, hubble.ui.enabled → false;
hubble.metrics.enabled → null (this is the exact value that disables
the upstream metrics ServiceMonitor template branch — verified by
reading cilium 1.16.5's _hubble.tpl); hubble.metrics.serviceMonitor
.enabled → false. tests/observability-toggle.sh extended with Case 4
(default render produces no hubble-relay / hubble-ui Deployments).
- bp-flux: flux2.prometheus.podMonitor.create → false.
- bp-sealed-secrets: sealed-secrets.metrics.serviceMonitor.enabled
→ false (explicit lock; upstream already defaults false).
- bp-spire: spire.global.spire.recommendations.enabled +
recommendations.prometheus → false.
- bp-nats-jetstream: nats.promExporter.enabled +
promExporter.podMonitor.enabled → false.
- bp-openbao: openbao.injector.metrics.enabled +
openbao.serviceMonitor.enabled → false.
- bp-keycloak: keycloak.metrics.enabled + metrics.serviceMonitor.enabled
+ metrics.prometheusRule.enabled → false.
- bp-gitea: gitea.gitea.metrics.* and gitea.postgresql.metrics.*
serviceMonitor + prometheusRule → false.
- bp-powerdns: powerdns.serviceMonitor.enabled + powerdns.metrics.enabled
→ false (forward-compatibility guard; current upstream
pschichtel/powerdns 0.10.0 has no ServiceMonitor template, but a future
upstream bump cannot silently regress).
Each chart ships a tests/observability-toggle.sh that asserts the rule
in three cases (default off / explicit on opt-in / explicit off) — runs
under blueprint-release.yaml's chart-test gate (added
|
||
|
|
43aff20254 |
feat(bp-*): convert all 11 bootstrap-kit charts to umbrella charts depending on upstream
Each platform/<name>/chart/Chart.yaml now declares the canonical upstream chart as a dependencies: entry. helm dependency build pulls the upstream payload into the OCI artifact at publish time, so Flux helm install of bp-<name>:1.1.0 actually installs the upstream Helm release alongside the Catalyst-curated overlays (NetworkPolicy, ServiceMonitor, ClusterIssuer, ExternalSecret) under templates/. Pinned upstream chart versions per platform/<name>/blueprint.yaml: - cilium 1.16.5 https://helm.cilium.io - cert-manager v1.16.2 https://charts.jetstack.io - flux 2.4.0 https://fluxcd-community.github.io/helm-charts - crossplane 1.17.x https://charts.crossplane.io/stable - sealed-secrets 2.16.x https://bitnami-labs.github.io/sealed-secrets - spire ... https://spiffe.github.io/helm-charts-hardened - nats-jetstream ... https://nats-io.github.io/k8s/helm/charts - openbao ... https://openbao.github.io/openbao-helm - keycloak ... https://charts.bitnami.com/bitnami - gitea ... https://dl.gitea.com/charts - catalyst-platform umbrella over the 10 leaf bp-* charts via helm dependency values.yaml in each chart adopts the umbrella convention: catalystBlueprint metadata block (provenance + version) at top level, upstream subchart values namespaced under the dependency name. cert-manager specifically: clusterissuer-letsencrypt-dns01.yaml gets the helm.sh/hook: post-install,post-upgrade annotation so it applies AFTER cert-manager controllers are running and CRDs registered (the previous hollow-chart shape ran the ClusterIssuer at install time when CRDs didn't exist yet, which was the omantel cluster's exact failure mode). Wrapper chart version bumped 1.0.0 → 1.1.0 across the board (umbrella conversion is a meaningful structural revision). Cluster manifests in clusters/_template/bootstrap-kit/ AND clusters/omantel.omani.works/ bootstrap-kit/ updated to reference 1.1.0. The blueprint-release.yaml workflow's helm package step needs an explicit helm dependency build before push so the upstream subchart bytes ship inside the OCI artifact. That CI change is a follow-up commit on this same branch (separate file scope). |
||
|
|
62d9c7d936 |
fix(charts): drop dependencies block — wrappers carry values overlay only
The first 2 blueprint-release CI runs failed on `helm package` with containerd permission errors because the wrapper Chart.yaml's `dependencies:` block triggered helm to pull the upstream charts via OCI/containerd at package time, which the GitHub Actions runner blocks.
Architectural fix: each Catalyst Blueprint wrapper carries the values overlay + metadata only. The bootstrap installer reads the upstream chart reference from the wrapper's values.yaml `catalystBlueprint.upstream.{chart,version,repo}` metadata block, points `helm install` at the upstream chart's repo, and overlays our values.
This keeps:
- blueprint-release CI lightweight (no upstream pulls during package; helm package now works without containerd)
- the "bp-<name> wrapper does NOT drift from upstream" property (we ship the overlay, not a fork)
- the single Blueprint contract from BLUEPRINT-AUTHORING §1 (a wrapper is still a Catalyst-curated Helm chart published as bp-<name>:<semver>)
Changes:
- 11 platform/<name>/chart/Chart.yaml: removed dependencies block. Each is now a plain Helm chart with no remote pulls during package.
- 11 platform/<name>/chart/values.yaml: prepended catalystBlueprint.upstream.{chart,version,repo} metadata block at the top. Bootstrap installer parses it to know which upstream chart to install with these values.
- products/catalyst/bootstrap/api/internal/bootstrap/bootstrap.go: installCilium now does `helm repo add cilium https://helm.cilium.io --force-update` then `helm install cilium cilium/cilium --version 1.16.5 --values -` (the cilium/cilium upstream chart, with our overlay values piped from values.yaml). Same pattern needs propagating to the other 10 install functions in a follow-up.
After this commit, blueprint-release CI should green-build all 11 wrappers (helm package now works without containerd access since there's nothing to pull). The bootstrap installer's actual `helm install` calls in production reach upstream chart repos via the runtime k3s cluster's pod network, which has full network access.
|
||
|
|
441ebaebb8 |
fix(charts): pin upstream chart versions/names to ones that exist in their repos
The first Blueprint Release CI run (commit
|
||
|
|
8c0f76640c |
feat(charts): G2 wrapper Helm charts for 11 bootstrap-kit components + blueprint-release CI
Per docs/PROVISIONING-PLAN.md and tickets [F] chart. Adds Catalyst-curated wrapper Helm charts at platform/<name>/chart/ for every component the bootstrap-kit installer (introduced in commit |
||
|
|
70fea3ab8f |
docs(pass-34): banned-term TENANT sweep + keycloak hostname drift
GLOSSARY's banned term "tenant" survived in Configuration tables and Flux
postBuild substitutions across product READMEs as ${TENANT} (uppercase
ENV var). Prior banned-term greps searched lowercase `tenant` so the
ALL-CAPS form slipped through.
Product README fixes:
- products/cortex: TENANT/DOMAIN → ORGANIZATION/SOVEREIGN_DOMAIN, plus
two DNS placeholder fixes for llm-gateway and chat URLs (same shape
Pass 25/31 fixed elsewhere).
- products/fingate: 6 instances (Flux substitution, Configuration table,
4 URL templates) renamed. URL shape api.openbanking.<org>.<sov-dom>
flagged as 4-segment FQDN that doesn't match NAMING §5.1 or §5.2 —
deferred to a deeper architectural pass.
- products/fabric: Configuration table row renamed.
Component README:
- platform/keycloak: shared-sovereign hostname auth.<sovereign-domain>
and per-organization auth.<org>.<sovereign-domain> both missing
<location-code> per NAMING §5.1. Fixed.
platform/librechat ${TENANT_ID} preserved — that's Microsoft Azure AD
tenant-ID (external technology, exempted by GLOSSARY).
Validation log Pass 34 entry includes meta-note: always run a global
grep for the surfaced drift category before closing a pass, to avoid
the asymmetric-drift problem Pass 25 warned against.
|
||
|
|
b467dc3f3b |
docs(pass-18): NAMING DR-as-env_type misexample + Keycloak deployment topology
Pass 18 — drift-detection on NAMING-CONVENTION + platform/keycloak.
Two real findings.
NAMING-CONVENTION §11.1:
- The example list of Catalyst Environments included `bankdhofar-dr`
— but `dr` is NOT a valid env_type. Canonical values per §2.4 are
prod / stg / uat / dev / poc. DR is a Placement mode
(active-active / active-hotstandby across regions inside the
*-prod Environment), not a separate Environment.
- Replaced `bankdhofar-dr` with `bankdhofar-uat` and added an
explicit "DR is a Placement, not an Env Type" note.
platform/keycloak/README.md:
- Keycloak Deployment YAML example used `namespace: open-banking`
with 2 replicas — Fingate-specific narrative that contradicted
the per-Org / per-Sovereign topology stated in the banner.
Rewrote with two side-by-side examples:
* shared-sovereign (3 HA replicas, catalyst-keycloak namespace,
CNPG-backed)
* per-organization (1 replica in <org> namespace, optional
embedded DB for smallest SME tier)
- HA section was a single set of claims (2+ replicas, CNPG, Infinispan)
that only matched corporate. Now branches on topology — corporate
gets HA + Infinispan, SME gets single replica with restart-on-
deploy as acceptable for tier SLAs.
Same kind of drift Pass 17 caught in Harbor: banner says one thing,
body still describes the older model. Both fixed.
VALIDATION-LOG: Pass 18 entry added.
Refs #37
|
||
|
|
14ed84de41 |
docs(pass-8): role-in-Catalyst banners + dead-link fix in component READMEs
Pass 8 — line-by-line read of platform/cnpg, platform/strimzi, platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium. CNPG and Strimzi: read in full and confirmed clean — they correctly position themselves as Application Blueprints and don't drift from the canonical model. CNPG's `<org>-postgres-dr` cluster name (Application-tier database role) is acceptable per NAMING-CONVENTION §1.3 (which only forbids primary/dr in K8s host-cluster names, not in Application-internal CRD names). Four READMEs updated: k8gb: - Header reframed: per-host-cluster infrastructure pointer to PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection. - Removed dead link to ../failover-controller/docs/ADR-FAILOVER- CONTROLLER.md (the failover-controller folder has no docs/); replaced with link to that component's README + SRE §2.4. keycloak: - Header reframed from "FAPI Authorization Server for Open Banking" (narrow) to "User identity for Catalyst Sovereigns" (broad). Keycloak handles ALL user identity in Catalyst, not just FAPI. - Added per-Org / per-Sovereign topology callout matching SECURITY §6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party Providers, not Catalyst's Organization-level multi-tenancy. - FAPI features kept since Keycloak still serves Fingate as the FAPI Authorization Server. cert-manager: - Header reframed as per-host-cluster infrastructure with pointer to PLATFORM-TECH-STACK §3.3. cilium: - Header reframed as per-host-cluster infrastructure with pointer to PLATFORM-TECH-STACK §3.1, including the install-first note (CNI must come before any other workload during Phase 0). VALIDATION-LOG: Pass 8 entry added. Refs #37 |
||
|
|
c9d04a53b4 |
refactor: flatten platform/ structure (41 components)
Remove hierarchical grouping (networking/, security/, etc.) and use flat structure for all 41 platform components. Changes: - All components now directly under platform/ (no subfolders) - AI Hub components moved from meta-platforms/ai-hub/components/ to platform/ - Open Banking components (lago, openmeter) moved to platform/ - meta-platforms/ now only contains README files that reference platform/ - Open Banking custom services remain in meta-platforms/open-banking/services/ Structure: - platform/ (41 components, flat) - meta-platforms/ai-hub/ (README only, references platform/) - meta-platforms/open-banking/ (README + 6 custom services) All documentation links updated. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |