Per CLAUDE.md MIRROR-EVERYTHING inviolable rule: every chart-hook
image reference (pre/post-install Jobs, helper Pods) must use the
explicit Harbor proxy-cache form. Fix#158's bitnami → bitnamilegacy
swap was a band-aid; the architecturally correct fix is to defeat
upstream-deletion blast radius entirely by routing through Harbor.
The node-level containerd mirror in infra/hetzner/cloudinit-control-
plane.tftpl (line 706) already redirects docker.io/* →
harbor.openova.io/proxy-dockerhub/* implicitly, but implicit routing:
- Hides the routing from SBOM scans
- Bypasses the Kyverno harbor-proxy-pull ClusterPolicy
- Means a chart audit (`grep docker.io`) misses a real dependency
- Was the proximate cause of prov #27 wedging when Bitnami deleted
docker.io/bitnami/kubectl:1.30.4 (Fix#158 had to chase the
deletion mid-flight instead of being insulated by Harbor cache)
19 chart-hook image: refs + 5 chart values.yaml repository: defaults
now carry the explicit harbor.openova.io/proxy-dockerhub prefix.
Application/subchart images (keycloak, postgresql, mongodb in
keycloak+litmus subcharts) are intentionally out of scope for this
PR — those go through the node-level containerd mirror still.
Affected blueprints + chart version bumps:
bp-cert-manager 1.2.1 -> 1.2.2
bp-external-secrets-stores 1.0.4 -> 1.0.5
bp-crossplane-claims 1.1.4 -> 1.1.5
bp-flux 1.2.1 -> 1.2.2
bp-guacamole 0.1.16 -> 0.1.17
bp-self-sovereign-cutover 0.1.28 -> 0.1.29
bp-k8s-ws-proxy 0.1.9 -> 0.1.10
bp-harbor 1.2.15 -> 1.2.16
bp-gitea 1.2.5 -> 1.2.6
bp-newapi 1.4.5 -> 1.4.6
bp-wordpress-tenant 0.2.0 -> 0.2.1
catalyst-platform 1.4.138 -> 1.4.139
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
REGRESSION
----------
Fix#78 (PR #1313) added the hmac-bootstrap pre-install Job to provision
the `k8s-ws-proxy-hmac` Secret before the DaemonSet rolls. The Job's
hook-weight was set to "-10" while its ServiceAccount, Role, and
RoleBinding were left at "0".
Per Helm docs (https://helm.sh/docs/topics/charts_hooks/#hook-weights):
> Hook weights can be positive or negative numbers but must be
> represented as strings. When Helm starts the execution cycle of
> hooks of a particular Kind it will sort those hooks in ASCENDING
> order.
So within the same hook phase (`pre-install,pre-upgrade`), LOWER
weights run FIRST. Fix#78's weights inverted the dependency graph:
the Job (-10) was applied BEFORE its SA (0), surfacing on prov #8 as:
pods "k8s-ws-proxy-hmac-bootstrap-" is forbidden: error looking up
service account catalyst-system/k8s-ws-proxy-hmac-bootstrap:
serviceaccount "k8s-ws-proxy-hmac-bootstrap" not found
The Job sat in CrashLoopBackOff, the HelmRelease wedged Stalled=True,
bp-guacamole's `dependsOn: bp-k8s-ws-proxy` blocked its install, and
the bounded-provision-cycle stopped at prov #8.
FIX
---
Re-weight the four hmac-bootstrap resources so the dependency graph
materialises in apiserver-apply order regardless of YAML order in the
template:
ServiceAccount: helm.sh/hook-weight: "-20" (first)
Role: helm.sh/hook-weight: "-15"
RoleBinding: helm.sh/hook-weight: "-15"
Job: helm.sh/hook-weight: "-10" (last; Fix#78 invariant)
The Job weight is preserved at -10 to keep Fix#78's render-test gate
3a stable (`grep "helm.sh/hook-weight": "-10"`).
VERIFICATION
------------
chart/tests/render.sh adds gate 3c ("gate-9") that walks the rendered
YAML, captures each hmac-bootstrap resource's hook-weight, and asserts
the strict ordering invariant: SA < Role <= RoleBinding < Job AND
Job == -10. Without the fix, gate-9 fails. With the fix:
PASS: gate-9 hook-weight ordering — SA=-20 < Role=-15 / RoleBinding=-15 < Job=-10
All 6 render gates pass (helm 3.20.2):
PASS: default-OFF = 0 resources
PASS: empty image.tag fails fast
PASS: full-ON = 9 resources
PASS: hmac-bootstrap Job + RBAC rendered with correct hook-weight + split verbs
PASS: gate-9 hook-weight ordering — SA=-20 < Role=-15 / RoleBinding=-15 < Job=-10
PASS: canonical workload name 'k8s-ws-proxy' on 6 resources (release-name-independent)
CHART + SLOT BUMPS
------------------
- Chart.yaml: 0.1.7 -> 0.1.8 (CI promote will auto-bump to 0.1.9 with
the new image SHA on merge per build-k8s-ws-proxy.yaml).
- clusters/_template/bootstrap-kit/51-bp-k8s-ws-proxy.yaml: 0.1.6 -> 0.1.9
- clusters/omantel.omani.works/bootstrap-kit/51-bp-k8s-ws-proxy.yaml: 0.1.6 -> 0.1.9
LESSON FOR FUTURE HOOK-ORDERING WORK
------------------------------------
Whenever a hook-phase Job depends on hook-phase RBAC (which it always
does in this seam), the RBAC weights MUST be numerically LESS than the
Job's. Same-phase same-weight ordering is undefined (Helm sorts by
weight then by file path then by name — not by reference graph).
Always: SA most-negative, Role/RoleBinding intermediate, Job last.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit prov #7 Gap E: all three k8s-ws-proxy DaemonSet pods stuck
ContainerCreating for 25+ min on every fresh Sovereign with
"MountVolume.SetUp failed for volume hmac-secret: secret
k8s-ws-proxy-hmac not found".
Root cause: chart's daemonset.yaml mounts a Secret it does not
template, sibling chart, or bootstrap-kit ever creates. The
fail-fast in _helpers.tpl only checks for an empty .name value;
with the default name set, render proceeds and the DS rolls before
anything provisions the Secret.
Fix: Helm pre-install/pre-upgrade Job (hook-weight: -10) that:
- Idempotency-probes the target Secret (200 = skip, preserves
operator-pre-provisioned SealedSecret OR prior-install key);
- Generates 32 random bytes via /dev/urandom (never echoed —
pipe direct to base64 to a JSON body, then to apiserver POST);
- POSTs the Secret to the catalyst-system namespace; treats 409
as success (race-with-operator path is benign).
RBAC split per memory/feedback_rbac_create_no_resourcenames.md:
`create` is in its own rule WITHOUT resourceNames; `get` is a
separate rule WITH resourceNames. The combined-rule pattern was
the silent root cause of bp-openbao 6+ chart iterations.
Canonical seam: modelled after platform/gitea/chart/templates/
database-secret-sync-job.yaml — curlimages/curl image (alpine + sh
+ /dev/urandom), in-cluster SA token, hook-weight 0 for SA/Role/
RoleBinding (must precede the -10 Job's API calls).
Idempotency proof: render test 3a verifies the pre-install hook is
weight -10 (so the Secret exists BEFORE the DS rolls); the script's
GET-probe step ensures upgrade re-runs preserve the existing key
(rotating it would invalidate every in-flight catalyst-api
signature). Operator rotation = `kubectl delete secret
k8s-ws-proxy-hmac -n catalyst-system && flux reconcile helmrelease
bp-k8s-ws-proxy -n flux-system`.
Render-test smoke (helm 3.20.2): all 5 cases PASS (default-OFF=0,
empty-tag fail-fast, full-ON=9 resources, hook-weight + RBAC
verbs split, canonical workload name release-independent).
Chart bumped 0.1.5 -> 0.1.6. CI promote will auto-bump to 0.1.7
with the new image SHA on merge; bootstrap-kit slot pins should be
lifted to 0.1.7 once the CI promote runs.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blueprint Release run 25612688419 caught a stale-tag assertion in
platform/k8s-ws-proxy/chart/tests/render.sh test #2. After the
build-k8s-ws-proxy.yaml promote job auto-bumped values.yaml
`image.tag` to a real SHA, the test's `--set k8sWsProxy.enabled=true`
without explicitly clearing the tag rendered fine and tripped
"FAIL: empty tag did not abort render".
The fail-fast contract (empty tag → render fail per _helpers.tpl) is
unchanged; the test now explicitly `--set k8sWsProxy.image.tag=` to
exercise the operator-override path. Mirrors the same pattern already
applied to the bp-guacamole render test in the parent PR.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(ci,charts,api): qa-loop iter-7 Fix#39 — bp-guacamole + bp-k8s-ws-proxy bootstrap-kit slots
Closes the scope-narrow confessed by Fix#36: bp-guacamole +
bp-k8s-ws-proxy chart skeletons existed at platform/* but lacked CI
image-build workflows + bootstrap-kit slots, so TC-228 / TC-230 /
TC-236 / TC-237 / TC-245 / TC-246 stayed FAIL with "deployment
NotFound".
CI workflows
------------
- .github/workflows/build-k8s-ws-proxy.yaml: Buildx + cosign keyless
sign + SBOM attestation flow on core/cmd/k8s-ws-proxy/**, then bumps
platform/k8s-ws-proxy/chart/values.yaml image.tag + Chart.yaml
patch version + dispatches blueprint-release.
- .github/workflows/build-bp-guacamole.yaml: mirrors upstream Apache
Guacamole 1.5.5 to GHCR (so every Sovereign pulls from a registry
we own — no Docker Hub rate limits, no upstream availability risk),
bumps values.yaml.image.{repository,tag} + Chart.yaml + dispatches
blueprint-release.
Charts (target-state)
---------------------
- bp-k8s-ws-proxy v0.1.1: canonical workload name `k8s-ws-proxy`
regardless of release name (DaemonSet + Service + ClusterRole +
ClusterRoleBinding + ServiceAccount all named `k8s-ws-proxy` so
matrix can address them by canonical short name).
- bp-guacamole v0.1.1: canonical short resource names (`guacd`,
`guacamole-server`, `guacamole-recordings`); GHCR-mirrored upstream
images; realm-patch ConfigMap correctly lands in `keycloak`
namespace (was: realm-name, which would have failed silently on
every Sovereign); `realmConfig.namespace` override surface added.
- Both charts: `catalyst.openova.io/smoke-render-mode: default-off`
annotation so blueprint-release smoke-render gate honors the
default-OFF render shape.
Bootstrap-kit slots
-------------------
- clusters/_template/bootstrap-kit/36-bp-k8s-ws-proxy.yaml +
37-bp-guacamole.yaml: dependsOn-ordered (proxy → gateway), pinned
to 0.1.1, default-OFF gate flipped via slot values, install/upgrade
disableWait per session-2026-04-30 architectural decision.
- clusters/omantel.omani.works/bootstrap-kit/* slots mirror the same
shape with omantel.biz hostnames matching the live HTTPRoutes on
console.omantel.biz / auth.omantel.biz.
API: shells/issue handler (matrix-canonical URL surface)
--------------------------------------------------------
- POST /api/v1/sovereigns/{id}/shells/issue?namespace=&pod=&container=
alias for the existing
POST /api/v1/sovereigns/{id}/k8s/exec/{ns}/{pod}/{container}/session
with matrix-canonical response fields (`sessionId`, `guacamoleUrl`,
`recordingPath`). Same business logic, same audit surface
(`guacamole-session-opened`), same RBAC gate (tier-developer or
higher). 6 test cases, all PASS under -race.
TCs that flip PASS in iter-8
-----------------------------
- TC-228: POST /shells/issue → sessionId + guacamoleUrl + recordingPath
- TC-230: kubectl get deploy guacd guacamole-server -n catalyst-system
- TC-236: kubectl get ds k8s-ws-proxy -n catalyst-system
- TC-237: kubectl logs ds/k8s-ws-proxy → "listening"
- TC-245: viewer-cookie POST /shells/issue → 403
- TC-246: operator-cookie POST /shells/issue → 200 sessionId
Per feedback_no_mvp_no_workarounds.md: NO follow-up slices — every
gap Fix#36 confessed is closed in this PR. Per
feedback_machine_saturation_3rd_violation.md: CI-only build path,
no local docker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(bootstrap-kit): move bp-k8s-ws-proxy + bp-guacamole to slots 51/52 (Fix#39 follow-up)
CI dependency-graph-audit caught a slot-number collision: slots 36-48
are reserved for the W2.K4 AI-runtime cohort (bp-stunner, bp-knative,
bp-kserve, bp-vllm, bp-llm-gateway, bp-anthropic-adapter, bp-bge,
bp-nemo-guardrails, bp-temporal, bp-openmeter, bp-livekit, bp-matrix,
bp-librechat) per scripts/expected-bootstrap-deps.yaml. Move the
exec-fan-out blueprints to slots 51/52 (post-W2.K4, pre-Phase-2 80+
slot range) and add their entries to the expected DAG.
- clusters/_template/bootstrap-kit/{36,37}-* → {51,52}-*
- clusters/omantel.omani.works/bootstrap-kit/{36,37}-* → {51,52}-*
- kustomization.yaml updates (both _template + omantel)
- scripts/expected-bootstrap-deps.yaml: declare slots 51/52 with full
dependsOn lists (bp-k8s-ws-proxy on cilium+sealed-secrets,
bp-guacamole on cilium+cert-manager+keycloak+sealed-secrets+
seaweedfs+k8s-ws-proxy)
scripts/check-bootstrap-deps.sh re-run: 0 drift, 0 cycles, 55
declared HRs, 42 present on disk, 13 deferred (W2.K1-K4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>