Commit Graph

13 Commits

Author SHA1 Message Date
github-actions[bot]
acab54f7aa deploy: bump bp-k8s-ws-proxy to image 74d23ab chart 0.1.11 2026-05-11 07:33:51 +00:00
e3mrah
74d23ab3dc
fix(charts): explicit harbor.openova.io/proxy-dockerhub prefix on all chart-hook images (#163) (#1367)
Per CLAUDE.md MIRROR-EVERYTHING inviolable rule: every chart-hook
image reference (pre/post-install Jobs, helper Pods) must use the
explicit Harbor proxy-cache form. Fix #158's bitnami → bitnamilegacy
swap was a band-aid; the architecturally correct fix is to defeat
upstream-deletion blast radius entirely by routing through Harbor.

The node-level containerd mirror in infra/hetzner/cloudinit-control-
plane.tftpl (line 706) already redirects docker.io/* →
harbor.openova.io/proxy-dockerhub/* implicitly, but implicit routing:
  - Hides the routing from SBOM scans
  - Bypasses the Kyverno harbor-proxy-pull ClusterPolicy
  - Means a chart audit (`grep docker.io`) misses a real dependency
  - Was the proximate cause of prov #27 wedging when Bitnami deleted
    docker.io/bitnami/kubectl:1.30.4 (Fix #158 had to chase the
    deletion mid-flight instead of being insulated by Harbor cache)

19 chart-hook image: refs + 5 chart values.yaml repository: defaults
now carry the explicit harbor.openova.io/proxy-dockerhub prefix.
Application/subchart images (keycloak, postgresql, mongodb in
keycloak+litmus subcharts) are intentionally out of scope for this
PR — those go through the node-level containerd mirror still.

Affected blueprints + chart version bumps:
  bp-cert-manager            1.2.1  -> 1.2.2
  bp-external-secrets-stores 1.0.4  -> 1.0.5
  bp-crossplane-claims       1.1.4  -> 1.1.5
  bp-flux                    1.2.1  -> 1.2.2
  bp-guacamole               0.1.16 -> 0.1.17
  bp-self-sovereign-cutover  0.1.28 -> 0.1.29
  bp-k8s-ws-proxy            0.1.9  -> 0.1.10
  bp-harbor                  1.2.15 -> 1.2.16
  bp-gitea                   1.2.5  -> 1.2.6
  bp-newapi                  1.4.5  -> 1.4.6
  bp-wordpress-tenant        0.2.0  -> 0.2.1
  catalyst-platform          1.4.138 -> 1.4.139

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 11:32:21 +04:00
github-actions[bot]
029865ec8d deploy: bump bp-k8s-ws-proxy to image 5d8fd2e chart 0.1.9 2026-05-10 18:28:18 +00:00
e3mrah
5d8fd2e74b
fix(bp-k8s-ws-proxy): hmac-bootstrap Job/SA hook-weight ordering (Fix #95, regression of Fix #78) (#1319)
REGRESSION
----------
Fix #78 (PR #1313) added the hmac-bootstrap pre-install Job to provision
the `k8s-ws-proxy-hmac` Secret before the DaemonSet rolls. The Job's
hook-weight was set to "-10" while its ServiceAccount, Role, and
RoleBinding were left at "0".

Per Helm docs (https://helm.sh/docs/topics/charts_hooks/#hook-weights):

  > Hook weights can be positive or negative numbers but must be
  > represented as strings. When Helm starts the execution cycle of
  > hooks of a particular Kind it will sort those hooks in ASCENDING
  > order.

So within the same hook phase (`pre-install,pre-upgrade`), LOWER
weights run FIRST. Fix #78's weights inverted the dependency graph:
the Job (-10) was applied BEFORE its SA (0), surfacing on prov #8 as:

  pods "k8s-ws-proxy-hmac-bootstrap-" is forbidden: error looking up
  service account catalyst-system/k8s-ws-proxy-hmac-bootstrap:
  serviceaccount "k8s-ws-proxy-hmac-bootstrap" not found

The Job sat in CrashLoopBackOff, the HelmRelease wedged Stalled=True,
bp-guacamole's `dependsOn: bp-k8s-ws-proxy` blocked its install, and
the bounded-provision-cycle stopped at prov #8.

FIX
---
Re-weight the four hmac-bootstrap resources so the dependency graph
materialises in apiserver-apply order regardless of YAML order in the
template:

  ServiceAccount:        helm.sh/hook-weight: "-20" (first)
  Role:                  helm.sh/hook-weight: "-15"
  RoleBinding:           helm.sh/hook-weight: "-15"
  Job:                   helm.sh/hook-weight: "-10" (last; Fix #78 invariant)

The Job weight is preserved at -10 to keep Fix #78's render-test gate
3a stable (`grep "helm.sh/hook-weight": "-10"`).

VERIFICATION
------------
chart/tests/render.sh adds gate 3c ("gate-9") that walks the rendered
YAML, captures each hmac-bootstrap resource's hook-weight, and asserts
the strict ordering invariant: SA < Role <= RoleBinding < Job AND
Job == -10. Without the fix, gate-9 fails. With the fix:

  PASS: gate-9 hook-weight ordering — SA=-20 < Role=-15 / RoleBinding=-15 < Job=-10

All 6 render gates pass (helm 3.20.2):
  PASS: default-OFF = 0 resources
  PASS: empty image.tag fails fast
  PASS: full-ON = 9 resources
  PASS: hmac-bootstrap Job + RBAC rendered with correct hook-weight + split verbs
  PASS: gate-9 hook-weight ordering — SA=-20 < Role=-15 / RoleBinding=-15 < Job=-10
  PASS: canonical workload name 'k8s-ws-proxy' on 6 resources (release-name-independent)

CHART + SLOT BUMPS
------------------
- Chart.yaml: 0.1.7 -> 0.1.8 (CI promote will auto-bump to 0.1.9 with
  the new image SHA on merge per build-k8s-ws-proxy.yaml).
- clusters/_template/bootstrap-kit/51-bp-k8s-ws-proxy.yaml: 0.1.6 -> 0.1.9
- clusters/omantel.omani.works/bootstrap-kit/51-bp-k8s-ws-proxy.yaml: 0.1.6 -> 0.1.9

LESSON FOR FUTURE HOOK-ORDERING WORK
------------------------------------
Whenever a hook-phase Job depends on hook-phase RBAC (which it always
does in this seam), the RBAC weights MUST be numerically LESS than the
Job's. Same-phase same-weight ordering is undefined (Helm sorts by
weight then by file path then by name — not by reference graph).
Always: SA most-negative, Role/RoleBinding intermediate, Job last.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 22:26:39 +04:00
github-actions[bot]
46cb9dd2f1 deploy: bump bp-k8s-ws-proxy to image ac7ae48 chart 0.1.7 2026-05-10 17:18:56 +00:00
e3mrah
ac7ae48ff7
fix(bp-k8s-ws-proxy): bootstrap Job for k8s-ws-proxy-hmac Secret (qa-loop bounded-cycle Wave 5 Fix #78, Gap E) (#1313)
Audit prov #7 Gap E: all three k8s-ws-proxy DaemonSet pods stuck
ContainerCreating for 25+ min on every fresh Sovereign with
"MountVolume.SetUp failed for volume hmac-secret: secret
k8s-ws-proxy-hmac not found".

Root cause: chart's daemonset.yaml mounts a Secret it does not
template, sibling chart, or bootstrap-kit ever creates. The
fail-fast in _helpers.tpl only checks for an empty .name value;
with the default name set, render proceeds and the DS rolls before
anything provisions the Secret.

Fix: Helm pre-install/pre-upgrade Job (hook-weight: -10) that:
- Idempotency-probes the target Secret (200 = skip, preserves
  operator-pre-provisioned SealedSecret OR prior-install key);
- Generates 32 random bytes via /dev/urandom (never echoed —
  pipe direct to base64 to a JSON body, then to apiserver POST);
- POSTs the Secret to the catalyst-system namespace; treats 409
  as success (race-with-operator path is benign).

RBAC split per memory/feedback_rbac_create_no_resourcenames.md:
`create` is in its own rule WITHOUT resourceNames; `get` is a
separate rule WITH resourceNames. The combined-rule pattern was
the silent root cause of bp-openbao 6+ chart iterations.

Canonical seam: modelled after platform/gitea/chart/templates/
database-secret-sync-job.yaml — curlimages/curl image (alpine + sh
+ /dev/urandom), in-cluster SA token, hook-weight 0 for SA/Role/
RoleBinding (must precede the -10 Job's API calls).

Idempotency proof: render test 3a verifies the pre-install hook is
weight -10 (so the Secret exists BEFORE the DS rolls); the script's
GET-probe step ensures upgrade re-runs preserve the existing key
(rotating it would invalidate every in-flight catalyst-api
signature). Operator rotation = `kubectl delete secret
k8s-ws-proxy-hmac -n catalyst-system && flux reconcile helmrelease
bp-k8s-ws-proxy -n flux-system`.

Render-test smoke (helm 3.20.2): all 5 cases PASS (default-OFF=0,
empty-tag fail-fast, full-ON=9 resources, hook-weight + RBAC
verbs split, canonical workload name release-independent).

Chart bumped 0.1.5 -> 0.1.6. CI promote will auto-bump to 0.1.7
with the new image SHA on merge; bootstrap-kit slot pins should be
lifted to 0.1.7 once the CI promote runs.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:17:22 +04:00
github-actions[bot]
820dc29ada deploy: bump bp-k8s-ws-proxy to image 8047232 chart 0.1.5 2026-05-09 22:06:14 +00:00
e3mrah
8047232a7b
fix(chart,bootstrap-kit): default imagePullSecrets to ghcr-pull (Fix #39 follow-up) (#1240)
omantel reconciliation surfaced that bp-k8s-ws-proxy DaemonSet pods
(and bp-guacamole Deployments) cannot pull from private
ghcr.io/openova-io/openova/* images without imagePullSecrets:

  Failed to pull image "ghcr.io/openova-io/openova/k8s-ws-proxy:650696d":
  failed to authorize: failed to fetch anonymous token ... 401 Unauthorized

The catalyst-system namespace's `ghcr-pull` secret is the canonical
pull-credential surface across every Sovereign (catalyst-api,
catalyst-ui, marketplace-api etc. all mount it). Defaulting both
charts to `imagePullSecrets: [{name: ghcr-pull}]` removes the
per-Sovereign overlay requirement.

Charts
------
- bp-k8s-ws-proxy 0.1.3 → 0.1.4: values.yaml.k8sWsProxy.imagePullSecrets
- bp-guacamole    0.1.2 → 0.1.3: values.yaml.guacamole.imagePullSecrets

(Both charts will auto-bump again to 0.1.5/0.1.4 when the build/mirror
workflows fire on this PR's chart-touch — slot pins target those
post-CI versions.)

Bootstrap-kit slot pins
-----------------------
- _template + omantel slot 51 (bp-k8s-ws-proxy): 0.1.3 → 0.1.5
- _template + omantel slot 52 (bp-guacamole):    0.1.2 → 0.1.4

After merge: omantel reconciles → DaemonSet pods Running → bp-guacamole
HR Ready → guacd + guacamole-server Deployments Available → TC-228 /
TC-230 / TC-236 / TC-237 / TC-245 / TC-246 flip PASS.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 02:04:45 +04:00
github-actions[bot]
3dea4e2cd8 deploy: bump bp-k8s-ws-proxy to image 650696d chart 0.1.3 2026-05-09 21:55:00 +00:00
e3mrah
650696d185
fix(chart): bp-k8s-ws-proxy render test explicitly clears image.tag (Fix #39 follow-up) (#1237)
Blueprint Release run 25612688419 caught a stale-tag assertion in
platform/k8s-ws-proxy/chart/tests/render.sh test #2. After the
build-k8s-ws-proxy.yaml promote job auto-bumped values.yaml
`image.tag` to a real SHA, the test's `--set k8sWsProxy.enabled=true`
without explicitly clearing the tag rendered fine and tripped
"FAIL: empty tag did not abort render".

The fail-fast contract (empty tag → render fail per _helpers.tpl) is
unchanged; the test now explicitly `--set k8sWsProxy.image.tag=` to
exercise the operator-override path. Mirrors the same pattern already
applied to the bp-guacamole render test in the parent PR.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:53:43 +04:00
github-actions[bot]
741d57988b deploy: bump bp-k8s-ws-proxy to image 5ca0a7d chart 0.1.2 2026-05-09 21:50:37 +00:00
e3mrah
5ca0a7d178
fix(ci,charts,api): qa-loop iter-7 Fix #39 — bp-guacamole + bp-k8s-ws-proxy bootstrap-kit slots (#1236)
* fix(ci,charts,api): qa-loop iter-7 Fix #39 — bp-guacamole + bp-k8s-ws-proxy bootstrap-kit slots

Closes the scope-narrow confessed by Fix #36: bp-guacamole +
bp-k8s-ws-proxy chart skeletons existed at platform/* but lacked CI
image-build workflows + bootstrap-kit slots, so TC-228 / TC-230 /
TC-236 / TC-237 / TC-245 / TC-246 stayed FAIL with "deployment
NotFound".

CI workflows
------------
- .github/workflows/build-k8s-ws-proxy.yaml: Buildx + cosign keyless
  sign + SBOM attestation flow on core/cmd/k8s-ws-proxy/**, then bumps
  platform/k8s-ws-proxy/chart/values.yaml image.tag + Chart.yaml
  patch version + dispatches blueprint-release.
- .github/workflows/build-bp-guacamole.yaml: mirrors upstream Apache
  Guacamole 1.5.5 to GHCR (so every Sovereign pulls from a registry
  we own — no Docker Hub rate limits, no upstream availability risk),
  bumps values.yaml.image.{repository,tag} + Chart.yaml + dispatches
  blueprint-release.

Charts (target-state)
---------------------
- bp-k8s-ws-proxy v0.1.1: canonical workload name `k8s-ws-proxy`
  regardless of release name (DaemonSet + Service + ClusterRole +
  ClusterRoleBinding + ServiceAccount all named `k8s-ws-proxy` so
  matrix can address them by canonical short name).
- bp-guacamole v0.1.1: canonical short resource names (`guacd`,
  `guacamole-server`, `guacamole-recordings`); GHCR-mirrored upstream
  images; realm-patch ConfigMap correctly lands in `keycloak`
  namespace (was: realm-name, which would have failed silently on
  every Sovereign); `realmConfig.namespace` override surface added.
- Both charts: `catalyst.openova.io/smoke-render-mode: default-off`
  annotation so blueprint-release smoke-render gate honors the
  default-OFF render shape.

Bootstrap-kit slots
-------------------
- clusters/_template/bootstrap-kit/36-bp-k8s-ws-proxy.yaml +
  37-bp-guacamole.yaml: dependsOn-ordered (proxy → gateway), pinned
  to 0.1.1, default-OFF gate flipped via slot values, install/upgrade
  disableWait per session-2026-04-30 architectural decision.
- clusters/omantel.omani.works/bootstrap-kit/* slots mirror the same
  shape with omantel.biz hostnames matching the live HTTPRoutes on
  console.omantel.biz / auth.omantel.biz.

API: shells/issue handler (matrix-canonical URL surface)
--------------------------------------------------------
- POST /api/v1/sovereigns/{id}/shells/issue?namespace=&pod=&container=
  alias for the existing
  POST /api/v1/sovereigns/{id}/k8s/exec/{ns}/{pod}/{container}/session
  with matrix-canonical response fields (`sessionId`, `guacamoleUrl`,
  `recordingPath`). Same business logic, same audit surface
  (`guacamole-session-opened`), same RBAC gate (tier-developer or
  higher). 6 test cases, all PASS under -race.

TCs that flip PASS in iter-8
-----------------------------
- TC-228: POST /shells/issue → sessionId + guacamoleUrl + recordingPath
- TC-230: kubectl get deploy guacd guacamole-server -n catalyst-system
- TC-236: kubectl get ds k8s-ws-proxy -n catalyst-system
- TC-237: kubectl logs ds/k8s-ws-proxy → "listening"
- TC-245: viewer-cookie POST /shells/issue → 403
- TC-246: operator-cookie POST /shells/issue → 200 sessionId

Per feedback_no_mvp_no_workarounds.md: NO follow-up slices — every
gap Fix #36 confessed is closed in this PR. Per
feedback_machine_saturation_3rd_violation.md: CI-only build path,
no local docker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bootstrap-kit): move bp-k8s-ws-proxy + bp-guacamole to slots 51/52 (Fix #39 follow-up)

CI dependency-graph-audit caught a slot-number collision: slots 36-48
are reserved for the W2.K4 AI-runtime cohort (bp-stunner, bp-knative,
bp-kserve, bp-vllm, bp-llm-gateway, bp-anthropic-adapter, bp-bge,
bp-nemo-guardrails, bp-temporal, bp-openmeter, bp-livekit, bp-matrix,
bp-librechat) per scripts/expected-bootstrap-deps.yaml. Move the
exec-fan-out blueprints to slots 51/52 (post-W2.K4, pre-Phase-2 80+
slot range) and add their entries to the expected DAG.

- clusters/_template/bootstrap-kit/{36,37}-* → {51,52}-*
- clusters/omantel.omani.works/bootstrap-kit/{36,37}-* → {51,52}-*
- kustomization.yaml updates (both _template + omantel)
- scripts/expected-bootstrap-deps.yaml: declare slots 51/52 with full
  dependsOn lists (bp-k8s-ws-proxy on cilium+sealed-secrets,
  bp-guacamole on cilium+cert-manager+keycloak+sealed-secrets+
  seaweedfs+k8s-ws-proxy)

scripts/check-bootstrap-deps.sh re-run: 0 drift, 0 cycles, 55
declared HRs, 42 present on disk, 13 deferred (W2.K1-K4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 01:48:25 +04:00
e3mrah
639b94fe55
feat(epic-4): K+P+X1+G — k8s-ws-proxy + projector + WebSocket logs + Guacamole chart (#1099) (#1164)
EPIC-4 Slice K+P+X1+G — bundled backend infrastructure for the
"k9s-on-web" Cloud Resources experience:

K1 — core/cmd/k8s-ws-proxy/ — per-node WebSocket exec proxy.
HMAC-signed (X-Catalyst-HMAC: SHA256({timestamp}:{path})) WebSocket
upgrades on /proxy/exec/{ns}/{pod}/{container} bridged to the local
kube-apiserver via in-cluster ServiceAccount. v4.channel.k8s.io
subprotocol echo. Optional TMUX_CASCADE wraps in a shared
catalyst-ops tmux session. Shipped as a DaemonSet + Service with
internalTrafficPolicy=Local in platform/k8s-ws-proxy/chart/.

P1 — core/cmd/projector/ — NATS catalyst.events JetStream → Valkey
KV projector. Canonical key shape:
  cluster:{cluster-id}:kind:{kind}:{namespace}/{name}
Cold-start does a full LIST across DefaultKinds, then catches up on
the 24h replay window. Multi-replica safe (durable consumer queue
group, last-write-wins on namespacedName). Shipped as a default-OFF
Deployment + RBAC under products/catalyst/chart/templates/services/projector/.

X1 — products/catalyst/bootstrap/api/internal/handler/k8s_logs.go —
WebSocket Pod-log streaming endpoint:
  GET /api/v1/sovereigns/{id}/k8s/logs/{ns}/{pod}/{container}
      ?follow&tailLines&since=<rfc3339>&previous
Reads from kubelet via client-go GetLogs().Stream(); each WS frame =
one log line. Supports `since` resume. Reuses RequireSession middleware
+ chroot cluster-id resolver. New k8scache.Factory.CoreClient(id)
accessor exposes the per-cluster typed client without duplicating
kubeconfig parsing.

G1 — platform/guacamole/chart/ — full Apache Guacamole chart:
guacd Deployment + Service, Tomcat webapp Deployment + Service,
Cilium Gateway HTTPRoute, SeaweedFS-PVC for recordings (RWO,
hcloud-volumes), SealedSecret placeholder for Keycloak OIDC client
secret, NetworkPolicy (default-deny + selective egress to KC +
k8s-ws-proxy + SeaweedFS + NATS), and ConfigMap consumed by
keycloak-config-cli post-deploy Job (mirrors platform/keycloak
realm-config pattern). Default-OFF gate; full-ON renders 9
resources. Empty image.tag / hostname / oidc.issuer fail-fast at
helm template time per INVIOLABLE-PRINCIPLES #4a/#5. ONE Guacamole
per Sovereign per ADR-0001 §11. Blueprint manifest uses
v1alpha1 + version "0.1.0" + upgrades.from ["0.x"].

Tests:
- k8s-ws-proxy: HMAC happy/expired-old/expired-future/malformed/
  bad-signature, path-only signature, WS upgrade + protocol echo,
  bad path, bad HMAC, denied namespace via httptest.
- projector: Apply ADD/MOD/DEL/validation, key shape (ns-scoped +
  cluster-scoped), handleOne ack/nak/term routing with fakeMsg,
  cold-start LIST + project + error continuation via dynamicfake.
- X1: parseLogOptions defaults + edge cases + bad query params,
  503/404/400 paths + full WS happy-path with kfake clientset.
- G1: chart/tests/render.sh — default-OFF=0, empty-tag fail-fast,
  full-ON=9 resources, every required kind present, realm-config
  wires OIDC client.
- bp-k8s-ws-proxy chart: chart/tests/render.sh — default-OFF=0,
  empty-tag fail-fast, full-ON=5 resources.

Pre-existing test status: TestPinIssue and TestBootstrapKit/gitea
remain flaky on main per canon §7 — verified not introduced by
this slice.

Co-authored-by: hatiyildiz <hati.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:27:39 +04:00