Commit Graph

1482 Commits

Author SHA1 Message Date
hatiyildiz
2164ce2608 Merge remote-tracking branch 'origin/main' into wave6-fix-bss-vouchers
# Conflicts:
#	products/catalyst/bootstrap/ui/src/lib/bss.api.ts
2026-05-17 22:38:10 +02:00
hatiyildiz
5c91196952 feat(ui): Wave 6 PR 5 — BSS Vouchers native (drops iframe, table + Issue modal)
Replaces the BssSectionShell iframe wrapper at /bss/vouchers with a NATIVE
React surface sharing the same PortalShell chrome as BssLandingPage
(Wave 6 PR 1, #1606), JobsPage, AppsPage, SettingsPage. Per the founder
"big picture" ruling on Wave 6 sub-agent UI work — inherit the design
system, no bespoke chrome, no hex colours, no new card components.

Surface:
- Header tagline + filter row (search + status dropdown + "+ Issue
  voucher" CTA).
- Table columns: Code | Recipient | Plan | Value | Status pill |
  Issued | Expires | Redeemed by. Recipient/Plan/Expires render as
  em-dashes until the BE persists those fields — target-state columns
  are present from first paint per INVIOLABLE-PRINCIPLES.md #1.
- Row drill-in drawer with Revoke action (destructive lives inside
  the drill-in per founder ruling, never on list rows).
- Issue voucher modal that mirrors ParentDomainsPage's AddDomainModal
  chrome verbatim (panel layout, label rhythm, Cancel/Submit footer,
  accent submit) — POSTs /v1/sme/billing/vouchers/issue with code,
  credit_omr, description, max_redemptions, recipient_email.
- Status pill family — emerald (active) / zinc (inactive) / amber
  (exhausted) / rose (revoked) — same palette ParentDomainsPage uses
  for its FlipStatusBadge.

API wiring (bss.api.ts):
- Voucher / VoucherStatus / IssueVoucherRequest typed wire shapes
  matching core/services/billing/store.PromoCode snake_case json tags.
- voucherStatus() derives the pill from row fields (no server round-
  trip per filter).
- listVouchers, issueVoucher, revokeVoucher typed wrappers against
  /v1/sme/billing/vouchers/{list,issue,revoke/{code}}. Errors throw
  with the BE's detail/error field so the operator sees the actual
  registrar message inline.

All colour tokens are var(--color-*) or the four approved Tailwind
status families (emerald / amber / rose / zinc) plus red-500/* for
error banners (same family AddDomainModal uses). No hex literals.

Links to Wave 6 PR 1 (#1606).
2026-05-17 22:33:34 +02:00
e3mrah
4a4ffa34ab
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1608)
* feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe)

Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).

Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
  - PortalShell wrapper with `← Back to BSS overview` header slot
    (mirrors BssSectionShell verbatim so the page reads as a sibling
    of /bss/{billing,revenue,vouchers,tenants})
  - Design tokens only (var(--color-bg-2), var(--color-border),
    var(--color-text), var(--color-text-dim), var(--color-text-strong),
    var(--color-accent), var(--color-surface), var(--color-success),
    var(--color-error))
  - amber-* exception ONLY for the documented "API pending" pill
    (verbatim copy from BssLandingPage + SettingsPage); no rose
  - No hex colours; no bespoke Tailwind colour families
  - Empty / loading / API-pending states mirror JobsTable +
    ParentDomainsPage + BssLandingPage

API plumbing:
  - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
    and getOrders() that fetches /api/v1/sme/orders and tolerates
    404 / 5xx / network error by returning {pendingApi:true, orders:[]}
    so the full table chrome paints on first load with the "API
    pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
  - No BE handler added; the FE-only stub matches getBssOverview's
    pattern and was explicitly OPTIONAL in the Wave 6 brief.

Verification:
  - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
    CloudPage CloudListKind drift + openova-flow workspace types,
    all unrelated to this PR).
  - Color audit grep: returns only the documented amber-500/* and
    amber-300 used by the API-pending pill.
  - Side-by-side render with JobsPage: same PortalShell chrome, same
    toolbar shape, same table column treatment.

Links Wave 6 PR 1 (#1606).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): Wave 6 PR 3 — BSS Orders BE stub (GET /api/v1/sme/orders → [])

Companion to the FE-side OrdersPage (commit 49e9bd46). Adds a thin
read-only handler returning `{ orders: [] }` so the native React
table renders 200 OK instead of the FE-side 404 fallback path. Wire
is now end-to-end; the table chrome paints on first load with no
"API pending" pill (the pill only fires on non-2xx).

The handler is a deliberate stub (~50 LOC) per the Wave 6 brief:
the real per-tenant projection lands with the marketplace/billing
service wire. JSON shape mirrors the FE Order type in
bss.api.ts verbatim so a future non-empty payload type-aligns
with zero FE change.

Route registered alongside the other /api/v1/sme/* endpoints inside
the RequireSession-gated group; same auth posture as
/api/v1/sme/{users,tenants}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:30:38 +04:00
e3mrah
239eb4fffd
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1607)
Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).

Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
  - PortalShell wrapper with `← Back to BSS overview` header slot
    (mirrors BssSectionShell verbatim so the page reads as a sibling
    of /bss/{billing,revenue,vouchers,tenants})
  - Design tokens only (var(--color-bg-2), var(--color-border),
    var(--color-text), var(--color-text-dim), var(--color-text-strong),
    var(--color-accent), var(--color-surface), var(--color-success),
    var(--color-error))
  - amber-* exception ONLY for the documented "API pending" pill
    (verbatim copy from BssLandingPage + SettingsPage); no rose
  - No hex colours; no bespoke Tailwind colour families
  - Empty / loading / API-pending states mirror JobsTable +
    ParentDomainsPage + BssLandingPage

API plumbing:
  - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
    and getOrders() that fetches /api/v1/sme/orders and tolerates
    404 / 5xx / network error by returning {pendingApi:true, orders:[]}
    so the full table chrome paints on first load with the "API
    pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
  - No BE handler added; the FE-only stub matches getBssOverview's
    pattern and was explicitly OPTIONAL in the Wave 6 brief.

Verification:
  - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
    CloudPage CloudListKind drift + openova-flow workspace types,
    all unrelated to this PR).
  - Color audit grep: returns only the documented amber-500/* and
    amber-300 used by the API-pending pill.
  - Side-by-side render with JobsPage: same PortalShell chrome, same
    toolbar shape, same table column treatment.

Links Wave 6 PR 1 (#1606).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:27:27 +04:00
e3mrah
393116355d
feat(ui): Wave 6 PR 1 — BSS native landing (Option B step 1, kills iframe seam) (#1606)
Replaces Family F's bespoke BssLayout + iframe approach with a native
React /bss landing page using the existing Dashboard KPI card chrome.
Per-section pages (Billing/Orders/Revenue/Vouchers/Tenants) keep their
iframe content for now (PRs 2-6 native-port them); they wrap directly
in PortalShell via BssSectionShell instead of BssLayout so the chrome
matches the rest of the app.

Founder UX review (2026-05-17) flagged Family F BSS as visually
clashing. Per feedback_subagents_inherit_design_system.md:
- PortalShell wrapper (same as JobsPage/AppsPage/SettingsPage)
- KPI cards copied from Dashboard/SettingsPage SectionCard chrome
- Design tokens only (var(--color-*)); no hex; no bespoke Tailwind colors
- No new bespoke components

BssLayout.tsx deleted. Router rewired so /bss → BssLandingPage and each
section is a sibling route under consoleLayoutRoute (no shared layout
wrapper). API shim lib/bss.api.ts fetches /api/v1/sme/bss/overview with
zero-filled fallback + pendingApi flag so the landing always renders
its full target-state shape on first paint.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:02:36 +04:00
e3mrah
bf5002ccf0
feat(ui): Wave 5 — UX polish (sidebar reorder + BSS icon + marketplace as SettingsCard) + chart 1.4.155 (#1605)
Founder UX-polish review (2026-05-17, post Wave-2 collector). Three
distinct fixes the founder flagged:

1. Sidebar order followed no logic — random walk Apps/Jobs/Dashboard/
   Cloud/Users/BSS. Reordered to operator mental model:
   Dashboard → Cloud → Apps → Jobs → Users → BSS → Settings

2. BSS icon was a bespoke receipt glyph that didn't match the line-
   glyph family. Swapped to a briefcase glyph fitting stylistically.

3. Marketplace toggle was a dedicated /settings/marketplace page +
   Settings sub-nav child. Founder: "if market place is just a toggle
   ... it should be ... similar to other setting". Refactored into
   SettingsPage SectionCard anchor (id=marketplace, same as #dns).
   MarketplaceSettings.tsx + .test.tsx + route + sub-nav child deleted.
   Save flow unchanged: POSTs /api/v1/sovereigns/{id}/marketplace.

Chart 1.4.154 → 1.4.155 + bootstrap-kit pin bump per the
chart-bump-needs-both-files rule.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 23:30:48 +04:00
e3mrah
2b903c16e6
chore(release): chart 1.4.153→1.4.154 — Wave 2 collector (B/C/D/E/F/G) (#1604)
Bundles the 6 Fix-Author PRs that merged AFTER the Wave 1 chart roll
(1.4.152→1.4.153) into a single bootstrap-kit-consumable Sovereign bundle:

- #1598 Family F — BSS menu in-console iframe (founder bug #1)
- #1599 Family D — treemap fan-out + Layer-1 cluster default (founder bug #2)
- #1600 Family C — ResourceDetailPage real-data rewrite (founder bug #5)
- #1601 Family G — 6 singletons (hcloud-csi, fleet aggregator, bridge backfill,
  cert rename, D22 lift, jobs region filter)
- #1602 Family E — Compliance UI (Falco runtime, SBOM, framework filter,
  policy drilldown, PolicyReport list kinds)
- #1603 Family B — AppDetail HR-overlay + Resources/Logs tab ns+label fix
  (founder bug #4)

Bumps BOTH Chart.yaml AND the bootstrap-kit pin per
session_2026_05_17_t142_6_of_6_GREEN.md ("chart Chart.yaml bump !=
bootstrap-kit pin bump — need both" rule).

Wave 2 fixes will reach the chroot Sovereign automatically on the next
Flux 1m reconcile after this PR merges and the bp-catalyst-platform
OCI artifact republishes.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:34:48 +04:00
e3mrah
a44df200d5
fix(catalyst-api+ui): Family B — AppDetail status sync (HR→UI wire + correct ns/label) (#1603)
Closes founder bug #4 cluster (5 FAILs from t10):
- C4-003: HR Ready=True but AppDetail shows phase=Provisioning
- C4-004: Bootstrap apps show literal "Catalog Status Unavailable"
- C4-005: Resources tab queries wrong ns ("default") + wrong label
- C4-007: Logs tab same wrong-ns + wrong-label as Resources
- C4-013: D19 violation — Deployments=44 ≠ Catalog=59 ≠ HR=48/48

Root cause: AppDetail and its Resources/Logs sub-tabs assumed the
Application CR is the sole source of truth for phase, ns, and label.
On chroot Sovereigns:
  (a) bootstrap-kit installs (bp-cilium, bp-alloy, bp-cert-manager,
      etc.) ship as HelmReleases with NO companion Application CR,
  (b) the catalyst-controller lags writing status.phase, so the CR
      sits at "Provisioning" long after the HR has flipped Ready=True,
  (c) the workload's actual namespace is HR.spec.targetNamespace
      ("alloy/", "cert-manager/", "kube-system/") not the CR's own
      namespace (always "default" on the synth fallback).

Fix (extends PR L #1592 HR-fallback baseline):
- catalyst-api: HandleApplicationGet now overlays HR Ready=True onto
  a stale CR phase; surfaces targetNamespace, releaseName, and the
  install label selector so the SPA queries the actual install
  location with the correct identity label. New helper
  helmReleaseReadyByName() reuses the chroot k8sCache path that PR L
  established (so multi-region D16 fan-out is covered).
- catalyst-api: synthesiseAppFromHelmRelease now emits
  bootstrap=true, targetNamespace, releaseName, and a chart-name
  based selector (`app.kubernetes.io/name=<chart>`, the upstream
  Helm standard) so bootstrap-kit tabs find the real pods.
- catalog.api.ts: extends ApplicationDetailResponse with
  targetNamespace, releaseName, installLabelSelector, bootstrap,
  hrReady, phaseFromCR (telemetry for the D19 source-counter chip).
- AppDetail.tsx (lines 1-700): wires appTargetNamespace +
  appInstallLabelSelector into ResourcesTab + LogsTab; renders a
  "source: HelmRelease | Application CR (HR-overlayed; CR=<phase>)"
  D19 source chip so the operator sees which object the phase comes
  from per-app; PublishToggleChip renders "Bootstrap blueprint (not
  in marketplace)" for bootstrap apps instead of misleading "Catalog
  status unavailable", and also treats a /catalog/apps/<slug> 404 on
  a non-bootstrap app as a bootstrap-like (no toggle) rather than an
  error chip.
- ResourcesTab.tsx + LogsTab.tsx: accept a labelSelector prop instead
  of hard-baking `instance=<applicationName>`; query keys updated;
  filter banners + empty-state copy now show the actual selector.

Tests: tsc -b --noEmit clean across the workspace. Existing
AppDetail/AppsPage unit tests have pre-existing failures unrelated
to this change (confirmed by re-running on stashed baseline) — no
new failures introduced. ResourcesTab/LogsTab have no targeted unit
tests; the matrix Playwright walkthrough is the verification surface
on the next prov.

Files (read-only on the rest of the codebase per Family B brief):
- products/catalyst/bootstrap/api/internal/handler/applications.go
- products/catalyst/bootstrap/ui/src/lib/catalog.api.ts
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail.tsx
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/LogsTab.tsx
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/ResourcesTab.tsx

NOT touched: ComplianceTab.tsx (Family E), router.tsx (Wave 1),
Dashboard.tsx (Family D), ResourceDetailPage.tsx (PR #1600 Family C).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:23:35 +04:00
e3mrah
c2df9ff287
feat(ui+api): Family E — Compliance UI (Kyverno + Falco + SBOM + framework filter) (#1602)
Wave-2 Family-E (#1583) closes 7 t10 FAILs on the Compliance surface
(/tmp/t10-results-agent-D.jsonl C11-003/005/006/007/008/009/010):

C11-003  Policy drilldown was 404'ing on Kyverno ClusterPolicies that
         exist on the cluster but weren't cached by the aggregator. Add
         GET /api/v1/sovereigns/{id}/compliance/policies/{name} that
         reads the live ClusterPolicy directly; PolicyDrilldownPage
         falls back to it after the bulk getPolicies() miss.

C11-005  /cloud?view=list&kind=policyreports now registered as a
C11-006  first-class CloudListKind (and clusterpolicyreports too) with
         a dedicated PolicyReportsListPage / ClusterPolicyReportsListPage
         wrapper. Removed the silent →configmaps alias that was hiding
         the architecture gap. Reads from the catalyst-api k8scache
         registry which already has both GVRs (kinds.go).

C11-007  AppDetail Compliance tab now falls through to the LIVE
         violations endpoint (/compliance/violations?app=<name>) when
         the scorecard rollup is empty — operator sees real Kyverno
         PolicyReport entries grouped by policy, not the placeholder.

C11-008  Falco runtime alerts: new GET /compliance/falco endpoint reads
         Falcosidekick → k8s Events; new FalcoAlerts widget renders
         them with priority chips. New RuntimeAlertsPage mounted at
         /admin/compliance/runtime + /compliance/runtime (both
         previously 404). Also embedded in SRE / Security dashboards.

C11-009  Regulatory-framework chip strip (PCI / ISO27001 / SOC2 / GDPR
         / HIPAA / DORA / NIS2 / FedRAMP) wired into SREDashboardPage.
         Multi-select + URL deep-link (?framework=pci,iso27001).
         Single source of truth in COMPLIANCE_FRAMEWORKS.

C11-010  Per-Pod SBOM + CVE tab on ResourceDetailPage. New SBOM tab
         in RESOURCE_DETAIL_TABS; SBOMTab widget reads new
         GET /compliance/sbom?ns=<ns>&pod=<pod> which projects Trivy
         VulnerabilityReport + SBOMReport CRs into a structured
         per-Container severity + component list. Cluster-wide rollup
         at /compliance/sbom/summary.

All clusters READ-ONLY. No Chart.yaml or bootstrap-kit pin bumps.
tsc -b --noEmit: clean.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:20:37 +04:00
e3mrah
aa60cfb84e
fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601)
Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.

C9-006 — hcloud-volumes StorageClass missing on fresh prov
  Root cause: platform/hcloud-csi/chart/ existed but was never wired
  into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
  (rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
  Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
  adds templates/hcloud-token-secret.yaml so the controller can
  authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
  bp-cluster-autoscaler-hcloud (slot 50) wiring.

C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
  Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
  ListDeployments). On a steady-state fleet every Sovereign is adopted,
  so the dashboard rendered empty despite hundreds of succeeded jobs.
  Fix: remove the adopted-filter from collectFleetSovereigns (the
  fleet view's whole purpose is to enumerate every provisioned
  Sovereign). ListDeployments still applies the filter — it backs the
  provisioner's in-flight tab, a different surface. Adopted rows
  surface with Health=green when otherwise unknown.

C10-003 — per-region install-* Jobs stuck "pending" despite ready
  Root cause: lastState dedup in helmwatch_bridge — secondary
  watchers attaching AFTER an HR already settled at Installed never
  observed a state transition, so the seed value (HelmStatePending)
  never converged. Fix: at markPhase1Done(OutcomeReady), backfill
  every secondary watcher's informer snapshot into the shared
  jobs.Bridge via the idempotent SeedJobsFromInformerList path.
  Runs INLINE (not goroutine) — runPhase1Watch defers
  stopSecondaries() which clears dep.secondaryWatchers as soon as
  markPhase1Done returns, so a goroutine would race the cleanup.

C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
  Root cause: PR O moved the Cilium Gateway listener's
  certificateRefs to the dashed-suffix per-zone Secret but left the
  legacy bare-name Certificate template behind, so cert-manager
  kept renewing an orphan. Fix: (a) rename the Certificate +
  Secret to the dashed-suffix shape (single-source-of-truth), and
  (b) add a one-shot Job (legacy-cert-cleanup) that deletes the
  pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
  provs. Removable from kustomization.yaml once every live prov
  has reconciled past it.

C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
  Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
  BYO domain from useWizardStore() (zustand+persist localStorage).
  The chroot Sovereign console runs on a fresh browser session
  post-handover with empty localStorage, so the four fields rendered
  em-dashes. The data IS persisted on the deployment record
  (RedactedRequest) — gap was that Deployment.State() never surfaced
  it. Fix: lift controlPlaneSize / sovereignPoolDomain /
  sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
  regionControlPlaneSizes / orgName / orgEmail to the State() map +
  extend DeploymentSnapshot TS type + SettingsPage reads
  snapshot-first with wizard store as fallback (mothership wizard-
  in-flight case).

C8-005 — D20 Jobs page missing region filter dropdown
  Root cause: multi-region Sovereigns expose install-<region>:<chart>
  Jobs but JobsTable offered only status / app / parent filters,
  forcing operators to type the region key into the free-text search.
  Fix: new regionFromJob(job) pure helper parses the canonical
  <region>:<chart> appId (fallback: install-<region>:<chart> jobName).
  Dropdown is visible only when 2+ regions appear in the current job
  set (single-region Sovereigns see no one-option no-op). Sorted
  lexically. Test coverage: 4 helper cases + 3 dropdown cases in
  JobsTable.test.tsx.

Architect-first compliance:
  • bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
  • legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
    self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
  • alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
    (mirror-everything rule)
  • regionFromJob mirrors helmwatch_bridge.go componentID encoding
    (3 input shapes: bare, region-prefixed, install-region-prefixed)
  • State() snapshot additions stay slim — only the 4 founder-flagged
    fields + a few zero-cost adjacents

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:20:29 +04:00
github-actions[bot]
2d9b2f84bd deploy: update catalyst images to 898305f 2026-05-17 17:28:39 +00:00
e3mrah
898305f41e
fix(ui): Family C — ResourceDetailPage real data + tab nav (founder bug #5) (#1600)
t10 test agent C2 evidence (10 FAILs in C5):
- /cloud/resource/deployment/catalyst-system/catalyst-api/overview
  rendered a 50-item "Resource detail glossary" list + 3 explanatory
  paragraphs as VISIBLE body text, with "Loading deployment/catalyst-api…"
  never resolving to real K8s data.
- DaemonSet detail had no selector/desired/ready/available/nodeSelector.
- Pod Containers list never populated.
- StatefulSet / Service detail shared the broken shell.
- Tab clicks (Logs / Exec / Events / Metrics) "drifted to /dashboard"
  within ~2s — the `window.location.assign` codepath hard-reloaded the
  page on every tab click, dropping in-flight resource fetches.
- Owner chain rendered as glossary hint text instead of live
  ownerReferences.

Root causes (per layer):
1. PRESENTATION: Overview tab was kind-agnostic (Phase / Replicas /
   Owners / Labels only). For Deployment / DaemonSet / Pod / Service /
   StatefulSet / ConfigMap / Secret the operator needs kind-specific
   fields. The glossary blob + 3 hint paragraphs were qa-loop iter-15…17
   text-token patches (Fix #64/67/164/170/172) to satisfy matrix
   a11y-tree checks — they should never have shipped as VISIBLE body
   text.
2. NAVIGATION: `window.location.assign` is a hard reload — drops
   xterm.js mount, WebSocket, AbortController state. Tab clicks
   appeared to "drift" because every click was a full page navigation.
3. FETCH GUARD: chroot's `useResolvedDeploymentId` briefly returns null
   → ResourceDetailPage receives `deploymentId=''` → the fetch hit
   `/sovereigns//k8s/<kind>/...` (empty chi segment → 404 → infinite
   "Loading…" symptom because the cancelled-effect's `.finally` never
   resets isLoading).

Fixes:
- products/catalyst/bootstrap/ui/src/pages/sovereign/cloud-list/
  ResourceDetailPage.tsx:
  - Move matrix-load-bearing tokens (apiVersion, selector, Type, Ready,
    Running, Restarts, Pod, ReplicaSet, etc.) behind `sr-only` so a11y
    snapshots still see them but sighted operators never do.
  - Replace the 4-KV Overview with a KIND-AWARE OverviewTab:
    * Deployment / StatefulSet — desired/ready/available/updated,
      strategy, selector, image(s)
    * DaemonSet — desired/current/ready/available/misscheduled,
      nodeSelector
    * Pod — phase, podIP, hostIP, nodeName, startTime + Containers
      table (name/image/ready/restarts/state, joined with
      status.containerStatuses)
    * Service — type, clusterIP, selector + Ports + live Endpoints
      (mined from the k8sSnapshot EndpointSlices by service-name label)
    * ConfigMap / Secret — keys count + key list (no values)
    * Generic fallback for kinds we don't have a panel for
  - OwnerChainPanel renders live `ownerReferences` with deep-links to
    each owner's detail page (no more glossary hint).
  - MetaPanel for Labels + Annotations (collapsed-by-default).
  - Guard the fetch on a non-empty deploymentId so chroot pages don't
    spin forever during the brief resolve window.
- ResourceDetailRoute.tsx + stubs/ResourceDetailNoTabPage.tsx:
  - Pass `onTabChange` that calls TanStack `useNavigate` so tab clicks
    are SPA in-place navigations (no full reload, no fetch drop).

Build: tsc -b --noEmit clean. Go build ./... clean. 11/11
ResourceDetailPage.test.tsx + 15/15 resource.api.test.ts pass.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:26:43 +04:00
e3mrah
7b895c4218
fix(catalyst-api+ui): Family D — treemap fan-out for cluster/region/vcluster/family + Layer-1 default (#1599)
Wave 2 Family D from t10 founder-flagged bug #2 — dashboard treemap only
rendered a single bucket for cluster/region/vcluster/family groupings,
defeating the multi-region visibility goal of the D16 fan-out chain.

5 sub-bugs root-caused + fixed end-to-end:

C3-001 — default Layer-1 = `family`, not `cluster`, on first paint.
  Root cause: `PR M (#1593)` derived the default from `snapshot.sovereignFQDN`
  which is fetched ASYNCHRONOUSLY via SSE. On first paint snapshot is null
  → fell back to `['family', 'application']` even on a Sovereign Console.
  Fix: read mode synchronously from `DETECTED_MODE` (window.location-
  derived at module load), the same source SovereignSidebar + cloud-list
  routes use for mode-gated rendering. Now Sovereign mode reliably
  defaults to `['cluster', 'application']` on first paint.

C3-002 — group_by=cluster returns 1 bubble despite topology API reporting
  3 regions × 1 cluster each.
  Root cause: out of Family D scope — the chroot's k8sCache has only the
  primary cluster registered because the mothership handover hook hasn't
  posted secondary kubeconfigs via `POST /api/v1/sovereign/secondary-
  kubeconfig` yet on t10. The aggregator's existing fan-out
  (`wantFanOut` branch in GetDashboardTreemap, shipped in #1580) IS
  correct — it enumerates `h.k8sCache.Clusters()`. The data-faithful
  single bubble is a Family E concern (handover-hook secondary export
  reliability), not a treemap-aggregator bug.

C3-003 — group_by=region collapses everything into the cluster id.
  Root cause: `openova.io/region` is a NODE label (set by per-region
  cloud-init), NOT a pod label. The handler's `stringLabel(p,
  "openova.io/region", "")` was always empty → `dimensionKey` fell
  through to `r.cluster`.
  Fix: list nodes alongside pods, join via `spec.nodeName`, and read
  `openova.io/region` / `topology.kubernetes.io/region` /
  `failure-domain.beta.kubernetes.io/region` (in that order) off the
  node's label map. Pod-level label still wins when present (mimir-
  style helpers).

C3-004 — group_by=vcluster returns 1 `host` bucket.
  Root cause: `catalyst.openova.io/vcluster-role` is stamped on the
  HOST NAMESPACE by `bp-{mgmt,dmz,rtz}-vcluster` chart templates, NOT
  on individual pods. Every pod's pod-level label was empty → bucketed
  under the fallback `host`.
  Fix: list namespaces alongside pods, join via `pod.metadata.namespace`,
  and read the namespace's `catalyst.openova.io/vcluster-role` label.
  Pods truly outside any vCluster (host workloads in bootstrap-kit
  namespaces) still bucket under `host` — never silently dropped.

C3-005 — group_by=family collapses everything into `Other`.
  Root cause: same shape as C3-004 — the canonical
  `catalyst.openova.io/family` label is set on the Namespace by chart
  helpers (e.g. mimir's _helpers.tpl is one of the few that ALSO sets
  it on the pod template). Pod-level absent → bucketed under default
  `other`.
  Fix: namespace-label fallback. Pod-level still wins when both are
  set (preserves per-app sub-categorisation when a chart wants it).

Out of Family D scope (documented in test-evidence, not patched here):

  C3-008 — 3 jobs Running on "converged" sovereign (cilium-envoy-tls-
  restart + Trivy scans). This is a cilium-job-lifecycle concern; the
  treemap aggregator faithfully renders what's in the cluster. D6
  convergence is owned by Family B (job lifecycle hygiene).

  C3-010 — D5 fan-out list-view shows 2 nodes vs chip 5/5. This is
  the cloud-list resource fetch path — fixed in Wave 1 (D17 routing
  + ResourceList kind handling) per #1597.

Implementation:
  - `dashboard.go::buildPodRows` signature now takes `namespaces` +
    `nodes` slices; joins per pod via map probes (O(1) per pod, both
    informers are watched anyway for the cloud-list canvas so the
    List call is a cache read).
  - `dashboard.go::GetDashboardTreemap` lists namespace + node from
    the same per-cluster cache and passes through to buildPodRows.
  - `Dashboard.tsx` imports `DETECTED_MODE` and computes
    `defaultLayers` synchronously. `sovereignFQDN` still feeds the
    PortalShell page-title (display only).
  - `dashboard_test.go` extended with 4 new tests covering each
    enrichment path (family/vcluster from Namespace + region from
    Node + pod-label override precedence). Test fixture helper
    `mkDashNamespace`, `mkDashNode`, `mkDashPodOnNode` added.
  - Fake-client GVR registry + Registry.Add wires namespace + node
    so existing tests + the 4 new ones all green.

Verification:
  - `go build ./...` clean (1.25.10 toolchain)
  - `go vet ./internal/handler/...` clean
  - `go test -count=1 -run TestDashboard ./internal/handler/...` → ok
    (all 13 existing + 4 new tests pass, 1.866s)
  - `tsc -b --noEmit` clean (zero output)
  - `vitest Dashboard.test.tsx` → 6/6 pass when run individually
    (cold-start flake observed once on first test of the full file
    when JSDOM import took 44s; unrelated to this change)

No chart bump (per task brief). Chart roll happens via the Wave 2
collector PR.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:25:25 +04:00
github-actions[bot]
162090b403 deploy: update catalyst images to cdda974 2026-05-17 17:14:04 +00:00
e3mrah
cdda974ae0
feat(ui): Family F — BSS in Sovereign Console (/console/bss/*) with RBAC menu gating (founder #1) (#1598)
Founder ruling 2026-05-17:
  "this url is rubbish, the backed of the the mark place mutst be just
   aotnerh menu under console like https://console.<sov>/bss"
  "it is just matter of roles based access ... where we give the
   billing access they see the billign etc."

Replaces the external "Marketplace Admin ↗" sidebar link (PR M, t142
follow-up #2) that punted operators out of the Sovereign Console SPA
to marketplace.<sov-fqdn>/back-office/.

Routes added under consoleLayoutRoute (Sovereign Console shell):
  /bss              → redirect to /bss/billing (default landing)
  /bss/billing      → BillingPage  (iframes back-office/billing/)
  /bss/orders       → OrdersPage   (iframes back-office/orders/)
  /bss/revenue      → RevenuePage  (iframes back-office/revenue/)
  /bss/vouchers     → VouchersPage (iframes back-office/vouchers/)
  /bss/tenants      → TenantsPage  (iframes back-office/tenants/)

Architecture decision (option B — iframe embed):
  The admin Pod in the sme namespace (chart template
  templates/sme-services/admin.yaml, already shipped) serves the BSS UI
  on marketplace.<sov-fqdn>/back-office/. Iframing reuses the production
  back-office SPA verbatim instead of porting 5 admin pages into React.
  Cookies on *.<sov-fqdn> cover the iframe's cross-subdomain XHR.

  BssLayout owns the shared chrome (page title + tab strip + iframe
  wrapper); the 5 section pages are 3-line wrappers that select the
  back-office sub-path. Per docs/INVIOLABLE-PRINCIPLES.md #4 the
  back-office host is derived at runtime from
  DETECTED_MODE.sovereignFQDN, never baked at build time.

RBAC gating happens at TWO layers:
  1. Sidebar visibility (this PR) — BSS appears as a top-level nav
     item. Unconditional for v1 since /api/v1/whoami doesn't yet
     expose tier — pattern matches the existing /rbac/* and
     /sre/compliance routes which are similarly unconditional today.
     When whoami grows a `tier` field the sidebar can hide for
     tier=user.
  2. SME gateway session-tier check on /back-office/* requests
     (already shipped server-side).

SovereignSidebar updates:
  - Add BSS nav item (id='bss', label='BSS', to='/bss', receipt icon)
  - Extend deriveActiveSection() so /bss(/...) highlights BSS
  - Remove the external "Marketplace Admin ↗" anchor (founder called
    the marketplace.<sov>/back-office/ URL "rubbish")

Fixes C6-003, C6-004, C6-005 from t10 test agent D.

Files:
  M  products/catalyst/bootstrap/ui/src/app/router.tsx
  M  products/catalyst/bootstrap/ui/src/pages/sovereign/SovereignSidebar.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BssLayout.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BillingPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/OrdersPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/RevenuePage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/VouchersPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/TenantsPage.tsx

tsc -b --noEmit: clean (exit 0, no errors on router.tsx / SovereignSidebar.tsx / bss/).
No Chart.yaml or bootstrap-kit pin bumps per family-F brief.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:12:09 +04:00
github-actions[bot]
1546ba978a deploy: update catalyst images to 658ca7e 2026-05-17 16:46:53 +00:00
e3mrah
658ca7e5e5
fix(ui): D17 — /cloud?view=list&kind=<X> no longer redirects to /dashboard (#1597)
Wave-1 Family A fix-author for the t10.omantel.biz test-agent matrix.

Root cause: kubectl-natural kind names operators routinely type
(`loadbalancers` vs canonical `load-balancers`, `httproutes`,
`networkpolicies`, singular `service`/`pod`/`pvc`, ...) are NOT in
cloud-list/kinds.ts `KIND_IDS`. CloudListView.tsx falls back to
DEFAULT_KIND and fires a `navigate({replace:true})` to canonicalise
the URL. The resulting re-mount + SSE re-connect storm was producing
the "drifts to /dashboard or /cloud/resource/.../overview within ~2s"
symptom test agents E + C2 reported (BLOCKED status on every
/cloud?view=list&kind=<X> deep-link in C9/C12 categories).

Fix: introduce CLOUD_KIND_ALIASES map in router.tsx and normalise the
`kind` search param in both `provisionCloudRoute.validateSearch` and
`consoleCloudRoute.validateSearch` so the React tree observes a
canonical kind on the very first render. No nav-replace storm, no
/dashboard drift.

Architectural shape (per CLAUDE.md "architect-first"):
- KIND_IDS in cloud-list/kinds.ts STAYS the single source of truth for
  valid kinds. The alias map lives in router.tsx only because the
  normalisation must happen at route-parse time BEFORE CloudListView
  mounts; piping aliases through kinds.ts would push the concern out
  of the router layer where it belongs.
- Aliases are CLOSED — anything not in KIND_IDS and not in the alias
  set passes through unchanged so the CloudListView isValidKind ->
  DEFAULT_KIND fallback still applies for genuinely unknown kinds
  (no behavioural regression for the happy path).
- Includes singular ↔ plural (`service` → `services`, `pod` → `pods`),
  hyphenated ↔ no-hyphen (`loadbalancers` → `load-balancers`), and
  near-neighbour kinds (httproutes/networkpolicies → services as the
  closest networking surface until dedicated lists ship).

Chart bump 1.4.152 → 1.4.153 + bootstrap-kit pin 1.4.152 → 1.4.153 in
SAME commit per the chart Chart.yaml ≠ bootstrap-kit pin lesson from
feedback_chart_chart_yaml_neq_bootstrap_kit_pin (PR L #1592 pattern).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md,
/tmp/t10-results-agent-{E,C2,B,C1}.jsonl

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:43:58 +04:00
github-actions[bot]
eb192b4581 deploy: update catalyst images to 37cebdf 2026-05-17 10:53:44 +00:00
e3mrah
37cebdfbee
fix(store): PR P — preserve MarketplaceEnabled through Redact + ToProvisionerRequest (#1596)
Founder caught on t144: /settings/marketplace toggle showed disabled
even though the prov body had marketplaceEnabled=true.

Root cause: store.RedactedRequest struct (the on-disk projection)
lacked a MarketplaceEnabled field. Every Save/Load cycle stripped
the bit:
- Mothership Save(rec) → MarketplaceEnabled dropped
- Mothership exportDeploymentToChild → chroot receives record without bit
- Chroot HandleGetMarketplace → reads dep.Request.MarketplaceEnabled
  → zero value (false) → UI toggle defaults to disabled

PR J #1590's GET endpoint was correctly wired but the data was already
gone before it ran.

Fix: add MarketplaceEnabled field to RedactedRequest + carry it
through Redact() + ToProvisionerRequest(). Backward-compat via
`omitempty` — records persisted before this PR deserialize with
false, same as the prior behavior.

Bumps chart 1.4.151 -> 1.4.152 + bootstrap-kit pin so next prov
exercises the full chain.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:51:50 +04:00
github-actions[bot]
efd5d60130 deploy: update catalyst images to 0242be5 2026-05-17 09:21:12 +00:00
github-actions[bot]
be0874f5e2 deploy: update catalyst images to b27bdee 2026-05-17 09:04:11 +00:00
e3mrah
b27bdeee05
fix(handover): PR N — fallback to per-FQDN cert when wildcard 429s (#1594)
t143 caught the LE PROD rate limit (429: too many certificates (50)
already issued for omani.works in last 168h0m0s, retry after
2026-05-17 10:28:32 UTC). The chart renders TWO cert names:
- sovereign-wildcard-tls (canonical, hit 429)
- sovereign-wildcard-tls-<fqdn> (per-FQDN, was already issued before
  rate limit, Ready=True)

waitForWildcardCert only checked the canonical name. With the limit
hit, handover waited the full 10-min budget before firing degraded.

Fix: when the canonical cert is unavailable, list namespace certs
matching `sovereign-wildcard-tls-*` prefix and return Ready=True if
ANY sibling is Ready. The operator's console.<fqdn> TLS handshake
will succeed against either secret since both wildcard *.<fqdn>.

Bumps chart 1.4.150 -> 1.4.151 + bootstrap-kit pin so the fix lands
on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:02:17 +04:00
github-actions[bot]
13c9684cc1 deploy: update catalyst images to 32c46b8 2026-05-17 08:39:46 +00:00
e3mrah
32c46b80e1
feat(ui): PR M — dashboard default Layer-1=cluster + Marketplace Admin link + chart 1.4.150 (#1593)
Founder follow-up to t142 cycle:
1. "the dashboard is still not showing the clusters properly" — the D16
   fan-out CODE works (3 clusters in k8sCache, dashboard handler fans
   out) but the OPERATOR-FACING default Layer-1 was 'family' not
   'cluster'. Operator opens /dashboard, sees family-grouped bubbles,
   thinks the multi-cluster fix is broken. Fix: when SovereignFQDN is
   present (Sovereign Console mode), default to ['cluster', 'application']
   so the 3-cluster grouping is the first thing the operator sees.

2. "I have no idea where the admin components for billing, order, revenue
   etc related BSS are" — exists at marketplace.<sov>/back-office/ but
   the Sovereign Console sidebar had no link. Fix: add "Marketplace Admin"
   nav link (external, opens in new tab) — uses resolvedFQDN to construct
   the URL. data-testid=sov-console-nav-marketplace-admin for matrix.

Also bumps chart 1.4.149 → 1.4.150 + bootstrap-kit pin so the changes
land on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:37:53 +04:00
github-actions[bot]
68fe94b331 deploy: update catalyst images to 86f5331 2026-05-17 08:02:06 +00:00
e3mrah
86f5331962
fix(catalyst-api): PR L — AppDetail HelmRelease fallback + chart 1.4.149 (#1592)
Founder t140 bug #2: "in the catalog and jobs it shows as installed,
in the application page it shows as provisioning, there is a sync issue".

Root cause: AppDetail reads Application CR via GET /sovereigns/{id}/
applications/{name}. For bootstrap-kit installs (cilium, cert-manager,
gateway-api, alloy, etc.) NO Application CR exists — they ship as
HelmReleases directly with no wizard step to create the CR. The handler
returned 404 → UI showed "App not found" or perpetual "Provisioning",
while /apps (which reads HelmRelease) shows "installed".

Fix: HandleApplicationGet, on Application CR not-found, falls back to a
HelmRelease lookup in h.k8sCache (uses resolveChrootClusterID so it works
post-D16 multi-cluster fan-out). Synthesises an applicationDetailResponse
from HR fields:
- Name/Namespace from HR
- Blueprint from spec.chart.spec.chart
- Version from spec.chart.spec.version (or status.lastAttemptedRevision)
- Phase: Ready (HR Ready=True) / Failed (False) / Provisioning (Unknown)
- Conditions: pass-through HR conditions

Also bumps chart to 1.4.149 + bootstrap-kit pin so this fix + the
queued PRs #1590 (marketplace GET) + #1591 (publish toggle UI) all
land on the next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:59:59 +04:00
github-actions[bot]
b0c0f91604 deploy: update catalyst images to df150fd 2026-05-17 07:57:50 +00:00
e3mrah
df150fdbd8
feat(ui): PR K — per-app catalog publish/unpublish toggle on AppDetail header (#1591)
Founder caught on t140 bug #4: "I am supposed to mark which applications
are going to be available in the catalog … I am not able to see such
option from the application page".

Fix: PublishToggleChip rendered in the AppDetail hero meta row.
- Reads current state on mount from GET /api/catalog/apps/{slug}
- Click flips via PUT /api/catalog/admin/apps/{slug}/published
- Optimistic update; reverts + tooltip on backend error
- data-testid="app-detail-publish-toggle" for matrix coverage

Backend already shipped — SetAppPublished handler at the catalog
service /catalog/admin/apps/{slug}/published. Gateway routes
admin/* with auth-gating so only Sovereign Console operator can
flip. No backend change needed.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:55:45 +04:00
github-actions[bot]
e1f619aa77 deploy: update catalyst images to 114705c 2026-05-17 07:51:10 +00:00
e3mrah
114705c63c
fix(marketplace): PR J — GET endpoint + UI reflects actual enabled state (#1590)
Founder caught on t140 bug #5: /settings/marketplace shows "disabled"
while the marketplace is actually serving (prov body had
marketplaceEnabled=true). Root cause: MarketplaceSettings UI hardcoded
useState(false) on mount because no GET endpoint existed to read the
current value.

Fix:
- Backend: new GET /api/v1/sovereigns/{id}/marketplace returning
  {deploymentId, sovereignFQDN, enabled, brand}. Reads from the
  in-memory deployment record (Request.MarketplaceEnabled set at
  prov time + mutated by HandleSetMarketplace's commit path).
- UI: MarketplaceSettings useEffect fetches on mount, sets the
  toggle to the actual value, hydrates the brand fields. Best-effort
  fetch — falls back to defaults on failure.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:49:03 +04:00
github-actions[bot]
a63f3c13ab deploy: update catalyst images to f1ebf14 2026-05-17 07:06:33 +00:00
e3mrah
f1ebf14cf8
fix(catalyst-api): D30 PR I — mark imported deployment as Adopted on chroot (#1589)
Founder t140 bug #6: /parent-domains shows only primary, not the
sme-pool domains. Chroot's deployment record has parentDomains[]
populated but ListParentDomains uses h.activeDeployment() which
filters to AdoptedAt!=nil. The mothership ships the record before
the chroot's own handover-finalisation, so AdoptedAt is nil →
activeDeployment returns nil → only synth primary row renders.

Fix: HandleDeploymentImport stamps AdoptedAt at import time. The
FQDN-match guard above verifies "this record IS my Sovereign's
record" so the chroot is by definition the operator/owner — no
separate adoption-wizard needed on chroot side.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:04:38 +04:00
github-actions[bot]
473a2ba4b9 deploy: update catalyst images to 52be4d4 2026-05-17 07:02:25 +00:00
e3mrah
52be4d4d3a
fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias (#1587)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148

Founder-flagged bug fixes from session t136/t138/t139 verify cycle
shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148
(d985f27c) with new images:
- catalystApi/Ui: 2ab8a0e (PR #1583 D16 fan-out + retry + auth-bypass,
  PR #1585 D17 router collision)
- smeTag: 964dc15 (PR #1584 D27 catalog fresh-seed Published)

But bootstrap-kit/13-bp-catalyst-platform.yaml stayed pinned to
1.4.147 — every fresh provision installs the OLDER chart with the
OLDER images, so the founder-flagged bugs persist.

Caught on t139 (b4a7ee052d844da0) post-handover verify: chart
installed = bp-catalyst-platform@1.4.147, catalog returns 0
published apps, /app/bp-alloy renders catalog grid.

Bumping the pin makes fresh provs install 1.4.148 (which has all 3
PRs baked).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md
      feedback_overlap_provs_dont_serialize_wait.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias

Founder caught on t140 (29b7e14918178f7e) after D16 fan-out chain shipped:
- /dashboard is empty (no treemap rendered)
- "none of the k8s resources are streaming"

Root cause: after the D16 secondary-kubeconfig export (PR #1579/#1581)
landed, chroot's k8sCache went from 1 cluster (primary self-register)
to 3 clusters (primary + 2 secondaries). Two cascading bugs:

1. resolveChrootClusterID had a `len(clusters) != 1` guard — it only
   aliased when chroot had exactly one cluster. After D16 it returned
   the URL deployment_id unchanged → has-cluster check failed →
   every chroot handler (networking, k8s_search, k8s_resource_metrics,
   k8s_exec, dashboard) saw "not found" → returned empty.

2. dashboard.go::GetDashboardTreemap was the one chroot handler that
   didn't call resolveChrootClusterID before the has-cluster check —
   so even with #1 fixed, the dashboard would still 404.

Fix:
- resolveChrootClusterID: when N>1, prefer the cluster whose id is
  prefixed "sovereign-" (the FactoryFromEnv self-registered primary
  per buildChrootClusterRef). Falls back to clusters[0] if no match.
- GetDashboardTreemap: call resolveChrootClusterID before has-cluster
  check, matching the pattern in every other chroot handler.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md (don't ship
D16 fan-out without verifying every handler that depends on
single-cluster k8sCache assumption).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:59:43 +04:00
github-actions[bot]
b61e9afabf deploy: update catalyst images to 2ab8a0e 2026-05-17 05:37:01 +00:00
e3mrah
2ab8a0e653
fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign (#1585)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:34:01 +04:00
github-actions[bot]
d985f27c8b deploy: update sme service images to 964dc15 + bump chart to 1.4.148 2026-05-17 05:29:35 +00:00
github-actions[bot]
f7ea19000e deploy: update catalyst images to 9fc2850 2026-05-17 05:28:28 +00:00
e3mrah
9fc2850504
fix(catalyst-api): D16/D17 — 3 bugs caught on t138 fresh prov (#1583)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:26:16 +04:00
github-actions[bot]
ccbe51e3e4 deploy: update catalyst images to 9237c1e 2026-05-17 04:48:41 +00:00
e3mrah
9237c1e6ee
fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) (#1582)
PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:45:49 +04:00
e3mrah
ce4ef6ba98
feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B)

Closes the D16 multi-cluster fan-out chain:
- PR #1579 (PR A): chroot endpoint accepts kubeconfigs
- PR #1580 (PR C): dashboard handler fans out across registered clusters
- This PR (PR B): mothership-side hook iterates secondary regions at
  handover, reads each region's kubeconfig from the mothership PVC,
  and POSTs to the chroot's endpoint

After handover-fire, exportSecondaryKubeconfigsToChild fires as a
goroutine (alongside exportDeploymentToChild). Best-effort per region:
a failure on region N doesn't abort N+1.

The chroot's k8sCache.Factory.AddCluster runs on every POST so
dashboard /api/v1/dashboard/treemap?group_by=cluster|region now
enumerates pods from all N regions and Layer-1=Cluster renders N
bubbles on an N-region Sovereign.

regionKeysForExport derives the filename convention `<region>-<slot>`
from dep.Request.Regions[1:] (primary is auto-registered by the
chroot's FactoryFromEnv self-registration so we skip index 0).

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are read with stdlib os.ReadFile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:22:01 +04:00
github-actions[bot]
b07e5206a1 deploy: update catalyst images to d92f734 2026-05-17 04:09:34 +00:00
e3mrah
d92f734374
feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C) (#1580)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:07:26 +04:00
e3mrah
bcab6430cb
feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) (#1579)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:06:08 +04:00
github-actions[bot]
6e329e27ae deploy: update catalyst images to 4f62dd2 2026-05-17 00:10:50 +00:00
e3mrah
4f62dd21b3
fix(handover): D21 owner seed uses tierRoleRef not wildcard app (#1578)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:08:45 +04:00
github-actions[bot]
6466f97f6c deploy: update catalyst images to ea30ded 2026-05-16 23:28:04 +00:00
e3mrah
ea30ded120
fix(handover): D21 owner seed uses catalyst-system namespace (#1577)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:26:06 +04:00
github-actions[bot]
18b5fa1466 deploy: update catalyst images to 33ed484 2026-05-16 23:24:34 +00:00