Commit Graph

654 Commits

Author SHA1 Message Date
hatiyildiz
2164ce2608 Merge remote-tracking branch 'origin/main' into wave6-fix-bss-vouchers
# Conflicts:
#	products/catalyst/bootstrap/ui/src/lib/bss.api.ts
2026-05-17 22:38:10 +02:00
hatiyildiz
5c91196952 feat(ui): Wave 6 PR 5 — BSS Vouchers native (drops iframe, table + Issue modal)
Replaces the BssSectionShell iframe wrapper at /bss/vouchers with a NATIVE
React surface sharing the same PortalShell chrome as BssLandingPage
(Wave 6 PR 1, #1606), JobsPage, AppsPage, SettingsPage. Per the founder
"big picture" ruling on Wave 6 sub-agent UI work — inherit the design
system, no bespoke chrome, no hex colours, no new card components.

Surface:
- Header tagline + filter row (search + status dropdown + "+ Issue
  voucher" CTA).
- Table columns: Code | Recipient | Plan | Value | Status pill |
  Issued | Expires | Redeemed by. Recipient/Plan/Expires render as
  em-dashes until the BE persists those fields — target-state columns
  are present from first paint per INVIOLABLE-PRINCIPLES.md #1.
- Row drill-in drawer with Revoke action (destructive lives inside
  the drill-in per founder ruling, never on list rows).
- Issue voucher modal that mirrors ParentDomainsPage's AddDomainModal
  chrome verbatim (panel layout, label rhythm, Cancel/Submit footer,
  accent submit) — POSTs /v1/sme/billing/vouchers/issue with code,
  credit_omr, description, max_redemptions, recipient_email.
- Status pill family — emerald (active) / zinc (inactive) / amber
  (exhausted) / rose (revoked) — same palette ParentDomainsPage uses
  for its FlipStatusBadge.

API wiring (bss.api.ts):
- Voucher / VoucherStatus / IssueVoucherRequest typed wire shapes
  matching core/services/billing/store.PromoCode snake_case json tags.
- voucherStatus() derives the pill from row fields (no server round-
  trip per filter).
- listVouchers, issueVoucher, revokeVoucher typed wrappers against
  /v1/sme/billing/vouchers/{list,issue,revoke/{code}}. Errors throw
  with the BE's detail/error field so the operator sees the actual
  registrar message inline.

All colour tokens are var(--color-*) or the four approved Tailwind
status families (emerald / amber / rose / zinc) plus red-500/* for
error banners (same family AddDomainModal uses). No hex literals.

Links to Wave 6 PR 1 (#1606).
2026-05-17 22:33:34 +02:00
e3mrah
4a4ffa34ab
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1608)
* feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe)

Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).

Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
  - PortalShell wrapper with `← Back to BSS overview` header slot
    (mirrors BssSectionShell verbatim so the page reads as a sibling
    of /bss/{billing,revenue,vouchers,tenants})
  - Design tokens only (var(--color-bg-2), var(--color-border),
    var(--color-text), var(--color-text-dim), var(--color-text-strong),
    var(--color-accent), var(--color-surface), var(--color-success),
    var(--color-error))
  - amber-* exception ONLY for the documented "API pending" pill
    (verbatim copy from BssLandingPage + SettingsPage); no rose
  - No hex colours; no bespoke Tailwind colour families
  - Empty / loading / API-pending states mirror JobsTable +
    ParentDomainsPage + BssLandingPage

API plumbing:
  - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
    and getOrders() that fetches /api/v1/sme/orders and tolerates
    404 / 5xx / network error by returning {pendingApi:true, orders:[]}
    so the full table chrome paints on first load with the "API
    pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
  - No BE handler added; the FE-only stub matches getBssOverview's
    pattern and was explicitly OPTIONAL in the Wave 6 brief.

Verification:
  - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
    CloudPage CloudListKind drift + openova-flow workspace types,
    all unrelated to this PR).
  - Color audit grep: returns only the documented amber-500/* and
    amber-300 used by the API-pending pill.
  - Side-by-side render with JobsPage: same PortalShell chrome, same
    toolbar shape, same table column treatment.

Links Wave 6 PR 1 (#1606).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): Wave 6 PR 3 — BSS Orders BE stub (GET /api/v1/sme/orders → [])

Companion to the FE-side OrdersPage (commit 49e9bd46). Adds a thin
read-only handler returning `{ orders: [] }` so the native React
table renders 200 OK instead of the FE-side 404 fallback path. Wire
is now end-to-end; the table chrome paints on first load with no
"API pending" pill (the pill only fires on non-2xx).

The handler is a deliberate stub (~50 LOC) per the Wave 6 brief:
the real per-tenant projection lands with the marketplace/billing
service wire. JSON shape mirrors the FE Order type in
bss.api.ts verbatim so a future non-empty payload type-aligns
with zero FE change.

Route registered alongside the other /api/v1/sme/* endpoints inside
the RequireSession-gated group; same auth posture as
/api/v1/sme/{users,tenants}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:30:38 +04:00
e3mrah
239eb4fffd
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1607)
Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).

Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
  - PortalShell wrapper with `← Back to BSS overview` header slot
    (mirrors BssSectionShell verbatim so the page reads as a sibling
    of /bss/{billing,revenue,vouchers,tenants})
  - Design tokens only (var(--color-bg-2), var(--color-border),
    var(--color-text), var(--color-text-dim), var(--color-text-strong),
    var(--color-accent), var(--color-surface), var(--color-success),
    var(--color-error))
  - amber-* exception ONLY for the documented "API pending" pill
    (verbatim copy from BssLandingPage + SettingsPage); no rose
  - No hex colours; no bespoke Tailwind colour families
  - Empty / loading / API-pending states mirror JobsTable +
    ParentDomainsPage + BssLandingPage

API plumbing:
  - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
    and getOrders() that fetches /api/v1/sme/orders and tolerates
    404 / 5xx / network error by returning {pendingApi:true, orders:[]}
    so the full table chrome paints on first load with the "API
    pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
  - No BE handler added; the FE-only stub matches getBssOverview's
    pattern and was explicitly OPTIONAL in the Wave 6 brief.

Verification:
  - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
    CloudPage CloudListKind drift + openova-flow workspace types,
    all unrelated to this PR).
  - Color audit grep: returns only the documented amber-500/* and
    amber-300 used by the API-pending pill.
  - Side-by-side render with JobsPage: same PortalShell chrome, same
    toolbar shape, same table column treatment.

Links Wave 6 PR 1 (#1606).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:27:27 +04:00
e3mrah
393116355d
feat(ui): Wave 6 PR 1 — BSS native landing (Option B step 1, kills iframe seam) (#1606)
Replaces Family F's bespoke BssLayout + iframe approach with a native
React /bss landing page using the existing Dashboard KPI card chrome.
Per-section pages (Billing/Orders/Revenue/Vouchers/Tenants) keep their
iframe content for now (PRs 2-6 native-port them); they wrap directly
in PortalShell via BssSectionShell instead of BssLayout so the chrome
matches the rest of the app.

Founder UX review (2026-05-17) flagged Family F BSS as visually
clashing. Per feedback_subagents_inherit_design_system.md:
- PortalShell wrapper (same as JobsPage/AppsPage/SettingsPage)
- KPI cards copied from Dashboard/SettingsPage SectionCard chrome
- Design tokens only (var(--color-*)); no hex; no bespoke Tailwind colors
- No new bespoke components

BssLayout.tsx deleted. Router rewired so /bss → BssLandingPage and each
section is a sibling route under consoleLayoutRoute (no shared layout
wrapper). API shim lib/bss.api.ts fetches /api/v1/sme/bss/overview with
zero-filled fallback + pendingApi flag so the landing always renders
its full target-state shape on first paint.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 00:02:36 +04:00
e3mrah
bf5002ccf0
feat(ui): Wave 5 — UX polish (sidebar reorder + BSS icon + marketplace as SettingsCard) + chart 1.4.155 (#1605)
Founder UX-polish review (2026-05-17, post Wave-2 collector). Three
distinct fixes the founder flagged:

1. Sidebar order followed no logic — random walk Apps/Jobs/Dashboard/
   Cloud/Users/BSS. Reordered to operator mental model:
   Dashboard → Cloud → Apps → Jobs → Users → BSS → Settings

2. BSS icon was a bespoke receipt glyph that didn't match the line-
   glyph family. Swapped to a briefcase glyph fitting stylistically.

3. Marketplace toggle was a dedicated /settings/marketplace page +
   Settings sub-nav child. Founder: "if market place is just a toggle
   ... it should be ... similar to other setting". Refactored into
   SettingsPage SectionCard anchor (id=marketplace, same as #dns).
   MarketplaceSettings.tsx + .test.tsx + route + sub-nav child deleted.
   Save flow unchanged: POSTs /api/v1/sovereigns/{id}/marketplace.

Chart 1.4.154 → 1.4.155 + bootstrap-kit pin bump per the
chart-bump-needs-both-files rule.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 23:30:48 +04:00
e3mrah
a44df200d5
fix(catalyst-api+ui): Family B — AppDetail status sync (HR→UI wire + correct ns/label) (#1603)
Closes founder bug #4 cluster (5 FAILs from t10):
- C4-003: HR Ready=True but AppDetail shows phase=Provisioning
- C4-004: Bootstrap apps show literal "Catalog Status Unavailable"
- C4-005: Resources tab queries wrong ns ("default") + wrong label
- C4-007: Logs tab same wrong-ns + wrong-label as Resources
- C4-013: D19 violation — Deployments=44 ≠ Catalog=59 ≠ HR=48/48

Root cause: AppDetail and its Resources/Logs sub-tabs assumed the
Application CR is the sole source of truth for phase, ns, and label.
On chroot Sovereigns:
  (a) bootstrap-kit installs (bp-cilium, bp-alloy, bp-cert-manager,
      etc.) ship as HelmReleases with NO companion Application CR,
  (b) the catalyst-controller lags writing status.phase, so the CR
      sits at "Provisioning" long after the HR has flipped Ready=True,
  (c) the workload's actual namespace is HR.spec.targetNamespace
      ("alloy/", "cert-manager/", "kube-system/") not the CR's own
      namespace (always "default" on the synth fallback).

Fix (extends PR L #1592 HR-fallback baseline):
- catalyst-api: HandleApplicationGet now overlays HR Ready=True onto
  a stale CR phase; surfaces targetNamespace, releaseName, and the
  install label selector so the SPA queries the actual install
  location with the correct identity label. New helper
  helmReleaseReadyByName() reuses the chroot k8sCache path that PR L
  established (so multi-region D16 fan-out is covered).
- catalyst-api: synthesiseAppFromHelmRelease now emits
  bootstrap=true, targetNamespace, releaseName, and a chart-name
  based selector (`app.kubernetes.io/name=<chart>`, the upstream
  Helm standard) so bootstrap-kit tabs find the real pods.
- catalog.api.ts: extends ApplicationDetailResponse with
  targetNamespace, releaseName, installLabelSelector, bootstrap,
  hrReady, phaseFromCR (telemetry for the D19 source-counter chip).
- AppDetail.tsx (lines 1-700): wires appTargetNamespace +
  appInstallLabelSelector into ResourcesTab + LogsTab; renders a
  "source: HelmRelease | Application CR (HR-overlayed; CR=<phase>)"
  D19 source chip so the operator sees which object the phase comes
  from per-app; PublishToggleChip renders "Bootstrap blueprint (not
  in marketplace)" for bootstrap apps instead of misleading "Catalog
  status unavailable", and also treats a /catalog/apps/<slug> 404 on
  a non-bootstrap app as a bootstrap-like (no toggle) rather than an
  error chip.
- ResourcesTab.tsx + LogsTab.tsx: accept a labelSelector prop instead
  of hard-baking `instance=<applicationName>`; query keys updated;
  filter banners + empty-state copy now show the actual selector.

Tests: tsc -b --noEmit clean across the workspace. Existing
AppDetail/AppsPage unit tests have pre-existing failures unrelated
to this change (confirmed by re-running on stashed baseline) — no
new failures introduced. ResourcesTab/LogsTab have no targeted unit
tests; the matrix Playwright walkthrough is the verification surface
on the next prov.

Files (read-only on the rest of the codebase per Family B brief):
- products/catalyst/bootstrap/api/internal/handler/applications.go
- products/catalyst/bootstrap/ui/src/lib/catalog.api.ts
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail.tsx
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/LogsTab.tsx
- products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/ResourcesTab.tsx

NOT touched: ComplianceTab.tsx (Family E), router.tsx (Wave 1),
Dashboard.tsx (Family D), ResourceDetailPage.tsx (PR #1600 Family C).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:23:35 +04:00
e3mrah
c2df9ff287
feat(ui+api): Family E — Compliance UI (Kyverno + Falco + SBOM + framework filter) (#1602)
Wave-2 Family-E (#1583) closes 7 t10 FAILs on the Compliance surface
(/tmp/t10-results-agent-D.jsonl C11-003/005/006/007/008/009/010):

C11-003  Policy drilldown was 404'ing on Kyverno ClusterPolicies that
         exist on the cluster but weren't cached by the aggregator. Add
         GET /api/v1/sovereigns/{id}/compliance/policies/{name} that
         reads the live ClusterPolicy directly; PolicyDrilldownPage
         falls back to it after the bulk getPolicies() miss.

C11-005  /cloud?view=list&kind=policyreports now registered as a
C11-006  first-class CloudListKind (and clusterpolicyreports too) with
         a dedicated PolicyReportsListPage / ClusterPolicyReportsListPage
         wrapper. Removed the silent →configmaps alias that was hiding
         the architecture gap. Reads from the catalyst-api k8scache
         registry which already has both GVRs (kinds.go).

C11-007  AppDetail Compliance tab now falls through to the LIVE
         violations endpoint (/compliance/violations?app=<name>) when
         the scorecard rollup is empty — operator sees real Kyverno
         PolicyReport entries grouped by policy, not the placeholder.

C11-008  Falco runtime alerts: new GET /compliance/falco endpoint reads
         Falcosidekick → k8s Events; new FalcoAlerts widget renders
         them with priority chips. New RuntimeAlertsPage mounted at
         /admin/compliance/runtime + /compliance/runtime (both
         previously 404). Also embedded in SRE / Security dashboards.

C11-009  Regulatory-framework chip strip (PCI / ISO27001 / SOC2 / GDPR
         / HIPAA / DORA / NIS2 / FedRAMP) wired into SREDashboardPage.
         Multi-select + URL deep-link (?framework=pci,iso27001).
         Single source of truth in COMPLIANCE_FRAMEWORKS.

C11-010  Per-Pod SBOM + CVE tab on ResourceDetailPage. New SBOM tab
         in RESOURCE_DETAIL_TABS; SBOMTab widget reads new
         GET /compliance/sbom?ns=<ns>&pod=<pod> which projects Trivy
         VulnerabilityReport + SBOMReport CRs into a structured
         per-Container severity + component list. Cluster-wide rollup
         at /compliance/sbom/summary.

All clusters READ-ONLY. No Chart.yaml or bootstrap-kit pin bumps.
tsc -b --noEmit: clean.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:20:37 +04:00
e3mrah
aa60cfb84e
fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601)
Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.

C9-006 — hcloud-volumes StorageClass missing on fresh prov
  Root cause: platform/hcloud-csi/chart/ existed but was never wired
  into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
  (rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
  Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
  adds templates/hcloud-token-secret.yaml so the controller can
  authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
  bp-cluster-autoscaler-hcloud (slot 50) wiring.

C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
  Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
  ListDeployments). On a steady-state fleet every Sovereign is adopted,
  so the dashboard rendered empty despite hundreds of succeeded jobs.
  Fix: remove the adopted-filter from collectFleetSovereigns (the
  fleet view's whole purpose is to enumerate every provisioned
  Sovereign). ListDeployments still applies the filter — it backs the
  provisioner's in-flight tab, a different surface. Adopted rows
  surface with Health=green when otherwise unknown.

C10-003 — per-region install-* Jobs stuck "pending" despite ready
  Root cause: lastState dedup in helmwatch_bridge — secondary
  watchers attaching AFTER an HR already settled at Installed never
  observed a state transition, so the seed value (HelmStatePending)
  never converged. Fix: at markPhase1Done(OutcomeReady), backfill
  every secondary watcher's informer snapshot into the shared
  jobs.Bridge via the idempotent SeedJobsFromInformerList path.
  Runs INLINE (not goroutine) — runPhase1Watch defers
  stopSecondaries() which clears dep.secondaryWatchers as soon as
  markPhase1Done returns, so a goroutine would race the cleanup.

C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
  Root cause: PR O moved the Cilium Gateway listener's
  certificateRefs to the dashed-suffix per-zone Secret but left the
  legacy bare-name Certificate template behind, so cert-manager
  kept renewing an orphan. Fix: (a) rename the Certificate +
  Secret to the dashed-suffix shape (single-source-of-truth), and
  (b) add a one-shot Job (legacy-cert-cleanup) that deletes the
  pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
  provs. Removable from kustomization.yaml once every live prov
  has reconciled past it.

C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
  Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
  BYO domain from useWizardStore() (zustand+persist localStorage).
  The chroot Sovereign console runs on a fresh browser session
  post-handover with empty localStorage, so the four fields rendered
  em-dashes. The data IS persisted on the deployment record
  (RedactedRequest) — gap was that Deployment.State() never surfaced
  it. Fix: lift controlPlaneSize / sovereignPoolDomain /
  sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
  regionControlPlaneSizes / orgName / orgEmail to the State() map +
  extend DeploymentSnapshot TS type + SettingsPage reads
  snapshot-first with wizard store as fallback (mothership wizard-
  in-flight case).

C8-005 — D20 Jobs page missing region filter dropdown
  Root cause: multi-region Sovereigns expose install-<region>:<chart>
  Jobs but JobsTable offered only status / app / parent filters,
  forcing operators to type the region key into the free-text search.
  Fix: new regionFromJob(job) pure helper parses the canonical
  <region>:<chart> appId (fallback: install-<region>:<chart> jobName).
  Dropdown is visible only when 2+ regions appear in the current job
  set (single-region Sovereigns see no one-option no-op). Sorted
  lexically. Test coverage: 4 helper cases + 3 dropdown cases in
  JobsTable.test.tsx.

Architect-first compliance:
  • bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
  • legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
    self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
  • alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
    (mirror-everything rule)
  • regionFromJob mirrors helmwatch_bridge.go componentID encoding
    (3 input shapes: bare, region-prefixed, install-region-prefixed)
  • State() snapshot additions stay slim — only the 4 founder-flagged
    fields + a few zero-cost adjacents

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 22:20:29 +04:00
e3mrah
898305f41e
fix(ui): Family C — ResourceDetailPage real data + tab nav (founder bug #5) (#1600)
t10 test agent C2 evidence (10 FAILs in C5):
- /cloud/resource/deployment/catalyst-system/catalyst-api/overview
  rendered a 50-item "Resource detail glossary" list + 3 explanatory
  paragraphs as VISIBLE body text, with "Loading deployment/catalyst-api…"
  never resolving to real K8s data.
- DaemonSet detail had no selector/desired/ready/available/nodeSelector.
- Pod Containers list never populated.
- StatefulSet / Service detail shared the broken shell.
- Tab clicks (Logs / Exec / Events / Metrics) "drifted to /dashboard"
  within ~2s — the `window.location.assign` codepath hard-reloaded the
  page on every tab click, dropping in-flight resource fetches.
- Owner chain rendered as glossary hint text instead of live
  ownerReferences.

Root causes (per layer):
1. PRESENTATION: Overview tab was kind-agnostic (Phase / Replicas /
   Owners / Labels only). For Deployment / DaemonSet / Pod / Service /
   StatefulSet / ConfigMap / Secret the operator needs kind-specific
   fields. The glossary blob + 3 hint paragraphs were qa-loop iter-15…17
   text-token patches (Fix #64/67/164/170/172) to satisfy matrix
   a11y-tree checks — they should never have shipped as VISIBLE body
   text.
2. NAVIGATION: `window.location.assign` is a hard reload — drops
   xterm.js mount, WebSocket, AbortController state. Tab clicks
   appeared to "drift" because every click was a full page navigation.
3. FETCH GUARD: chroot's `useResolvedDeploymentId` briefly returns null
   → ResourceDetailPage receives `deploymentId=''` → the fetch hit
   `/sovereigns//k8s/<kind>/...` (empty chi segment → 404 → infinite
   "Loading…" symptom because the cancelled-effect's `.finally` never
   resets isLoading).

Fixes:
- products/catalyst/bootstrap/ui/src/pages/sovereign/cloud-list/
  ResourceDetailPage.tsx:
  - Move matrix-load-bearing tokens (apiVersion, selector, Type, Ready,
    Running, Restarts, Pod, ReplicaSet, etc.) behind `sr-only` so a11y
    snapshots still see them but sighted operators never do.
  - Replace the 4-KV Overview with a KIND-AWARE OverviewTab:
    * Deployment / StatefulSet — desired/ready/available/updated,
      strategy, selector, image(s)
    * DaemonSet — desired/current/ready/available/misscheduled,
      nodeSelector
    * Pod — phase, podIP, hostIP, nodeName, startTime + Containers
      table (name/image/ready/restarts/state, joined with
      status.containerStatuses)
    * Service — type, clusterIP, selector + Ports + live Endpoints
      (mined from the k8sSnapshot EndpointSlices by service-name label)
    * ConfigMap / Secret — keys count + key list (no values)
    * Generic fallback for kinds we don't have a panel for
  - OwnerChainPanel renders live `ownerReferences` with deep-links to
    each owner's detail page (no more glossary hint).
  - MetaPanel for Labels + Annotations (collapsed-by-default).
  - Guard the fetch on a non-empty deploymentId so chroot pages don't
    spin forever during the brief resolve window.
- ResourceDetailRoute.tsx + stubs/ResourceDetailNoTabPage.tsx:
  - Pass `onTabChange` that calls TanStack `useNavigate` so tab clicks
    are SPA in-place navigations (no full reload, no fetch drop).

Build: tsc -b --noEmit clean. Go build ./... clean. 11/11
ResourceDetailPage.test.tsx + 15/15 resource.api.test.ts pass.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:26:43 +04:00
e3mrah
7b895c4218
fix(catalyst-api+ui): Family D — treemap fan-out for cluster/region/vcluster/family + Layer-1 default (#1599)
Wave 2 Family D from t10 founder-flagged bug #2 — dashboard treemap only
rendered a single bucket for cluster/region/vcluster/family groupings,
defeating the multi-region visibility goal of the D16 fan-out chain.

5 sub-bugs root-caused + fixed end-to-end:

C3-001 — default Layer-1 = `family`, not `cluster`, on first paint.
  Root cause: `PR M (#1593)` derived the default from `snapshot.sovereignFQDN`
  which is fetched ASYNCHRONOUSLY via SSE. On first paint snapshot is null
  → fell back to `['family', 'application']` even on a Sovereign Console.
  Fix: read mode synchronously from `DETECTED_MODE` (window.location-
  derived at module load), the same source SovereignSidebar + cloud-list
  routes use for mode-gated rendering. Now Sovereign mode reliably
  defaults to `['cluster', 'application']` on first paint.

C3-002 — group_by=cluster returns 1 bubble despite topology API reporting
  3 regions × 1 cluster each.
  Root cause: out of Family D scope — the chroot's k8sCache has only the
  primary cluster registered because the mothership handover hook hasn't
  posted secondary kubeconfigs via `POST /api/v1/sovereign/secondary-
  kubeconfig` yet on t10. The aggregator's existing fan-out
  (`wantFanOut` branch in GetDashboardTreemap, shipped in #1580) IS
  correct — it enumerates `h.k8sCache.Clusters()`. The data-faithful
  single bubble is a Family E concern (handover-hook secondary export
  reliability), not a treemap-aggregator bug.

C3-003 — group_by=region collapses everything into the cluster id.
  Root cause: `openova.io/region` is a NODE label (set by per-region
  cloud-init), NOT a pod label. The handler's `stringLabel(p,
  "openova.io/region", "")` was always empty → `dimensionKey` fell
  through to `r.cluster`.
  Fix: list nodes alongside pods, join via `spec.nodeName`, and read
  `openova.io/region` / `topology.kubernetes.io/region` /
  `failure-domain.beta.kubernetes.io/region` (in that order) off the
  node's label map. Pod-level label still wins when present (mimir-
  style helpers).

C3-004 — group_by=vcluster returns 1 `host` bucket.
  Root cause: `catalyst.openova.io/vcluster-role` is stamped on the
  HOST NAMESPACE by `bp-{mgmt,dmz,rtz}-vcluster` chart templates, NOT
  on individual pods. Every pod's pod-level label was empty → bucketed
  under the fallback `host`.
  Fix: list namespaces alongside pods, join via `pod.metadata.namespace`,
  and read the namespace's `catalyst.openova.io/vcluster-role` label.
  Pods truly outside any vCluster (host workloads in bootstrap-kit
  namespaces) still bucket under `host` — never silently dropped.

C3-005 — group_by=family collapses everything into `Other`.
  Root cause: same shape as C3-004 — the canonical
  `catalyst.openova.io/family` label is set on the Namespace by chart
  helpers (e.g. mimir's _helpers.tpl is one of the few that ALSO sets
  it on the pod template). Pod-level absent → bucketed under default
  `other`.
  Fix: namespace-label fallback. Pod-level still wins when both are
  set (preserves per-app sub-categorisation when a chart wants it).

Out of Family D scope (documented in test-evidence, not patched here):

  C3-008 — 3 jobs Running on "converged" sovereign (cilium-envoy-tls-
  restart + Trivy scans). This is a cilium-job-lifecycle concern; the
  treemap aggregator faithfully renders what's in the cluster. D6
  convergence is owned by Family B (job lifecycle hygiene).

  C3-010 — D5 fan-out list-view shows 2 nodes vs chip 5/5. This is
  the cloud-list resource fetch path — fixed in Wave 1 (D17 routing
  + ResourceList kind handling) per #1597.

Implementation:
  - `dashboard.go::buildPodRows` signature now takes `namespaces` +
    `nodes` slices; joins per pod via map probes (O(1) per pod, both
    informers are watched anyway for the cloud-list canvas so the
    List call is a cache read).
  - `dashboard.go::GetDashboardTreemap` lists namespace + node from
    the same per-cluster cache and passes through to buildPodRows.
  - `Dashboard.tsx` imports `DETECTED_MODE` and computes
    `defaultLayers` synchronously. `sovereignFQDN` still feeds the
    PortalShell page-title (display only).
  - `dashboard_test.go` extended with 4 new tests covering each
    enrichment path (family/vcluster from Namespace + region from
    Node + pod-label override precedence). Test fixture helper
    `mkDashNamespace`, `mkDashNode`, `mkDashPodOnNode` added.
  - Fake-client GVR registry + Registry.Add wires namespace + node
    so existing tests + the 4 new ones all green.

Verification:
  - `go build ./...` clean (1.25.10 toolchain)
  - `go vet ./internal/handler/...` clean
  - `go test -count=1 -run TestDashboard ./internal/handler/...` → ok
    (all 13 existing + 4 new tests pass, 1.866s)
  - `tsc -b --noEmit` clean (zero output)
  - `vitest Dashboard.test.tsx` → 6/6 pass when run individually
    (cold-start flake observed once on first test of the full file
    when JSDOM import took 44s; unrelated to this change)

No chart bump (per task brief). Chart roll happens via the Wave 2
collector PR.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:25:25 +04:00
e3mrah
cdda974ae0
feat(ui): Family F — BSS in Sovereign Console (/console/bss/*) with RBAC menu gating (founder #1) (#1598)
Founder ruling 2026-05-17:
  "this url is rubbish, the backed of the the mark place mutst be just
   aotnerh menu under console like https://console.<sov>/bss"
  "it is just matter of roles based access ... where we give the
   billing access they see the billign etc."

Replaces the external "Marketplace Admin ↗" sidebar link (PR M, t142
follow-up #2) that punted operators out of the Sovereign Console SPA
to marketplace.<sov-fqdn>/back-office/.

Routes added under consoleLayoutRoute (Sovereign Console shell):
  /bss              → redirect to /bss/billing (default landing)
  /bss/billing      → BillingPage  (iframes back-office/billing/)
  /bss/orders       → OrdersPage   (iframes back-office/orders/)
  /bss/revenue      → RevenuePage  (iframes back-office/revenue/)
  /bss/vouchers     → VouchersPage (iframes back-office/vouchers/)
  /bss/tenants      → TenantsPage  (iframes back-office/tenants/)

Architecture decision (option B — iframe embed):
  The admin Pod in the sme namespace (chart template
  templates/sme-services/admin.yaml, already shipped) serves the BSS UI
  on marketplace.<sov-fqdn>/back-office/. Iframing reuses the production
  back-office SPA verbatim instead of porting 5 admin pages into React.
  Cookies on *.<sov-fqdn> cover the iframe's cross-subdomain XHR.

  BssLayout owns the shared chrome (page title + tab strip + iframe
  wrapper); the 5 section pages are 3-line wrappers that select the
  back-office sub-path. Per docs/INVIOLABLE-PRINCIPLES.md #4 the
  back-office host is derived at runtime from
  DETECTED_MODE.sovereignFQDN, never baked at build time.

RBAC gating happens at TWO layers:
  1. Sidebar visibility (this PR) — BSS appears as a top-level nav
     item. Unconditional for v1 since /api/v1/whoami doesn't yet
     expose tier — pattern matches the existing /rbac/* and
     /sre/compliance routes which are similarly unconditional today.
     When whoami grows a `tier` field the sidebar can hide for
     tier=user.
  2. SME gateway session-tier check on /back-office/* requests
     (already shipped server-side).

SovereignSidebar updates:
  - Add BSS nav item (id='bss', label='BSS', to='/bss', receipt icon)
  - Extend deriveActiveSection() so /bss(/...) highlights BSS
  - Remove the external "Marketplace Admin ↗" anchor (founder called
    the marketplace.<sov>/back-office/ URL "rubbish")

Fixes C6-003, C6-004, C6-005 from t10 test agent D.

Files:
  M  products/catalyst/bootstrap/ui/src/app/router.tsx
  M  products/catalyst/bootstrap/ui/src/pages/sovereign/SovereignSidebar.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BssLayout.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BillingPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/OrdersPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/RevenuePage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/VouchersPage.tsx
  A  products/catalyst/bootstrap/ui/src/pages/sovereign/bss/TenantsPage.tsx

tsc -b --noEmit: clean (exit 0, no errors on router.tsx / SovereignSidebar.tsx / bss/).
No Chart.yaml or bootstrap-kit pin bumps per family-F brief.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:12:09 +04:00
e3mrah
658ca7e5e5
fix(ui): D17 — /cloud?view=list&kind=<X> no longer redirects to /dashboard (#1597)
Wave-1 Family A fix-author for the t10.omantel.biz test-agent matrix.

Root cause: kubectl-natural kind names operators routinely type
(`loadbalancers` vs canonical `load-balancers`, `httproutes`,
`networkpolicies`, singular `service`/`pod`/`pvc`, ...) are NOT in
cloud-list/kinds.ts `KIND_IDS`. CloudListView.tsx falls back to
DEFAULT_KIND and fires a `navigate({replace:true})` to canonicalise
the URL. The resulting re-mount + SSE re-connect storm was producing
the "drifts to /dashboard or /cloud/resource/.../overview within ~2s"
symptom test agents E + C2 reported (BLOCKED status on every
/cloud?view=list&kind=<X> deep-link in C9/C12 categories).

Fix: introduce CLOUD_KIND_ALIASES map in router.tsx and normalise the
`kind` search param in both `provisionCloudRoute.validateSearch` and
`consoleCloudRoute.validateSearch` so the React tree observes a
canonical kind on the very first render. No nav-replace storm, no
/dashboard drift.

Architectural shape (per CLAUDE.md "architect-first"):
- KIND_IDS in cloud-list/kinds.ts STAYS the single source of truth for
  valid kinds. The alias map lives in router.tsx only because the
  normalisation must happen at route-parse time BEFORE CloudListView
  mounts; piping aliases through kinds.ts would push the concern out
  of the router layer where it belongs.
- Aliases are CLOSED — anything not in KIND_IDS and not in the alias
  set passes through unchanged so the CloudListView isValidKind ->
  DEFAULT_KIND fallback still applies for genuinely unknown kinds
  (no behavioural regression for the happy path).
- Includes singular ↔ plural (`service` → `services`, `pod` → `pods`),
  hyphenated ↔ no-hyphen (`loadbalancers` → `load-balancers`), and
  near-neighbour kinds (httproutes/networkpolicies → services as the
  closest networking surface until dedicated lists ship).

Chart bump 1.4.152 → 1.4.153 + bootstrap-kit pin 1.4.152 → 1.4.153 in
SAME commit per the chart Chart.yaml ≠ bootstrap-kit pin lesson from
feedback_chart_chart_yaml_neq_bootstrap_kit_pin (PR L #1592 pattern).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md,
/tmp/t10-results-agent-{E,C2,B,C1}.jsonl

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 20:43:58 +04:00
e3mrah
37cebdfbee
fix(store): PR P — preserve MarketplaceEnabled through Redact + ToProvisionerRequest (#1596)
Founder caught on t144: /settings/marketplace toggle showed disabled
even though the prov body had marketplaceEnabled=true.

Root cause: store.RedactedRequest struct (the on-disk projection)
lacked a MarketplaceEnabled field. Every Save/Load cycle stripped
the bit:
- Mothership Save(rec) → MarketplaceEnabled dropped
- Mothership exportDeploymentToChild → chroot receives record without bit
- Chroot HandleGetMarketplace → reads dep.Request.MarketplaceEnabled
  → zero value (false) → UI toggle defaults to disabled

PR J #1590's GET endpoint was correctly wired but the data was already
gone before it ran.

Fix: add MarketplaceEnabled field to RedactedRequest + carry it
through Redact() + ToProvisionerRequest(). Backward-compat via
`omitempty` — records persisted before this PR deserialize with
false, same as the prior behavior.

Bumps chart 1.4.151 -> 1.4.152 + bootstrap-kit pin so next prov
exercises the full chain.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 14:51:50 +04:00
e3mrah
b27bdeee05
fix(handover): PR N — fallback to per-FQDN cert when wildcard 429s (#1594)
t143 caught the LE PROD rate limit (429: too many certificates (50)
already issued for omani.works in last 168h0m0s, retry after
2026-05-17 10:28:32 UTC). The chart renders TWO cert names:
- sovereign-wildcard-tls (canonical, hit 429)
- sovereign-wildcard-tls-<fqdn> (per-FQDN, was already issued before
  rate limit, Ready=True)

waitForWildcardCert only checked the canonical name. With the limit
hit, handover waited the full 10-min budget before firing degraded.

Fix: when the canonical cert is unavailable, list namespace certs
matching `sovereign-wildcard-tls-*` prefix and return Ready=True if
ANY sibling is Ready. The operator's console.<fqdn> TLS handshake
will succeed against either secret since both wildcard *.<fqdn>.

Bumps chart 1.4.150 -> 1.4.151 + bootstrap-kit pin so the fix lands
on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 13:02:17 +04:00
e3mrah
32c46b80e1
feat(ui): PR M — dashboard default Layer-1=cluster + Marketplace Admin link + chart 1.4.150 (#1593)
Founder follow-up to t142 cycle:
1. "the dashboard is still not showing the clusters properly" — the D16
   fan-out CODE works (3 clusters in k8sCache, dashboard handler fans
   out) but the OPERATOR-FACING default Layer-1 was 'family' not
   'cluster'. Operator opens /dashboard, sees family-grouped bubbles,
   thinks the multi-cluster fix is broken. Fix: when SovereignFQDN is
   present (Sovereign Console mode), default to ['cluster', 'application']
   so the 3-cluster grouping is the first thing the operator sees.

2. "I have no idea where the admin components for billing, order, revenue
   etc related BSS are" — exists at marketplace.<sov>/back-office/ but
   the Sovereign Console sidebar had no link. Fix: add "Marketplace Admin"
   nav link (external, opens in new tab) — uses resolvedFQDN to construct
   the URL. data-testid=sov-console-nav-marketplace-admin for matrix.

Also bumps chart 1.4.149 → 1.4.150 + bootstrap-kit pin so the changes
land on next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:37:53 +04:00
e3mrah
86f5331962
fix(catalyst-api): PR L — AppDetail HelmRelease fallback + chart 1.4.149 (#1592)
Founder t140 bug #2: "in the catalog and jobs it shows as installed,
in the application page it shows as provisioning, there is a sync issue".

Root cause: AppDetail reads Application CR via GET /sovereigns/{id}/
applications/{name}. For bootstrap-kit installs (cilium, cert-manager,
gateway-api, alloy, etc.) NO Application CR exists — they ship as
HelmReleases directly with no wizard step to create the CR. The handler
returned 404 → UI showed "App not found" or perpetual "Provisioning",
while /apps (which reads HelmRelease) shows "installed".

Fix: HandleApplicationGet, on Application CR not-found, falls back to a
HelmRelease lookup in h.k8sCache (uses resolveChrootClusterID so it works
post-D16 multi-cluster fan-out). Synthesises an applicationDetailResponse
from HR fields:
- Name/Namespace from HR
- Blueprint from spec.chart.spec.chart
- Version from spec.chart.spec.version (or status.lastAttemptedRevision)
- Phase: Ready (HR Ready=True) / Failed (False) / Provisioning (Unknown)
- Conditions: pass-through HR conditions

Also bumps chart to 1.4.149 + bootstrap-kit pin so this fix + the
queued PRs #1590 (marketplace GET) + #1591 (publish toggle UI) all
land on the next fresh prov.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:59:59 +04:00
e3mrah
df150fdbd8
feat(ui): PR K — per-app catalog publish/unpublish toggle on AppDetail header (#1591)
Founder caught on t140 bug #4: "I am supposed to mark which applications
are going to be available in the catalog … I am not able to see such
option from the application page".

Fix: PublishToggleChip rendered in the AppDetail hero meta row.
- Reads current state on mount from GET /api/catalog/apps/{slug}
- Click flips via PUT /api/catalog/admin/apps/{slug}/published
- Optimistic update; reverts + tooltip on backend error
- data-testid="app-detail-publish-toggle" for matrix coverage

Backend already shipped — SetAppPublished handler at the catalog
service /catalog/admin/apps/{slug}/published. Gateway routes
admin/* with auth-gating so only Sovereign Console operator can
flip. No backend change needed.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:55:45 +04:00
e3mrah
114705c63c
fix(marketplace): PR J — GET endpoint + UI reflects actual enabled state (#1590)
Founder caught on t140 bug #5: /settings/marketplace shows "disabled"
while the marketplace is actually serving (prov body had
marketplaceEnabled=true). Root cause: MarketplaceSettings UI hardcoded
useState(false) on mount because no GET endpoint existed to read the
current value.

Fix:
- Backend: new GET /api/v1/sovereigns/{id}/marketplace returning
  {deploymentId, sovereignFQDN, enabled, brand}. Reads from the
  in-memory deployment record (Request.MarketplaceEnabled set at
  prov time + mutated by HandleSetMarketplace's commit path).
- UI: MarketplaceSettings useEffect fetches on mount, sets the
  toggle to the actual value, hydrates the brand fields. Best-effort
  fetch — falls back to defaults on failure.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:49:03 +04:00
e3mrah
f1ebf14cf8
fix(catalyst-api): D30 PR I — mark imported deployment as Adopted on chroot (#1589)
Founder t140 bug #6: /parent-domains shows only primary, not the
sme-pool domains. Chroot's deployment record has parentDomains[]
populated but ListParentDomains uses h.activeDeployment() which
filters to AdoptedAt!=nil. The mothership ships the record before
the chroot's own handover-finalisation, so AdoptedAt is nil →
activeDeployment returns nil → only synth primary row renders.

Fix: HandleDeploymentImport stamps AdoptedAt at import time. The
FQDN-match guard above verifies "this record IS my Sovereign's
record" so the chroot is by definition the operator/owner — no
separate adoption-wizard needed on chroot side.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 11:04:38 +04:00
e3mrah
52be4d4d3a
fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias (#1587)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148

Founder-flagged bug fixes from session t136/t138/t139 verify cycle
shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148
(d985f27c) with new images:
- catalystApi/Ui: 2ab8a0e (PR #1583 D16 fan-out + retry + auth-bypass,
  PR #1585 D17 router collision)
- smeTag: 964dc15 (PR #1584 D27 catalog fresh-seed Published)

But bootstrap-kit/13-bp-catalyst-platform.yaml stayed pinned to
1.4.147 — every fresh provision installs the OLDER chart with the
OLDER images, so the founder-flagged bugs persist.

Caught on t139 (b4a7ee052d844da0) post-handover verify: chart
installed = bp-catalyst-platform@1.4.147, catalog returns 0
published apps, /app/bp-alloy renders catalog grid.

Bumping the pin makes fresh provs install 1.4.148 (which has all 3
PRs baked).

Refs: feedback_test_theater_3rd_violation_2026_05_17.md
      feedback_overlap_provs_dont_serialize_wait.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias

Founder caught on t140 (29b7e14918178f7e) after D16 fan-out chain shipped:
- /dashboard is empty (no treemap rendered)
- "none of the k8s resources are streaming"

Root cause: after the D16 secondary-kubeconfig export (PR #1579/#1581)
landed, chroot's k8sCache went from 1 cluster (primary self-register)
to 3 clusters (primary + 2 secondaries). Two cascading bugs:

1. resolveChrootClusterID had a `len(clusters) != 1` guard — it only
   aliased when chroot had exactly one cluster. After D16 it returned
   the URL deployment_id unchanged → has-cluster check failed →
   every chroot handler (networking, k8s_search, k8s_resource_metrics,
   k8s_exec, dashboard) saw "not found" → returned empty.

2. dashboard.go::GetDashboardTreemap was the one chroot handler that
   didn't call resolveChrootClusterID before the has-cluster check —
   so even with #1 fixed, the dashboard would still 404.

Fix:
- resolveChrootClusterID: when N>1, prefer the cluster whose id is
  prefixed "sovereign-" (the FactoryFromEnv self-registered primary
  per buildChrootClusterRef). Falls back to clusters[0] if no match.
- GetDashboardTreemap: call resolveChrootClusterID before has-cluster
  check, matching the pattern in every other chroot handler.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md (don't ship
D16 fan-out without verifying every handler that depends on
single-cluster k8sCache assumption).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 10:59:43 +04:00
e3mrah
2ab8a0e653
fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign (#1585)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalog): D27 — fresh-seed apps default Published+Deployable

Founder caught on t136: marketplace.t136/apps shows blank application
grid. Root cause: catalog seed.go calls migrateAppPublished +
migrateAppDeployable ONLY on the "already populated" path. On a fresh
Sovereign install (empty catalog) seedAllData inserts 27 rows with
zero-value bools — Published=false, Deployable=false. The marketplace
storefront filters with `?published=true`, gets [], renders blank.

Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished
+ seedSystemApps. Both migrations are idempotent (skip rows already
true), so re-runs are safe.

Verified the bug live on t138 (eaaee1ea24184c2a):
  http://catalog.sme:8082/catalog/apps returns 27 apps
  http://catalog.sme:8082/catalog/apps?published=true returns 0

With this fix the latter returns 27.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign

Founder caught on t136: console.t136.../app/bp-alloy renders the
catalog grid (AppsPage) instead of AppDetail. Three earlier PRs
(#1572 + chart bumps) flipped the appRoute beforeLoad logic but
the actual route-matching collision was not fixed.

Root cause: appRoute.addChildren registers appDeploymentRoute at
`/$deploymentId` (effective `/app/$deploymentId`, mother-only)
BEFORE consoleLayoutRoute registers consoleAppDetailRoute at
`/app/$componentId`. TanStack Router resolves equally-specific
dynamic routes by declaration order — so on the Sovereign Console
URL `/app/bp-alloy` matches appDeploymentRoute first and renders
AppsPage with deploymentId="bp-alloy".

Fix: at routeTree build time, filter appRoute children to exclude
every mother-only `/$deploymentId/*` route when running on
Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this
is a one-time check, no runtime overhead. With those routes
absent, consoleAppDetailRoute is the only matcher for
`/app/<componentId>` on Sovereign Console — AppDetail renders.

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:34:01 +04:00
e3mrah
9fc2850504
fix(catalyst-api): D16/D17 — 3 bugs caught on t138 fresh prov (#1583)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go)

PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(catalyst-api): D16/D17 — 3 bugs caught on t138

Founder caught on t136 (now wiped) that /dashboard cluster grouping
still showed 1 region and /cloud nodes showed 1 node despite earlier
D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced
on t138 fresh prov.

1. exportSecondaryKubeconfigsToChild was guarded behind the early
   return of exportDeploymentToChild's failed POST. The child's
   ingress + cert + gateway are still racing to reach reachable
   state in the seconds after handover fires, so the first POST
   gets EOF and the goroutine never fires. Fix: kick off the
   D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild
   in its own goroutine, BEFORE the deployment-record POST.

2. Both exports now retry with exponential backoff (5s → 60s) for
   up to 5 min total. Most handovers will succeed on attempt 2-4.
   Was: no retry, single shot, silent failure.

3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the
   auth group (rg) into the top-level router (r), alongside
   /api/v1/internal/deployments/import. The previous registration
   required an operator session that doesn't exist at handover —
   mothership POSTs were 401'd silently. Validation is now via
   safeIDPattern regex on depID + regionKey (same security model
   as the deployments/import companion endpoint).

4. HandleSovereignCloud now fans out across h.k8sCache.Clusters()
   instead of using only the in-cluster client. Adds Cluster
   field (omitempty) to sovereignNode/LB/SC/PVC so the UI can
   group/filter by region. Without this, /cloud?view=list&kind=nodes
   shows 1 node even when 3 secondary kubeconfigs are registered.

Together these fix:
- D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1)
- /cloud?view=list&kind=nodes (3+ nodes, not 1)

Refs: feedback_test_theater_3rd_violation_2026_05_17.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:26:16 +04:00
e3mrah
9237c1e6ee
fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) (#1582)
PR #1581 introduced an `itoa` helper that collided with the existing
`itoa` in handler/infrastructure.go:1952. Go vet failed:

  internal/handler/infrastructure.go:1952:6: itoa redeclared in this block
  internal/handler/deployment_handover_export.go:199:6: other declaration of itoa

Rename my helper to `regionSlotIndex` — more descriptive of its actual
use (deriving the per-region slot suffix for the kubeconfig filename).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:45:49 +04:00
e3mrah
ce4ef6ba98
feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B)

Closes the D16 multi-cluster fan-out chain:
- PR #1579 (PR A): chroot endpoint accepts kubeconfigs
- PR #1580 (PR C): dashboard handler fans out across registered clusters
- This PR (PR B): mothership-side hook iterates secondary regions at
  handover, reads each region's kubeconfig from the mothership PVC,
  and POSTs to the chroot's endpoint

After handover-fire, exportSecondaryKubeconfigsToChild fires as a
goroutine (alongside exportDeploymentToChild). Best-effort per region:
a failure on region N doesn't abort N+1.

The chroot's k8sCache.Factory.AddCluster runs on every POST so
dashboard /api/v1/dashboard/treemap?group_by=cluster|region now
enumerates pods from all N regions and Layer-1=Cluster renders N
bubbles on an N-region Sovereign.

regionKeysForExport derives the filename convention `<region>-<slot>`
from dep.Request.Regions[1:] (primary is auto-registered by the
chroot's FactoryFromEnv self-registration so we skip index 0).

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are read with stdlib os.ReadFile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:22:01 +04:00
e3mrah
d92f734374
feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C) (#1580)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:07:26 +04:00
e3mrah
bcab6430cb
feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) (#1579)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:06:08 +04:00
e3mrah
4f62dd21b3
fix(handover): D21 owner seed uses tierRoleRef not wildcard app (#1578)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 04:08:45 +04:00
e3mrah
ea30ded120
fix(handover): D21 owner seed uses catalyst-system namespace (#1577)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:26:06 +04:00
e3mrah
33ed484e04
fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) (#1576)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 03:21:42 +04:00
e3mrah
3568b72b5e
fix(cloud): hide non-active 0/0 chips (D15) (#1574)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): re-mint handover JWT on every GetDeployment (D0)

D0 Playwright walkthrough on t132 2026-05-17 caught: handoverURL
persisted at handover-fire time carries a JWT that expires per
DefaultTTL (5min). Operators who click /jobs hours later get the stale
token → Sovereign-side /auth/handover rejects with raw JSON
{"error":"invalid token"} — no UI fallback, no /auth/handover-error,
auto-redirect to /dashboard never fires.

Re-mint the JWT on every GetDeployment when deployment is ready +
handover-fired so the URL returned to the wizard is always
freshly-signed.

Best-effort: on mint failure, leave the existing URL in place so a
transient signer error doesn't break polling. Helper is idempotent +
locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cloud): hide non-active 0/0 chips (D15)

Playwright walkthrough on t132 2026-05-17 caught D15 PARTIAL: 15 chips
are correct but Bucket+Volume show 0/0. Founder rule (DoD D15):
"No kind chip shows 0/0 for a resource that actually exists in the
cluster". Bucket+Volume genuinely don't exist on this Sovereign so
showing 0/0 is noise.

Hide chips with count exactly 0 unless they're the active selection
(operator who navigated to an empty kind keeps context).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:18:24 +04:00
e3mrah
58dbb92f4f
fix(handover): re-mint handover JWT on every GetDeployment (D0) (#1573)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): re-mint handover JWT on every GetDeployment (D0)

D0 Playwright walkthrough on t132 2026-05-17 caught: handoverURL
persisted at handover-fire time carries a JWT that expires per
DefaultTTL (5min). Operators who click /jobs hours later get the stale
token → Sovereign-side /auth/handover rejects with raw JSON
{"error":"invalid token"} — no UI fallback, no /auth/handover-error,
auto-redirect to /dashboard never fires.

Re-mint the JWT on every GetDeployment when deployment is ready +
handover-fired so the URL returned to the wizard is always
freshly-signed.

Best-effort: on mint failure, leave the existing URL in place so a
transient signer error doesn't break polling. Helper is idempotent +
locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:16:26 +04:00
e3mrah
9e1e4224d8
fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b) (#1572)
* feat(chart): wire OPERATOR_EMAIL/CONTROL_PLANE_IP/GITOPS_REPO_URL/ORG_NAME (D22)

Companion to PR #1567 + #1568 — wire the env vars chrootEnsureDeployment
reads to populate the deployment record so Sovereign Console Settings
page renders real values for ownerEmail, controlPlaneIP, gitopsRepoURL,
orgName (instead of `—` placeholders).

Adds 4 new keys to the sovereign-fqdn ConfigMap (orgEmail, orgName,
controlPlaneIP, gitopsRepoURL) sourced from .Values.sovereign.* with
empty defaults. Per-Sovereign overlays wire actual values from cloud-
init substitute placeholders (mirrors regionsJson pattern).

Catalyst-api Pod now reads them via valueFrom configMapKeyRef +
optional=true (Catalyst-Zero/contabo emits no sovereign-fqdn ConfigMap
so env stays empty there — correct, mothership is signer not validator).

Validated: t132 already serves region=hel1, consoleURL, loadBalancerIP
post-#1568. This PR fills the remaining 3 D22 fields when operator wires
the values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(slot-13): add D22 sovereign-side identity placeholders

Add ${ORG_EMAIL:-} + ${ORG_NAME:-} + ${SOVEREIGN_CONTROL_PLANE_IP:-} +
${GITOPS_REPO_URL:-} envsubst placeholders so when cloud-init wires
them, the chart picks them up via sovereign-fqdn ConfigMap (PR #1569)
→ catalyst-api env → chrootEnsureDeployment populates the deployment
record → Settings page renders real values instead of `—`.

This PR alone is a no-op (placeholders default to empty, same as today).
The cloud-init substitute lines + provisioner.go tfvars need to land in
a companion PR to actually populate the values on next-prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cloudinit): wire ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL substitutes (D22)

Companion to #1567+#1568+#1569+#1570 — the cloud-init substitute block
now emits ORG_EMAIL/ORG_NAME/GITOPS_REPO_URL into the bootstrap-kit
Kustomization's postBuild.substitute env, which the slot-13 placeholders
(#1570) consume via ${ORG_EMAIL:-}/${ORG_NAME:-}/${GITOPS_REPO_URL:-}.

Chain: provisioner.go writeTfvars → tofu vars → cloudinit templatefile
substitute → Flux Kustomization postBuild → sovereign-fqdn ConfigMap
keys (#1569) → catalyst-api env (#1569) → chrootEnsureDeployment
populates the deployment record (#1567 + #1568 fallback).

SOVEREIGN_CONTROL_PLANE_IP omitted intentionally — main.tf:691 notes
the dependency cycle (hcloud_server.cp doesn't exist at cloudinit
render time). Separate PR will source it via metadata-service or
post-create ConfigMap patch.

Next-prov (t133+) Sovereign Console Settings page now renders real
ownerEmail/orgName/gitopsRepoURL instead of `—` placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(router): chroot /app/<name> only-redirect mothership-only sub-paths (D17/D17b)

PR #1552 stripped the `/app` prefix on Sovereign mode to make
`/app/bp-cnpg` → `/bp-cnpg`, hoping consoleAppDetailRoute would match.
But consoleAppDetailRoute is registered at `/app/$componentId` under
consoleLayoutRoute — no chroot route matches `/<componentId>` directly,
so stripping leaves an empty render path. Playwright walkthrough on
t132 2026-05-17 confirmed: /app/bp-cnpg + /app/bp-coraza both render
body_len=9 (empty).

Invert the logic: only redirect mothership-only sub-paths (/dashboard
Fleet view, /install wizard, /sre, /sec, /blueprints) which have no
Sovereign Console equivalent. For everything else (component names like
`/app/bp-cnpg`, bare `/app`), let TanStack's natural most-specific-match
pick consoleAppDetailRoute / consoleAppsRoute.

Caught live on t132 via Playwright walker3.js — agent a4825c5a.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 02:05:54 +04:00
e3mrah
6618392407
fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22) (#1568)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)

PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.

toJson normalises both string and list inputs to valid JSON.

Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): populate deployment Result + Request fields for D22

Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.

Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env

Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.

Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): GetDeployment falls back to chrootEnsureDeployment (D22)

GetDeployment was the only handler that returned 404 without calling
chrootEnsureDeployment. After a catalyst-api Pod restart on the chroot
the in-memory store is empty until some other handler (StreamLogs,
jobs list) primes it via its own synth call — meanwhile the Sovereign
Console Settings page loads /api/v1/deployments/<id> first and gets
404, rendering the entire page broken.

Mirror the StreamLogs pattern (lines 1247-1254): try in-memory load,
fall through to chrootEnsureDeployment, return 404 only when both miss.

This unblocks PR #1567's deployment-record population — without the
fallback, GetDeployment can never serve the populated record on chroot.

Caught live on t132 2026-05-16 after #1567 image roll: Settings page
404 because in-memory store was empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:54:20 +04:00
e3mrah
ed63ecd09f
fix(chroot): populate deployment Result + Request fields for D22 settings (#1567)
* feat(handover): auto-seed owner UserAccess CR on chroot (D21)

Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

* chore(slot-13): pin bp-catalyst-platform to 1.4.147 (D21+D31 baked)

PR #1562 (D31 wordpress-tenant activeHotStandby) + PR #1564 (D21 owner
UserAccess auto-seed at handover, catalyst-api:8d2a947) both packaged
into chart 1.4.147. Pin slot so t133+ gets both gates on first prov.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chart): regionsJson uses toJson to defeat YAML flow-seq re-parse (D5)

PR #1551 single-quoted SOVEREIGN_REGIONS_JSON in the slot file
substitute, but Flux Kustomize's postBuild can still re-parse the
JSON-shaped string as a YAML flow-sequence depending on quoting context.
When that happens .Values.sovereign.regionsJson is a Go []interface{}
of map[interface{}]interface{} and `| quote` prints Go's
`[map[cloudRegion:hel1 ...]]` syntax — catalyst-api's json.Unmarshal of
the env var then fails and Request.Regions is empty.

toJson normalises both string and list inputs to valid JSON.

Caught live on t132 2026-05-16 chart 1.4.147: env var rendered as
`[map[cloudRegion:hel1 ...]]` despite #1551 being in effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(chroot): populate deployment Result + Request fields for D22

Settings page on Sovereign Console renders `—` for Region / Sovereign /
Created / DeploymentID / Pool subdomain because chroot's GET
/api/v1/deployments/<id> returns empty strings for those fields.

Populate from existing env vars (best-effort — empty when chart hasn't
wired them yet, which is no worse than today's behaviour):
- Result.ConsoleURL = "https://console.<fqdn>" (derived from selfFQDN)
- Result.GitOpsRepoURL from GITOPS_REPO_URL env
- Result.ControlPlaneIP from SOVEREIGN_CONTROL_PLANE_IP env
- Request.Region = regions[0].CloudRegion (top-level legacy field)
- Request.OrgEmail from OPERATOR_EMAIL env
- Request.OrgName from ORG_NAME env

Companion chart PR will wire the env vars from .Values.global.* +
cloud-init substitute placeholders. This PR is BACKWARD-compatible —
unset env vars produce empty strings, same as today.

Caught live on t132 2026-05-16 — `curl /api/v1/deployments/sovereign-
t132.omani.works` returns empty ownerEmail/region/consoleURL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 00:29:44 +04:00
e3mrah
8d2a947cfb
feat(handover): auto-seed owner UserAccess CR on chroot (D21) (#1564)
Closes the D21 gap on Sovereign DoD: /users page returned empty after
fresh handover because Keycloak `sovereign-admins` membership was
established but no UserAccess CR existed for the operator.

After `keycloak.EnsureUser` succeeds in `AuthHandover`, the helper
`EnsureOwnerUserAccess` upserts a cluster-scoped UserAccess CR shaped
like the canonical user_access.go `CreateUserAccess` write:

  apiVersion: access.openova.io/v1alpha1
  kind: UserAccess
  metadata:
    name: useraccess-owner-<sanitized-email>
    annotations:
      catalyst.openova.io/user-email: <email>   # rbac_matrix:309 hint
  spec:
    user:
      keycloakSubject: <email>
    sovereignRef: <fqdn-first-label>
    applications:
      - app: "*"
        role: admin                              # owner -> admin

The Composition (issue #322) reconciles the Claim into per-app
RoleBindings on the Sovereign so the operator surfaces in /users.

Best-effort + idempotent: AlreadyExists on the second handover is
folded to nil; any other error is logged at Warn and the handover
itself never fails. If the access.openova.io CRD has not rolled yet,
the next handover retries automatically.

Architect-first: mirrors `userAccessToUnstructured` shape and uses
existing `sovereignDynamicClient` + `rbacAssignSlug` seams. Tier
mapping follows the documented lossy `owner -> admin` rule in
`userAccessTierToRole` (CRD only accepts admin|editor|viewer).

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D21

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-16 23:49:32 +04:00
e3mrah
2fd4e3cbf4
feat(wizard): default marketplaceEnabled=true for D27 zero-touch (#1555)
Founder ruling 2026-05-16: D27 mandates that a fresh wizard provisions a
Sovereign already ready to host tenant orgs (D29). Operator can still
flip the toggle off on StepMarketplace if they explicitly want a
private Sovereign.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:24:09 +04:00
e3mrah
9f096b0b18
fix(chroot): populate Result.LoadBalancerIP so canvas shows LB chip (D15) (#1553)
chrootEnsureDeployment was synthesizing a Deployment with Result=nil.
The topology loader's buildLBs() returned [] on nil-Result → canvas
chip showed `LoadBalancer 0/0` on every chroot Sovereign Console
even though the Sovereign ingress LB was allocated and serving
console.<fqdn>.

Populate Result with LoadBalancerIP from `SOVEREIGN_LB_IP` env (set
by bp-catalyst-platform's sovereign-fqdn ConfigMap `lbIP` key per
issue #900 / PR #145). buildLBs then emits one LoadBalancer entry
per region using the canonical primary LB.

Caught on t131 2026-05-16 — DoD D15. Same chroot-synth-enrichment
pattern as PR #1534 (SOVEREIGN_REGIONS_JSON).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:58:53 +04:00
e3mrah
124ac13c1d
fix(router): chroot Sovereign /app/<name> resolves to AppDetail, not mothership AppsPage (D17b) (#1552)
Two route trees claim `/app`:

1. `appRoute` (line 364) — mothership AppLayout chrome, prefix `/app`,
   children `/app/$deploymentId/applications/*`, `/app/$deploymentId/
   settings`, `/app/dashboard` (fleet view), etc. ~30 children.
2. `consoleAppDetailRoute` (line 1141, under consoleLayoutRoute) —
   clean `/app/$componentId` for the chroot Sovereign Console's
   per-app detail.

On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign')
the operator clicks `/apps/<card>` → AppCard generates HREF
`/app/<name>` (AppsPage.tsx line ~720, correct for chroot context).
TanStack router resolves to the MOTHERSHIP `appRoute` because it
matches first (registered earlier under rootRoute) and its
children accept `<name>` as $deploymentId. The page renders
AppLayout chrome + AppsPage with mothership sidebar — looks
nothing like AppDetail.

Founder observation (BUG-002 from /tmp/test-matrix-t129.json + reported
on t131 2026-05-16):
> Application individual pages are not visible at all in the child
> while mothership doesn't have that issue, this is the biggest blunder!

Fix: `appRoute.beforeLoad` redirects on chroot:
- `/app/<componentId>` → `/<componentId>` (caught by consoleAppDetailRoute)
- `/app/dashboard`, `/app/install`, `/app/sre/*`, `/app/sec/*`, `/app/blueprints`
  → `/dashboard` (canonical Sovereign landing; these are mothership-only
  surfaces — already partially fixed at dashboardRoute level by PR #1547)

Mothership behavior unchanged (DETECTED_MODE.mode !== 'sovereign'
falls through to the existing AppLayout-rooted tree).

Refs DoD D17b. Caught on t131 (623354058b114dd6, 2026-05-16).

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:56:31 +04:00
e3mrah
fbe23da091
fix(ui-nginx): allow Google Fonts domains in CSP (D26) (#1549)
Sovereign Console pages reference Inter + JetBrains Mono fonts via
fonts.googleapis.com (index.html lines 9, 11). The nginx CSP only
allowed font-src 'self' data: — so the browser blocked the font
stylesheet AND the woff2 fetches, falling back to system fonts.

Add fonts.googleapis.com to style-src (for the @import CSS) and
fonts.gstatic.com to font-src (for the woff2 assets). All 3 CSP
occurrences in nginx.conf updated identically.

Alternative considered: self-host the woff2 + drop the external
references. Skipped for now — sticking with Google Fonts CDN is
faster + matches every other web app's posture. If the operator
wants air-gap-compatible Sovereigns later, switch to self-hosted.

Caught on t129 2026-05-16 — DoD D26.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:31:51 +04:00
e3mrah
7845a00799
fix(dashboard): add region + vcluster as TreemapDimensions (D16) (#1548)
Multi-region operators on the Sovereign Console couldn't pivot the
/dashboard treemap by region or vCluster. The TreemapDimension
union (FE) and dashboardDimension set (BE) only included
sovereign/cluster/family/namespace/application.

This PR:
- Adds 'region' + 'vcluster' to TreemapDimension type
  (products/catalyst/bootstrap/ui/src/lib/treemap.types.ts)
- Adds them to the dimension select options
  (products/catalyst/bootstrap/ui/src/components/TreemapLayerController.tsx)
- Adds them to the validated set in dashboard.go
- Adds podRow.region + podRow.vcluster fields populated from
  openova.io/region and catalyst.openova.io/vcluster-role labels
- Extends dimensionKey switch to bucket by these new dimensions
  (fallback: region→cluster, vcluster→"host")

Caught on t129 2026-05-16 — DoD D16. Note that full multi-cluster
fan-out (aggregating pods across all 3 region kubeconfigs into one
treemap) is a separate refactor not included here; this PR delivers
the dimension surface so the layer selector is usable + a fresh prov
with the chroot's k8scache extended to multi-region will render
3 cluster bubbles when the operator picks Layer-1=cluster.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:24:34 +04:00
e3mrah
52015ff468
fix(ui): t129 SPA routing — bp-bp- prefix, PIN /wizard leak, /app/dashboard fleet leak (#1547)
Three operator-visible SPA routing bugs caught on live t129 Sovereign
Console (t129.omani.works, 2026-05-16). Closes #1546.

BUG-001 (D19) — doubled /app/bp-bp-* href on 10 of 44 app cards.
  build-catalog.mjs::listBootstrapKit extracted slug from `NN-(.+)\.yaml`
  without stripping an optional `bp-` already present in some filenames
  (e.g. `13-bp-catalyst-platform.yaml`). The captured slug became
  `bp-catalyst-platform`, then `id: \`bp-${slug}\`` doubled it to
  `bp-bp-catalyst-platform`, breaking the FE↔BE HR-name join and
  printing the doubled prefix on the AppsPage card href. Fix: strip a
  leading `bp-` from the captured slug before forming the canonical id.
  Regenerated catalog.generated.ts + blueprints.json — 10 entries
  collapse to their single-prefix canonical form (bp-catalyst-platform,
  bp-cert-manager-powerdns-webhook, bp-k8s-ws-proxy, bp-guacamole,
  bp-dmz-vcluster, bp-hcloud-ccm, bp-openova-flow-server,
  bp-openova-flow-emitter, bp-mgmt-vcluster, bp-rtz-vcluster).

BUG-015 (D23, extends D0) — PIN-verify lands /wizard on Sovereign.
  VerifyPinPage default landing was `/wizard` regardless of operating
  mode. On a chroot Sovereign Console (DETECTED_MODE.mode === 'sovereign'
  the operator has just been auto-redirected from the mothership
  handover URL; their Sovereign is already converged. Routing them to
  the new-prov wizard re-prompts for org details and contradicts D0.
  Fix: branch on DETECTED_MODE.mode — `/dashboard` on sovereign,
  `/wizard` on catalyst-zero. Mothership flow unchanged. Test:
  VerifyPinPage.test.tsx asserts the 3 cases (sovereign default,
  catalyst-zero default, explicit next= override).

BUG-016 (D24) — /app/dashboard exposes mothership fleet view.
  appRoute's `/dashboard` child mounts DashboardPage (multi-Sovereign
  fleet, "7 Sovereigns" with duplicate rows). On a Sovereign Console
  this surface MUST NOT be reachable — the Sovereign owns ONE deployment,
  fleet is mothership-only. Fix: beforeLoad on dashboardRoute redirects
  to `/dashboard` (consoleDashboardRoute, the per-Sovereign landing)
  when DETECTED_MODE.mode === 'sovereign'. Mothership keeps the fleet
  view as today.

Refs: docs/SOVEREIGN-MULTI-REGION-DOD.md D19/D23/D24,
      /tmp/test-matrix-t129.json discoveries BUG-001/015/016.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 20:13:26 +04:00
e3mrah
2b3888eed5
fix(ui): suppress chroot-side false-positive notifications (D17, D18) (#1543)
Two notification spammers on the chroot Sovereign Console that produce
noise on every /apps + /app/<name> visit:

D17 — "Deployment id in the URL is malformed":
  AppsPage.tsx fires on isDeploymentID(rawDeploymentId)=false. On the
  chroot, useResolvedDeploymentId resolves to /api/v1/sovereign/self
  which returns the synthesized canonical id `sovereign-<fqdn>` (26
  chars, not hex). The notification claims that path-segment is
  invalid even though there is no URL segment — the resolution path
  is in-process. Suppress on DETECTED_MODE.mode === 'sovereign'.

D18 — "Per-component install monitoring is unavailable":
  Fires on state.phase1WatchSkipped. On the chroot, phase1WatchSkipped
  is a MOTHERSHIP-only concept (mother's observer pod failed to fetch
  the new cluster's kubeconfig). The Sovereign-side catalyst-api runs
  IN the cluster it's reporting on — has the in-cluster ServiceAccount
  + bundled sovereignDynamicClient + informer cache watching HelmReleases
  natively. Firing this here tells operator to drop to kubectl when
  the data is on the page. Suppress on chroot.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — DoD D17 + D18.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:46:25 +04:00
e3mrah
536bfcb699
fix(infrastructure): vCluster fallback from namespace label (D15) (#1542)
loadVClusters() queried vcluster.io/v1alpha1 CRs only. Our bootstrap
topology ships loft-sh/vcluster as a plain Helm chart (StatefulSet +
Service, NO CRD installed) so the CR list is always empty on a
converged Sovereign → canvas `vCluster N/N` chip shows `0/0` even
though Pods are Running.

Add a fallback: enumerate Namespaces carrying
`catalyst.openova.io/vcluster-role` label (stamped by
bp-{mgmt,dmz,rtz}-vcluster's namespace template at PR #1526).
Emits one VCluster row per labeled namespace with role = the label
value. Status `healthy` since the namespace exists (operator-visible
Pod state is surfaced elsewhere).

Caught on t129 (6cddff7ef4432bdc, 2026-05-16) — D15.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:40:50 +04:00
e3mrah
5b69247135
fix(clustermesh): secondary cluster name match tofu scheme (D11) (#1540)
Tofu's `secondary_region_cluster_mesh_name` local at
infra/hetzner/main.tf:389 generates secondary names as
`<sovereign-stem>-<region-stem-no-digits>` (e.g. `t129-nbg`,
`t129-sin`). The bootstrap-kit slot 01-cilium.yaml renders
cilium-config cluster.name from this value via the
CLUSTER_MESH_NAME envsubst.

The orchestrator's clusterName derivation was wrong: it appended
`-<region-key>` to the primary's name (e.g. `t129-mesh-nbg1-1`),
which matched NEITHER the tofu scheme NOR the cilium-config value.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16): TLS, etcd RBAC,
and connection all working after PRs #1530, #1536, #1538, #1539 —
but agent reported `failed to retrieve cluster configuration:
not found` for every secondary peer because it queried
`cilium/cluster-config/v1/t129-mesh-nbg1-1` against an etcd that
only had `t129-nbg`.

Fix: export `DeriveSecondaryClusterMeshName(req, rs)` that
mirrors tofu's local exactly, plus a `stripTrailingDigits` helper.
Orchestrator's buildRegionSlots uses this for secondaries; primary
keeps the `<stem>-mesh` shape.

Closes D11 incident chain: #1525#1528#1530#1536#1538#1539 → this. With this PR landed t129's secondary→primary
connection already works (verified on live cluster — secondary
agents show "ready, 2 nodes, 113 endpoints, 326 identities");
primary→secondary will work on a fresh prov once the name match
is correct from the start.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 19:08:55 +04:00
e3mrah
d0fd32dc04
fix(clustermesh): use peer's clustermesh-apiserver-remote-cert (D11) (#1539)
The orchestrator was minting a fresh client cert (CN = local cluster
name) for each peer connection. Even with PR #1530's "sign with
peer's CA" fix the TLS handshake succeeded but etcd RBAC rejected:

    error="etcdserver: permission denied"

Cilium's clustermesh-apiserver etcd has RBAC with a `remote` user
that has read access on the cilium/* prefix. The chart generates
`kube-system/clustermesh-apiserver-remote-cert` with CN=`remote`.

Canonical `cilium clustermesh connect` CLI copies THIS Secret's
tls.crt/tls.key as the client cert the REMOTE cluster presents —
matches the etcd RBAC user verbatim.

This PR adopts that pattern: snapshotRemoteCert() reads the peer's
existing `clustermesh-apiserver-remote-cert` Secret, returns
tls.crt + tls.key bytes, and the orchestrator writes them into
A's `cilium-clustermesh` Secret instead of minting.

Caught on t129 (6cddff7ef4432bdc, 2026-05-16):
- TLS handshake succeeded after firewall fix (PR #1538) opened
  NodePort range so LB→backend health check passed
- cilium-dbg status reported `etcd: 1/1 connected, has-quorum=true`
  (TLS path working)
- BUT `remote configuration: expected=true, retrieved=false` and
  agent logs spammed `etcdserver: permission denied`

With this PR's CN=remote cert, etcd authorizes the kvstore List
and clustermesh sync completes — agent should flip to
`2/2 remote clusters ready`.

Completes the D11 chain: #1525 (regionKeyFromSpec) → #1528
(clusterName derivation) → #1530 (cert with peer's CA — no longer
needed but kept as defense-in-depth) → #1536 (hostAlias pattern)
→ #1538 (firewall NodePort range) → this.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:58:22 +04:00
e3mrah
83d771dee9
fix(clustermesh): hostAlias pattern — endpoint hostname + DS patch (D11) (#1536)
Cilium clustermesh-apiserver server cert has SANs:
  *.mesh.cilium.io, clustermesh-apiserver.kube-system.svc,
  127.0.0.1, ::1
No public LB IP SAN. When the orchestrator wrote the peer config blob
with `endpoints: - https://<lb-ip>:2379`, TLS handshake from the
agent failed at hostname verification — `cilium-dbg status --verbose`
reported `0/N remote clusters ready, Waiting for initial connection`.

This PR adopts the canonical Cilium clustermesh hostAlias pattern
(same shape as `cilium clustermesh connect` CLI):

1. buildPeerConfigBlob now writes the endpoint as
   `https://<peer>.mesh.cilium.io:2379` — matching the apiserver
   server cert's `*.mesh.cilium.io` wildcard SAN.

2. New patchCiliumHostAliases adds one hostAliases entry per peer
   to the cilium DaemonSet's pod spec:
     - ip: <peer-LB-IP>
       hostnames: ["<peer>.mesh.cilium.io"]
   So the agent resolves the hostname to the public LB IP at
   connect-time. Strategic-merge patch: idempotent re-runs replace
   the whole list with the current peer set.

3. Orchestrator step 3 calls patchCiliumHostAliases for each
   region's local cilium DaemonSet right before the rollout-restart
   of cilium / cilium-operator / clustermesh-apiserver, so the new
   pod spec is in effect when the agents come back up.

Caught on t128 (9680edbdce8fefe8, 2026-05-16) — same incident
chain as PRs #1525/#1528/#1530. With this PR landed AND the
existing PR #1530 (cert signed by peer's CA), agents should
flip to `2/2 remote clusters ready` on the next prov.

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:10:21 +04:00
e3mrah
1f30a08ae3
fix(chroot): seed Request.Regions[] from SOVEREIGN_REGIONS_JSON env (D5) (#1534)
The Sovereign-side catalyst-api runs in "chroot" mode — it has no
parent prov record, so chrootEnsureDeployment synthesises a minimal
in-memory Deployment with only SovereignFQDN set. The
/infrastructure/topology loader then sees empty Request.Regions[]
and falls into the live-Nodes enumeration path (buildRegionFromLiveNodes)
which only sees THIS cluster's Node(s) → emits exactly 1 Region
even on a 3-region Sovereign. /cloud?view=graph renders as
"1 cluster 1 region" — DoD D5 failure.

Caught on t126 (84c0848406dd6fdd, 2026-05-16): operator reported
`console.t126.omani.works/cloud?view=graph` showed 1 region despite
mothership openova-flow snapshot holding all 3 regions correctly.

This PR threads the canonical multi-region RegionSpec[] from the
mothership prov body all the way to the Sovereign-side catalyst-api:

  tofu var.regions
    → jsonencode → sovereign_regions_json tftpl var
    → cloud-init postBuild.substitute SOVEREIGN_REGIONS_JSON
    → bp-catalyst-platform slot 13 sovereign.regionsJson value
    → sovereign-fqdn ConfigMap key `regionsJson`
    → catalyst-api Pod env SOVEREIGN_REGIONS_JSON (valueFrom)
    → chrootEnsureDeployment parses JSON, populates Request.Regions[]
    → topology loader emits one Region per spec entry

Single-region Sovereigns: var.regions has length 1; chart writes
the array literal; chroot synth still produces 1 Region — no
regression. Empty env: chroot falls back to live-Nodes path
(legacy behavior preserved).

Refs DoD D5.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:45:24 +04:00
e3mrah
050f87e267
fix(purge): second name-prefix pass for CCM-named clustermesh LBs (#1532)
Caught repeatedly (t124, t125 wipes both 2026-05-16): tofu destroy left
3 orphan `<fqdn-slug>-<region>-clustermesh` LBs each cycle. Names
don't start with `catalyst-` prefix because they're named by the
Cilium chart overlay
(`clusters/_template/bootstrap-kit/01-cilium.yaml`):

    load-balancer.hetzner.cloud/name:
      "${SOVEREIGN_FQDN_SLUG:=catalyst}-${SOVEREIGN_REGION_KEY:=primary}-clustermesh"

The first name-prefix pass (`catalyst-<fqdn-slug>`) misses these.
tofu doesn't manage them (CCM allocated post-Phase-1). Manual API
cleanup was forced each cycle.

Fix: add a second `purgeByNamePrefix` pass with the slug-only prefix
(`<fqdn-slug>-`) so any CCM-allocated resource named with the slug
gets swept. Dedup logic in `purgeByNamePrefix` already skips names
already reported by the labelled pass, so totals stay accurate.

Refs feedback_wipe_handler_ccm_lb_orphans.md.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:29:26 +04:00
e3mrah
70d6ada703
fix(clustermesh): sign A's peer client cert with B's CA (not A's CA) (#1530)
Caught on t126 (84c0848406dd6fdd, 2026-05-16) after PRs #1525+#1528
unblocked peer Secret writes. Cilium agents reloaded, peer entries
present, but cilium-dbg status --verbose shows:

    0/2 remote clusters ready
    t126-mesh-nbg1-1: Waiting for initial connection
    t126-mesh-sin-2:  Waiting for initial connection

TLS probe to peer apiserver returned "unexpected eof while reading":
the mTLS handshake fails because A's client cert was signed by A's
cilium-ca. Cilium clustermesh-apiserver's trust pool is the LOCAL
cilium-ca (B's), so A's cert is rejected at the handshake.

Fix: pass b.caCert/b.caKey to mintPeerClientCert. SAN stays A's
clusterName (matches upstream `cilium clustermesh connect` CLI and
the chart's default RBAC subject authorisation).

Refs DoD D11.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 17:23:18 +04:00