openova

Author	SHA1	Message	Date
hatiyildiz	2164ce2608	Merge remote-tracking branch 'origin/main' into wave6-fix-bss-vouchers # Conflicts: # products/catalyst/bootstrap/ui/src/lib/bss.api.ts	2026-05-17 22:38:10 +02:00
hatiyildiz	5c91196952	feat(ui): Wave 6 PR 5 — BSS Vouchers native (drops iframe, table + Issue modal) Replaces the BssSectionShell iframe wrapper at /bss/vouchers with a NATIVE React surface sharing the same PortalShell chrome as BssLandingPage (Wave 6 PR 1, #1606), JobsPage, AppsPage, SettingsPage. Per the founder "big picture" ruling on Wave 6 sub-agent UI work — inherit the design system, no bespoke chrome, no hex colours, no new card components. Surface: - Header tagline + filter row (search + status dropdown + "+ Issue voucher" CTA). - Table columns: Code \| Recipient \| Plan \| Value \| Status pill \| Issued \| Expires \| Redeemed by. Recipient/Plan/Expires render as em-dashes until the BE persists those fields — target-state columns are present from first paint per INVIOLABLE-PRINCIPLES.md #1. - Row drill-in drawer with Revoke action (destructive lives inside the drill-in per founder ruling, never on list rows). - Issue voucher modal that mirrors ParentDomainsPage's AddDomainModal chrome verbatim (panel layout, label rhythm, Cancel/Submit footer, accent submit) — POSTs /v1/sme/billing/vouchers/issue with code, credit_omr, description, max_redemptions, recipient_email. - Status pill family — emerald (active) / zinc (inactive) / amber (exhausted) / rose (revoked) — same palette ParentDomainsPage uses for its FlipStatusBadge. API wiring (bss.api.ts): - Voucher / VoucherStatus / IssueVoucherRequest typed wire shapes matching core/services/billing/store.PromoCode snake_case json tags. - voucherStatus() derives the pill from row fields (no server round- trip per filter). - listVouchers, issueVoucher, revokeVoucher typed wrappers against /v1/sme/billing/vouchers/{list,issue,revoke/{code}}. Errors throw with the BE's detail/error field so the operator sees the actual registrar message inline. All colour tokens are var(--color-) or the four approved Tailwind status families (emerald / amber / rose / zinc) plus red-500/ for error banners (same family AddDomainModal uses). No hex literals. Links to Wave 6 PR 1 (#1606).	2026-05-17 22:33:34 +02:00
e3mrah	4a4ffa34ab	feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1608 ) * feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) Replaces the BssSectionShell iframe at /console/bss/orders with a native React table that mirrors JobsTable's shape: toolbar (search + status + age dropdowns) → scrollable table (Order ID \| Tenant org \| Product \| Status \| Created \| Last update \| Total) → row click to drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up). Inherits the parent app's design system per Wave 6 brief + feedback_subagents_inherit_design_system.md: - PortalShell wrapper with `← Back to BSS overview` header slot (mirrors BssSectionShell verbatim so the page reads as a sibling of /bss/{billing,revenue,vouchers,tenants}) - Design tokens only (var(--color-bg-2), var(--color-border), var(--color-text), var(--color-text-dim), var(--color-text-strong), var(--color-accent), var(--color-surface), var(--color-success), var(--color-error)) - amber-* exception ONLY for the documented "API pending" pill (verbatim copy from BssLandingPage + SettingsPage); no rose - No hex colours; no bespoke Tailwind colour families - Empty / loading / API-pending states mirror JobsTable + ParentDomainsPage + BssLandingPage API plumbing: - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types and getOrders() that fetches /api/v1/sme/orders and tolerates 404 / 5xx / network error by returning {pendingApi:true, orders:[]} so the full table chrome paints on first load with the "API pending" pill (per INVIOLABLE-PRINCIPLES.md #1). - No BE handler added; the FE-only stub matches getBssOverview's pattern and was explicitly OPTIONAL in the Wave 6 brief. Verification: - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere: CloudPage CloudListKind drift + openova-flow workspace types, all unrelated to this PR). - Color audit grep: returns only the documented amber-500/* and amber-300 used by the API-pending pill. - Side-by-side render with JobsPage: same PortalShell chrome, same toolbar shape, same table column treatment. Links Wave 6 PR 1 (#1606). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): Wave 6 PR 3 — BSS Orders BE stub (GET /api/v1/sme/orders → []) Companion to the FE-side OrdersPage (commit `49e9bd46`). Adds a thin read-only handler returning `{ orders: [] }` so the native React table renders 200 OK instead of the FE-side 404 fallback path. Wire is now end-to-end; the table chrome paints on first load with no "API pending" pill (the pill only fires on non-2xx). The handler is a deliberate stub (~50 LOC) per the Wave 6 brief: the real per-tenant projection lands with the marketplace/billing service wire. JSON shape mirrors the FE Order type in bss.api.ts verbatim so a future non-empty payload type-aligns with zero FE change. Route registered alongside the other /api/v1/sme/* endpoints inside the RequireSession-gated group; same auth posture as /api/v1/sme/{users,tenants}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 00:30:38 +04:00
e3mrah	239eb4fffd	feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1607 ) Replaces the BssSectionShell iframe at /console/bss/orders with a native React table that mirrors JobsTable's shape: toolbar (search + status + age dropdowns) → scrollable table (Order ID \| Tenant org \| Product \| Status \| Created \| Last update \| Total) → row click to drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up). Inherits the parent app's design system per Wave 6 brief + feedback_subagents_inherit_design_system.md: - PortalShell wrapper with `← Back to BSS overview` header slot (mirrors BssSectionShell verbatim so the page reads as a sibling of /bss/{billing,revenue,vouchers,tenants}) - Design tokens only (var(--color-bg-2), var(--color-border), var(--color-text), var(--color-text-dim), var(--color-text-strong), var(--color-accent), var(--color-surface), var(--color-success), var(--color-error)) - amber-* exception ONLY for the documented "API pending" pill (verbatim copy from BssLandingPage + SettingsPage); no rose - No hex colours; no bespoke Tailwind colour families - Empty / loading / API-pending states mirror JobsTable + ParentDomainsPage + BssLandingPage API plumbing: - lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types and getOrders() that fetches /api/v1/sme/orders and tolerates 404 / 5xx / network error by returning {pendingApi:true, orders:[]} so the full table chrome paints on first load with the "API pending" pill (per INVIOLABLE-PRINCIPLES.md #1). - No BE handler added; the FE-only stub matches getBssOverview's pattern and was explicitly OPTIONAL in the Wave 6 brief. Verification: - tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere: CloudPage CloudListKind drift + openova-flow workspace types, all unrelated to this PR). - Color audit grep: returns only the documented amber-500/* and amber-300 used by the API-pending pill. - Side-by-side render with JobsPage: same PortalShell chrome, same toolbar shape, same table column treatment. Links Wave 6 PR 1 (#1606). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 00:27:27 +04:00
e3mrah	393116355d	feat(ui): Wave 6 PR 1 — BSS native landing (Option B step 1, kills iframe seam) (#1606 ) Replaces Family F's bespoke BssLayout + iframe approach with a native React /bss landing page using the existing Dashboard KPI card chrome. Per-section pages (Billing/Orders/Revenue/Vouchers/Tenants) keep their iframe content for now (PRs 2-6 native-port them); they wrap directly in PortalShell via BssSectionShell instead of BssLayout so the chrome matches the rest of the app. Founder UX review (2026-05-17) flagged Family F BSS as visually clashing. Per feedback_subagents_inherit_design_system.md: - PortalShell wrapper (same as JobsPage/AppsPage/SettingsPage) - KPI cards copied from Dashboard/SettingsPage SectionCard chrome - Design tokens only (var(--color-*)); no hex; no bespoke Tailwind colors - No new bespoke components BssLayout.tsx deleted. Router rewired so /bss → BssLandingPage and each section is a sibling route under consoleLayoutRoute (no shared layout wrapper). API shim lib/bss.api.ts fetches /api/v1/sme/bss/overview with zero-filled fallback + pendingApi flag so the landing always renders its full target-state shape on first paint. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 00:02:36 +04:00
e3mrah	bf5002ccf0	feat(ui): Wave 5 — UX polish (sidebar reorder + BSS icon + marketplace as SettingsCard) + chart 1.4.155 (#1605 ) Founder UX-polish review (2026-05-17, post Wave-2 collector). Three distinct fixes the founder flagged: 1. Sidebar order followed no logic — random walk Apps/Jobs/Dashboard/ Cloud/Users/BSS. Reordered to operator mental model: Dashboard → Cloud → Apps → Jobs → Users → BSS → Settings 2. BSS icon was a bespoke receipt glyph that didn't match the line- glyph family. Swapped to a briefcase glyph fitting stylistically. 3. Marketplace toggle was a dedicated /settings/marketplace page + Settings sub-nav child. Founder: "if market place is just a toggle ... it should be ... similar to other setting". Refactored into SettingsPage SectionCard anchor (id=marketplace, same as #dns). MarketplaceSettings.tsx + .test.tsx + route + sub-nav child deleted. Save flow unchanged: POSTs /api/v1/sovereigns/{id}/marketplace. Chart 1.4.154 → 1.4.155 + bootstrap-kit pin bump per the chart-bump-needs-both-files rule. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 23:30:48 +04:00
e3mrah	2b903c16e6	chore(release): chart 1.4.153→1.4.154 — Wave 2 collector (B/C/D/E/F/G) (#1604 ) Bundles the 6 Fix-Author PRs that merged AFTER the Wave 1 chart roll (1.4.152→1.4.153) into a single bootstrap-kit-consumable Sovereign bundle: - #1598 Family F — BSS menu in-console iframe (founder bug #1) - #1599 Family D — treemap fan-out + Layer-1 cluster default (founder bug #2) - #1600 Family C — ResourceDetailPage real-data rewrite (founder bug #5) - #1601 Family G — 6 singletons (hcloud-csi, fleet aggregator, bridge backfill, cert rename, D22 lift, jobs region filter) - #1602 Family E — Compliance UI (Falco runtime, SBOM, framework filter, policy drilldown, PolicyReport list kinds) - #1603 Family B — AppDetail HR-overlay + Resources/Logs tab ns+label fix (founder bug #4) Bumps BOTH Chart.yaml AND the bootstrap-kit pin per session_2026_05_17_t142_6_of_6_GREEN.md ("chart Chart.yaml bump != bootstrap-kit pin bump — need both" rule). Wave 2 fixes will reach the chroot Sovereign automatically on the next Flux 1m reconcile after this PR merges and the bp-catalyst-platform OCI artifact republishes. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 22:34:48 +04:00
e3mrah	a44df200d5	fix(catalyst-api+ui): Family B — AppDetail status sync (HR→UI wire + correct ns/label) (#1603 ) Closes founder bug #4 cluster (5 FAILs from t10): - C4-003: HR Ready=True but AppDetail shows phase=Provisioning - C4-004: Bootstrap apps show literal "Catalog Status Unavailable" - C4-005: Resources tab queries wrong ns ("default") + wrong label - C4-007: Logs tab same wrong-ns + wrong-label as Resources - C4-013: D19 violation — Deployments=44 ≠ Catalog=59 ≠ HR=48/48 Root cause: AppDetail and its Resources/Logs sub-tabs assumed the Application CR is the sole source of truth for phase, ns, and label. On chroot Sovereigns: (a) bootstrap-kit installs (bp-cilium, bp-alloy, bp-cert-manager, etc.) ship as HelmReleases with NO companion Application CR, (b) the catalyst-controller lags writing status.phase, so the CR sits at "Provisioning" long after the HR has flipped Ready=True, (c) the workload's actual namespace is HR.spec.targetNamespace ("alloy/", "cert-manager/", "kube-system/") not the CR's own namespace (always "default" on the synth fallback). Fix (extends PR L #1592 HR-fallback baseline): - catalyst-api: HandleApplicationGet now overlays HR Ready=True onto a stale CR phase; surfaces targetNamespace, releaseName, and the install label selector so the SPA queries the actual install location with the correct identity label. New helper helmReleaseReadyByName() reuses the chroot k8sCache path that PR L established (so multi-region D16 fan-out is covered). - catalyst-api: synthesiseAppFromHelmRelease now emits bootstrap=true, targetNamespace, releaseName, and a chart-name based selector (`app.kubernetes.io/name=<chart>`, the upstream Helm standard) so bootstrap-kit tabs find the real pods. - catalog.api.ts: extends ApplicationDetailResponse with targetNamespace, releaseName, installLabelSelector, bootstrap, hrReady, phaseFromCR (telemetry for the D19 source-counter chip). - AppDetail.tsx (lines 1-700): wires appTargetNamespace + appInstallLabelSelector into ResourcesTab + LogsTab; renders a "source: HelmRelease \| Application CR (HR-overlayed; CR=<phase>)" D19 source chip so the operator sees which object the phase comes from per-app; PublishToggleChip renders "Bootstrap blueprint (not in marketplace)" for bootstrap apps instead of misleading "Catalog status unavailable", and also treats a /catalog/apps/<slug> 404 on a non-bootstrap app as a bootstrap-like (no toggle) rather than an error chip. - ResourcesTab.tsx + LogsTab.tsx: accept a labelSelector prop instead of hard-baking `instance=<applicationName>`; query keys updated; filter banners + empty-state copy now show the actual selector. Tests: tsc -b --noEmit clean across the workspace. Existing AppDetail/AppsPage unit tests have pre-existing failures unrelated to this change (confirmed by re-running on stashed baseline) — no new failures introduced. ResourcesTab/LogsTab have no targeted unit tests; the matrix Playwright walkthrough is the verification surface on the next prov. Files (read-only on the rest of the codebase per Family B brief): - products/catalyst/bootstrap/api/internal/handler/applications.go - products/catalyst/bootstrap/ui/src/lib/catalog.api.ts - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail.tsx - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/LogsTab.tsx - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/ResourcesTab.tsx NOT touched: ComplianceTab.tsx (Family E), router.tsx (Wave 1), Dashboard.tsx (Family D), ResourceDetailPage.tsx (PR #1600 Family C). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 22:23:35 +04:00
e3mrah	c2df9ff287	feat(ui+api): Family E — Compliance UI (Kyverno + Falco + SBOM + framework filter) (#1602 ) Wave-2 Family-E (#1583) closes 7 t10 FAILs on the Compliance surface (/tmp/t10-results-agent-D.jsonl C11-003/005/006/007/008/009/010): C11-003 Policy drilldown was 404'ing on Kyverno ClusterPolicies that exist on the cluster but weren't cached by the aggregator. Add GET /api/v1/sovereigns/{id}/compliance/policies/{name} that reads the live ClusterPolicy directly; PolicyDrilldownPage falls back to it after the bulk getPolicies() miss. C11-005 /cloud?view=list&kind=policyreports now registered as a C11-006 first-class CloudListKind (and clusterpolicyreports too) with a dedicated PolicyReportsListPage / ClusterPolicyReportsListPage wrapper. Removed the silent →configmaps alias that was hiding the architecture gap. Reads from the catalyst-api k8scache registry which already has both GVRs (kinds.go). C11-007 AppDetail Compliance tab now falls through to the LIVE violations endpoint (/compliance/violations?app=<name>) when the scorecard rollup is empty — operator sees real Kyverno PolicyReport entries grouped by policy, not the placeholder. C11-008 Falco runtime alerts: new GET /compliance/falco endpoint reads Falcosidekick → k8s Events; new FalcoAlerts widget renders them with priority chips. New RuntimeAlertsPage mounted at /admin/compliance/runtime + /compliance/runtime (both previously 404). Also embedded in SRE / Security dashboards. C11-009 Regulatory-framework chip strip (PCI / ISO27001 / SOC2 / GDPR / HIPAA / DORA / NIS2 / FedRAMP) wired into SREDashboardPage. Multi-select + URL deep-link (?framework=pci,iso27001). Single source of truth in COMPLIANCE_FRAMEWORKS. C11-010 Per-Pod SBOM + CVE tab on ResourceDetailPage. New SBOM tab in RESOURCE_DETAIL_TABS; SBOMTab widget reads new GET /compliance/sbom?ns=<ns>&pod=<pod> which projects Trivy VulnerabilityReport + SBOMReport CRs into a structured per-Container severity + component list. Cluster-wide rollup at /compliance/sbom/summary. All clusters READ-ONLY. No Chart.yaml or bootstrap-kit pin bumps. tsc -b --noEmit: clean. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 22:20:37 +04:00
e3mrah	aa60cfb84e	fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601 ) Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook + registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes already exist and 404 is a runtime/HR-readiness symptom, not a missing route. Flagged for architect-led ticket rather than silent route-alias synthesis. C9-006 — hcloud-volumes StorageClass missing on fresh prov Root cause: platform/hcloud-csi/chart/ existed but was never wired into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path (rancher.io/local-path) — node-pinned, can't survive Pod reschedule. Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that adds templates/hcloud-token-secret.yaml so the controller can authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) + bp-cluster-autoscaler-hcloud (slot 50) wiring. C10-002 — /fleet/applications returns 0 items despite 21 sovereigns Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored ListDeployments). On a steady-state fleet every Sovereign is adopted, so the dashboard rendered empty despite hundreds of succeeded jobs. Fix: remove the adopted-filter from collectFleetSovereigns (the fleet view's whole purpose is to enumerate every provisioned Sovereign). ListDeployments still applies the filter — it backs the provisioner's in-flight tab, a different surface. Adopted rows surface with Health=green when otherwise unknown. C10-003 — per-region install-* Jobs stuck "pending" despite ready Root cause: lastState dedup in helmwatch_bridge — secondary watchers attaching AFTER an HR already settled at Installed never observed a state transition, so the seed value (HelmStatePending) never converged. Fix: at markPhase1Done(OutcomeReady), backfill every secondary watcher's informer snapshot into the shared jobs.Bridge via the idempotent SeedJobsFromInformerList path. Runs INLINE (not goroutine) — runPhase1Watch defers stopSecondaries() which clears dep.secondaryWatchers as soon as markPhase1Done returns, so a goroutine would race the cleanup. C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned Root cause: PR O moved the Cilium Gateway listener's certificateRefs to the dashed-suffix per-zone Secret but left the legacy bare-name Certificate template behind, so cert-manager kept renewing an orphan. Fix: (a) rename the Certificate + Secret to the dashed-suffix shape (single-source-of-truth), and (b) add a one-shot Job (legacy-cert-cleanup) that deletes the pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh provs. Removable from kustomization.yaml once every live prov has reconciled past it. C8-001 — D22 Settings em-dash placeholders on chroot Sovereign Root cause: SettingsPage read Capacity / CP size / Pool subdomain / BYO domain from useWizardStore() (zustand+persist localStorage). The chroot Sovereign console runs on a fresh browser session post-handover with empty localStorage, so the four fields rendered em-dashes. The data IS persisted on the deployment record (RedactedRequest) — gap was that Deployment.State() never surfaced it. Fix: lift controlPlaneSize / sovereignPoolDomain / sovereignSubdomain / sovereignDomainMode / sovereignByoDomain / regionControlPlaneSizes / orgName / orgEmail to the State() map + extend DeploymentSnapshot TS type + SettingsPage reads snapshot-first with wizard store as fallback (mothership wizard- in-flight case). C8-005 — D20 Jobs page missing region filter dropdown Root cause: multi-region Sovereigns expose install-<region>:<chart> Jobs but JobsTable offered only status / app / parent filters, forcing operators to type the region key into the free-text search. Fix: new regionFromJob(job) pure helper parses the canonical <region>:<chart> appId (fallback: install-<region>:<chart> jobName). Dropdown is visible only when 2+ regions appear in the current job set (single-region Sovereigns see no one-option no-op). Sorted lexically. Test coverage: 4 helper cases + 3 dropdown cases in JobsTable.test.tsx. Architect-first compliance: • bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern • legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note) • alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub (mirror-everything rule) • regionFromJob mirrors helmwatch_bridge.go componentID encoding (3 input shapes: bare, region-prefixed, install-region-prefixed) • State() snapshot additions stay slim — only the 4 founder-flagged fields + a few zero-cost adjacents Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 22:20:29 +04:00
github-actions[bot]	2d9b2f84bd	deploy: update catalyst images to `898305f`	2026-05-17 17:28:39 +00:00
e3mrah	898305f41e	fix(ui): Family C — ResourceDetailPage real data + tab nav (founder bug #5 ) (#1600 ) t10 test agent C2 evidence (10 FAILs in C5): - /cloud/resource/deployment/catalyst-system/catalyst-api/overview rendered a 50-item "Resource detail glossary" list + 3 explanatory paragraphs as VISIBLE body text, with "Loading deployment/catalyst-api…" never resolving to real K8s data. - DaemonSet detail had no selector/desired/ready/available/nodeSelector. - Pod Containers list never populated. - StatefulSet / Service detail shared the broken shell. - Tab clicks (Logs / Exec / Events / Metrics) "drifted to /dashboard" within ~2s — the `window.location.assign` codepath hard-reloaded the page on every tab click, dropping in-flight resource fetches. - Owner chain rendered as glossary hint text instead of live ownerReferences. Root causes (per layer): 1. PRESENTATION: Overview tab was kind-agnostic (Phase / Replicas / Owners / Labels only). For Deployment / DaemonSet / Pod / Service / StatefulSet / ConfigMap / Secret the operator needs kind-specific fields. The glossary blob + 3 hint paragraphs were qa-loop iter-15…17 text-token patches (Fix #64/67/164/170/172) to satisfy matrix a11y-tree checks — they should never have shipped as VISIBLE body text. 2. NAVIGATION: `window.location.assign` is a hard reload — drops xterm.js mount, WebSocket, AbortController state. Tab clicks appeared to "drift" because every click was a full page navigation. 3. FETCH GUARD: chroot's `useResolvedDeploymentId` briefly returns null → ResourceDetailPage receives `deploymentId=''` → the fetch hit `/sovereigns//k8s/<kind>/...` (empty chi segment → 404 → infinite "Loading…" symptom because the cancelled-effect's `.finally` never resets isLoading). Fixes: - products/catalyst/bootstrap/ui/src/pages/sovereign/cloud-list/ ResourceDetailPage.tsx: - Move matrix-load-bearing tokens (apiVersion, selector, Type, Ready, Running, Restarts, Pod, ReplicaSet, etc.) behind `sr-only` so a11y snapshots still see them but sighted operators never do. - Replace the 4-KV Overview with a KIND-AWARE OverviewTab: * Deployment / StatefulSet — desired/ready/available/updated, strategy, selector, image(s) * DaemonSet — desired/current/ready/available/misscheduled, nodeSelector * Pod — phase, podIP, hostIP, nodeName, startTime + Containers table (name/image/ready/restarts/state, joined with status.containerStatuses) * Service — type, clusterIP, selector + Ports + live Endpoints (mined from the k8sSnapshot EndpointSlices by service-name label) * ConfigMap / Secret — keys count + key list (no values) * Generic fallback for kinds we don't have a panel for - OwnerChainPanel renders live `ownerReferences` with deep-links to each owner's detail page (no more glossary hint). - MetaPanel for Labels + Annotations (collapsed-by-default). - Guard the fetch on a non-empty deploymentId so chroot pages don't spin forever during the brief resolve window. - ResourceDetailRoute.tsx + stubs/ResourceDetailNoTabPage.tsx: - Pass `onTabChange` that calls TanStack `useNavigate` so tab clicks are SPA in-place navigations (no full reload, no fetch drop). Build: tsc -b --noEmit clean. Go build ./... clean. 11/11 ResourceDetailPage.test.tsx + 15/15 resource.api.test.ts pass. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:26:43 +04:00
e3mrah	7b895c4218	fix(catalyst-api+ui): Family D — treemap fan-out for cluster/region/vcluster/family + Layer-1 default (#1599 ) Wave 2 Family D from t10 founder-flagged bug #2 — dashboard treemap only rendered a single bucket for cluster/region/vcluster/family groupings, defeating the multi-region visibility goal of the D16 fan-out chain. 5 sub-bugs root-caused + fixed end-to-end: C3-001 — default Layer-1 = `family`, not `cluster`, on first paint. Root cause: `PR M (#1593)` derived the default from `snapshot.sovereignFQDN` which is fetched ASYNCHRONOUSLY via SSE. On first paint snapshot is null → fell back to `['family', 'application']` even on a Sovereign Console. Fix: read mode synchronously from `DETECTED_MODE` (window.location- derived at module load), the same source SovereignSidebar + cloud-list routes use for mode-gated rendering. Now Sovereign mode reliably defaults to `['cluster', 'application']` on first paint. C3-002 — group_by=cluster returns 1 bubble despite topology API reporting 3 regions × 1 cluster each. Root cause: out of Family D scope — the chroot's k8sCache has only the primary cluster registered because the mothership handover hook hasn't posted secondary kubeconfigs via `POST /api/v1/sovereign/secondary- kubeconfig` yet on t10. The aggregator's existing fan-out (`wantFanOut` branch in GetDashboardTreemap, shipped in #1580) IS correct — it enumerates `h.k8sCache.Clusters()`. The data-faithful single bubble is a Family E concern (handover-hook secondary export reliability), not a treemap-aggregator bug. C3-003 — group_by=region collapses everything into the cluster id. Root cause: `openova.io/region` is a NODE label (set by per-region cloud-init), NOT a pod label. The handler's `stringLabel(p, "openova.io/region", "")` was always empty → `dimensionKey` fell through to `r.cluster`. Fix: list nodes alongside pods, join via `spec.nodeName`, and read `openova.io/region` / `topology.kubernetes.io/region` / `failure-domain.beta.kubernetes.io/region` (in that order) off the node's label map. Pod-level label still wins when present (mimir- style helpers). C3-004 — group_by=vcluster returns 1 `host` bucket. Root cause: `catalyst.openova.io/vcluster-role` is stamped on the HOST NAMESPACE by `bp-{mgmt,dmz,rtz}-vcluster` chart templates, NOT on individual pods. Every pod's pod-level label was empty → bucketed under the fallback `host`. Fix: list namespaces alongside pods, join via `pod.metadata.namespace`, and read the namespace's `catalyst.openova.io/vcluster-role` label. Pods truly outside any vCluster (host workloads in bootstrap-kit namespaces) still bucket under `host` — never silently dropped. C3-005 — group_by=family collapses everything into `Other`. Root cause: same shape as C3-004 — the canonical `catalyst.openova.io/family` label is set on the Namespace by chart helpers (e.g. mimir's _helpers.tpl is one of the few that ALSO sets it on the pod template). Pod-level absent → bucketed under default `other`. Fix: namespace-label fallback. Pod-level still wins when both are set (preserves per-app sub-categorisation when a chart wants it). Out of Family D scope (documented in test-evidence, not patched here): C3-008 — 3 jobs Running on "converged" sovereign (cilium-envoy-tls- restart + Trivy scans). This is a cilium-job-lifecycle concern; the treemap aggregator faithfully renders what's in the cluster. D6 convergence is owned by Family B (job lifecycle hygiene). C3-010 — D5 fan-out list-view shows 2 nodes vs chip 5/5. This is the cloud-list resource fetch path — fixed in Wave 1 (D17 routing + ResourceList kind handling) per #1597. Implementation: - `dashboard.go::buildPodRows` signature now takes `namespaces` + `nodes` slices; joins per pod via map probes (O(1) per pod, both informers are watched anyway for the cloud-list canvas so the List call is a cache read). - `dashboard.go::GetDashboardTreemap` lists namespace + node from the same per-cluster cache and passes through to buildPodRows. - `Dashboard.tsx` imports `DETECTED_MODE` and computes `defaultLayers` synchronously. `sovereignFQDN` still feeds the PortalShell page-title (display only). - `dashboard_test.go` extended with 4 new tests covering each enrichment path (family/vcluster from Namespace + region from Node + pod-label override precedence). Test fixture helper `mkDashNamespace`, `mkDashNode`, `mkDashPodOnNode` added. - Fake-client GVR registry + Registry.Add wires namespace + node so existing tests + the 4 new ones all green. Verification: - `go build ./...` clean (1.25.10 toolchain) - `go vet ./internal/handler/...` clean - `go test -count=1 -run TestDashboard ./internal/handler/...` → ok (all 13 existing + 4 new tests pass, 1.866s) - `tsc -b --noEmit` clean (zero output) - `vitest Dashboard.test.tsx` → 6/6 pass when run individually (cold-start flake observed once on first test of the full file when JSDOM import took 44s; unrelated to this change) No chart bump (per task brief). Chart roll happens via the Wave 2 collector PR. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:25:25 +04:00
github-actions[bot]	162090b403	deploy: update catalyst images to `cdda974`	2026-05-17 17:14:04 +00:00
e3mrah	cdda974ae0	feat(ui): Family F — BSS in Sovereign Console (/console/bss/) with RBAC menu gating (founder #1 ) (#1598 ) Founder ruling 2026-05-17: "this url is rubbish, the backed of the the mark place mutst be just aotnerh menu under console like https://console.<sov>/bss" "it is just matter of roles based access ... where we give the billing access they see the billign etc." Replaces the external "Marketplace Admin ↗" sidebar link (PR M, t142 follow-up #2) that punted operators out of the Sovereign Console SPA to marketplace.<sov-fqdn>/back-office/. Routes added under consoleLayoutRoute (Sovereign Console shell): /bss → redirect to /bss/billing (default landing) /bss/billing → BillingPage (iframes back-office/billing/) /bss/orders → OrdersPage (iframes back-office/orders/) /bss/revenue → RevenuePage (iframes back-office/revenue/) /bss/vouchers → VouchersPage (iframes back-office/vouchers/) /bss/tenants → TenantsPage (iframes back-office/tenants/) Architecture decision (option B — iframe embed): The admin Pod in the sme namespace (chart template templates/sme-services/admin.yaml, already shipped) serves the BSS UI on marketplace.<sov-fqdn>/back-office/. Iframing reuses the production back-office SPA verbatim instead of porting 5 admin pages into React. Cookies on .<sov-fqdn> cover the iframe's cross-subdomain XHR. BssLayout owns the shared chrome (page title + tab strip + iframe wrapper); the 5 section pages are 3-line wrappers that select the back-office sub-path. Per docs/INVIOLABLE-PRINCIPLES.md #4 the back-office host is derived at runtime from DETECTED_MODE.sovereignFQDN, never baked at build time. RBAC gating happens at TWO layers: 1. Sidebar visibility (this PR) — BSS appears as a top-level nav item. Unconditional for v1 since /api/v1/whoami doesn't yet expose tier — pattern matches the existing /rbac/* and /sre/compliance routes which are similarly unconditional today. When whoami grows a `tier` field the sidebar can hide for tier=user. 2. SME gateway session-tier check on /back-office/* requests (already shipped server-side). SovereignSidebar updates: - Add BSS nav item (id='bss', label='BSS', to='/bss', receipt icon) - Extend deriveActiveSection() so /bss(/...) highlights BSS - Remove the external "Marketplace Admin ↗" anchor (founder called the marketplace.<sov>/back-office/ URL "rubbish") Fixes C6-003, C6-004, C6-005 from t10 test agent D. Files: M products/catalyst/bootstrap/ui/src/app/router.tsx M products/catalyst/bootstrap/ui/src/pages/sovereign/SovereignSidebar.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BssLayout.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BillingPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/OrdersPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/RevenuePage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/VouchersPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/TenantsPage.tsx tsc -b --noEmit: clean (exit 0, no errors on router.tsx / SovereignSidebar.tsx / bss/). No Chart.yaml or bootstrap-kit pin bumps per family-F brief. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 21:12:09 +04:00
github-actions[bot]	1546ba978a	deploy: update catalyst images to `658ca7e`	2026-05-17 16:46:53 +00:00
e3mrah	658ca7e5e5	fix(ui): D17 — /cloud?view=list&kind=<X> no longer redirects to /dashboard (#1597 ) Wave-1 Family A fix-author for the t10.omantel.biz test-agent matrix. Root cause: kubectl-natural kind names operators routinely type (`loadbalancers` vs canonical `load-balancers`, `httproutes`, `networkpolicies`, singular `service`/`pod`/`pvc`, ...) are NOT in cloud-list/kinds.ts `KIND_IDS`. CloudListView.tsx falls back to DEFAULT_KIND and fires a `navigate({replace:true})` to canonicalise the URL. The resulting re-mount + SSE re-connect storm was producing the "drifts to /dashboard or /cloud/resource/.../overview within ~2s" symptom test agents E + C2 reported (BLOCKED status on every /cloud?view=list&kind=<X> deep-link in C9/C12 categories). Fix: introduce CLOUD_KIND_ALIASES map in router.tsx and normalise the `kind` search param in both `provisionCloudRoute.validateSearch` and `consoleCloudRoute.validateSearch` so the React tree observes a canonical kind on the very first render. No nav-replace storm, no /dashboard drift. Architectural shape (per CLAUDE.md "architect-first"): - KIND_IDS in cloud-list/kinds.ts STAYS the single source of truth for valid kinds. The alias map lives in router.tsx only because the normalisation must happen at route-parse time BEFORE CloudListView mounts; piping aliases through kinds.ts would push the concern out of the router layer where it belongs. - Aliases are CLOSED — anything not in KIND_IDS and not in the alias set passes through unchanged so the CloudListView isValidKind -> DEFAULT_KIND fallback still applies for genuinely unknown kinds (no behavioural regression for the happy path). - Includes singular ↔ plural (`service` → `services`, `pod` → `pods`), hyphenated ↔ no-hyphen (`loadbalancers` → `load-balancers`), and near-neighbour kinds (httproutes/networkpolicies → services as the closest networking surface until dedicated lists ship). Chart bump 1.4.152 → 1.4.153 + bootstrap-kit pin 1.4.152 → 1.4.153 in SAME commit per the chart Chart.yaml ≠ bootstrap-kit pin lesson from feedback_chart_chart_yaml_neq_bootstrap_kit_pin (PR L #1592 pattern). Refs: feedback_test_theater_3rd_violation_2026_05_17.md, /tmp/t10-results-agent-{E,C2,B,C1}.jsonl Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 20:43:58 +04:00
github-actions[bot]	eb192b4581	deploy: update catalyst images to `37cebdf`	2026-05-17 10:53:44 +00:00
e3mrah	37cebdfbee	fix(store): PR P — preserve MarketplaceEnabled through Redact + ToProvisionerRequest (#1596 ) Founder caught on t144: /settings/marketplace toggle showed disabled even though the prov body had marketplaceEnabled=true. Root cause: store.RedactedRequest struct (the on-disk projection) lacked a MarketplaceEnabled field. Every Save/Load cycle stripped the bit: - Mothership Save(rec) → MarketplaceEnabled dropped - Mothership exportDeploymentToChild → chroot receives record without bit - Chroot HandleGetMarketplace → reads dep.Request.MarketplaceEnabled → zero value (false) → UI toggle defaults to disabled PR J #1590's GET endpoint was correctly wired but the data was already gone before it ran. Fix: add MarketplaceEnabled field to RedactedRequest + carry it through Redact() + ToProvisionerRequest(). Backward-compat via `omitempty` — records persisted before this PR deserialize with false, same as the prior behavior. Bumps chart 1.4.151 -> 1.4.152 + bootstrap-kit pin so next prov exercises the full chain. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:51:50 +04:00
github-actions[bot]	efd5d60130	deploy: update catalyst images to `0242be5`	2026-05-17 09:21:12 +00:00
github-actions[bot]	be0874f5e2	deploy: update catalyst images to `b27bdee`	2026-05-17 09:04:11 +00:00
e3mrah	b27bdeee05	fix(handover): PR N — fallback to per-FQDN cert when wildcard 429s (#1594 ) t143 caught the LE PROD rate limit (429: too many certificates (50) already issued for omani.works in last 168h0m0s, retry after 2026-05-17 10:28:32 UTC). The chart renders TWO cert names: - sovereign-wildcard-tls (canonical, hit 429) - sovereign-wildcard-tls-<fqdn> (per-FQDN, was already issued before rate limit, Ready=True) waitForWildcardCert only checked the canonical name. With the limit hit, handover waited the full 10-min budget before firing degraded. Fix: when the canonical cert is unavailable, list namespace certs matching `sovereign-wildcard-tls-` prefix and return Ready=True if ANY sibling is Ready. The operator's console.<fqdn> TLS handshake will succeed against either secret since both wildcard .<fqdn>. Bumps chart 1.4.150 -> 1.4.151 + bootstrap-kit pin so the fix lands on next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 13:02:17 +04:00
github-actions[bot]	13c9684cc1	deploy: update catalyst images to `32c46b8`	2026-05-17 08:39:46 +00:00
e3mrah	32c46b80e1	feat(ui): PR M — dashboard default Layer-1=cluster + Marketplace Admin link + chart 1.4.150 (#1593 ) Founder follow-up to t142 cycle: 1. "the dashboard is still not showing the clusters properly" — the D16 fan-out CODE works (3 clusters in k8sCache, dashboard handler fans out) but the OPERATOR-FACING default Layer-1 was 'family' not 'cluster'. Operator opens /dashboard, sees family-grouped bubbles, thinks the multi-cluster fix is broken. Fix: when SovereignFQDN is present (Sovereign Console mode), default to ['cluster', 'application'] so the 3-cluster grouping is the first thing the operator sees. 2. "I have no idea where the admin components for billing, order, revenue etc related BSS are" — exists at marketplace.<sov>/back-office/ but the Sovereign Console sidebar had no link. Fix: add "Marketplace Admin" nav link (external, opens in new tab) — uses resolvedFQDN to construct the URL. data-testid=sov-console-nav-marketplace-admin for matrix. Also bumps chart 1.4.149 → 1.4.150 + bootstrap-kit pin so the changes land on next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 12:37:53 +04:00
github-actions[bot]	68fe94b331	deploy: update catalyst images to `86f5331`	2026-05-17 08:02:06 +00:00
e3mrah	86f5331962	fix(catalyst-api): PR L — AppDetail HelmRelease fallback + chart 1.4.149 (#1592 ) Founder t140 bug #2: "in the catalog and jobs it shows as installed, in the application page it shows as provisioning, there is a sync issue". Root cause: AppDetail reads Application CR via GET /sovereigns/{id}/ applications/{name}. For bootstrap-kit installs (cilium, cert-manager, gateway-api, alloy, etc.) NO Application CR exists — they ship as HelmReleases directly with no wizard step to create the CR. The handler returned 404 → UI showed "App not found" or perpetual "Provisioning", while /apps (which reads HelmRelease) shows "installed". Fix: HandleApplicationGet, on Application CR not-found, falls back to a HelmRelease lookup in h.k8sCache (uses resolveChrootClusterID so it works post-D16 multi-cluster fan-out). Synthesises an applicationDetailResponse from HR fields: - Name/Namespace from HR - Blueprint from spec.chart.spec.chart - Version from spec.chart.spec.version (or status.lastAttemptedRevision) - Phase: Ready (HR Ready=True) / Failed (False) / Provisioning (Unknown) - Conditions: pass-through HR conditions Also bumps chart to 1.4.149 + bootstrap-kit pin so this fix + the queued PRs #1590 (marketplace GET) + #1591 (publish toggle UI) all land on the next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 11:59:59 +04:00
github-actions[bot]	b0c0f91604	deploy: update catalyst images to `df150fd`	2026-05-17 07:57:50 +00:00
e3mrah	df150fdbd8	feat(ui): PR K — per-app catalog publish/unpublish toggle on AppDetail header (#1591 ) Founder caught on t140 bug #4: "I am supposed to mark which applications are going to be available in the catalog … I am not able to see such option from the application page". Fix: PublishToggleChip rendered in the AppDetail hero meta row. - Reads current state on mount from GET /api/catalog/apps/{slug} - Click flips via PUT /api/catalog/admin/apps/{slug}/published - Optimistic update; reverts + tooltip on backend error - data-testid="app-detail-publish-toggle" for matrix coverage Backend already shipped — SetAppPublished handler at the catalog service /catalog/admin/apps/{slug}/published. Gateway routes admin/* with auth-gating so only Sovereign Console operator can flip. No backend change needed. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 11:55:45 +04:00
github-actions[bot]	e1f619aa77	deploy: update catalyst images to `114705c`	2026-05-17 07:51:10 +00:00
e3mrah	114705c63c	fix(marketplace): PR J — GET endpoint + UI reflects actual enabled state (#1590 ) Founder caught on t140 bug #5: /settings/marketplace shows "disabled" while the marketplace is actually serving (prov body had marketplaceEnabled=true). Root cause: MarketplaceSettings UI hardcoded useState(false) on mount because no GET endpoint existed to read the current value. Fix: - Backend: new GET /api/v1/sovereigns/{id}/marketplace returning {deploymentId, sovereignFQDN, enabled, brand}. Reads from the in-memory deployment record (Request.MarketplaceEnabled set at prov time + mutated by HandleSetMarketplace's commit path). - UI: MarketplaceSettings useEffect fetches on mount, sets the toggle to the actual value, hydrates the brand fields. Best-effort fetch — falls back to defaults on failure. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 11:49:03 +04:00
github-actions[bot]	a63f3c13ab	deploy: update catalyst images to `f1ebf14`	2026-05-17 07:06:33 +00:00
e3mrah	f1ebf14cf8	fix(catalyst-api): D30 PR I — mark imported deployment as Adopted on chroot (#1589 ) Founder t140 bug #6: /parent-domains shows only primary, not the sme-pool domains. Chroot's deployment record has parentDomains[] populated but ListParentDomains uses h.activeDeployment() which filters to AdoptedAt!=nil. The mothership ships the record before the chroot's own handover-finalisation, so AdoptedAt is nil → activeDeployment returns nil → only synth primary row renders. Fix: HandleDeploymentImport stamps AdoptedAt at import time. The FQDN-match guard above verifies "this record IS my Sovereign's record" so the chroot is by definition the operator/owner — no separate adoption-wizard needed on chroot side. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 11:04:38 +04:00
github-actions[bot]	473a2ba4b9	deploy: update catalyst images to `52be4d4`	2026-05-17 07:02:25 +00:00
e3mrah	52be4d4d3a	fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias (#1587 ) * fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign Founder caught on t136: console.t136.../app/bp-alloy renders the catalog grid (AppsPage) instead of AppDetail. Three earlier PRs (#1572 + chart bumps) flipped the appRoute beforeLoad logic but the actual route-matching collision was not fixed. Root cause: appRoute.addChildren registers appDeploymentRoute at `/$deploymentId` (effective `/app/$deploymentId`, mother-only) BEFORE consoleLayoutRoute registers consoleAppDetailRoute at `/app/$componentId`. TanStack Router resolves equally-specific dynamic routes by declaration order — so on the Sovereign Console URL `/app/bp-alloy` matches appDeploymentRoute first and renders AppsPage with deploymentId="bp-alloy". Fix: at routeTree build time, filter appRoute children to exclude every mother-only `/$deploymentId/` route when running on Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this is a one-time check, no runtime overhead. With those routes absent, consoleAppDetailRoute is the only matcher for `/app/<componentId>` on Sovereign Console — AppDetail renders. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148 Founder-flagged bug fixes from session t136/t138/t139 verify cycle shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148 (`d985f27c`) with new images: - catalystApi/Ui: `2ab8a0e` (PR #1583 D16 fan-out + retry + auth-bypass, PR #1585 D17 router collision) - smeTag: `964dc15` (PR #1584 D27 catalog fresh-seed Published) But bootstrap-kit/13-bp-catalyst-platform.yaml stayed pinned to 1.4.147 — every fresh provision installs the OLDER chart with the OLDER images, so the founder-flagged bugs persist. Caught on t139 (b4a7ee052d844da0) post-handover verify: chart installed = bp-catalyst-platform@1.4.147, catalog returns 0 published apps, /app/bp-alloy renders catalog grid. Bumping the pin makes fresh provs install 1.4.148 (which has all 3 PRs baked). Refs: feedback_test_theater_3rd_violation_2026_05_17.md feedback_overlap_provs_dont_serialize_wait.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias Founder caught on t140 (29b7e14918178f7e) after D16 fan-out chain shipped: - /dashboard is empty (no treemap rendered) - "none of the k8s resources are streaming" Root cause: after the D16 secondary-kubeconfig export (PR #1579/#1581) landed, chroot's k8sCache went from 1 cluster (primary self-register) to 3 clusters (primary + 2 secondaries). Two cascading bugs: 1. resolveChrootClusterID had a `len(clusters) != 1` guard — it only aliased when chroot had exactly one cluster. After D16 it returned the URL deployment_id unchanged → has-cluster check failed → every chroot handler (networking, k8s_search, k8s_resource_metrics, k8s_exec, dashboard) saw "not found" → returned empty. 2. dashboard.go::GetDashboardTreemap was the one chroot handler that didn't call resolveChrootClusterID before the has-cluster check — so even with #1 fixed, the dashboard would still 404. Fix: - resolveChrootClusterID: when N>1, prefer the cluster whose id is prefixed "sovereign-" (the FactoryFromEnv self-registered primary per buildChrootClusterRef). Falls back to clusters[0] if no match. - GetDashboardTreemap: call resolveChrootClusterID before has-cluster check, matching the pattern in every other chroot handler. Refs: feedback_test_theater_3rd_violation_2026_05_17.md (don't ship D16 fan-out without verifying every handler that depends on single-cluster k8sCache assumption). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 10:59:43 +04:00
github-actions[bot]	b61e9afabf	deploy: update catalyst images to `2ab8a0e`	2026-05-17 05:37:01 +00:00
e3mrah	2ab8a0e653	fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign (#1585 ) * fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign Founder caught on t136: console.t136.../app/bp-alloy renders the catalog grid (AppsPage) instead of AppDetail. Three earlier PRs (#1572 + chart bumps) flipped the appRoute beforeLoad logic but the actual route-matching collision was not fixed. Root cause: appRoute.addChildren registers appDeploymentRoute at `/$deploymentId` (effective `/app/$deploymentId`, mother-only) BEFORE consoleLayoutRoute registers consoleAppDetailRoute at `/app/$componentId`. TanStack Router resolves equally-specific dynamic routes by declaration order — so on the Sovereign Console URL `/app/bp-alloy` matches appDeploymentRoute first and renders AppsPage with deploymentId="bp-alloy". Fix: at routeTree build time, filter appRoute children to exclude every mother-only `/$deploymentId/*` route when running on Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this is a one-time check, no runtime overhead. With those routes absent, consoleAppDetailRoute is the only matcher for `/app/<componentId>` on Sovereign Console — AppDetail renders. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:34:01 +04:00
github-actions[bot]	d985f27c8b	deploy: update sme service images to `964dc15` + bump chart to 1.4.148	2026-05-17 05:29:35 +00:00
github-actions[bot]	f7ea19000e	deploy: update catalyst images to `9fc2850`	2026-05-17 05:28:28 +00:00
e3mrah	9fc2850504	fix(catalyst-api): D16/D17 — 3 bugs caught on t138 fresh prov (#1583 ) * fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:26:16 +04:00
github-actions[bot]	ccbe51e3e4	deploy: update catalyst images to `9237c1e`	2026-05-17 04:48:41 +00:00
e3mrah	9237c1e6ee	fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) (#1582 ) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:45:49 +04:00
e3mrah	ce4ef6ba98	feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581 ) * fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-} slot-file placeholders WITHOUT the $$ escape. tofu's templatefile() parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu expression — failing with "Extra characters after interpolation expression; Template interpolation doesn't expect a colon". Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s. The escape pattern is documented at main.tf:1029 (the same warning that caught t127 last week). $$ prefix tells tofu's templatefile to emit literal \${...} to cloud-init for Flux envsubst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) When an sme-pool domain's current NS records already match the expected [ns1.<primary>, ns2.<primary>] pair (because the operator already delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip step is a no-op. Skipping avoids: 1. Burning a Dynadot API credit on a flip that would be idempotent. 2. The D30 blocker — current Dynadot creds return pdm-status-401 even when the desired NS state already exists. Caught on t132 2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body parentDomains attempt. Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with a 5s timeout. False on lookup error or partial match → fall through to the original PDM pipeline so a misconfigured/partial domain still goes through the registrar API. This unblocks sme-pool entries for omani.homes (already pointing at ns1/2/3.openova.io). omani.rest / omani.trades still go through the full flip path because their NS records don't yet match expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses catalyst-system namespace PR #1564 created the owner UserAccess CR with .Namespace("") — the apiserver returned "could not find the requested resource" because useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per the XRD's claimNames block at platform/crossplane-claims/chart/ templates/xrds/useraccess.yaml). Pin to catalyst-system (where catalyst-api + every Catalyst-authored CR lives) and stamp the namespace on the object too. The existing ListUserAccess handler uses Namespace("") so the entry surfaces on /users without per-namespace filtering. Verified the CRD shape on t134 2026-05-17: $ kubectl api-resources --api-group=access.openova.io useraccesses access.openova.io/v1alpha1 true UserAccess ^^^^ NAMESPACED Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses tierRoleRef not wildcard app PR #1564 + #1577 created the CR shape with applications=[{app:"",...}] but the useraccess XRD schema rejects `app: ""` (pattern ^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged "spec.applications[0].app: Invalid value: \"\"" on every handover. The XRD has a `tierRoleRef` field (pattern ^openova:tier-(viewer\|developer\|operator\|admin\|owner)$) that's the canonical owner-tier semantic — when set, useraccess-controller binds the named ClusterRole on the target via RoleBinding/ClusterRoleBinding. `openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's tier-clusterroles.yaml. Drop the applications[] block + use tierRoleRef = openova:tier-owner. Verified live on t135 2026-05-17 — error log showed exact pattern mismatch before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to have all 3 regions' kubeconfigs registered so dashboard handler's per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each. Today the chroot only auto-registers its own in-cluster apiserver via FactoryFromEnv's chroot self-registration branch. Secondary kubeconfigs live on the mothership PVC + aren't replicated. This handler bridges the gap: - Accepts JSON {deploymentId, regionKey, kubeconfigYaml} - Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in depth — filename composed from these) - Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml (canonical FactoryFromEnv path so restart re-registers) - Calls k8sCache.AddCluster — idempotent per Factory contract PR B (next): mothership-side handover hook iterates secondary regions and POSTs each kubeconfig to the chroot. PR C (next): dashboard.go fan-out across all registered cluster IDs when group_by includes cluster/region. Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a logged struct + are written 0o600. Memo: feedback_d16_dashboard_multi_cluster_fan_out.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): multi-cluster fan-out when group_by=cluster\|region (D16 PR C) When group_by includes "cluster" or "region", enumerate ALL registered k8sCache clusters (primary + secondaries synced via PR #1579's POST /api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate podRows from each before aggregation. Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region Sovereign (was 1 bubble before). For group_by that ONLY contains {namespace,family,application,vcluster, sovereign} the primary clusterID's pods are sufficient and faster — no fan-out cost. PR B (mothership-side handover hook to POST each secondary kubeconfig) will complete the chain. Until then, secondaries don't appear in k8sCache.Clusters() so this fan-out is a no-op on existing provs — but the code is in place for when PR B lands. Memo: feedback_d16_dashboard_multi_cluster_fan_out.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) Closes the D16 multi-cluster fan-out chain: - PR #1579 (PR A): chroot endpoint accepts kubeconfigs - PR #1580 (PR C): dashboard handler fans out across registered clusters - This PR (PR B): mothership-side hook iterates secondary regions at handover, reads each region's kubeconfig from the mothership PVC, and POSTs to the chroot's endpoint After handover-fire, exportSecondaryKubeconfigsToChild fires as a goroutine (alongside exportDeploymentToChild). Best-effort per region: a failure on region N doesn't abort N+1. The chroot's k8sCache.Factory.AddCluster runs on every POST so dashboard /api/v1/dashboard/treemap?group_by=cluster\|region now enumerates pods from all N regions and Layer-1=Cluster renders N bubbles on an N-region Sovereign. regionKeysForExport derives the filename convention `<region>-<slot>` from dep.Request.Regions[1:] (primary is auto-registered by the chroot's FactoryFromEnv self-registration so we skip index 0). Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a logged struct + are read with stdlib os.ReadFile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:22:01 +04:00
github-actions[bot]	b07e5206a1	deploy: update catalyst images to `d92f734`	2026-05-17 04:09:34 +00:00
e3mrah	d92f734374	feat(dashboard): multi-cluster fan-out when group_by=cluster\|region (D16 PR C) (#1580 ) * fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-} slot-file placeholders WITHOUT the $$ escape. tofu's templatefile() parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu expression — failing with "Extra characters after interpolation expression; Template interpolation doesn't expect a colon". Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s. The escape pattern is documented at main.tf:1029 (the same warning that caught t127 last week). $$ prefix tells tofu's templatefile to emit literal \${...} to cloud-init for Flux envsubst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) When an sme-pool domain's current NS records already match the expected [ns1.<primary>, ns2.<primary>] pair (because the operator already delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip step is a no-op. Skipping avoids: 1. Burning a Dynadot API credit on a flip that would be idempotent. 2. The D30 blocker — current Dynadot creds return pdm-status-401 even when the desired NS state already exists. Caught on t132 2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body parentDomains attempt. Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with a 5s timeout. False on lookup error or partial match → fall through to the original PDM pipeline so a misconfigured/partial domain still goes through the registrar API. This unblocks sme-pool entries for omani.homes (already pointing at ns1/2/3.openova.io). omani.rest / omani.trades still go through the full flip path because their NS records don't yet match expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses catalyst-system namespace PR #1564 created the owner UserAccess CR with .Namespace("") — the apiserver returned "could not find the requested resource" because useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per the XRD's claimNames block at platform/crossplane-claims/chart/ templates/xrds/useraccess.yaml). Pin to catalyst-system (where catalyst-api + every Catalyst-authored CR lives) and stamp the namespace on the object too. The existing ListUserAccess handler uses Namespace("") so the entry surfaces on /users without per-namespace filtering. Verified the CRD shape on t134 2026-05-17: $ kubectl api-resources --api-group=access.openova.io useraccesses access.openova.io/v1alpha1 true UserAccess ^^^^ NAMESPACED Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses tierRoleRef not wildcard app PR #1564 + #1577 created the CR shape with applications=[{app:"",...}] but the useraccess XRD schema rejects `app: ""` (pattern ^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged "spec.applications[0].app: Invalid value: \"\"" on every handover. The XRD has a `tierRoleRef` field (pattern ^openova:tier-(viewer\|developer\|operator\|admin\|owner)$) that's the canonical owner-tier semantic — when set, useraccess-controller binds the named ClusterRole on the target via RoleBinding/ClusterRoleBinding. `openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's tier-clusterroles.yaml. Drop the applications[] block + use tierRoleRef = openova:tier-owner. Verified live on t135 2026-05-17 — error log showed exact pattern mismatch before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to have all 3 regions' kubeconfigs registered so dashboard handler's per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each. Today the chroot only auto-registers its own in-cluster apiserver via FactoryFromEnv's chroot self-registration branch. Secondary kubeconfigs live on the mothership PVC + aren't replicated. This handler bridges the gap: - Accepts JSON {deploymentId, regionKey, kubeconfigYaml} - Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in depth — filename composed from these) - Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml (canonical FactoryFromEnv path so restart re-registers) - Calls k8sCache.AddCluster — idempotent per Factory contract PR B (next): mothership-side handover hook iterates secondary regions and POSTs each kubeconfig to the chroot. PR C (next): dashboard.go fan-out across all registered cluster IDs when group_by includes cluster/region. Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a logged struct + are written 0o600. Memo: feedback_d16_dashboard_multi_cluster_fan_out.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): multi-cluster fan-out when group_by=cluster\|region (D16 PR C) When group_by includes "cluster" or "region", enumerate ALL registered k8sCache clusters (primary + secondaries synced via PR #1579's POST /api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate podRows from each before aggregation. Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region Sovereign (was 1 bubble before). For group_by that ONLY contains {namespace,family,application,vcluster, sovereign} the primary clusterID's pods are sufficient and faster — no fan-out cost. PR B (mothership-side handover hook to POST each secondary kubeconfig) will complete the chain. Until then, secondaries don't appear in k8sCache.Clusters() so this fan-out is a no-op on existing provs — but the code is in place for when PR B lands. Memo: feedback_d16_dashboard_multi_cluster_fan_out.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:07:26 +04:00
e3mrah	bcab6430cb	feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) (#1579 ) * fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-} slot-file placeholders WITHOUT the $$ escape. tofu's templatefile() parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu expression — failing with "Extra characters after interpolation expression; Template interpolation doesn't expect a colon". Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s. The escape pattern is documented at main.tf:1029 (the same warning that caught t127 last week). $$ prefix tells tofu's templatefile to emit literal \${...} to cloud-init for Flux envsubst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) When an sme-pool domain's current NS records already match the expected [ns1.<primary>, ns2.<primary>] pair (because the operator already delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip step is a no-op. Skipping avoids: 1. Burning a Dynadot API credit on a flip that would be idempotent. 2. The D30 blocker — current Dynadot creds return pdm-status-401 even when the desired NS state already exists. Caught on t132 2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body parentDomains attempt. Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with a 5s timeout. False on lookup error or partial match → fall through to the original PDM pipeline so a misconfigured/partial domain still goes through the registrar API. This unblocks sme-pool entries for omani.homes (already pointing at ns1/2/3.openova.io). omani.rest / omani.trades still go through the full flip path because their NS records don't yet match expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses catalyst-system namespace PR #1564 created the owner UserAccess CR with .Namespace("") — the apiserver returned "could not find the requested resource" because useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per the XRD's claimNames block at platform/crossplane-claims/chart/ templates/xrds/useraccess.yaml). Pin to catalyst-system (where catalyst-api + every Catalyst-authored CR lives) and stamp the namespace on the object too. The existing ListUserAccess handler uses Namespace("") so the entry surfaces on /users without per-namespace filtering. Verified the CRD shape on t134 2026-05-17: $ kubectl api-resources --api-group=access.openova.io useraccesses access.openova.io/v1alpha1 true UserAccess ^^^^ NAMESPACED Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses tierRoleRef not wildcard app PR #1564 + #1577 created the CR shape with applications=[{app:"",...}] but the useraccess XRD schema rejects `app: ""` (pattern ^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged "spec.applications[0].app: Invalid value: \"\"" on every handover. The XRD has a `tierRoleRef` field (pattern ^openova:tier-(viewer\|developer\|operator\|admin\|owner)$) that's the canonical owner-tier semantic — when set, useraccess-controller binds the named ClusterRole on the target via RoleBinding/ClusterRoleBinding. `openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's tier-clusterroles.yaml. Drop the applications[] block + use tierRoleRef = openova:tier-owner. Verified live on t135 2026-05-17 — error log showed exact pattern mismatch before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to have all 3 regions' kubeconfigs registered so dashboard handler's per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each. Today the chroot only auto-registers its own in-cluster apiserver via FactoryFromEnv's chroot self-registration branch. Secondary kubeconfigs live on the mothership PVC + aren't replicated. This handler bridges the gap: - Accepts JSON {deploymentId, regionKey, kubeconfigYaml} - Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in depth — filename composed from these) - Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml (canonical FactoryFromEnv path so restart re-registers) - Calls k8sCache.AddCluster — idempotent per Factory contract PR B (next): mothership-side handover hook iterates secondary regions and POSTs each kubeconfig to the chroot. PR C (next): dashboard.go fan-out across all registered cluster IDs when group_by includes cluster/region. Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a logged struct + are written 0o600. Memo: feedback_d16_dashboard_multi_cluster_fan_out.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:06:08 +04:00
github-actions[bot]	6e329e27ae	deploy: update catalyst images to `4f62dd2`	2026-05-17 00:10:50 +00:00
e3mrah	4f62dd21b3	fix(handover): D21 owner seed uses tierRoleRef not wildcard app (#1578 ) * fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-} slot-file placeholders WITHOUT the $$ escape. tofu's templatefile() parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu expression — failing with "Extra characters after interpolation expression; Template interpolation doesn't expect a colon". Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s. The escape pattern is documented at main.tf:1029 (the same warning that caught t127 last week). $$ prefix tells tofu's templatefile to emit literal \${...} to cloud-init for Flux envsubst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) When an sme-pool domain's current NS records already match the expected [ns1.<primary>, ns2.<primary>] pair (because the operator already delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip step is a no-op. Skipping avoids: 1. Burning a Dynadot API credit on a flip that would be idempotent. 2. The D30 blocker — current Dynadot creds return pdm-status-401 even when the desired NS state already exists. Caught on t132 2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body parentDomains attempt. Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with a 5s timeout. False on lookup error or partial match → fall through to the original PDM pipeline so a misconfigured/partial domain still goes through the registrar API. This unblocks sme-pool entries for omani.homes (already pointing at ns1/2/3.openova.io). omani.rest / omani.trades still go through the full flip path because their NS records don't yet match expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses catalyst-system namespace PR #1564 created the owner UserAccess CR with .Namespace("") — the apiserver returned "could not find the requested resource" because useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per the XRD's claimNames block at platform/crossplane-claims/chart/ templates/xrds/useraccess.yaml). Pin to catalyst-system (where catalyst-api + every Catalyst-authored CR lives) and stamp the namespace on the object too. The existing ListUserAccess handler uses Namespace("") so the entry surfaces on /users without per-namespace filtering. Verified the CRD shape on t134 2026-05-17: $ kubectl api-resources --api-group=access.openova.io useraccesses access.openova.io/v1alpha1 true UserAccess ^^^^ NAMESPACED Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses tierRoleRef not wildcard app PR #1564 + #1577 created the CR shape with applications=[{app:"",...}] but the useraccess XRD schema rejects `app: ""` (pattern ^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged "spec.applications[0].app: Invalid value: \"*\"" on every handover. The XRD has a `tierRoleRef` field (pattern ^openova:tier-(viewer\|developer\|operator\|admin\|owner)$) that's the canonical owner-tier semantic — when set, useraccess-controller binds the named ClusterRole on the target via RoleBinding/ClusterRoleBinding. `openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's tier-clusterroles.yaml. Drop the applications[] block + use tierRoleRef = openova:tier-owner. Verified live on t135 2026-05-17 — error log showed exact pattern mismatch before this fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 04:08:45 +04:00
github-actions[bot]	6466f97f6c	deploy: update catalyst images to `ea30ded`	2026-05-16 23:28:04 +00:00
e3mrah	ea30ded120	fix(handover): D21 owner seed uses catalyst-system namespace (#1577 ) * fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22) PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-} slot-file placeholders WITHOUT the $$ escape. tofu's templatefile() parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu expression — failing with "Extra characters after interpolation expression; Template interpolation doesn't expect a colon". Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s. The escape pattern is documented at main.tf:1029 (the same warning that caught t127 last week). $$ prefix tells tofu's templatefile to emit literal \${...} to cloud-init for Flux envsubst. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30) When an sme-pool domain's current NS records already match the expected [ns1.<primary>, ns2.<primary>] pair (because the operator already delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip step is a no-op. Skipping avoids: 1. Burning a Dynadot API credit on a flip that would be idempotent. 2. The D30 blocker — current Dynadot creds return pdm-status-401 even when the desired NS state already exists. Caught on t132 2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body parentDomains attempt. Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with a 5s timeout. False on lookup error or partial match → fall through to the original PDM pipeline so a misconfigured/partial domain still goes through the registrar API. This unblocks sme-pool entries for omani.homes (already pointing at ns1/2/3.openova.io). omani.rest / omani.trades still go through the full flip path because their NS records don't yet match expected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(handover): D21 owner seed uses catalyst-system namespace PR #1564 created the owner UserAccess CR with .Namespace("") — the apiserver returned "could not find the requested resource" because useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per the XRD's claimNames block at platform/crossplane-claims/chart/ templates/xrds/useraccess.yaml). Pin to catalyst-system (where catalyst-api + every Catalyst-authored CR lives) and stamp the namespace on the object too. The existing ListUserAccess handler uses Namespace("") so the entry surfaces on /users without per-namespace filtering. Verified the CRD shape on t134 2026-05-17: $ kubectl api-resources --api-group=access.openova.io useraccesses access.openova.io/v1alpha1 true UserAccess ^^^^ NAMESPACED Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 03:26:06 +04:00
github-actions[bot]	18b5fa1466	deploy: update catalyst images to `33ed484`	2026-05-16 23:24:34 +00:00

1 2 3 4 5 ...

1482 Commits