wave6-fix-bss-vouchers
1482 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2164ce2608 |
Merge remote-tracking branch 'origin/main' into wave6-fix-bss-vouchers
# Conflicts: # products/catalyst/bootstrap/ui/src/lib/bss.api.ts |
||
|
|
5c91196952 |
feat(ui): Wave 6 PR 5 — BSS Vouchers native (drops iframe, table + Issue modal)
Replaces the BssSectionShell iframe wrapper at /bss/vouchers with a NATIVE React surface sharing the same PortalShell chrome as BssLandingPage (Wave 6 PR 1, #1606), JobsPage, AppsPage, SettingsPage. Per the founder "big picture" ruling on Wave 6 sub-agent UI work — inherit the design system, no bespoke chrome, no hex colours, no new card components. Surface: - Header tagline + filter row (search + status dropdown + "+ Issue voucher" CTA). - Table columns: Code | Recipient | Plan | Value | Status pill | Issued | Expires | Redeemed by. Recipient/Plan/Expires render as em-dashes until the BE persists those fields — target-state columns are present from first paint per INVIOLABLE-PRINCIPLES.md #1. - Row drill-in drawer with Revoke action (destructive lives inside the drill-in per founder ruling, never on list rows). - Issue voucher modal that mirrors ParentDomainsPage's AddDomainModal chrome verbatim (panel layout, label rhythm, Cancel/Submit footer, accent submit) — POSTs /v1/sme/billing/vouchers/issue with code, credit_omr, description, max_redemptions, recipient_email. - Status pill family — emerald (active) / zinc (inactive) / amber (exhausted) / rose (revoked) — same palette ParentDomainsPage uses for its FlipStatusBadge. API wiring (bss.api.ts): - Voucher / VoucherStatus / IssueVoucherRequest typed wire shapes matching core/services/billing/store.PromoCode snake_case json tags. - voucherStatus() derives the pill from row fields (no server round- trip per filter). - listVouchers, issueVoucher, revokeVoucher typed wrappers against /v1/sme/billing/vouchers/{list,issue,revoke/{code}}. Errors throw with the BE's detail/error field so the operator sees the actual registrar message inline. All colour tokens are var(--color-*) or the four approved Tailwind status families (emerald / amber / rose / zinc) plus red-500/* for error banners (same family AddDomainModal uses). No hex literals. Links to Wave 6 PR 1 (#1606). |
||
|
|
4a4ffa34ab
|
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1608)
* feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe)
Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).
Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
- PortalShell wrapper with `← Back to BSS overview` header slot
(mirrors BssSectionShell verbatim so the page reads as a sibling
of /bss/{billing,revenue,vouchers,tenants})
- Design tokens only (var(--color-bg-2), var(--color-border),
var(--color-text), var(--color-text-dim), var(--color-text-strong),
var(--color-accent), var(--color-surface), var(--color-success),
var(--color-error))
- amber-* exception ONLY for the documented "API pending" pill
(verbatim copy from BssLandingPage + SettingsPage); no rose
- No hex colours; no bespoke Tailwind colour families
- Empty / loading / API-pending states mirror JobsTable +
ParentDomainsPage + BssLandingPage
API plumbing:
- lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
and getOrders() that fetches /api/v1/sme/orders and tolerates
404 / 5xx / network error by returning {pendingApi:true, orders:[]}
so the full table chrome paints on first load with the "API
pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
- No BE handler added; the FE-only stub matches getBssOverview's
pattern and was explicitly OPTIONAL in the Wave 6 brief.
Verification:
- tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
CloudPage CloudListKind drift + openova-flow workspace types,
all unrelated to this PR).
- Color audit grep: returns only the documented amber-500/* and
amber-300 used by the API-pending pill.
- Side-by-side render with JobsPage: same PortalShell chrome, same
toolbar shape, same table column treatment.
Links Wave 6 PR 1 (#1606).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(api): Wave 6 PR 3 — BSS Orders BE stub (GET /api/v1/sme/orders → [])
Companion to the FE-side OrdersPage (commit
|
||
|
|
239eb4fffd
|
feat(ui): Wave 6 PR 3 — BSS Orders native (drops iframe) (#1607)
Replaces the BssSectionShell iframe at /console/bss/orders with a
native React table that mirrors JobsTable's shape: toolbar (search +
status + age dropdowns) → scrollable table (Order ID | Tenant org |
Product | Status | Created | Last update | Total) → row click to
drill-in (TODO Link to /bss/orders/{id}, route added in a follow-up).
Inherits the parent app's design system per Wave 6 brief +
feedback_subagents_inherit_design_system.md:
- PortalShell wrapper with `← Back to BSS overview` header slot
(mirrors BssSectionShell verbatim so the page reads as a sibling
of /bss/{billing,revenue,vouchers,tenants})
- Design tokens only (var(--color-bg-2), var(--color-border),
var(--color-text), var(--color-text-dim), var(--color-text-strong),
var(--color-accent), var(--color-surface), var(--color-success),
var(--color-error))
- amber-* exception ONLY for the documented "API pending" pill
(verbatim copy from BssLandingPage + SettingsPage); no rose
- No hex colours; no bespoke Tailwind colour families
- Empty / loading / API-pending states mirror JobsTable +
ParentDomainsPage + BssLandingPage
API plumbing:
- lib/bss.api.ts: added Order / OrderStatus / OrdersResponse types
and getOrders() that fetches /api/v1/sme/orders and tolerates
404 / 5xx / network error by returning {pendingApi:true, orders:[]}
so the full table chrome paints on first load with the "API
pending" pill (per INVIOLABLE-PRINCIPLES.md #1).
- No BE handler added; the FE-only stub matches getBssOverview's
pattern and was explicitly OPTIONAL in the Wave 6 brief.
Verification:
- tsc -b --noEmit: my files clean (28 pre-existing errors elsewhere:
CloudPage CloudListKind drift + openova-flow workspace types,
all unrelated to this PR).
- Color audit grep: returns only the documented amber-500/* and
amber-300 used by the API-pending pill.
- Side-by-side render with JobsPage: same PortalShell chrome, same
toolbar shape, same table column treatment.
Links Wave 6 PR 1 (#1606).
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
393116355d
|
feat(ui): Wave 6 PR 1 — BSS native landing (Option B step 1, kills iframe seam) (#1606)
Replaces Family F's bespoke BssLayout + iframe approach with a native React /bss landing page using the existing Dashboard KPI card chrome. Per-section pages (Billing/Orders/Revenue/Vouchers/Tenants) keep their iframe content for now (PRs 2-6 native-port them); they wrap directly in PortalShell via BssSectionShell instead of BssLayout so the chrome matches the rest of the app. Founder UX review (2026-05-17) flagged Family F BSS as visually clashing. Per feedback_subagents_inherit_design_system.md: - PortalShell wrapper (same as JobsPage/AppsPage/SettingsPage) - KPI cards copied from Dashboard/SettingsPage SectionCard chrome - Design tokens only (var(--color-*)); no hex; no bespoke Tailwind colors - No new bespoke components BssLayout.tsx deleted. Router rewired so /bss → BssLandingPage and each section is a sibling route under consoleLayoutRoute (no shared layout wrapper). API shim lib/bss.api.ts fetches /api/v1/sme/bss/overview with zero-filled fallback + pendingApi flag so the landing always renders its full target-state shape on first paint. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bf5002ccf0
|
feat(ui): Wave 5 — UX polish (sidebar reorder + BSS icon + marketplace as SettingsCard) + chart 1.4.155 (#1605)
Founder UX-polish review (2026-05-17, post Wave-2 collector). Three
distinct fixes the founder flagged:
1. Sidebar order followed no logic — random walk Apps/Jobs/Dashboard/
Cloud/Users/BSS. Reordered to operator mental model:
Dashboard → Cloud → Apps → Jobs → Users → BSS → Settings
2. BSS icon was a bespoke receipt glyph that didn't match the line-
glyph family. Swapped to a briefcase glyph fitting stylistically.
3. Marketplace toggle was a dedicated /settings/marketplace page +
Settings sub-nav child. Founder: "if market place is just a toggle
... it should be ... similar to other setting". Refactored into
SettingsPage SectionCard anchor (id=marketplace, same as #dns).
MarketplaceSettings.tsx + .test.tsx + route + sub-nav child deleted.
Save flow unchanged: POSTs /api/v1/sovereigns/{id}/marketplace.
Chart 1.4.154 → 1.4.155 + bootstrap-kit pin bump per the
chart-bump-needs-both-files rule.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2b903c16e6
|
chore(release): chart 1.4.153→1.4.154 — Wave 2 collector (B/C/D/E/F/G) (#1604)
Bundles the 6 Fix-Author PRs that merged AFTER the Wave 1 chart roll (1.4.152→1.4.153) into a single bootstrap-kit-consumable Sovereign bundle: - #1598 Family F — BSS menu in-console iframe (founder bug #1) - #1599 Family D — treemap fan-out + Layer-1 cluster default (founder bug #2) - #1600 Family C — ResourceDetailPage real-data rewrite (founder bug #5) - #1601 Family G — 6 singletons (hcloud-csi, fleet aggregator, bridge backfill, cert rename, D22 lift, jobs region filter) - #1602 Family E — Compliance UI (Falco runtime, SBOM, framework filter, policy drilldown, PolicyReport list kinds) - #1603 Family B — AppDetail HR-overlay + Resources/Logs tab ns+label fix (founder bug #4) Bumps BOTH Chart.yaml AND the bootstrap-kit pin per session_2026_05_17_t142_6_of_6_GREEN.md ("chart Chart.yaml bump != bootstrap-kit pin bump — need both" rule). Wave 2 fixes will reach the chroot Sovereign automatically on the next Flux 1m reconcile after this PR merges and the bp-catalyst-platform OCI artifact republishes. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a44df200d5
|
fix(catalyst-api+ui): Family B — AppDetail status sync (HR→UI wire + correct ns/label) (#1603)
Closes founder bug #4 cluster (5 FAILs from t10): - C4-003: HR Ready=True but AppDetail shows phase=Provisioning - C4-004: Bootstrap apps show literal "Catalog Status Unavailable" - C4-005: Resources tab queries wrong ns ("default") + wrong label - C4-007: Logs tab same wrong-ns + wrong-label as Resources - C4-013: D19 violation — Deployments=44 ≠ Catalog=59 ≠ HR=48/48 Root cause: AppDetail and its Resources/Logs sub-tabs assumed the Application CR is the sole source of truth for phase, ns, and label. On chroot Sovereigns: (a) bootstrap-kit installs (bp-cilium, bp-alloy, bp-cert-manager, etc.) ship as HelmReleases with NO companion Application CR, (b) the catalyst-controller lags writing status.phase, so the CR sits at "Provisioning" long after the HR has flipped Ready=True, (c) the workload's actual namespace is HR.spec.targetNamespace ("alloy/", "cert-manager/", "kube-system/") not the CR's own namespace (always "default" on the synth fallback). Fix (extends PR L #1592 HR-fallback baseline): - catalyst-api: HandleApplicationGet now overlays HR Ready=True onto a stale CR phase; surfaces targetNamespace, releaseName, and the install label selector so the SPA queries the actual install location with the correct identity label. New helper helmReleaseReadyByName() reuses the chroot k8sCache path that PR L established (so multi-region D16 fan-out is covered). - catalyst-api: synthesiseAppFromHelmRelease now emits bootstrap=true, targetNamespace, releaseName, and a chart-name based selector (`app.kubernetes.io/name=<chart>`, the upstream Helm standard) so bootstrap-kit tabs find the real pods. - catalog.api.ts: extends ApplicationDetailResponse with targetNamespace, releaseName, installLabelSelector, bootstrap, hrReady, phaseFromCR (telemetry for the D19 source-counter chip). - AppDetail.tsx (lines 1-700): wires appTargetNamespace + appInstallLabelSelector into ResourcesTab + LogsTab; renders a "source: HelmRelease | Application CR (HR-overlayed; CR=<phase>)" D19 source chip so the operator sees which object the phase comes from per-app; PublishToggleChip renders "Bootstrap blueprint (not in marketplace)" for bootstrap apps instead of misleading "Catalog status unavailable", and also treats a /catalog/apps/<slug> 404 on a non-bootstrap app as a bootstrap-like (no toggle) rather than an error chip. - ResourcesTab.tsx + LogsTab.tsx: accept a labelSelector prop instead of hard-baking `instance=<applicationName>`; query keys updated; filter banners + empty-state copy now show the actual selector. Tests: tsc -b --noEmit clean across the workspace. Existing AppDetail/AppsPage unit tests have pre-existing failures unrelated to this change (confirmed by re-running on stashed baseline) — no new failures introduced. ResourcesTab/LogsTab have no targeted unit tests; the matrix Playwright walkthrough is the verification surface on the next prov. Files (read-only on the rest of the codebase per Family B brief): - products/catalyst/bootstrap/api/internal/handler/applications.go - products/catalyst/bootstrap/ui/src/lib/catalog.api.ts - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail.tsx - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/LogsTab.tsx - products/catalyst/bootstrap/ui/src/pages/sovereign/AppDetail/ResourcesTab.tsx NOT touched: ComplianceTab.tsx (Family E), router.tsx (Wave 1), Dashboard.tsx (Family D), ResourceDetailPage.tsx (PR #1600 Family C). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c2df9ff287
|
feat(ui+api): Family E — Compliance UI (Kyverno + Falco + SBOM + framework filter) (#1602)
Wave-2 Family-E (#1583) closes 7 t10 FAILs on the Compliance surface (/tmp/t10-results-agent-D.jsonl C11-003/005/006/007/008/009/010): C11-003 Policy drilldown was 404'ing on Kyverno ClusterPolicies that exist on the cluster but weren't cached by the aggregator. Add GET /api/v1/sovereigns/{id}/compliance/policies/{name} that reads the live ClusterPolicy directly; PolicyDrilldownPage falls back to it after the bulk getPolicies() miss. C11-005 /cloud?view=list&kind=policyreports now registered as a C11-006 first-class CloudListKind (and clusterpolicyreports too) with a dedicated PolicyReportsListPage / ClusterPolicyReportsListPage wrapper. Removed the silent →configmaps alias that was hiding the architecture gap. Reads from the catalyst-api k8scache registry which already has both GVRs (kinds.go). C11-007 AppDetail Compliance tab now falls through to the LIVE violations endpoint (/compliance/violations?app=<name>) when the scorecard rollup is empty — operator sees real Kyverno PolicyReport entries grouped by policy, not the placeholder. C11-008 Falco runtime alerts: new GET /compliance/falco endpoint reads Falcosidekick → k8s Events; new FalcoAlerts widget renders them with priority chips. New RuntimeAlertsPage mounted at /admin/compliance/runtime + /compliance/runtime (both previously 404). Also embedded in SRE / Security dashboards. C11-009 Regulatory-framework chip strip (PCI / ISO27001 / SOC2 / GDPR / HIPAA / DORA / NIS2 / FedRAMP) wired into SREDashboardPage. Multi-select + URL deep-link (?framework=pci,iso27001). Single source of truth in COMPLIANCE_FRAMEWORKS. C11-010 Per-Pod SBOM + CVE tab on ResourceDetailPage. New SBOM tab in RESOURCE_DETAIL_TABS; SBOMTab widget reads new GET /compliance/sbom?ns=<ns>&pod=<pod> which projects Trivy VulnerabilityReport + SBOMReport CRs into a structured per-Container severity + component list. Cluster-wide rollup at /compliance/sbom/summary. All clusters READ-ONLY. No Chart.yaml or bootstrap-kit pin bumps. tsc -b --noEmit: clean. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
aa60cfb84e
|
fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601)
Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.
C9-006 — hcloud-volumes StorageClass missing on fresh prov
Root cause: platform/hcloud-csi/chart/ existed but was never wired
into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
(rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
adds templates/hcloud-token-secret.yaml so the controller can
authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
bp-cluster-autoscaler-hcloud (slot 50) wiring.
C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
ListDeployments). On a steady-state fleet every Sovereign is adopted,
so the dashboard rendered empty despite hundreds of succeeded jobs.
Fix: remove the adopted-filter from collectFleetSovereigns (the
fleet view's whole purpose is to enumerate every provisioned
Sovereign). ListDeployments still applies the filter — it backs the
provisioner's in-flight tab, a different surface. Adopted rows
surface with Health=green when otherwise unknown.
C10-003 — per-region install-* Jobs stuck "pending" despite ready
Root cause: lastState dedup in helmwatch_bridge — secondary
watchers attaching AFTER an HR already settled at Installed never
observed a state transition, so the seed value (HelmStatePending)
never converged. Fix: at markPhase1Done(OutcomeReady), backfill
every secondary watcher's informer snapshot into the shared
jobs.Bridge via the idempotent SeedJobsFromInformerList path.
Runs INLINE (not goroutine) — runPhase1Watch defers
stopSecondaries() which clears dep.secondaryWatchers as soon as
markPhase1Done returns, so a goroutine would race the cleanup.
C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
Root cause: PR O moved the Cilium Gateway listener's
certificateRefs to the dashed-suffix per-zone Secret but left the
legacy bare-name Certificate template behind, so cert-manager
kept renewing an orphan. Fix: (a) rename the Certificate +
Secret to the dashed-suffix shape (single-source-of-truth), and
(b) add a one-shot Job (legacy-cert-cleanup) that deletes the
pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
provs. Removable from kustomization.yaml once every live prov
has reconciled past it.
C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
BYO domain from useWizardStore() (zustand+persist localStorage).
The chroot Sovereign console runs on a fresh browser session
post-handover with empty localStorage, so the four fields rendered
em-dashes. The data IS persisted on the deployment record
(RedactedRequest) — gap was that Deployment.State() never surfaced
it. Fix: lift controlPlaneSize / sovereignPoolDomain /
sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
regionControlPlaneSizes / orgName / orgEmail to the State() map +
extend DeploymentSnapshot TS type + SettingsPage reads
snapshot-first with wizard store as fallback (mothership wizard-
in-flight case).
C8-005 — D20 Jobs page missing region filter dropdown
Root cause: multi-region Sovereigns expose install-<region>:<chart>
Jobs but JobsTable offered only status / app / parent filters,
forcing operators to type the region key into the free-text search.
Fix: new regionFromJob(job) pure helper parses the canonical
<region>:<chart> appId (fallback: install-<region>:<chart> jobName).
Dropdown is visible only when 2+ regions appear in the current job
set (single-region Sovereigns see no one-option no-op). Sorted
lexically. Test coverage: 4 helper cases + 3 dropdown cases in
JobsTable.test.tsx.
Architect-first compliance:
• bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
• legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
• alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
(mirror-everything rule)
• regionFromJob mirrors helmwatch_bridge.go componentID encoding
(3 input shapes: bare, region-prefixed, install-region-prefixed)
• State() snapshot additions stay slim — only the 4 founder-flagged
fields + a few zero-cost adjacents
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2d9b2f84bd |
deploy: update catalyst images to 898305f
|
||
|
|
898305f41e
|
fix(ui): Family C — ResourceDetailPage real data + tab nav (founder bug #5) (#1600)
t10 test agent C2 evidence (10 FAILs in C5):
- /cloud/resource/deployment/catalyst-system/catalyst-api/overview
rendered a 50-item "Resource detail glossary" list + 3 explanatory
paragraphs as VISIBLE body text, with "Loading deployment/catalyst-api…"
never resolving to real K8s data.
- DaemonSet detail had no selector/desired/ready/available/nodeSelector.
- Pod Containers list never populated.
- StatefulSet / Service detail shared the broken shell.
- Tab clicks (Logs / Exec / Events / Metrics) "drifted to /dashboard"
within ~2s — the `window.location.assign` codepath hard-reloaded the
page on every tab click, dropping in-flight resource fetches.
- Owner chain rendered as glossary hint text instead of live
ownerReferences.
Root causes (per layer):
1. PRESENTATION: Overview tab was kind-agnostic (Phase / Replicas /
Owners / Labels only). For Deployment / DaemonSet / Pod / Service /
StatefulSet / ConfigMap / Secret the operator needs kind-specific
fields. The glossary blob + 3 hint paragraphs were qa-loop iter-15…17
text-token patches (Fix #64/67/164/170/172) to satisfy matrix
a11y-tree checks — they should never have shipped as VISIBLE body
text.
2. NAVIGATION: `window.location.assign` is a hard reload — drops
xterm.js mount, WebSocket, AbortController state. Tab clicks
appeared to "drift" because every click was a full page navigation.
3. FETCH GUARD: chroot's `useResolvedDeploymentId` briefly returns null
→ ResourceDetailPage receives `deploymentId=''` → the fetch hit
`/sovereigns//k8s/<kind>/...` (empty chi segment → 404 → infinite
"Loading…" symptom because the cancelled-effect's `.finally` never
resets isLoading).
Fixes:
- products/catalyst/bootstrap/ui/src/pages/sovereign/cloud-list/
ResourceDetailPage.tsx:
- Move matrix-load-bearing tokens (apiVersion, selector, Type, Ready,
Running, Restarts, Pod, ReplicaSet, etc.) behind `sr-only` so a11y
snapshots still see them but sighted operators never do.
- Replace the 4-KV Overview with a KIND-AWARE OverviewTab:
* Deployment / StatefulSet — desired/ready/available/updated,
strategy, selector, image(s)
* DaemonSet — desired/current/ready/available/misscheduled,
nodeSelector
* Pod — phase, podIP, hostIP, nodeName, startTime + Containers
table (name/image/ready/restarts/state, joined with
status.containerStatuses)
* Service — type, clusterIP, selector + Ports + live Endpoints
(mined from the k8sSnapshot EndpointSlices by service-name label)
* ConfigMap / Secret — keys count + key list (no values)
* Generic fallback for kinds we don't have a panel for
- OwnerChainPanel renders live `ownerReferences` with deep-links to
each owner's detail page (no more glossary hint).
- MetaPanel for Labels + Annotations (collapsed-by-default).
- Guard the fetch on a non-empty deploymentId so chroot pages don't
spin forever during the brief resolve window.
- ResourceDetailRoute.tsx + stubs/ResourceDetailNoTabPage.tsx:
- Pass `onTabChange` that calls TanStack `useNavigate` so tab clicks
are SPA in-place navigations (no full reload, no fetch drop).
Build: tsc -b --noEmit clean. Go build ./... clean. 11/11
ResourceDetailPage.test.tsx + 15/15 resource.api.test.ts pass.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7b895c4218
|
fix(catalyst-api+ui): Family D — treemap fan-out for cluster/region/vcluster/family + Layer-1 default (#1599)
Wave 2 Family D from t10 founder-flagged bug #2 — dashboard treemap only rendered a single bucket for cluster/region/vcluster/family groupings, defeating the multi-region visibility goal of the D16 fan-out chain. 5 sub-bugs root-caused + fixed end-to-end: C3-001 — default Layer-1 = `family`, not `cluster`, on first paint. Root cause: `PR M (#1593)` derived the default from `snapshot.sovereignFQDN` which is fetched ASYNCHRONOUSLY via SSE. On first paint snapshot is null → fell back to `['family', 'application']` even on a Sovereign Console. Fix: read mode synchronously from `DETECTED_MODE` (window.location- derived at module load), the same source SovereignSidebar + cloud-list routes use for mode-gated rendering. Now Sovereign mode reliably defaults to `['cluster', 'application']` on first paint. C3-002 — group_by=cluster returns 1 bubble despite topology API reporting 3 regions × 1 cluster each. Root cause: out of Family D scope — the chroot's k8sCache has only the primary cluster registered because the mothership handover hook hasn't posted secondary kubeconfigs via `POST /api/v1/sovereign/secondary- kubeconfig` yet on t10. The aggregator's existing fan-out (`wantFanOut` branch in GetDashboardTreemap, shipped in #1580) IS correct — it enumerates `h.k8sCache.Clusters()`. The data-faithful single bubble is a Family E concern (handover-hook secondary export reliability), not a treemap-aggregator bug. C3-003 — group_by=region collapses everything into the cluster id. Root cause: `openova.io/region` is a NODE label (set by per-region cloud-init), NOT a pod label. The handler's `stringLabel(p, "openova.io/region", "")` was always empty → `dimensionKey` fell through to `r.cluster`. Fix: list nodes alongside pods, join via `spec.nodeName`, and read `openova.io/region` / `topology.kubernetes.io/region` / `failure-domain.beta.kubernetes.io/region` (in that order) off the node's label map. Pod-level label still wins when present (mimir- style helpers). C3-004 — group_by=vcluster returns 1 `host` bucket. Root cause: `catalyst.openova.io/vcluster-role` is stamped on the HOST NAMESPACE by `bp-{mgmt,dmz,rtz}-vcluster` chart templates, NOT on individual pods. Every pod's pod-level label was empty → bucketed under the fallback `host`. Fix: list namespaces alongside pods, join via `pod.metadata.namespace`, and read the namespace's `catalyst.openova.io/vcluster-role` label. Pods truly outside any vCluster (host workloads in bootstrap-kit namespaces) still bucket under `host` — never silently dropped. C3-005 — group_by=family collapses everything into `Other`. Root cause: same shape as C3-004 — the canonical `catalyst.openova.io/family` label is set on the Namespace by chart helpers (e.g. mimir's _helpers.tpl is one of the few that ALSO sets it on the pod template). Pod-level absent → bucketed under default `other`. Fix: namespace-label fallback. Pod-level still wins when both are set (preserves per-app sub-categorisation when a chart wants it). Out of Family D scope (documented in test-evidence, not patched here): C3-008 — 3 jobs Running on "converged" sovereign (cilium-envoy-tls- restart + Trivy scans). This is a cilium-job-lifecycle concern; the treemap aggregator faithfully renders what's in the cluster. D6 convergence is owned by Family B (job lifecycle hygiene). C3-010 — D5 fan-out list-view shows 2 nodes vs chip 5/5. This is the cloud-list resource fetch path — fixed in Wave 1 (D17 routing + ResourceList kind handling) per #1597. Implementation: - `dashboard.go::buildPodRows` signature now takes `namespaces` + `nodes` slices; joins per pod via map probes (O(1) per pod, both informers are watched anyway for the cloud-list canvas so the List call is a cache read). - `dashboard.go::GetDashboardTreemap` lists namespace + node from the same per-cluster cache and passes through to buildPodRows. - `Dashboard.tsx` imports `DETECTED_MODE` and computes `defaultLayers` synchronously. `sovereignFQDN` still feeds the PortalShell page-title (display only). - `dashboard_test.go` extended with 4 new tests covering each enrichment path (family/vcluster from Namespace + region from Node + pod-label override precedence). Test fixture helper `mkDashNamespace`, `mkDashNode`, `mkDashPodOnNode` added. - Fake-client GVR registry + Registry.Add wires namespace + node so existing tests + the 4 new ones all green. Verification: - `go build ./...` clean (1.25.10 toolchain) - `go vet ./internal/handler/...` clean - `go test -count=1 -run TestDashboard ./internal/handler/...` → ok (all 13 existing + 4 new tests pass, 1.866s) - `tsc -b --noEmit` clean (zero output) - `vitest Dashboard.test.tsx` → 6/6 pass when run individually (cold-start flake observed once on first test of the full file when JSDOM import took 44s; unrelated to this change) No chart bump (per task brief). Chart roll happens via the Wave 2 collector PR. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
162090b403 |
deploy: update catalyst images to cdda974
|
||
|
|
cdda974ae0
|
feat(ui): Family F — BSS in Sovereign Console (/console/bss/*) with RBAC menu gating (founder #1) (#1598)
Founder ruling 2026-05-17: "this url is rubbish, the backed of the the mark place mutst be just aotnerh menu under console like https://console.<sov>/bss" "it is just matter of roles based access ... where we give the billing access they see the billign etc." Replaces the external "Marketplace Admin ↗" sidebar link (PR M, t142 follow-up #2) that punted operators out of the Sovereign Console SPA to marketplace.<sov-fqdn>/back-office/. Routes added under consoleLayoutRoute (Sovereign Console shell): /bss → redirect to /bss/billing (default landing) /bss/billing → BillingPage (iframes back-office/billing/) /bss/orders → OrdersPage (iframes back-office/orders/) /bss/revenue → RevenuePage (iframes back-office/revenue/) /bss/vouchers → VouchersPage (iframes back-office/vouchers/) /bss/tenants → TenantsPage (iframes back-office/tenants/) Architecture decision (option B — iframe embed): The admin Pod in the sme namespace (chart template templates/sme-services/admin.yaml, already shipped) serves the BSS UI on marketplace.<sov-fqdn>/back-office/. Iframing reuses the production back-office SPA verbatim instead of porting 5 admin pages into React. Cookies on *.<sov-fqdn> cover the iframe's cross-subdomain XHR. BssLayout owns the shared chrome (page title + tab strip + iframe wrapper); the 5 section pages are 3-line wrappers that select the back-office sub-path. Per docs/INVIOLABLE-PRINCIPLES.md #4 the back-office host is derived at runtime from DETECTED_MODE.sovereignFQDN, never baked at build time. RBAC gating happens at TWO layers: 1. Sidebar visibility (this PR) — BSS appears as a top-level nav item. Unconditional for v1 since /api/v1/whoami doesn't yet expose tier — pattern matches the existing /rbac/* and /sre/compliance routes which are similarly unconditional today. When whoami grows a `tier` field the sidebar can hide for tier=user. 2. SME gateway session-tier check on /back-office/* requests (already shipped server-side). SovereignSidebar updates: - Add BSS nav item (id='bss', label='BSS', to='/bss', receipt icon) - Extend deriveActiveSection() so /bss(/...) highlights BSS - Remove the external "Marketplace Admin ↗" anchor (founder called the marketplace.<sov>/back-office/ URL "rubbish") Fixes C6-003, C6-004, C6-005 from t10 test agent D. Files: M products/catalyst/bootstrap/ui/src/app/router.tsx M products/catalyst/bootstrap/ui/src/pages/sovereign/SovereignSidebar.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BssLayout.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/BillingPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/OrdersPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/RevenuePage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/VouchersPage.tsx A products/catalyst/bootstrap/ui/src/pages/sovereign/bss/TenantsPage.tsx tsc -b --noEmit: clean (exit 0, no errors on router.tsx / SovereignSidebar.tsx / bss/). No Chart.yaml or bootstrap-kit pin bumps per family-F brief. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1546ba978a |
deploy: update catalyst images to 658ca7e
|
||
|
|
658ca7e5e5
|
fix(ui): D17 — /cloud?view=list&kind=<X> no longer redirects to /dashboard (#1597)
Wave-1 Family A fix-author for the t10.omantel.biz test-agent matrix.
Root cause: kubectl-natural kind names operators routinely type
(`loadbalancers` vs canonical `load-balancers`, `httproutes`,
`networkpolicies`, singular `service`/`pod`/`pvc`, ...) are NOT in
cloud-list/kinds.ts `KIND_IDS`. CloudListView.tsx falls back to
DEFAULT_KIND and fires a `navigate({replace:true})` to canonicalise
the URL. The resulting re-mount + SSE re-connect storm was producing
the "drifts to /dashboard or /cloud/resource/.../overview within ~2s"
symptom test agents E + C2 reported (BLOCKED status on every
/cloud?view=list&kind=<X> deep-link in C9/C12 categories).
Fix: introduce CLOUD_KIND_ALIASES map in router.tsx and normalise the
`kind` search param in both `provisionCloudRoute.validateSearch` and
`consoleCloudRoute.validateSearch` so the React tree observes a
canonical kind on the very first render. No nav-replace storm, no
/dashboard drift.
Architectural shape (per CLAUDE.md "architect-first"):
- KIND_IDS in cloud-list/kinds.ts STAYS the single source of truth for
valid kinds. The alias map lives in router.tsx only because the
normalisation must happen at route-parse time BEFORE CloudListView
mounts; piping aliases through kinds.ts would push the concern out
of the router layer where it belongs.
- Aliases are CLOSED — anything not in KIND_IDS and not in the alias
set passes through unchanged so the CloudListView isValidKind ->
DEFAULT_KIND fallback still applies for genuinely unknown kinds
(no behavioural regression for the happy path).
- Includes singular ↔ plural (`service` → `services`, `pod` → `pods`),
hyphenated ↔ no-hyphen (`loadbalancers` → `load-balancers`), and
near-neighbour kinds (httproutes/networkpolicies → services as the
closest networking surface until dedicated lists ship).
Chart bump 1.4.152 → 1.4.153 + bootstrap-kit pin 1.4.152 → 1.4.153 in
SAME commit per the chart Chart.yaml ≠ bootstrap-kit pin lesson from
feedback_chart_chart_yaml_neq_bootstrap_kit_pin (PR L #1592 pattern).
Refs: feedback_test_theater_3rd_violation_2026_05_17.md,
/tmp/t10-results-agent-{E,C2,B,C1}.jsonl
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
eb192b4581 |
deploy: update catalyst images to 37cebdf
|
||
|
|
37cebdfbee
|
fix(store): PR P — preserve MarketplaceEnabled through Redact + ToProvisionerRequest (#1596)
Founder caught on t144: /settings/marketplace toggle showed disabled even though the prov body had marketplaceEnabled=true. Root cause: store.RedactedRequest struct (the on-disk projection) lacked a MarketplaceEnabled field. Every Save/Load cycle stripped the bit: - Mothership Save(rec) → MarketplaceEnabled dropped - Mothership exportDeploymentToChild → chroot receives record without bit - Chroot HandleGetMarketplace → reads dep.Request.MarketplaceEnabled → zero value (false) → UI toggle defaults to disabled PR J #1590's GET endpoint was correctly wired but the data was already gone before it ran. Fix: add MarketplaceEnabled field to RedactedRequest + carry it through Redact() + ToProvisionerRequest(). Backward-compat via `omitempty` — records persisted before this PR deserialize with false, same as the prior behavior. Bumps chart 1.4.151 -> 1.4.152 + bootstrap-kit pin so next prov exercises the full chain. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
efd5d60130 |
deploy: update catalyst images to 0242be5
|
||
|
|
be0874f5e2 |
deploy: update catalyst images to b27bdee
|
||
|
|
b27bdeee05
|
fix(handover): PR N — fallback to per-FQDN cert when wildcard 429s (#1594)
t143 caught the LE PROD rate limit (429: too many certificates (50) already issued for omani.works in last 168h0m0s, retry after 2026-05-17 10:28:32 UTC). The chart renders TWO cert names: - sovereign-wildcard-tls (canonical, hit 429) - sovereign-wildcard-tls-<fqdn> (per-FQDN, was already issued before rate limit, Ready=True) waitForWildcardCert only checked the canonical name. With the limit hit, handover waited the full 10-min budget before firing degraded. Fix: when the canonical cert is unavailable, list namespace certs matching `sovereign-wildcard-tls-*` prefix and return Ready=True if ANY sibling is Ready. The operator's console.<fqdn> TLS handshake will succeed against either secret since both wildcard *.<fqdn>. Bumps chart 1.4.150 -> 1.4.151 + bootstrap-kit pin so the fix lands on next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
13c9684cc1 |
deploy: update catalyst images to 32c46b8
|
||
|
|
32c46b80e1
|
feat(ui): PR M — dashboard default Layer-1=cluster + Marketplace Admin link + chart 1.4.150 (#1593)
Founder follow-up to t142 cycle: 1. "the dashboard is still not showing the clusters properly" — the D16 fan-out CODE works (3 clusters in k8sCache, dashboard handler fans out) but the OPERATOR-FACING default Layer-1 was 'family' not 'cluster'. Operator opens /dashboard, sees family-grouped bubbles, thinks the multi-cluster fix is broken. Fix: when SovereignFQDN is present (Sovereign Console mode), default to ['cluster', 'application'] so the 3-cluster grouping is the first thing the operator sees. 2. "I have no idea where the admin components for billing, order, revenue etc related BSS are" — exists at marketplace.<sov>/back-office/ but the Sovereign Console sidebar had no link. Fix: add "Marketplace Admin" nav link (external, opens in new tab) — uses resolvedFQDN to construct the URL. data-testid=sov-console-nav-marketplace-admin for matrix. Also bumps chart 1.4.149 → 1.4.150 + bootstrap-kit pin so the changes land on next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
68fe94b331 |
deploy: update catalyst images to 86f5331
|
||
|
|
86f5331962
|
fix(catalyst-api): PR L — AppDetail HelmRelease fallback + chart 1.4.149 (#1592)
Founder t140 bug #2: "in the catalog and jobs it shows as installed, in the application page it shows as provisioning, there is a sync issue". Root cause: AppDetail reads Application CR via GET /sovereigns/{id}/ applications/{name}. For bootstrap-kit installs (cilium, cert-manager, gateway-api, alloy, etc.) NO Application CR exists — they ship as HelmReleases directly with no wizard step to create the CR. The handler returned 404 → UI showed "App not found" or perpetual "Provisioning", while /apps (which reads HelmRelease) shows "installed". Fix: HandleApplicationGet, on Application CR not-found, falls back to a HelmRelease lookup in h.k8sCache (uses resolveChrootClusterID so it works post-D16 multi-cluster fan-out). Synthesises an applicationDetailResponse from HR fields: - Name/Namespace from HR - Blueprint from spec.chart.spec.chart - Version from spec.chart.spec.version (or status.lastAttemptedRevision) - Phase: Ready (HR Ready=True) / Failed (False) / Provisioning (Unknown) - Conditions: pass-through HR conditions Also bumps chart to 1.4.149 + bootstrap-kit pin so this fix + the queued PRs #1590 (marketplace GET) + #1591 (publish toggle UI) all land on the next fresh prov. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b0c0f91604 |
deploy: update catalyst images to df150fd
|
||
|
|
df150fdbd8
|
feat(ui): PR K — per-app catalog publish/unpublish toggle on AppDetail header (#1591)
Founder caught on t140 bug #4: "I am supposed to mark which applications are going to be available in the catalog … I am not able to see such option from the application page". Fix: PublishToggleChip rendered in the AppDetail hero meta row. - Reads current state on mount from GET /api/catalog/apps/{slug} - Click flips via PUT /api/catalog/admin/apps/{slug}/published - Optimistic update; reverts + tooltip on backend error - data-testid="app-detail-publish-toggle" for matrix coverage Backend already shipped — SetAppPublished handler at the catalog service /catalog/admin/apps/{slug}/published. Gateway routes admin/* with auth-gating so only Sovereign Console operator can flip. No backend change needed. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e1f619aa77 |
deploy: update catalyst images to 114705c
|
||
|
|
114705c63c
|
fix(marketplace): PR J — GET endpoint + UI reflects actual enabled state (#1590)
Founder caught on t140 bug #5: /settings/marketplace shows "disabled" while the marketplace is actually serving (prov body had marketplaceEnabled=true). Root cause: MarketplaceSettings UI hardcoded useState(false) on mount because no GET endpoint existed to read the current value. Fix: - Backend: new GET /api/v1/sovereigns/{id}/marketplace returning {deploymentId, sovereignFQDN, enabled, brand}. Reads from the in-memory deployment record (Request.MarketplaceEnabled set at prov time + mutated by HandleSetMarketplace's commit path). - UI: MarketplaceSettings useEffect fetches on mount, sets the toggle to the actual value, hydrates the brand fields. Best-effort fetch — falls back to defaults on failure. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a63f3c13ab |
deploy: update catalyst images to f1ebf14
|
||
|
|
f1ebf14cf8
|
fix(catalyst-api): D30 PR I — mark imported deployment as Adopted on chroot (#1589)
Founder t140 bug #6: /parent-domains shows only primary, not the sme-pool domains. Chroot's deployment record has parentDomains[] populated but ListParentDomains uses h.activeDeployment() which filters to AdoptedAt!=nil. The mothership ships the record before the chroot's own handover-finalisation, so AdoptedAt is nil → activeDeployment returns nil → only synth primary row renders. Fix: HandleDeploymentImport stamps AdoptedAt at import time. The FQDN-match guard above verifies "this record IS my Sovereign's record" so the chroot is by definition the operator/owner — no separate adoption-wizard needed on chroot side. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
473a2ba4b9 |
deploy: update catalyst images to 52be4d4
|
||
|
|
52be4d4d3a
|
fix(catalyst-api): D16 PR H — resolveChrootClusterID multi-cluster + dashboard alias (#1587)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign Founder caught on t136: console.t136.../app/bp-alloy renders the catalog grid (AppsPage) instead of AppDetail. Three earlier PRs (#1572 + chart bumps) flipped the appRoute beforeLoad logic but the actual route-matching collision was not fixed. Root cause: appRoute.addChildren registers appDeploymentRoute at `/$deploymentId` (effective `/app/$deploymentId`, mother-only) BEFORE consoleLayoutRoute registers consoleAppDetailRoute at `/app/$componentId`. TanStack Router resolves equally-specific dynamic routes by declaration order — so on the Sovereign Console URL `/app/bp-alloy` matches appDeploymentRoute first and renders AppsPage with deploymentId="bp-alloy". Fix: at routeTree build time, filter appRoute children to exclude every mother-only `/$deploymentId/*` route when running on Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this is a one-time check, no runtime overhead. With those routes absent, consoleAppDetailRoute is the only matcher for `/app/<componentId>` on Sovereign Console — AppDetail renders. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(bootstrap-kit): pin bp-catalyst-platform 1.4.147→1.4.148 Founder-flagged bug fixes from session t136/t138/t139 verify cycle shipped 3 PRs that bumped catalyst chart Chart.yaml to 1.4.148 ( |
||
|
|
b61e9afabf |
deploy: update catalyst images to 2ab8a0e
|
||
|
|
2ab8a0e653
|
fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign (#1585)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): D17 — exclude mother-only /app/$deploymentId routes on Sovereign Founder caught on t136: console.t136.../app/bp-alloy renders the catalog grid (AppsPage) instead of AppDetail. Three earlier PRs (#1572 + chart bumps) flipped the appRoute beforeLoad logic but the actual route-matching collision was not fixed. Root cause: appRoute.addChildren registers appDeploymentRoute at `/$deploymentId` (effective `/app/$deploymentId`, mother-only) BEFORE consoleLayoutRoute registers consoleAppDetailRoute at `/app/$componentId`. TanStack Router resolves equally-specific dynamic routes by declaration order — so on the Sovereign Console URL `/app/bp-alloy` matches appDeploymentRoute first and renders AppsPage with deploymentId="bp-alloy". Fix: at routeTree build time, filter appRoute children to exclude every mother-only `/$deploymentId/*` route when running on Sovereign mode. DETECTED_MODE.mode is fixed per-page-load so this is a one-time check, no runtime overhead. With those routes absent, consoleAppDetailRoute is the only matcher for `/app/<componentId>` on Sovereign Console — AppDetail renders. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d985f27c8b |
deploy: update sme service images to 964dc15 + bump chart to 1.4.148
|
||
|
|
f7ea19000e |
deploy: update catalyst images to 9fc2850
|
||
|
|
9fc2850504
|
fix(catalyst-api): D16/D17 — 3 bugs caught on t138 fresh prov (#1583)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ccbe51e3e4 |
deploy: update catalyst images to 9237c1e
|
||
|
|
9237c1e6ee
|
fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) (#1582)
PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ce4ef6ba98
|
feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".
Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.
The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)
When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:
1. Burning a Dynadot API credit on a flip that would be idempotent.
2. The D30 blocker — current Dynadot creds return pdm-status-401
even when the desired NS state already exists. Caught on t132
2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
parentDomains attempt.
Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.
This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses catalyst-system namespace
PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).
Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.
Verified the CRD shape on t134 2026-05-17:
$ kubectl api-resources --api-group=access.openova.io
useraccesses access.openova.io/v1alpha1 true UserAccess
^^^^
NAMESPACED
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses tierRoleRef not wildcard app
PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.
The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.
Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)
D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.
Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.
This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
(canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract
PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.
PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.
Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.
Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)
When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.
Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).
For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.
PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.
Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B)
Closes the D16 multi-cluster fan-out chain:
- PR #1579 (PR A): chroot endpoint accepts kubeconfigs
- PR #1580 (PR C): dashboard handler fans out across registered clusters
- This PR (PR B): mothership-side hook iterates secondary regions at
handover, reads each region's kubeconfig from the mothership PVC,
and POSTs to the chroot's endpoint
After handover-fire, exportSecondaryKubeconfigsToChild fires as a
goroutine (alongside exportDeploymentToChild). Best-effort per region:
a failure on region N doesn't abort N+1.
The chroot's k8sCache.Factory.AddCluster runs on every POST so
dashboard /api/v1/dashboard/treemap?group_by=cluster|region now
enumerates pods from all N regions and Layer-1=Cluster renders N
bubbles on an N-region Sovereign.
regionKeysForExport derives the filename convention `<region>-<slot>`
from dep.Request.Regions[1:] (primary is auto-registered by the
chroot's FactoryFromEnv self-registration so we skip index 0).
Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are read with stdlib os.ReadFile.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b07e5206a1 |
deploy: update catalyst images to d92f734
|
||
|
|
d92f734374
|
feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C) (#1580)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".
Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.
The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)
When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:
1. Burning a Dynadot API credit on a flip that would be idempotent.
2. The D30 blocker — current Dynadot creds return pdm-status-401
even when the desired NS state already exists. Caught on t132
2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
parentDomains attempt.
Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.
This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses catalyst-system namespace
PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).
Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.
Verified the CRD shape on t134 2026-05-17:
$ kubectl api-resources --api-group=access.openova.io
useraccesses access.openova.io/v1alpha1 true UserAccess
^^^^
NAMESPACED
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses tierRoleRef not wildcard app
PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.
The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.
Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)
D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.
Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.
This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
(canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract
PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.
PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.
Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.
Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)
When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.
Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).
For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.
PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.
Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
bcab6430cb
|
feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A) (#1579)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".
Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.
The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)
When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:
1. Burning a Dynadot API credit on a flip that would be idempotent.
2. The D30 blocker — current Dynadot creds return pdm-status-401
even when the desired NS state already exists. Caught on t132
2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
parentDomains attempt.
Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.
This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses catalyst-system namespace
PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).
Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.
Verified the CRD shape on t134 2026-05-17:
$ kubectl api-resources --api-group=access.openova.io
useraccesses access.openova.io/v1alpha1 true UserAccess
^^^^
NAMESPACED
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses tierRoleRef not wildcard app
PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.
The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.
Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)
D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.
Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.
This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
(canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract
PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.
PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.
Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.
Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6e329e27ae |
deploy: update catalyst images to 4f62dd2
|
||
|
|
4f62dd21b3
|
fix(handover): D21 owner seed uses tierRoleRef not wildcard app (#1578)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".
Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.
The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)
When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:
1. Burning a Dynadot API credit on a flip that would be idempotent.
2. The D30 blocker — current Dynadot creds return pdm-status-401
even when the desired NS state already exists. Caught on t132
2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
parentDomains attempt.
Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.
This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses catalyst-system namespace
PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).
Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.
Verified the CRD shape on t134 2026-05-17:
$ kubectl api-resources --api-group=access.openova.io
useraccesses access.openova.io/v1alpha1 true UserAccess
^^^^
NAMESPACED
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses tierRoleRef not wildcard app
PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.
The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.
Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6466f97f6c |
deploy: update catalyst images to ea30ded
|
||
|
|
ea30ded120
|
fix(handover): D21 owner seed uses catalyst-system namespace (#1577)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)
PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".
Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.
The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)
When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:
1. Burning a Dynadot API credit on a flip that would be idempotent.
2. The D30 blocker — current Dynadot creds return pdm-status-401
even when the desired NS state already exists. Caught on t132
2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
parentDomains attempt.
Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.
This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(handover): D21 owner seed uses catalyst-system namespace
PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).
Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.
Verified the CRD shape on t134 2026-05-17:
$ kubectl api-resources --api-group=access.openova.io
useraccesses access.openova.io/v1alpha1 true UserAccess
^^^^
NAMESPACED
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
18b5fa1466 |
deploy: update catalyst images to 33ed484
|