openova/products
e3mrah ce4ef6ba98
feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581)
* fix(cloudinit): escape $$\{ORG_EMAIL:-\}/$$\{ORG_NAME:-\} in comment (D22)

PR #1571 added a comment mentioning the $${ORG_EMAIL:-}/$${ORG_NAME:-}
slot-file placeholders WITHOUT the $$ escape. tofu's templatefile()
parses comments and tried to interpolate \${ORG_EMAIL:-} as a tofu
expression — failing with "Extra characters after interpolation
expression; Template interpolation doesn't expect a colon".

Caught live on t133 fad01d84f5655004 — tofu plan failed in 30s.

The escape pattern is documented at main.tf:1029 (the same warning
that caught t127 last week). $$ prefix tells tofu's templatefile to
emit literal \${...} to cloud-init for Flux envsubst.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(parent-domains): short-circuit pdmFlipNS when NS already matches (D30)

When an sme-pool domain's current NS records already match the expected
[ns1.<primary>, ns2.<primary>] pair (because the operator already
delegated the domain to OpenOva's PowerDNS), the PDM registrar-flip
step is a no-op. Skipping avoids:

  1. Burning a Dynadot API credit on a flip that would be idempotent.
  2. The D30 blocker — current Dynadot creds return pdm-status-401
     even when the desired NS state already exists. Caught on t132
     2026-05-16 day-2 add + t134 2026-05-17 fresh-prov body
     parentDomains attempt.

Adds nsAlreadyMatches() helper using net.DefaultResolver.LookupNS with
a 5s timeout. False on lookup error or partial match → fall through to
the original PDM pipeline so a misconfigured/partial domain still goes
through the registrar API.

This unblocks sme-pool entries for omani.homes (already pointing at
ns1/2/3.openova.io). omani.rest / omani.trades still go through the
full flip path because their NS records don't yet match expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses catalyst-system namespace

PR #1564 created the owner UserAccess CR with .Namespace("") — the
apiserver returned "could not find the requested resource" because
useraccesses.access.openova.io is NAMESPACED (Crossplane Claim per
the XRD's claimNames block at platform/crossplane-claims/chart/
templates/xrds/useraccess.yaml).

Pin to catalyst-system (where catalyst-api + every Catalyst-authored
CR lives) and stamp the namespace on the object too. The existing
ListUserAccess handler uses Namespace("") so the entry surfaces on
/users without per-namespace filtering.

Verified the CRD shape on t134 2026-05-17:
  $ kubectl api-resources --api-group=access.openova.io
  useraccesses   access.openova.io/v1alpha1   true   UserAccess
                                                ^^^^
                                                NAMESPACED

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(handover): D21 owner seed uses tierRoleRef not wildcard app

PR #1564 + #1577 created the CR shape with applications=[{app:"*",...}]
but the useraccess XRD schema rejects `app: "*"` (pattern
^[a-z0-9][a-z0-9-]{0,62}$). The seed handler logged
"spec.applications[0].app: Invalid value: \"*\"" on every handover.

The XRD has a `tierRoleRef` field (pattern
^openova:tier-(viewer|developer|operator|admin|owner)$) that's the
canonical owner-tier semantic — when set, useraccess-controller binds
the named ClusterRole on the target via RoleBinding/ClusterRoleBinding.
`openova:tier-owner` is shipped by EPIC-3 (#1098) slice T1's
tier-clusterroles.yaml.

Drop the applications[] block + use tierRoleRef = openova:tier-owner.
Verified live on t135 2026-05-17 — error log showed exact pattern
mismatch before this fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(chroot): POST /api/v1/sovereign/secondary-kubeconfig (D16 PR A)

D16 multi-cluster fan-out requires the chroot's k8sCache.Factory to
have all 3 regions' kubeconfigs registered so dashboard handler's
per-cluster h.k8sCache.List(clusterID, ...) enumerates pods from each.

Today the chroot only auto-registers its own in-cluster apiserver via
FactoryFromEnv's chroot self-registration branch. Secondary
kubeconfigs live on the mothership PVC + aren't replicated.

This handler bridges the gap:
- Accepts JSON {deploymentId, regionKey, kubeconfigYaml}
- Validates ids via ^[a-z0-9][a-z0-9-]{0,62}$ pattern (defense in
  depth — filename composed from these)
- Writes kubeconfig 0o600 to /var/lib/catalyst/kubeconfigs/<depID>-<region>.yaml
  (canonical FactoryFromEnv path so restart re-registers)
- Calls k8sCache.AddCluster — idempotent per Factory contract

PR B (next): mothership-side handover hook iterates secondary regions
and POSTs each kubeconfig to the chroot.

PR C (next): dashboard.go fan-out across all registered cluster IDs
when group_by includes cluster/region.

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are written 0o600.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(dashboard): multi-cluster fan-out when group_by=cluster|region (D16 PR C)

When group_by includes "cluster" or "region", enumerate ALL registered
k8sCache clusters (primary + secondaries synced via PR #1579's POST
/api/v1/sovereign/secondary-kubeconfig endpoint) and concatenate
podRows from each before aggregation.

Layer-1=Cluster on /dashboard now renders 3 bubbles on a 3-region
Sovereign (was 1 bubble before).

For group_by that ONLY contains {namespace,family,application,vcluster,
sovereign} the primary clusterID's pods are sufficient and faster — no
fan-out cost.

PR B (mothership-side handover hook to POST each secondary kubeconfig)
will complete the chain. Until then, secondaries don't appear in
k8sCache.Clusters() so this fan-out is a no-op on existing provs — but
the code is in place for when PR B lands.

Memo: feedback_d16_dashboard_multi_cluster_fan_out.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B)

Closes the D16 multi-cluster fan-out chain:
- PR #1579 (PR A): chroot endpoint accepts kubeconfigs
- PR #1580 (PR C): dashboard handler fans out across registered clusters
- This PR (PR B): mothership-side hook iterates secondary regions at
  handover, reads each region's kubeconfig from the mothership PVC,
  and POSTs to the chroot's endpoint

After handover-fire, exportSecondaryKubeconfigsToChild fires as a
goroutine (alongside exportDeploymentToChild). Best-effort per region:
a failure on region N doesn't abort N+1.

The chroot's k8sCache.Factory.AddCluster runs on every POST so
dashboard /api/v1/dashboard/treemap?group_by=cluster|region now
enumerates pods from all N regions and Layer-1=Cluster renders N
bubbles on an N-region Sovereign.

regionKeysForExport derives the filename convention `<region>-<slot>`
from dep.Request.Regions[1:] (primary is auto-registered by the
chroot's FactoryFromEnv self-registration so we skip index 0).

Per docs/INVIOLABLE-PRINCIPLES.md #10 kubeconfig bytes never enter a
logged struct + are read with stdlib os.ReadFile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:22:01 +04:00
..
axon feat(axon): make qwen3-coder thinking mode toggleable via request parameter 2026-04-26 09:20:33 +02:00
catalyst feat(handover): export secondary kubeconfigs to chroot at handover (D16 PR B) (#1581) 2026-05-17 08:22:01 +04:00
continuum feat(continuum): F — dry-run report + post-switchover health check + audit-emit coverage (slice F-1+F-2+F-3, #1101) (#1161) 2026-05-09 08:33:37 +04:00
cortex docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean 2026-04-28 00:37:21 +02:00
dmz-vcluster fix: mark bp-dmz-vcluster + bp-netbird default-off for smoke-render gate (#1286) 2026-05-10 15:57:18 +04:00
fabric docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
fingate docs(pass-52): bundled date-sweep + cross-component namespace clean; knative clean 2026-04-28 00:37:21 +02:00
openova-flow fix(openova-flow): COPY go.sum + go mod download in Dockerfile (#1475) 2026-05-14 14:23:57 +04:00
relay docs(seaweedfs+guacamole): replace MinIO with SeaweedFS as unified S3 encapsulation; add Guacamole to bp-relay 2026-04-28 10:23:46 +02:00
sandbox fix(sovereign-tls): tls-restart Job needs list+watch verbs (#1504) 2026-05-15 21:02:37 +04:00