fix-c18b-provisioning-token-secret-ownership
36 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8878938a43
|
fix(ci): bump sme-services Containerfiles golang 1.22 → 1.26 (unblock 5 stranded fixes) (#1691)
Every services-build run since 2026-05-18 06:32 UTC failed with "go: go.mod requires go >= 1.26.0 (running go 1.22.12; GOTOOLCHAIN=local)" because a recent go.mod bump to `go 1.26.0` was not paired with a Containerfile base-image bump. 5 strandled fixes that never produced new image SHAs: - PR #1683 fix(billing): consume catalyst.usage.recorded from CATALYST_SME stream (was creating overlapping CATALYST_USAGE) - PR #1684 fix(provisioning): set Organization.spec.tenantPublic - PR #1685 fix(catalog+billing): Sandbox Free/Pro/Ent plans + quota - PR #1686 feat(sandbox): orchestrator listens tenant.sandbox_requested - test(sandbox): integration tests for orchestrator + sessions API The stranded billing image is the root cause of every voucher 502 on t22 and blocks the full marketplace customer journey (steps 9, 10, 15 all fail). t22 billing Pod is in CrashLoopBackOff with the exact NATS subject-overlap signature PR #1683 fixes. Bumps all 10 service Containerfiles (auth/billing/catalog/catalyst- catalog/domain/gateway/metering-sidecar/notification/provisioning/ tenant) to golang:1.26-alpine, matching the toolchain in go.mod. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8017700ad4
|
feat(sandbox): tier-bound MCP capabilities (Free/Pro/Ent plans gate tool access) (#1690)
Stop handing every Sandbox session the full MCP surface. Each per-Sandbox
NewAPI token now carries a plan-derived capability allowlist that the MCP
server enforces against per-tool RequiredCapability via Claims.HasCapability:
- Free: read-only k8s + gitea read + session/rag/skills
- Pro: + sandbox.db.* + sandbox.storage.* + sandbox.preview.* +
sandbox.auth.* + sandbox.secrets.* + marketplace.* + flux.status
- Ent: + sandbox.deploy.{staging,production,...} + sandbox.stripe.* +
flux.{reconcile,suspend,resume} + gitea.pr.{create,merge} +
gitea.issue.*
Wiring:
- Sandbox CRD spec gains planId + capabilities[] (operator overlay).
- Sandbox sandboxapi.{CapabilitiesForPlan,ResolveCapabilities} is the
SoT; tenant orchestrator carries an exact-mirror capabilitiesForPlan
(no controllers-module dep — same isolation pattern quotaForPlan
uses).
- sandbox-controller threads spec.capabilities (falling back to plan)
into newapi.MintRequest.
- catalyst-api bridge handler accepts capabilities[] on the wire and
encodes it as the JWT `capabilities` claim (omitted when empty).
- Claims.HasCapability gains wildcard prefix matching (`sandbox.db.*`
satisfies `sandbox.db.provision`, `sandbox.db`, etc.) so plan grants
stay coarse. Plain stem matches WITHOUT a wildcard are intentionally
rejected — the production second-gate in sandbox_deploy.go stays
honest.
- MCP registry: every gated tool now carries its granular dotted
RequiredCapability (`sandbox.db.provision`, `gitea.pr.list`, …).
Read-only / session tools previously ungated also get granular
grants so Free tokens can browse without inheriting the write
surface.
No Chart.yaml bump — CRD additions are additive; existing Sandbox CRs
parse fine. Empty token capabilities downgrades to introspection only,
matching pre-PR-#1671 callers.
Tests: shared/auth/claims_test.go (wildcard matrix),
sandboxapi/capabilities_test.go (plan ladder + spec override),
sandbox_token_test.go (capabilities round-trip + omit-on-empty),
sandbox_controller_test.go (plan-derived + spec-override mint),
sandbox_consumer_test.go (orchestrator stamps spec.capabilities), plus
updates to every per-namespace registry test asserting new granular
RequiredCapability values.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ffb79aab12
|
fix(billing): consume catalyst.usage.recorded from CATALYST_SME stream (was creating overlapping CATALYST_USAGE) (#1683)
t20 (2026-05-18) caught the bug: billing crashed at startup with NATS error code 10065 "subjects overlap with an existing stream" because CATALYST_SME (subjects `catalyst.>`, created by the tenant / provisioning MultiSubscribers) had already claimed `catalyst.usage.recorded` by the time billing tried to create CATALYST_USAGE (subject `catalyst.usage.recorded`). JetStream forbids two Streams from owning overlapping subject filters. Option B per the matrix: have billing share CATALYST_SME and scope its metering reads via a consumer-side FilterSubject instead of owning a separate Stream. This matches the architecture every other SME service (tenant, notification, provisioning) already uses for catalyst.* events. Changes: - core/services/shared/events/nats.go: add EnsureCatalystSMEStream (public wrapper around the existing package-private ensureSMEStream helper used by NewMultiSubscriber) + SubscribeUsageRecordedOnSME (durable consumer on CATALYST_SME with FilterSubject scoped to catalyst.usage.recorded). The original EnsureUsageStream and SubscribeUsageRecorded are retained but marked Deprecated for back-compat with any Catalyst-Zero / dev loop wired before t20. - core/services/billing/main.go: replace the EnsureUsageStream call with EnsureCatalystSMEStream and the SubscribeUsageRecorded call with SubscribeUsageRecordedOnSME. Comment captures the t20 root cause + the bootstrap-order rationale so the next reader doesn't re-introduce the dedicated Stream. The consumer-side FilterSubject (`catalyst.usage.recorded`) lives in core/services/shared/events/nats.go inside SubscribeUsageRecordedOnSME. go build + go test clean for core/services/billing and core/services/shared. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3acb340b36
|
test(sandbox): integration tests for orchestrator + sessions API status reflection (#1680)
Adds regression coverage so the Sandbox event flow + REST surface can
be exercised without a live Sovereign — the convergence loop the
qa-loop's last 5 iterations relied on.
Tenant orchestrator (5 cases / 8 runs):
* full event flow — tenant.sandbox_requested envelope → in-process
BrokerSubscriber → SandboxOrchestrator.Start → recordingSandboxClient
materialises a CR shaped per architecture.md §7 (labels, annotations,
spec.owner/quota/agentCatalogue/planId)
* NATS-style redelivery is idempotent — second Emit() goes Get(found)
→ no-op, Create count stays at 1
* plan tiers fan out — free/pro/ent each stamp the right quota
(catches the PR #1633 regression)
* non-sandbox event types ignored at the dispatcher seam
* agentCatalogue strips empty / whitespace entries before persist
Catalyst sessions API (7 cases / 10 runs):
* POST → GET round-trip through a dynamic/fake apiserver via
SetSovereignDepsFactory (mirrors chroot Sovereign "Path 2")
* GET reflects controller status (sessions / storage / spend /
previews / conditions) into the FE wire shape
* Failed condition taxonomy — TokenMintFailed, GitopsWriteFailed,
ManifestRenderFailed each preserved verbatim so the FE renders
actionable error states instead of a generic red pill
* POST invalid-agent returns 400 before any apiserver call
* GET unknown sandbox returns 404 sandbox-not-found
* LIST → DELETE → LIST round-trip
* Org-scope isolation — claims.Org-scoped namespace boundary blocks
cross-Org leak
Hard rules followed: READ-ONLY fake clients (no apiserver write), no
chart bump, no production code changes — only new _test.go files.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
96d2d9bce7
|
fix(provisioning): set Organization.spec.tenantPublic on product-install (was empty; HTTPRoute reconciler had nothing to render) (#1650)
PR #1644 added Organization.spec.tenantPublic + per-tenant HTTPRoute reconciler, but nothing set the field — every Org CR's TenantPublic stayed zero-value, the reconciler short-circuited at the empty ParentDomain guard, and `<slug>.omani.homes` 404'd at the Cilium Gateway. Wire the patch at the only point that knows a tenant's product is actually Ready: the provisioning service. Both the initial workflow (`provision.completed`) and the day-2 install path (`provision.app_ready`) now patch the Organization CR's spec.tenantPublic with parentDomain (from TENANT_PARENT_DOMAIN env), subdomain (= slug), backendService (canonical vcluster-synced name), port 80, and the picked product slug. Last-write-wins on subsequent installs. Per docs/INVIOLABLE-PRINCIPLES.md #4 the parent zone flows through env, never hardcoded — every Sovereign picks its own pool zone. Empty env disables the patch entirely (legacy tenants keep working through the Sovereign-wide tenant-wildcard route). Best-effort: failures don't fail the provision. 404 on the CR is benign (legacy tenant without an Organization counterpart). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8888d9edd1
|
feat(catalog+billing): Sandbox Free/Pro/Ent plans + quota wire (was no plans = broken checkout) (#1642)
PR #1633 added the Sandbox app to seedApps but never wired the matching plan rows. The marketplace checkout hit "plan_id not found" the moment a customer picked Sandbox, and PR #1639's sandbox-orchestrator could only mint CRs with the Wave 1 baseline quota regardless of the picked tier. This PR closes both gaps in lockstep: Catalog: - Plan struct gets ProductSlug + IncludedQuotas fields (back-compat: omitempty BSON tags so legacy rows decode fine). - expectedSandboxPlans() helper canonical-defines the three tiers: sandbox-free 0 OMR 1 session, 1 agent, 5 GB, BYOS sandbox-pro 9 OMR 3 sessions, 6 agents, 50 GB, BYOS (Popular) sandbox-ent 49 OMR unlimited, 6 agents, 500 GB, BYOS - seedAllData appends them on fresh seed; seedMissingSandboxPlans backfills them on already-populated Sovereigns (idempotent GET-then- create, patches missing ProductSlug/IncludedQuotas on legacy rows). - UpdatePlan persists the two new fields. Sandbox orchestrator wiring: - SandboxRequestedPayload.PlanID added; CreateOrg forwards body.PlanID. - buildSandbox stamps openova.io/plan-id annotation + spec.planId when PlanID is non-empty. - quotaForPlan() maps sandbox-{free,pro,ent} → SandboxQuota; empty or unknown plan_id falls through to DefaultQuota (Wave 1 baseline = Sandbox Free shape). Hard-coded map mirrors catalog IncludedQuotas so tenant-service avoids a compile-time dep on the catalog mongo stack. Tests: - TestExpectedSandboxPlans_Shape locks slugs, prices, quota keys, the Popular flag (sandbox-pro), and the quota ladder. - TestSandboxHandle_PlanIDStampsAnnotationAndQuota table-test exercises all three tiers end-to-end (annotation + spec.planId + spec.quota). - TestSandboxHandle_PlanIDEmptyKeepsDefaultQuota guards back-compat with pre-PR publishers. - TestSandboxHandle_PlanIDUnknownFallsBackToDefault guards typo'd / retired plan IDs. go build + go test clean for catalog, tenant, billing, provisioning, shared, marketplace-api. No Chart.yaml bump, no cluster touch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
4c83d98765
|
feat(sandbox): orchestrator listens tenant.sandbox_requested → Sandbox CR materialisation (#1639)
PR #1633 wired CreateOrg to publish `tenant.sandbox_requested` when the marketplace cart includes the sandbox product. Nobody was subscribing — the event landed in NATS `catalyst.tenant.sandbox_requested` and aged out unread, so no Sandbox CR (PR #1622) was ever minted and the customer sat on a "Provisioning…" spinner forever. This slice closes the loop. A new SandboxOrchestrator in tenant-service: - Subscribes via events.MultiSubscriber (PR #1636) to the canonical NATS subject + legacy Kafka topic. - Parses {tenant_id, org_slug, owner_id, owner_email, agents, sovereign, requested_at} and resolves the owner email (event field → store.GetMemberEmail → owner_id fallback). - Materialises a Sandbox CR in catalyst-system (SANDBOX_NAMESPACE override) via a dynamic client, with spec per architecture §7: owner.email + owner.orgRef.slug, default quota (4 CPU / 8 Gi / 50 Gi / 3 sessions), spec.agentCatalogue from the cart. - Idempotent: Get-then-Create with AlreadyExists swallowed so NATS redeliveries + duplicate marketplace submits stay no-ops; the sandbox-controller remains SoR for spec mutations. Wiring in main.go is best-effort — when no in-cluster config nor KUBECONFIG is available (CI / dev loops) the orchestrator is skipped with a Warn; the rest of the tenant service still boots. Hard rules: no chart bump, no cluster writes outside of the Sandbox Create call (sandbox-controller reconciles the rest), `go build ./...` clean, `go test ./...` clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
72f82ea7f2
|
fix(sme): wire provisioning/notification/domain consumers to NATS (was Kafka-only, was silent-dropping every tenant.created event) (#1636)
PR #1626 wired the PUBLISH leg of tenant + billing to NATS via events.MultiPublisher (canonical subject `catalyst.<event.Type>` per ADR-0001 §6). The CONSUME leg stayed Kafka-only — provisioning, notification, domain, billing's tenant-events cascade, AND tenant's own provision-events + members-cleanup consumers all called events.NewConsumer(redpandaBrokers, …). On Sovereigns REDPANDA_BROKERS is empty by design (no Redpanda exists; NATS is the canonical bus per the convergence-fix block in configmap.yaml) so those consumers either never started OR dialed `localhost:9092` in a hot crash loop. Net effect on every Sovereign install pre-this-PR: 1. alice POSTs /sme/tenants → tenant publishes catalyst.tenant.created to NATS (PR #1626). 2. provisioning's only subscriber was Kafka-only → silent drop. 3. No Organization CR ever spawned → no vCluster → CONVERGENCE BROKEN. This change introduces a symmetric subscribe-side abstraction mirroring bridge.go's MultiPublisher: - events.BrokerSubscriber: unified Subscribe(ctx, handler) interface, satisfied by *Consumer, *DLQSubscriber, *MultiSubscriber. - events.MultiSubscriber: fans in from NATS JetStream durable consumers (one per canonical subject) + an optional legacy Kafka Consumer. NewMultiSubscriber refuses to construct with both legs nil (the silent-no-op pattern this PR exists to prevent). - events.NATSConn.ensureSMEStream: idempotently creates the CATALYST_SME Stream filtering `catalyst.>` so the first consumer on a fresh Sovereign bootstraps lifecycle. Each service's main.go now constructs a MultiSubscriber and passes it to the consumer dispatch loop. Consumer signatures take events.BrokerSubscriber instead of *events.Consumer (interface upcast, so *events.Consumer call sites keep working on Catalyst-Zero): - provisioning: tenant.created / tenant.deleted / tenant.app_install_requested / tenant.app_uninstall_requested / order.placed (the 5 subjects PR #1626 publishes to NATS). Also wires MultiPublisher so provision.* publishes hit NATS too — downstream tenant + notification consumers need them. - notification: full fan-in (user.login, order.placed, payment.received, provision.*, domain.*, member.invited). - domain: tenant.deleted (subdomain + BYOD reclamation cascade). - billing: tenant.deleted (Stripe sub-cancel + invoice void + ledger marker cascade). Existing metering NATS subscriber unaffected. - tenant: provision.* + tenant.deleted (members cleanup). Now reachable on Sovereigns; pre-this-PR they were inside the `if redpandaBrokersRaw != ""` block. Chart wiring: NATS_URL env added to provisioning, notification, and domain Deployments (tenant + billing already wired via PR #1626). notification.yaml also flips its hardcoded REDPANDA_BROKERS literal to the shared ConfigMap key so the per-topology default (empty on Sovereigns, talentmesh redpanda on Catalyst-Zero) applies. Verification: - go build ./core/services/{shared,tenant,billing,provisioning, notification,domain}/... clean. - go test ./... clean across all 6 modules. - helm template with global.sovereignFQDN=test.example.com renders NATS_URL="nats://nats-jetstream.nats-system.svc.cluster.local:4222" into all 5 Deployments + ConfigMap. - helm template without sovereignFQDN renders NATS_URL="" and REDPANDA_BROKERS=talentmesh redpanda, matching Catalyst-Zero. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b8b80973de |
feat(sandbox): Wave 4 — marketplace catalog entry (customer can pick Sandbox alongside WordPress)
Adds the Sandbox product to the marketplace storefront so a customer picks it off marketplace.<sov>/apps the same way they pick WordPress / Nextcloud. Card chrome is the existing .app-card shape verbatim — no new components per the design-system inheritance rule. The detail page gains a 6-agent picker (aider, claude-code, cursor-agent, little-coder, opencode, qwen-code) using the existing .related-card chrome with a picked state mirroring .app-card.in-cart. Picks land on cart.agents and travel through checkout into the tenant create-org payload. Tenant-service emits a sibling `tenant.sandbox_requested` event on sme.tenant.events when the cart contains the sandbox product. The event carries org slug + owner + agents list, sufficient for the sandbox-controller (or its upstream orchestrator) to mint a Sandbox CR with matching spec.agentCatalogue. The Organization CR creation path is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d681f64505
|
fix(catalyst-api): mint HS256 token on SME proxy calls (was forwarding incompatible RS256) (#1630)
PR #1625 shipped the /api/v1/sme/billing/vouchers/* proxies but the SME gateway (core/services/gateway/proxy.go) rejects RS256 outright — it only accepts HS256 signed with sme-secrets/JWT_SECRET. Result on every fresh Sovereign: operator clicks on /bss/vouchers returned silent 401 with no upstream audit trail. This commit ships the bridge: - core/services/shared/auth/mint_sme.go (new) - MintSMEAccessToken(secret, sub, email, role) → 5-min HS256 JWT in the wire shape billing's requireVoucherIssuer expects. - SMERoleFor(realmRoles, tier) → maps Keycloak roles + tier claim onto SME vocab (superadmin | sovereign-admin | member). - Pure, no IO, fully unit-tested (mint_sme_test.go). - products/catalyst/bootstrap/api/internal/handler/sme_billing_vouchers.go - proxySMEVoucher now mints a fresh HS256 token per upstream hop from the operator's already-validated RS256 session claims and forwards that as Bearer to the SME gateway. RS256 header is no longer leaked upstream. - Unwired bridge (CATALYST_SME_JWT_SECRET empty) surfaces 503 `sme-jwt-bridge-unwired` instead of the silent 401. - products/catalyst/bootstrap/api/internal/handler/handler.go - h.smeJWTSecret field + SetSMEJWTSecret(secret) setter. - products/catalyst/bootstrap/api/cmd/api/main.go - Reads CATALYST_SME_JWT_SECRET on startup and wires it. - Log line includes byte count only (never the secret value, per INVIOLABLE-PRINCIPLES.md #10). - products/catalyst/chart/templates/api-deployment.yaml - New env CATALYST_SME_JWT_SECRET sourced from sme-secrets/JWT_SECRET in the same namespace (catalyst-system). optional: true so Sovereigns without marketplace surface a 503 rather than CreateContainerConfigError. - products/catalyst/chart/templates/sme-services/sme-secrets.yaml - emberstack/reflector annotation block mirroring sme-secrets from `sme` ns into `catalyst-system` (Kubernetes secretKeyRef is same-namespace-only). Same pattern as cnpg-cluster.yaml and provisioning-github-token.yaml. Operator-visible behaviour: the bridge is transparent on the happy path (operator with sovereign-admin tier on a Sovereign with marketplace enabled clicks /bss/vouchers → list returns). On the unhappy paths the operator now sees a real status code: - 503 sme-jwt-bridge-unwired (chart wire missing) — actionable - 503 sme-gateway-unreachable (DNS NXDOMAIN) — pre-existing - 403 from billing's requireVoucherIssuer (role insufficient) — was silent 401 before, now propagates the real authz result. Tests: core/services/shared/auth `go test ./...` PASS. catalyst-api `go build ./...` PASS. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
50a45a9783
|
fix(billing): skip Stripe when voucher covers 100% of total (unblocks fully-paid voucher checkout) (#1628)
POST /billing/checkout was 503'ing with "payment processor is not
configured" on Sovereigns that have not pasted Stripe keys yet — even
when the customer's credit balance (from a fresh voucher redemption
in the same request, or a prior balance) fully covered the order
total. Make the credit-only short-circuit explicit: compute
`remainingOMR := totalOMR - creditBalance` and settle via
CreditOnlyCheckout when `<= 0`, BEFORE any Stripe settings probe.
This is the path that has to keep working during the voucher-only
weeks of a new Sovereign.
Adds checkout_test.go covering two regression paths:
- fresh-voucher path: customer with 0 credit redeems WELCOME50
against a 50-OMR plan → 200 + paid_by_credit:true, settings table
never probed (sqlmock asserts no unexpected queries).
- pre-existing-credit path: customer with 200-OMR standing balance
buys a 100-OMR plan, no promo_code in request → 200 +
paid_by_credit:true + 100-OMR leftover credit.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
048cb2c3de
|
fix(sme): wire tenant + billing event dispatchers to NATS (was Redpanda-only, blocking convergence) (#1626)
The tenant + billing services hardcoded a franz-go Kafka publisher
pointing at REDPANDA_BROKERS. On Sovereigns there is NO Redpanda in
cluster — only NATS JetStream at
nats-jetstream.nats-system.svc.cluster.local:4222 — so every
tenant.created / tenant.deleted / order.placed event was silently
dropped, blocking provisioning + downstream consumers and stalling
the convergence chain end to end.
Per ADR-0001 §6 the canonical event bus is NATS JetStream with
subject convention `catalyst.<domain>.<event>`. This change:
- Adds events.BrokerPublisher + events.MultiPublisher that fan out
to NATS (`catalyst.<event.Type>` derived from Event.Type) and the
legacy Redpanda topic in one call. Either transport may be nil;
the constructor refuses to build a no-op publisher (the exact
silent-failure mode we just hit).
- Adds NATSConn.PublishEvent so the generic Event envelope can flow
over the same JetStream connection used for the metering
subscriber (#798), with Event.ID as the JetStream Msg-Id for
broker-side de-dup.
- Updates tenant + billing main.go to read NATS_URL +
REDPANDA_BROKERS independently, construct the appropriate
transports, and wire MultiPublisher into the Handler. Legacy
Kafka consumers only start when REDPANDA_BROKERS is non-empty
so the pods no longer crashloop dialling localhost:9092 on
Sovereigns.
- Updates chart templates to inject NATS_URL into both tenant and
billing Deployments. ConfigMap default for NATS_URL on Sovereigns
is nats://nats-jetstream.nats-system.svc.cluster.local:4222
(fixes the existing bug where defaults pointed at the wrong
namespace `nats-jetstream` — NATS actually lives in `nats-system`
per clusters/_template/bootstrap-kit/07-nats-jetstream.yaml).
- Sovereign default of REDPANDA_BROKERS is now empty (was the wrong
NATS URL stuffed into a Kafka env, which made franz-go fail every
dial).
Subject mapping per CanonicalSubject:
tenant.created → catalyst.tenant.created
tenant.deleted → catalyst.tenant.deleted
tenant.app_install_requested → catalyst.tenant.app_install_requested
order.placed → catalyst.billing.order.placed
Test:
go build ./... in shared/, tenant/, billing/ (clean)
go test ./events/... ./handlers/... in all three (existing + new
bridge_test.go pass)
helm template with global.sovereignFQDN set renders NATS_URL in
both Deployments + REDPANDA_BROKERS="" in ConfigMap
helm template without global.sovereignFQDN renders the legacy
Redpanda broker (Catalyst-Zero contabo path remains intact)
NATS-side consumers for sme.tenant.events / sme.provision.events ship
in a follow-up PR per the ADR-0001 §6 migration plan; this PR only
unblocks the publish leg which is the immediate convergence blocker.
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
255eb3bf17
|
feat(sandbox+auth+newapi): Wave 1b — newapi proxy + BYOS + org-scoped JWT (#1619)
Three coordinated deliverables for Sandbox Wave 1b — scaffolding +
design + the ONE prerequisite (long-lived org-scoped JWT) the rest of
Sandbox depends on.
Deliverable 1 — newapi proxy contract:
- products/sandbox/docs/newapi-proxy-contract.md: agent-pod env
(LLM_GATEWAY_URL / OPENAI_BASE_URL alias), provider selection
(?provider=qwen; default Qwen via omtd.bankdhofar.com), per-Sandbox
token issuance via /admin/tokens/sandbox bridge, lifecycle +
rotation, auth model.
- platform/newapi/internal/handler/sandbox_token.go: bridge handler
stub. Validates the inbound PAT (typ=pat + aud=newapi + org_id
cross-check vs request body), then echoes a NewAPI-shaped response
so the contract is testable without the upstream NewAPI admin
API. Wave 4 wires the actual upstream calls.
Deliverable 2 — Claude Code BYOS OAuth:
- products/sandbox/docs/claude-code-byos.md: UX (Connect Claude Max →
OAuth → refresh token Secret/catalyst-system/sandbox-byos-claude-
code-<user-uid>), Pod env injection (ANTHROPIC_API_KEY bypassing
newapi), per-session toggle, revocation paths, chart wiring.
- products/catalyst/bootstrap/api/internal/handler/byos_claude_code.go:
POST /start, GET /callback, DELETE, GET /status — four endpoints
behind RequireSession. Honest 503 + 501 surface so the popup
flow exercises end-to-end against the placeholder client_id;
Wave 4 flips it live.
Deliverable 3 — Long-lived org-scoped JWT (THE prerequisite):
- platform/keycloak/chart/templates/configmap-sovereign-realm.yaml +
configmap-tenant-realm.yaml: add `org` protocolMapper emitting
user attribute `org` as claim `org_id`; add `org` to default
client scopes for ALL clients.
- core/services/auth/handlers/handlers.go: include typ=session in
JWTs + document the cross-service claim contract.
- core/services/auth/handlers/pat.go: NEW POST /auth/pat with
admin-configurable TTL (default 7d, max 90d), audience claim,
capabilities pass-through, typ=pat discriminator.
- core/services/auth/handlers/routes.go + main.go: wire /auth/pat
behind JWTAuth middleware.
- core/services/shared/auth/claims.go: single Claims struct +
HasCapability/HasGroup helpers + ContextKey for cross-service
consumers (sandbox-controller, newapi bridge, MCP server).
- products/catalyst/bootstrap/api/internal/auth/session.go: align
Org JSON tag with new `org_id` claim; UnmarshalJSON accepts BOTH
legacy `org` and new `org_id` so a rolling chart upgrade does
not regress org-scoped queries.
Out of scope (Wave 4 wires):
- Sandbox CRD + controller (writes Secret, mounts Pod env).
- Actual outbound HTTP to Anthropic /oauth/token + KMS encrypt.
- Actual outbound HTTP to NewAPI admin API.
- Per-Sandbox capability projection from Keycloak groups.
- PAT revocation lookup (jti store) + /auth/pats list.
- Settings UI card + session-toolbar routing toggle.
Build verification (go vet + go build clean):
- core/services/auth/...
- core/services/shared/...
- platform/newapi/internal/handler/...
- products/catalyst/bootstrap/api/...
Founder TODO (single knob to flip BYOS live, Wave 4):
Register an Anthropic OAuth client at
https://console.anthropic.com/settings/oauth (public PKCE,
redirect=https://console.<sov-fqdn>/api/v1/sandbox/byos/claude-code/callback)
and paste the client_id into clusters/<sovereign>/bootstrap-kit/
sandbox.yaml. Today every BYOS endpoint returns 503 with a clear
message pointing at claude-code-byos.md §8.
Refs: products/sandbox/docs/architecture.md §6 (THE prerequisite).
Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>
|
||
|
|
964dc15570
|
fix(catalog): D27 — fresh-seed apps default Published+Deployable (#1584)
* fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c04b2ec76d
|
feat(wordpress-tenant): activeHotStandby option wires bp-cnpg-pair (D31) (#1562)
Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app may opt into a cross-region active-hot-standby Postgres pair for their WordPress instance instead of the default single CNPG Cluster. Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster CRs with WAL streaming over Cilium ClusterMesh via a managed Service annotated service.cilium.io/global=true). When the new pg.activeHotStandby.enabled flag is false (default), templates render the existing single Cluster bit-for-bit — no regression for non-HA tenants. Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace HA filter can surface it. Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default single-cluster shape AND the enabled 2-Cluster shape with the right nodeSelectors, replica.source, externalCluster.host, Cilium global annotation, and bootstrap.pg_basebackup; all 5 cases pass. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f9ed292198
|
fix(billing): /redeem-preview + plans + addons bypass JWT (D29) (#1561)
* chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes) PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as public gateway routes — required for the marketplace /redeem zero-touch flow. Pin the slot so future provisions inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing): /redeem-preview + plans + addons bypass JWT (D29) Mirror PR #1559's gateway public routes in the billing service's own middleware chain. The gateway now lets these requests through without an Authorization header (D29 voucher-redeem landing), but billing service's main.go was JWT-gating EVERY /billing/* path except /billing/webhook — so the request still got 401, just one hop later. Caught live on t132 2026-05-16 after PR #1559 rolled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a11067da1a
|
fix(gateway): /redeem-preview + plans + addons must be public (D29) (#1559)
* feat(billing+notification): wire voucher-issued email (D28) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): admin pod uses dedicated image tag (D27 SME stack) t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` — the SME services CI run for that mono-repo SHA published 10 services but admin's image was missing from GHCR. Decouple admin's tag from smeTag so a missing-build for one service doesn't wedge the SME stack. Default to `3c2f7e4` (matches marketplaceApi + console, known-published). When admin's UI changes, bump in lockstep with those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(slot-13): pin bp-catalyst-platform to 1.4.144 PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override) landed and Blueprint Release packaged 1.4.144. Pin the slot file so future provisions get the latest chart by default — t132 manually upgraded via kubectl patch but t133+ will inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): /redeem-preview + plans + addons must be public (D29) The marketplace /redeem?code=XXX landing page calls /api/billing/vouchers/redeem-preview unauthenticated per docs/FRANCHISE- MODEL.md §3, but the gateway's catch-all /api/billing/ entry was returning 401 to it — breaking the entire voucher-redeem zero-touch flow that D29 depends on. Also expose /api/billing/plans and /api/billing/addons so the marketplace landing can render pricing without a session. Caught live on t132 2026-05-16 — every /redeem call returned 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1fe706769f
|
feat(billing+notification): wire voucher-issued email (D28) (#1556)
D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b0ed216e81
|
feat(catalog): catalog-svc HTTP REST service + chart wiring (slice L1+L2, #1097) (#1148)
EPIC-2 Slice L of #1097. Multi-source Blueprint catalog HTTP REST service backed by Gitea (3 sources: public mirror, sovereign-curated, per-Org private). Replaces the per-Org SME catalog per ADR-0001 §4.3 (different scope: SME's was Org-bound; catalyst-catalog is Sovereign- wide multi-source). L1 — core/services/catalyst-catalog/ Go service: - Separate go.mod (services group is for HTTP services, controllers group is for CRD reconcilers — documented in DESIGN.md). - Imports the unified Gitea client via Go module replace directive. - Promoted core/controllers/internal/gitea → pkg/gitea so the catalog (a sibling Go module) can import it (Go internal/ rule). 5 Group C controllers updated atomically. - HTTP REST endpoints: /api/v1/catalog{,/{name},/{name}/versions, /{name}/versions/{version}} + /healthz. - Source resolution priority on collision: private > sovereign > public. - Per-Org access filter: caller's Claims.Groups[] determines visible private blueprints; Org A user does NOT see Org B's private set. - 30s TTL LRU cache on blueprint.yaml reads (capacity 1024 default). - Session-cookie / Bearer / ?access_token= claim extraction matching catalyst-api's seam; expired-token rejection in-process. - Containerfile: distroless-static, non-root UID 65532. L2 — products/catalyst/chart/templates/services/catalog/ wiring: - 5 templates (deployment, service, serviceaccount, rbac, httproute) + _helpers.tpl. Default-OFF gate via .Values.services.catalog.enabled. - helm template: 0 catalog resources when OFF, 6 when ON. - Empty image.tag fail-fasts at render per Inviolable Principle #4a. - HTTPRoute exposes /api/v1/catalog on api.<sovereign> hostname. - Chart bumped 1.4.85 → 1.4.86. Gitea client extension (canonical seam, NOT per-service variant): - +ListOrgRepos(ctx, org) []Repo — paginated repo listing. - +ListContents(ctx, org, repo, branch, path) []ContentEntry — directory listing for per-Org shared-blueprints fan-out. GitHub Actions workflow: - .github/workflows/catalyst-catalog-build.yaml — push-on-paths + pull_request + workflow_dispatch (NO cron). go vet + go test (race + count=1) + image build → GHCR :<sha>. repository_dispatch fan-out to chart-bump matches the Group C controllers' pattern. Tests (3-tier gate): unit (config, cache, auth, source, handler) + integration (httptest-backed Gitea fixtures across all 3 sources + priority + per-Org access). All green; race detector on. L3 (SME catalog retirement) is deferred per the EPIC-2 master brief. GraphQL deferred (REST first; gqlgen would pull ~80MB of indirect deps for a feature no UI consumer has asked for yet). Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a57d05d4dd
|
fix(provisioning,catalog): parent-kustomization prefix collision + disable openclaw/stalwart-mail (#1043)
Two bugs surfaced live 2026-05-06 on tenant "test": 1) UpdateParentKustomization used substring match against " - <slug>", which falsely "found" the slug when it was a PREFIX of an existing entry. Adding "test" to a file already listing "test11" or "test13" silently no-op'd. Result: tenant manifests committed but the tenants/kustomization.yaml never registered them, Flux's tenants Kustomization couldn't apply the new tenant, vCluster step timed out at 10m. Fix: exact line match on the resources entry. 2) openclaw + stalwart-mail were flagged Deployable=true in #941 but never had AppSpec entries in core/services/provisioning/gitops/apps.go KnownApps. The SME provisioning generator emits a single-Deployment template that requires Image + Port; for those two slugs it produced invalid manifests: Deployment.apps "openclaw" is invalid: containers[0].image: Required value containers[0].ports[0].containerPort: Required value tenant-test11-apps Kustomization rejected the dry-run, no apps ever landed inside the vcluster. Re-enabling these requires per-app overlay support beyond the single-Deployment template — separate work. For now: comment them out of DeployableAppSlugs so the catalog seed flips them back to Deployable=false on next pod restart and the marketplace UI shows them as COMING SOON. Adds regression tests for both: prefix-collision in UpdateParentKustomization, and a stability test on the deployable map shape. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ff0e90156d
|
fix(provisioning): re-read parent kustomization on commit retry — prevent slug-resurrection race (#1034)
Live race seen 2026-05-06: bookcheck teardown committed at T (removed the slug from tenants/kustomization.yaml + pruned its directory). Multitest provision's first commit attempt at T-2s got a ref-race rejection, the github client's retry replayed the SAME files map (which held the pre-teardown parent kustomization with bookcheck still in it), and the retry's commit at T+5s overwrote the teardown's removal. Result: the parent kustomization listed bookcheck but the directory was gone, Flux's tenants Kustomization wedged in build-failure loop, and EVERY subsequent tenant change was blocked until manually unblocked. Add CommitFilesWithPruneAndRebuild — same as CommitFilesWithPrune but takes a `rebuild(ctx) (files, error)` callback invoked at the start of each attempt. Wire both consumer paths (provision + teardown) through it; each rebuild re-reads parent kustomization.yaml against the current HEAD and re-applies UpdateParentKustomization / RemoveTenantFromParentKustomization fresh. Static tenant-scoped manifests still flow through unchanged. CommitFilesWithPrune is preserved as a thin wrapper for callers that ship truly static files (e.g. day-2 app installs scoped to a tenant subdir, no parent merge involved). Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
f1744c8973
|
fix(provisioning): BookStack — also emit DB_USERNAME/DB_PASSWORD (Laravel-native) (#1031)
PR #1028 fixed the APP_KEY halt and switched to DB_USER/DB_PASS, but linuxserver/bookstack's init script does NOT substitute DB_USER → DB_USERNAME in the .env file. Laravel reads env vars natively but using DB_USERNAME / DB_PASSWORD (Laravel-canonical names). Without those, Laravel falls back to the .env placeholder values (database_username / database_user_password) and the app fails with: SQLSTATE[HY000] [1045] Access denied for user 'database_username'@... Caught live on tenant 'bookcheck' 2026-05-06 after PR #1028 deployed — pod ran, app started, but every request hit the placeholder credentials. Emit BOTH name pairs so the env works regardless of which the LSIO upstream eventually wires up. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b180d56926
|
fix(provisioning): BookStack overlay — add DB_* envs + APP_KEY + APP_URL (#1028)
linuxserver/bookstack reads DB_HOST/DB_USER/DB_PASS/DB_DATABASE (NOT WORDPRESS_DB_*) and halts init with "The application key is missing, halting init!" when APP_KEY isn't set. The pod stays 1/1 Running because the readiness probe doesn't catch the silent halt, but the application never binds to port 80, so the ingress returns 502. Discovered via live E2E on tenant 'aaa' (BookStack on m plan): all 7 provisioning steps reported done, ingress healthy, cert ready, but https://aaa.omani.rest → 502. Add a "bookstack" DBEnvStyle case in the mysql env-emitter that writes DB_*, APP_URL=https://<slug>.omani.rest, and a Laravel-format APP_KEY (base64:<32-byte>). Also add a randomAppKey() helper alongside randomHex(). Tag the catalog AppSpec with DBEnvStyle: "bookstack". Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c9b8c13406
|
fix(tenant): JWT-bypass /tenant/internal/* — paid checkouts never provisioned (#1018) (#1019)
Billing's dispatchOrderPlaced enriches the order.placed NATS event by
calling /tenant/internal/tenants/<id>/subdomain over the in-cluster
ClusterIP. routes.go registers that path with the comment "Internal —
unauthenticated service-to-service", but main.go wraps everything
under /tenant/ in JWTAuth except /tenant/check-slug/. So billing got
401, returned "" for the subdomain, published order.placed with
subdomain="", and provisioning rejected every paid checkout with
"invalid subdomain expected=[a-z][a-z0-9-]{2,30}".
Add /tenant/internal/ to the public-paths bypass. Both gateways
already 401 the path externally, and subdomain values are public DNS
names — the documented threat model.
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
689276889c
|
fix(bp-catalyst-platform+bp-newapi): unblock alice signup gates 2-6 on Sovereigns (#915) (#951)
Six coupled chart + orchestrator fixes that unblock alice marketplace
signup → tenant ready → SaaS integrations → LLM → ledger on a freshly
franchised Sovereign. C5-final got Gate 1 GREEN on otech113 (2026-05-05)
but every downstream gate failed because the SME bundle hardcoded
contabo-only assumptions.
Bumps:
- bp-catalyst-platform 1.4.21 → 1.4.22
- bp-newapi 1.3.0 → 1.4.0
- bootstrap-kit slot 13 + 80 pins updated in lockstep
Issues addressed (single consolidated PR — smaller PRs would race
against alice signup retries):
- #934 (auth SMTP empty → "failed to send email"): sme-secrets.yaml
now reads SMTP_* from `catalyst-system/sovereign-smtp-credentials`
(the same A5-seeded source #883/#905 the chart 1.4.20 catalyst-
openova-kc-credentials Secret already uses) with source-wins
precedence. Both canonical (smtp-host/port/from/user/pass) AND
legacy (host/port/from/user/password) source-Secret key shapes
accepted. Empty source falls back to chart-level defaults so the
contabo path stays clean.
- #940 (provisioning service GITHUB_TOKEN placeholder + hardcoded
upstream github.com): chart values
.Values.smeServices.provisioning.{githubToken,git.{apiURL,owner,
repo,branch}} make every GitHub-API coordinate operator-overridable
with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST
API + `openova` org; contabo ⇒ api.github.com + `openova-io` org).
Provisioning binary's startup gate validates the GITHUB_TOKEN does
NOT contain placeholder substrings (<placeholder>, PLACEHOLDER,
REPLACE_ME, ...) and crashes the Pod into Pending if it does — the
operator sees the misconfig immediately instead of after alice
signups have failed silently in service logs. GitHub client now
accepts a custom API URL via NewClientWithAPIURL so Gitea's GitHub-
compatible /api/v1 surface drops in without re-implementing the
client.
- #941 (catalog "27 apps COMING SOON"): added `openclaw` and
`stalwart-mail` to migrateAppDeployable's deployable map at
core/services/catalog/handlers/seed.go. Both blueprints (bp-openclaw,
bp-stalwart-{sovereign,tenant}) ship with visibility=listed in the
embedded blueprints.json AND have working SME-tenant overlay
templates in sme_tenant_gitops.go, but the catalog handler silently
filtered them out because they were missing here. Map extracted to
DeployableAppSlugs() exported function so unit tests can assert
membership without invoking a Mongo store.
- #942 (REDPANDA_BROKERS hardcoded to talentmesh): configmap.yaml
selects broker default at render time based on global.sovereignFQDN
— Sovereign ⇒ NATS JetStream Service per ADR-0001 (the only local
bus on Sovereigns); contabo ⇒ legacy Redpanda Service in talentmesh.
Operator MAY override either default via
.Values.smeServices.eventBus.brokers without forking the chart.
The ConfigMap key name stays REDPANDA_BROKERS for back-compat with
existing SME service Go env wiring; new EVENT_BUS_PROTOCOL key
surfaces the protocol hint for services that want to switch wire
format independently.
- #943 (bp-newapi silently skips Deployment): NEW
templates/cnpg-cluster.yaml auto-provisions a CNPG-backed Postgres
Cluster + Helm-`lookup`-persistent DSN Secret when
.Values.cnpg.enabled (DEFAULT true). NEW templates/credentials-
secret.yaml auto-generates SESSION_SECRET + CRYPTO_SECRET (each
64-char randAlphaNum, persistent across reconciles via Helm
`lookup`) when .Values.credentials.autoProvision (DEFAULT true).
deployment.yaml gate now resolves Secret names from the chart-
emitted defaults when the operator hasn't supplied an override.
Capabilities-gated on postgresql.cnpg.io/v1 so a cold install
before bp-cnpg is Ready surfaces as "no Cluster yet" rather than
a hard install error.
- #944 (CRITICAL — cross-cluster pollution): provisioning.yaml
templates GIT_BASE_PATH from
.Values.smeServices.provisioning.gitBasePath with a topology-aware
default `clusters/<sovereignFQDN>/sme-tenants` on Sovereigns. NEW
`core/services/provisioning/gitguard` package validates at startup
AND on every commit code path that the path begins with
`clusters/<self-FQDN>/` — refusing to commit to any other cluster's
tree. Defence in depth so a runtime env mutation (kubectl exec,
ConfigMap update without Pod restart, hostile sidecar) cannot
bypass the check. Pre-#944 every alice tenant overlay landed in
upstream openova/openova `clusters/contabo-mkt/tenants/<id>/`
which contabo Flux would then install on the contabo cluster —
C5-final caught + reverted the alice2 incident at commit
|
||
|
|
95a06f56f8
|
fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868) (#869)
Two-part fix for marketplace UI signin flow which 503'd then 404'd on
otech103. Live debugging found two stacked bugs.
Part A — chart (HTTPRoute backend):
- marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080
(cross-namespace) instead of catalyst-system/marketplace-api which had
a Service selector matching zero Pods. The gateway in sme already
fronts services-auth, catalog, tenant, billing, provisioning.
- marketplace-reference-grant.yaml: extend `to:` list with the gateway
Service so the cross-ns hop is authorised by Gateway API.
- Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin.
Part B — services-auth (route name):
- Add /auth/send-pin alias delegating to existing SendMagicLink handler,
and /auth/verify-pin alias delegating to VerifyMagicLink. The
marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the
PIN-named routes are the canonical UX-facing names. /auth/magic-link
and /auth/verify remain registered for backward compat.
- services-build workflow auto-rebuilds the auth image on push to
core/services/** — no manual dispatch needed.
Refs: #868
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
|
||
|
|
fa4395fa3a
|
fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863) (#864)
After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with the client already authenticated". Root cause: bp-valkey 1.0.0 ships auth.enabled=true (bitnami default) but SME service code + Deployment templates never plumbed a password through. Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep. Changes: - core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload taking username + password. ConnectValkey kept backwards-compatible for contabo-mkt's auth-less in-namespace Valkey. - core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME + VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set, else fall through to no-auth path. - NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup` reads bp-valkey's auto-generated `valkey-password` from the `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme` ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret (#830 Bug 2). On first install the lookup may return nil; Flux's 15m reconcile picks up the mirror once bp-valkey is Ready. - auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey- auth` Secret with optional=true so contabo-mkt's auth-less path keeps working when the mirror Secret is absent. - values.yaml: add `smeServices.valkey.{sourceSecretName, sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable Principle #4). Live verified the failure mode on otech103: 11/13 SME pods Running 1/1, auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's CreateContainerConfigError is unrelated (ghcr-pull, separate ticket). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
5cdb738ac9
|
fix(services): go mod tidy across sibling services after #798 shared deps bump (#821)
#798 added github.com/nats-io/nats.go to core/services/shared/go.mod and adjusted x/sys/x/crypto/x/text to Go 1.22-compatible versions. The sibling services (auth, catalog, domain, gateway, notification, provisioning, tenant) reference the same shared module via the local `replace` directive — their go.sum files must include the new transitive hashes, otherwise the CI Containerfile build hits: go: updates to go.mod needed; to update it: go mod tidy This commit is a pure `go mod tidy` across all 7 services; no source changes. CI services-build is now unblocked. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9645a9044a
|
feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) (#818)
* feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add the SME-2 metering integration end-to-end. NewAPI is consumed as the upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned mirror, not a fork) — the metering envelope is produced by a Go sidecar that observes the OpenAI-style `usage.total_tokens` field on every 2xx /v1/* response. This avoids forking the upstream binary while still producing the canonical envelope shape on `catalyst.usage.recorded`. A) NewAPI metering sidecar — core/services/metering-sidecar/ - Transparent reverse proxy in front of NewAPI on its own port; the bp-newapi Service routes the cluster-fronting port to the sidecar, which forwards to NewAPI on the pod's loopback. - Observes successful /v1/* JSON responses, parses `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes one envelope on `catalyst.usage.recorded` per completed request. - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed. - Customer-facing latency is NEVER blocked on metering: the response body is restored before publish; on NATS unreachable the envelope is persisted to disk and retried by a background drain loop. - 14 unit tests (proxy + publisher + safeFilename guards). B) sme-billing NATS subscriber — core/services/billing/handlers/ metering_consumer.go - JetStream durable consumer `sme-billing-metering` on stream `CATALYST_USAGE` (provisioned by sme-billing on startup). - Idempotent on metadata.request_id via a UNIQUE partial index on credit_ledger.external_ref; redelivery from the broker collapses to a single ledger row. - Customer auto-create on cold start (the rbac sme.user.created envelope may land AFTER the first metered request; we don't strand usage waiting for it). - 11 unit tests covering happy-path, idempotency, malformed-payload poison-pill, missing-request-id, non-negative amount guard, resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak. C) HTTP handler POST /billing/metering/record — handlers/metering.go - Synchronous validate → INSERT credit_ledger → return {ledger_entry_id, balance_after_omr, balance_after_micro_omr, duplicate}. Same payload + idempotency guard as the NATS path. - Auth: superadmin OR sovereign-admin (operator-admin model; end-user LLM traffic flows through the sidecar, never this URL). - 8 unit tests covering happy-path, idempotency, role gating, malformed-JSON, positive-amount rejection, customer-not-found. D) Schema — core/services/billing/store/store.go - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR exact integer — preserves precision at metering rates). - ADD COLUMN external_ref TEXT + UNIQUE partial index for idempotency dedup. - ADD COLUMN metadata JSONB for the raw envelope. - GetCreditBalance projects both amount_omr (legacy) and amount_micro_omr (new) into the integer-OMR view. - GetCreditBalanceMicroOMR returns canonical precision. - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0) distinguishes fresh insert from duplicate without a follow-up SELECT. E) Wiring - core/services/shared/events/nats.go — minimal NATS JetStream publisher + subscriber surface; legacy RedPanda producer/consumer in events.go untouched per [Q-mine-3]. - core/services/billing/main.go — NATS_URL env; subscriber wired in parallel with the existing RedPanda tenant-events consumer. - middleware/jwt.go — exported test helper WithClaims so handler tests can construct an authenticated context without minting a real signed token. - .github/workflows/services-build.yaml — metering-sidecar added to the build matrix; deploy job skips it (image consumed by the bp-newapi chart, not products/catalyst sme-services). F) bp-newapi chart (1.0.0 → 1.1.0) - meteringSidecar block in values.yaml: image, port, NATS URL, priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool dir, header names, resources, securityContext (read-only-rootfs). - deployment.yaml renders the sidecar container + emptyDir spool volume when meteringSidecar.enabled (default true). - service.yaml routes the cluster-fronting :3000 to the sidecar when enabled, exposes a separate :3001 → NewAPI direct port for bp-catalyst-platform admin-API traffic (ADR-0003 §3.2). - networkpolicy.yaml allows the sidecar's port + nats-system egress for JetStream publish. Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green. Helm template renders cleanly with sidecar enabled and disabled. Closes #798 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798) Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000") that does NOT scan directly into Go int64 — the integration test TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on the post-redeem balance read. Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is unambiguously bigint and Scan target stays uniform across pre-#798 rows (amount_omr only) and post-#798 rows (amount_micro_omr present). Affects: - GetCreditBalance - GetCreditBalanceMicroOMR - RecordUsage's running-balance read Test mocks updated to match the new SQL prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a034a0959
|
feat(catalog): unified catalog with Published flag — operator curates marketplace (#710 wave 2) (#724)
Single source of truth for apps; Sovereign-console operator decides which
apps marketplace customers see; marketplace storefront filters by
Published. Per founder rule 2026-05-04: unpublish is a marketplace-
visibility toggle, not a deployment-lifecycle action — existing tenant
deployments of an unpublished app keep running unaffected.
core/services/catalog/store/store.go
====================================
- App.Published bool — operator-controlled visibility
- ListPublishedApps: marketplace-storefront subset
(Published=true AND System=false AND Deployable=true).
System and Deployable are catalog-team-controlled; Published is the
operator's curation knob.
- SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign
console hits per row toggle. Cheaper than UpdateApp; slug-keyed so
the UI doesn't need the internal Mongo _id.
- UpdateApp: thread published through full-update path too.
core/services/catalog/handlers/handlers.go + routes.go
======================================================
- ListApps now honours ?published=true query param:
GET /catalog/apps → operator view: every app
GET /catalog/apps?published=true → marketplace view: filtered
- New PATCH /catalog/admin/apps/{slug}/publish?value={true|false}
for the Sovereign-console operator's row toggle.
- requireAdmin gating preserved on the admin endpoint.
core/services/catalog/handlers/seed.go
======================================
- migrateAppPublished: defaults Published=true on every existing app
on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace
visibility per app, not IN — matches how a real SaaS storefront is
curated and prevents an empty marketplace on flag-introduction day.
Idempotent on re-run.
core/marketplace/src/lib/api.ts
================================
- getApps() now hits /catalog/apps?published=true so the marketplace
storefront only renders the operator-curated subset.
DoD pending wave 2.5
====================
The Sovereign-console "Catalog & publishing" admin page (per-row
toggle UI) is the next chunk and ships in a follow-up — backend +
storefront filter are the load-bearing change here. Catalog admins
can flip the flag today via the PATCH endpoint; the per-row UI is
quality-of-life on top.
Co-authored-by: hatiyildiz <hatiyildiz@openova.io>
|
||
|
|
9519c1ef00 | merge: Group L testing (Playwright e2e smoke tests, Hetzner provisioning test scaffold gated on HETZNER_TEST_TOKEN secret, integration tests for bootstrap installer + Dynadot + voucher) | ||
|
|
7edf63ca7e |
docs(franchise),test(billing): voucher CRD propagation invariant
#118 verifies that the voucher shape on a franchised Sovereign is identical to Catalyst-Zero. Two artefacts: 1. New §"Voucher shape propagates automatically" in docs/FRANCHISE-MODEL.md explaining WHY there is no propagation problem to solve: vouchers are not a CRD. They are rows in the per-Sovereign billing service's Postgres database, and every Sovereign runs the same SHA-pinned core/services/billing image. Same image → same migration → same schema → same handlers → same shape. The doc lists which file owns each part of the shape and includes a 4-step curl smoke test to run on any Sovereign at first-provisioning to confirm the invariant holds. 2. New core/services/billing/handlers/vouchers_test.go covering the public POST /billing/vouchers/redeem-preview endpoint added in #117. Four cases: - 404 on unknown / soft-deleted code (no tombstone leak) - 200 on a valid live code, asserting the public shape excludes times_redeemed and max_redemptions (defence-in-depth against enumeration) - 410 Gone on a code that exists but has hit its cap, with the credit/description still in the response so the landing page can show "campaign ended" - 400 on whitespace-only input The tests run on every CI build of the billing service, on every Sovereign that builds from this repo. If a future change drifts the preview endpoint's shape, the tests fail before the regression can ship. Also tidies vouchers.go imports (removed two unused stdlib imports that were placeholder). Closes #118. |
||
|
|
12387a4a74 |
feat(billing): /billing/vouchers/{issue,list,revoke,redeem-preview} surface
#117 adds a franchise-aligned URL surface for the existing PromoCode voucher implementation, plus one new endpoint (redeem-preview) for the public landing flow described in docs/FRANCHISE-MODEL.md §3. The orchestrator's hint was right — the issue/list/revoke handlers already exist (AdminUpsertPromo / AdminListPromos / AdminDeletePromo on the legacy /billing/admin/promos surface). This commit: 1. Adds new endpoint handlers in core/services/billing/handlers/vouchers.go: - POST /billing/vouchers/issue (superadmin or sovereign-admin) - GET /billing/vouchers/list (superadmin or sovereign-admin) - DELETE /billing/vouchers/revoke/{code} (superadmin or sovereign-admin) - POST /billing/vouchers/redeem-preview (unauthenticated; public) The first three reuse the existing store-layer methods. The last is new — it validates a code without consuming it, returning a safe shape (no times_redeemed, no max_redemptions exposure) so an attacker scraping the public endpoint cannot enumerate cap status. 2. Distinguishes 404 (code never existed or soft-deleted — same tombstone-leak protection as #91) from 410 Gone (code exists but is inactive or capped). The 410 body still includes the credit and description so the landing page can show "this campaign has ended". 3. Keeps the legacy /billing/admin/promos endpoints in place — the existing admin UI continues to work without any breaking change. New code should target /billing/vouchers/... 4. Updates docs/FRANCHISE-MODEL.md to point to the new URL surface. The actual REDEMPTION still happens transactionally inside POST /billing/checkout via the `promo_code` field — that path locks the promo row, inserts the promo_redemptions edge, increments times_redeemed, and adds the credit_ledger entry in one transaction. Splitting it into a separate /redeem endpoint would break that atomicity, so we deliberately do not add one. The public redeem flow is preview → signup → checkout-with-promo_code. Closes #117. |
||
|
|
3e956b7d81 |
test: voucher issuance integration test — real Postgres (#147)
Closes the Group L "integration test — voucher issuance via API — issue → redeem → Org created path" ticket. Per docs/INVIOLABLE-PRINCIPLES.md principle #2 (no mocks where the test would otherwise verify real behavior), this test runs against a real PostgreSQL — not sqlmock. The voucher mechanic lives in store.RedeemPromoCode which runs a transaction with SELECT FOR UPDATE on promo_codes, COUNT lookup on promo_redemptions, and inserts into credit_ledger. Mocking SQL strings doesn't verify whether the transactional invariants actually hold under concurrent contention; this codebase has been bitten by exactly that gap before (#93: counter incremented before order was committed). The test is gated on BILLING_TEST_PG_URL — when unset, it skips (NOT mocks). CI populates it via the new postgres service container in .github/workflows/test-billing-integration.yaml. Each test gets its own Postgres schema (via CREATE SCHEMA + libpq's options=-c search_path) so parallel runs don't cross-contaminate, and so goroutine concurrency tests reliably hit the same schema regardless of which pooled connection they pick up. Coverage: - Issue → Redeem → Credit applied (the canonical happy path) - Per-customer double-redemption blocked - Redemption cap enforced under concurrency (12 goroutines fighting for a 5-cap voucher → exactly 5 successful redemptions, no more) - Soft-deleted codes rejected as "not found" (no tombstone leak per #91) - Inactive codes rejected with distinct "not active" error - Two different customers can each redeem the same voucher - Org-creation prerequisites: customer.tenant_id non-empty, balance > 0 (these are the inputs the downstream tenant.created event consumer feeds into CreateTenant — covered by tenant-service consumer_test.go) CI workflow added: .github/workflows/test-billing-integration.yaml runs the tests against a postgres:16-alpine service container with -race. Refs #147 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
fabedd42c1 |
feat(admin,billing): per-Sovereign voucher issuance for sovereign-admin
#115 extends the existing PromoCode (voucher) admin surface so a sovereign-admin role can issue, list, and revoke vouchers on a franchised Sovereign. No new endpoints, no new schema, no new CRD — all the changes are role-gating widenings on the existing surface. Backend (core/services/billing/handlers/handlers.go): - New `requireVoucherIssuer` helper accepts both `superadmin` and `sovereign-admin`. Used by AdminListPromos, AdminUpsertPromo, and AdminDeletePromo only. All other admin endpoints (Stripe settings, revenue, orders) keep the existing `requireAdmin` (superadmin-only). UI (core/admin/src/components/AdminShell.svelte + BillingPage.svelte): - AdminShell now accepts both roles. Sidebar nav is filtered by role: superadmin sees Revenue / Catalog / Tenants / Orders / Billing; sovereign-admin sees only Billing. Filtering is via a `superadminOnly` flag on each nav item (defence-in-depth: even if a sovereign-admin guesses a URL, the backend's requireAdmin will return 403). - BillingPage hides the Stripe Configuration section for sovereign-admin (it would 403 from GET /billing/admin/settings anyway). The Vouchers (Promo Codes) section is shown to both roles with a small label tweak ("Issued vouchers are scoped to this Sovereign" for sovereign-admin). Per docs/INVIOLABLE-PRINCIPLES.md §1 (target-state shape, no MVP) and §3 (follow documented architecture exactly) — this matches the FRANCHISE-MODEL.md design where "every franchised Sovereign runs the same admin app" with role-based gating. Closes #115. |
||
|
|
7646840ffe |
feat(consolidation): move 8 SME backend services + shared module to public repo
Per docs/PROVISIONING-PLAN.md and tickets [B] sme-backend group. Migrates the 8 Go backend services from openova-private/services/ to openova/core/services/, plus the shared module they all depend on, plus the services-build CI workflow.
What moved:
- services/auth → core/services/auth (Go HTTP service for SME marketplace authentication)
- services/billing → core/services/billing (Go HTTP service for billing + voucher backend)
- services/catalog → core/services/catalog (Go HTTP service for App catalog)
- services/domain → core/services/domain (Go HTTP service for tenant domain mapping)
- services/gateway → core/services/gateway (Go HTTP gateway with rate limiting)
- services/notification → core/services/notification (Go HTTP service with email templates)
- services/provisioning → core/services/provisioning (Go HTTP service that commits tenant Application manifests via Gitea/GitHub API)
- services/tenant → core/services/tenant (Go HTTP service for tenant lifecycle)
- services/shared → core/services/shared (shared Go module: db, events, health, middleware, respond)
- 9 go.mod files updated: module github.com/openova-io/openova-private/services/<X> → github.com/openova-io/openova/core/services/<X>
- 9 go.sum and import paths similarly updated
- replace directives updated: openova-private/services/shared → openova/core/services/shared
- sme-services-build.yaml workflow → services-build.yaml in .github/workflows/, paths/context/image-base/deploy paths all repointed at core/services + ghcr.io/openova-io/openova/services-* + products/catalyst/chart/templates/sme-services
- All 8 manifests in products/catalyst/chart/templates/sme-services/ updated: image refs ghcr.io/openova-io/openova-private/sme-{X} → ghcr.io/openova-io/openova/services-{X}
- provisioning.yaml GITHUB_REPO env var: "openova-private" → "openova"
Closes [B] sme-backend (10 tickets).
After this commit, all 14 user-facing + backend Catalyst-Zero modules build from this public repo:
- 4 UIs: console, admin, marketplace, catalyst-ui
- 2 backends: marketplace-api, catalyst-api
- 8 SME services: auth, billing, catalog, domain, gateway, notification, provisioning, tenant
- 1 shared Go module
Note: 1 line in core/services/provisioning/main.go retains a literal default of "openova-private" for the GITHUB_REPO fallback when env var is unset; the K8s manifest sets GITHUB_REPO=openova explicitly so this path is never exercised in the deployed runtime, and the in-code default will be cleaned up in a follow-up.
|