openova

Author	SHA1	Message	Date
e3mrah	8878938a43	fix(ci): bump sme-services Containerfiles golang 1.22 → 1.26 (unblock 5 stranded fixes) (#1691 ) Every services-build run since 2026-05-18 06:32 UTC failed with "go: go.mod requires go >= 1.26.0 (running go 1.22.12; GOTOOLCHAIN=local)" because a recent go.mod bump to `go 1.26.0` was not paired with a Containerfile base-image bump. 5 strandled fixes that never produced new image SHAs: - PR #1683 fix(billing): consume catalyst.usage.recorded from CATALYST_SME stream (was creating overlapping CATALYST_USAGE) - PR #1684 fix(provisioning): set Organization.spec.tenantPublic - PR #1685 fix(catalog+billing): Sandbox Free/Pro/Ent plans + quota - PR #1686 feat(sandbox): orchestrator listens tenant.sandbox_requested - test(sandbox): integration tests for orchestrator + sessions API The stranded billing image is the root cause of every voucher 502 on t22 and blocks the full marketplace customer journey (steps 9, 10, 15 all fail). t22 billing Pod is in CrashLoopBackOff with the exact NATS subject-overlap signature PR #1683 fixes. Bumps all 10 service Containerfiles (auth/billing/catalog/catalyst- catalog/domain/gateway/metering-sidecar/notification/provisioning/ tenant) to golang:1.26-alpine, matching the toolchain in go.mod. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:36:39 +04:00
e3mrah	8017700ad4	feat(sandbox): tier-bound MCP capabilities (Free/Pro/Ent plans gate tool access) (#1690 ) Stop handing every Sandbox session the full MCP surface. Each per-Sandbox NewAPI token now carries a plan-derived capability allowlist that the MCP server enforces against per-tool RequiredCapability via Claims.HasCapability: - Free: read-only k8s + gitea read + session/rag/skills - Pro: + sandbox.db.* + sandbox.storage.* + sandbox.preview.* + sandbox.auth.* + sandbox.secrets.* + marketplace.* + flux.status - Ent: + sandbox.deploy.{staging,production,...} + sandbox.stripe.* + flux.{reconcile,suspend,resume} + gitea.pr.{create,merge} + gitea.issue.* Wiring: - Sandbox CRD spec gains planId + capabilities[] (operator overlay). - Sandbox sandboxapi.{CapabilitiesForPlan,ResolveCapabilities} is the SoT; tenant orchestrator carries an exact-mirror capabilitiesForPlan (no controllers-module dep — same isolation pattern quotaForPlan uses). - sandbox-controller threads spec.capabilities (falling back to plan) into newapi.MintRequest. - catalyst-api bridge handler accepts capabilities[] on the wire and encodes it as the JWT `capabilities` claim (omitted when empty). - Claims.HasCapability gains wildcard prefix matching (`sandbox.db.*` satisfies `sandbox.db.provision`, `sandbox.db`, etc.) so plan grants stay coarse. Plain stem matches WITHOUT a wildcard are intentionally rejected — the production second-gate in sandbox_deploy.go stays honest. - MCP registry: every gated tool now carries its granular dotted RequiredCapability (`sandbox.db.provision`, `gitea.pr.list`, …). Read-only / session tools previously ungated also get granular grants so Free tokens can browse without inheriting the write surface. No Chart.yaml bump — CRD additions are additive; existing Sandbox CRs parse fine. Empty token capabilities downgrades to introspection only, matching pre-PR-#1671 callers. Tests: shared/auth/claims_test.go (wildcard matrix), sandboxapi/capabilities_test.go (plan ladder + spec override), sandbox_token_test.go (capabilities round-trip + omit-on-empty), sandbox_controller_test.go (plan-derived + spec-override mint), sandbox_consumer_test.go (orchestrator stamps spec.capabilities), plus updates to every per-namespace registry test asserting new granular RequiredCapability values. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 16:30:00 +04:00
e3mrah	ffb79aab12	fix(billing): consume catalyst.usage.recorded from CATALYST_SME stream (was creating overlapping CATALYST_USAGE) (#1683 ) t20 (2026-05-18) caught the bug: billing crashed at startup with NATS error code 10065 "subjects overlap with an existing stream" because CATALYST_SME (subjects `catalyst.>`, created by the tenant / provisioning MultiSubscribers) had already claimed `catalyst.usage.recorded` by the time billing tried to create CATALYST_USAGE (subject `catalyst.usage.recorded`). JetStream forbids two Streams from owning overlapping subject filters. Option B per the matrix: have billing share CATALYST_SME and scope its metering reads via a consumer-side FilterSubject instead of owning a separate Stream. This matches the architecture every other SME service (tenant, notification, provisioning) already uses for catalyst.* events. Changes: - core/services/shared/events/nats.go: add EnsureCatalystSMEStream (public wrapper around the existing package-private ensureSMEStream helper used by NewMultiSubscriber) + SubscribeUsageRecordedOnSME (durable consumer on CATALYST_SME with FilterSubject scoped to catalyst.usage.recorded). The original EnsureUsageStream and SubscribeUsageRecorded are retained but marked Deprecated for back-compat with any Catalyst-Zero / dev loop wired before t20. - core/services/billing/main.go: replace the EnsureUsageStream call with EnsureCatalystSMEStream and the SubscribeUsageRecorded call with SubscribeUsageRecordedOnSME. Comment captures the t20 root cause + the bootstrap-order rationale so the next reader doesn't re-introduce the dedicated Stream. The consumer-side FilterSubject (`catalyst.usage.recorded`) lives in core/services/shared/events/nats.go inside SubscribeUsageRecordedOnSME. go build + go test clean for core/services/billing and core/services/shared. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:53:25 +04:00
e3mrah	3acb340b36	test(sandbox): integration tests for orchestrator + sessions API status reflection (#1680 ) Adds regression coverage so the Sandbox event flow + REST surface can be exercised without a live Sovereign — the convergence loop the qa-loop's last 5 iterations relied on. Tenant orchestrator (5 cases / 8 runs): * full event flow — tenant.sandbox_requested envelope → in-process BrokerSubscriber → SandboxOrchestrator.Start → recordingSandboxClient materialises a CR shaped per architecture.md §7 (labels, annotations, spec.owner/quota/agentCatalogue/planId) * NATS-style redelivery is idempotent — second Emit() goes Get(found) → no-op, Create count stays at 1 * plan tiers fan out — free/pro/ent each stamp the right quota (catches the PR #1633 regression) * non-sandbox event types ignored at the dispatcher seam * agentCatalogue strips empty / whitespace entries before persist Catalyst sessions API (7 cases / 10 runs): * POST → GET round-trip through a dynamic/fake apiserver via SetSovereignDepsFactory (mirrors chroot Sovereign "Path 2") * GET reflects controller status (sessions / storage / spend / previews / conditions) into the FE wire shape * Failed condition taxonomy — TokenMintFailed, GitopsWriteFailed, ManifestRenderFailed each preserved verbatim so the FE renders actionable error states instead of a generic red pill * POST invalid-agent returns 400 before any apiserver call * GET unknown sandbox returns 404 sandbox-not-found * LIST → DELETE → LIST round-trip * Org-scope isolation — claims.Org-scoped namespace boundary blocks cross-Org leak Hard rules followed: READ-ONLY fake clients (no apiserver write), no chart bump, no production code changes — only new _test.go files. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 14:41:41 +04:00
e3mrah	96d2d9bce7	fix(provisioning): set Organization.spec.tenantPublic on product-install (was empty; HTTPRoute reconciler had nothing to render) (#1650 ) PR #1644 added Organization.spec.tenantPublic + per-tenant HTTPRoute reconciler, but nothing set the field — every Org CR's TenantPublic stayed zero-value, the reconciler short-circuited at the empty ParentDomain guard, and `<slug>.omani.homes` 404'd at the Cilium Gateway. Wire the patch at the only point that knows a tenant's product is actually Ready: the provisioning service. Both the initial workflow (`provision.completed`) and the day-2 install path (`provision.app_ready`) now patch the Organization CR's spec.tenantPublic with parentDomain (from TENANT_PARENT_DOMAIN env), subdomain (= slug), backendService (canonical vcluster-synced name), port 80, and the picked product slug. Last-write-wins on subsequent installs. Per docs/INVIOLABLE-PRINCIPLES.md #4 the parent zone flows through env, never hardcoded — every Sovereign picks its own pool zone. Empty env disables the patch entirely (legacy tenants keep working through the Sovereign-wide tenant-wildcard route). Best-effort: failures don't fail the provision. 404 on the CR is benign (legacy tenant without an Organization counterpart). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:44:00 +04:00
e3mrah	8888d9edd1	feat(catalog+billing): Sandbox Free/Pro/Ent plans + quota wire (was no plans = broken checkout) (#1642 ) PR #1633 added the Sandbox app to seedApps but never wired the matching plan rows. The marketplace checkout hit "plan_id not found" the moment a customer picked Sandbox, and PR #1639's sandbox-orchestrator could only mint CRs with the Wave 1 baseline quota regardless of the picked tier. This PR closes both gaps in lockstep: Catalog: - Plan struct gets ProductSlug + IncludedQuotas fields (back-compat: omitempty BSON tags so legacy rows decode fine). - expectedSandboxPlans() helper canonical-defines the three tiers: sandbox-free 0 OMR 1 session, 1 agent, 5 GB, BYOS sandbox-pro 9 OMR 3 sessions, 6 agents, 50 GB, BYOS (Popular) sandbox-ent 49 OMR unlimited, 6 agents, 500 GB, BYOS - seedAllData appends them on fresh seed; seedMissingSandboxPlans backfills them on already-populated Sovereigns (idempotent GET-then- create, patches missing ProductSlug/IncludedQuotas on legacy rows). - UpdatePlan persists the two new fields. Sandbox orchestrator wiring: - SandboxRequestedPayload.PlanID added; CreateOrg forwards body.PlanID. - buildSandbox stamps openova.io/plan-id annotation + spec.planId when PlanID is non-empty. - quotaForPlan() maps sandbox-{free,pro,ent} → SandboxQuota; empty or unknown plan_id falls through to DefaultQuota (Wave 1 baseline = Sandbox Free shape). Hard-coded map mirrors catalog IncludedQuotas so tenant-service avoids a compile-time dep on the catalog mongo stack. Tests: - TestExpectedSandboxPlans_Shape locks slugs, prices, quota keys, the Popular flag (sandbox-pro), and the quota ladder. - TestSandboxHandle_PlanIDStampsAnnotationAndQuota table-test exercises all three tiers end-to-end (annotation + spec.planId + spec.quota). - TestSandboxHandle_PlanIDEmptyKeepsDefaultQuota guards back-compat with pre-PR publishers. - TestSandboxHandle_PlanIDUnknownFallsBackToDefault guards typo'd / retired plan IDs. go build + go test clean for catalog, tenant, billing, provisioning, shared, marketplace-api. No Chart.yaml bump, no cluster touch. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:31:25 +04:00
e3mrah	4c83d98765	feat(sandbox): orchestrator listens tenant.sandbox_requested → Sandbox CR materialisation (#1639 ) PR #1633 wired CreateOrg to publish `tenant.sandbox_requested` when the marketplace cart includes the sandbox product. Nobody was subscribing — the event landed in NATS `catalyst.tenant.sandbox_requested` and aged out unread, so no Sandbox CR (PR #1622) was ever minted and the customer sat on a "Provisioning…" spinner forever. This slice closes the loop. A new SandboxOrchestrator in tenant-service: - Subscribes via events.MultiSubscriber (PR #1636) to the canonical NATS subject + legacy Kafka topic. - Parses {tenant_id, org_slug, owner_id, owner_email, agents, sovereign, requested_at} and resolves the owner email (event field → store.GetMemberEmail → owner_id fallback). - Materialises a Sandbox CR in catalyst-system (SANDBOX_NAMESPACE override) via a dynamic client, with spec per architecture §7: owner.email + owner.orgRef.slug, default quota (4 CPU / 8 Gi / 50 Gi / 3 sessions), spec.agentCatalogue from the cart. - Idempotent: Get-then-Create with AlreadyExists swallowed so NATS redeliveries + duplicate marketplace submits stay no-ops; the sandbox-controller remains SoR for spec mutations. Wiring in main.go is best-effort — when no in-cluster config nor KUBECONFIG is available (CI / dev loops) the orchestrator is skipped with a Warn; the rest of the tenant service still boots. Hard rules: no chart bump, no cluster writes outside of the Sandbox Create call (sandbox-controller reconciles the rest), `go build ./...` clean, `go test ./...` clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:09:22 +04:00
e3mrah	72f82ea7f2	fix(sme): wire provisioning/notification/domain consumers to NATS (was Kafka-only, was silent-dropping every tenant.created event) (#1636 ) PR #1626 wired the PUBLISH leg of tenant + billing to NATS via events.MultiPublisher (canonical subject `catalyst.<event.Type>` per ADR-0001 §6). The CONSUME leg stayed Kafka-only — provisioning, notification, domain, billing's tenant-events cascade, AND tenant's own provision-events + members-cleanup consumers all called events.NewConsumer(redpandaBrokers, …). On Sovereigns REDPANDA_BROKERS is empty by design (no Redpanda exists; NATS is the canonical bus per the convergence-fix block in configmap.yaml) so those consumers either never started OR dialed `localhost:9092` in a hot crash loop. Net effect on every Sovereign install pre-this-PR: 1. alice POSTs /sme/tenants → tenant publishes catalyst.tenant.created to NATS (PR #1626). 2. provisioning's only subscriber was Kafka-only → silent drop. 3. No Organization CR ever spawned → no vCluster → CONVERGENCE BROKEN. This change introduces a symmetric subscribe-side abstraction mirroring bridge.go's MultiPublisher: - events.BrokerSubscriber: unified Subscribe(ctx, handler) interface, satisfied by Consumer, DLQSubscriber, MultiSubscriber. - events.MultiSubscriber: fans in from NATS JetStream durable consumers (one per canonical subject) + an optional legacy Kafka Consumer. NewMultiSubscriber refuses to construct with both legs nil (the silent-no-op pattern this PR exists to prevent). - events.NATSConn.ensureSMEStream: idempotently creates the CATALYST_SME Stream filtering `catalyst.>` so the first consumer on a fresh Sovereign bootstraps lifecycle. Each service's main.go now constructs a MultiSubscriber and passes it to the consumer dispatch loop. Consumer signatures take events.BrokerSubscriber instead of events.Consumer (interface upcast, so events.Consumer call sites keep working on Catalyst-Zero): - provisioning: tenant.created / tenant.deleted / tenant.app_install_requested / tenant.app_uninstall_requested / order.placed (the 5 subjects PR #1626 publishes to NATS). Also wires MultiPublisher so provision. publishes hit NATS too — downstream tenant + notification consumers need them. - notification: full fan-in (user.login, order.placed, payment.received, provision., domain., member.invited). - domain: tenant.deleted (subdomain + BYOD reclamation cascade). - billing: tenant.deleted (Stripe sub-cancel + invoice void + ledger marker cascade). Existing metering NATS subscriber unaffected. - tenant: provision.* + tenant.deleted (members cleanup). Now reachable on Sovereigns; pre-this-PR they were inside the `if redpandaBrokersRaw != ""` block. Chart wiring: NATS_URL env added to provisioning, notification, and domain Deployments (tenant + billing already wired via PR #1626). notification.yaml also flips its hardcoded REDPANDA_BROKERS literal to the shared ConfigMap key so the per-topology default (empty on Sovereigns, talentmesh redpanda on Catalyst-Zero) applies. Verification: - go build ./core/services/{shared,tenant,billing,provisioning, notification,domain}/... clean. - go test ./... clean across all 6 modules. - helm template with global.sovereignFQDN=test.example.com renders NATS_URL="nats://nats-jetstream.nats-system.svc.cluster.local:4222" into all 5 Deployments + ConfigMap. - helm template without sovereignFQDN renders NATS_URL="" and REDPANDA_BROKERS=talentmesh redpanda, matching Catalyst-Zero. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 10:32:49 +04:00
Emrah Baysal	b8b80973de	feat(sandbox): Wave 4 — marketplace catalog entry (customer can pick Sandbox alongside WordPress) Adds the Sandbox product to the marketplace storefront so a customer picks it off marketplace.<sov>/apps the same way they pick WordPress / Nextcloud. Card chrome is the existing .app-card shape verbatim — no new components per the design-system inheritance rule. The detail page gains a 6-agent picker (aider, claude-code, cursor-agent, little-coder, opencode, qwen-code) using the existing .related-card chrome with a picked state mirroring .app-card.in-cart. Picks land on cart.agents and travel through checkout into the tenant create-org payload. Tenant-service emits a sibling `tenant.sandbox_requested` event on sme.tenant.events when the cart contains the sandbox product. The event carries org slug + owner + agents list, sufficient for the sandbox-controller (or its upstream orchestrator) to mint a Sandbox CR with matching spec.agentCatalogue. The Organization CR creation path is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 08:22:37 +02:00
e3mrah	d681f64505	fix(catalyst-api): mint HS256 token on SME proxy calls (was forwarding incompatible RS256) (#1630 ) PR #1625 shipped the /api/v1/sme/billing/vouchers/* proxies but the SME gateway (core/services/gateway/proxy.go) rejects RS256 outright — it only accepts HS256 signed with sme-secrets/JWT_SECRET. Result on every fresh Sovereign: operator clicks on /bss/vouchers returned silent 401 with no upstream audit trail. This commit ships the bridge: - core/services/shared/auth/mint_sme.go (new) - MintSMEAccessToken(secret, sub, email, role) → 5-min HS256 JWT in the wire shape billing's requireVoucherIssuer expects. - SMERoleFor(realmRoles, tier) → maps Keycloak roles + tier claim onto SME vocab (superadmin \| sovereign-admin \| member). - Pure, no IO, fully unit-tested (mint_sme_test.go). - products/catalyst/bootstrap/api/internal/handler/sme_billing_vouchers.go - proxySMEVoucher now mints a fresh HS256 token per upstream hop from the operator's already-validated RS256 session claims and forwards that as Bearer to the SME gateway. RS256 header is no longer leaked upstream. - Unwired bridge (CATALYST_SME_JWT_SECRET empty) surfaces 503 `sme-jwt-bridge-unwired` instead of the silent 401. - products/catalyst/bootstrap/api/internal/handler/handler.go - h.smeJWTSecret field + SetSMEJWTSecret(secret) setter. - products/catalyst/bootstrap/api/cmd/api/main.go - Reads CATALYST_SME_JWT_SECRET on startup and wires it. - Log line includes byte count only (never the secret value, per INVIOLABLE-PRINCIPLES.md #10). - products/catalyst/chart/templates/api-deployment.yaml - New env CATALYST_SME_JWT_SECRET sourced from sme-secrets/JWT_SECRET in the same namespace (catalyst-system). optional: true so Sovereigns without marketplace surface a 503 rather than CreateContainerConfigError. - products/catalyst/chart/templates/sme-services/sme-secrets.yaml - emberstack/reflector annotation block mirroring sme-secrets from `sme` ns into `catalyst-system` (Kubernetes secretKeyRef is same-namespace-only). Same pattern as cnpg-cluster.yaml and provisioning-github-token.yaml. Operator-visible behaviour: the bridge is transparent on the happy path (operator with sovereign-admin tier on a Sovereign with marketplace enabled clicks /bss/vouchers → list returns). On the unhappy paths the operator now sees a real status code: - 503 sme-jwt-bridge-unwired (chart wire missing) — actionable - 503 sme-gateway-unreachable (DNS NXDOMAIN) — pre-existing - 403 from billing's requireVoucherIssuer (role insufficient) — was silent 401 before, now propagates the real authz result. Tests: core/services/shared/auth `go test ./...` PASS. catalyst-api `go build ./...` PASS. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 10:11:04 +04:00
e3mrah	50a45a9783	fix(billing): skip Stripe when voucher covers 100% of total (unblocks fully-paid voucher checkout) (#1628 ) POST /billing/checkout was 503'ing with "payment processor is not configured" on Sovereigns that have not pasted Stripe keys yet — even when the customer's credit balance (from a fresh voucher redemption in the same request, or a prior balance) fully covered the order total. Make the credit-only short-circuit explicit: compute `remainingOMR := totalOMR - creditBalance` and settle via CreditOnlyCheckout when `<= 0`, BEFORE any Stripe settings probe. This is the path that has to keep working during the voucher-only weeks of a new Sovereign. Adds checkout_test.go covering two regression paths: - fresh-voucher path: customer with 0 credit redeems WELCOME50 against a 50-OMR plan → 200 + paid_by_credit:true, settings table never probed (sqlmock asserts no unexpected queries). - pre-existing-credit path: customer with 200-OMR standing balance buys a 100-OMR plan, no promo_code in request → 200 + paid_by_credit:true + 100-OMR leftover credit. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 09:44:22 +04:00
e3mrah	048cb2c3de	fix(sme): wire tenant + billing event dispatchers to NATS (was Redpanda-only, blocking convergence) (#1626 ) The tenant + billing services hardcoded a franz-go Kafka publisher pointing at REDPANDA_BROKERS. On Sovereigns there is NO Redpanda in cluster — only NATS JetStream at nats-jetstream.nats-system.svc.cluster.local:4222 — so every tenant.created / tenant.deleted / order.placed event was silently dropped, blocking provisioning + downstream consumers and stalling the convergence chain end to end. Per ADR-0001 §6 the canonical event bus is NATS JetStream with subject convention `catalyst.<domain>.<event>`. This change: - Adds events.BrokerPublisher + events.MultiPublisher that fan out to NATS (`catalyst.<event.Type>` derived from Event.Type) and the legacy Redpanda topic in one call. Either transport may be nil; the constructor refuses to build a no-op publisher (the exact silent-failure mode we just hit). - Adds NATSConn.PublishEvent so the generic Event envelope can flow over the same JetStream connection used for the metering subscriber (#798), with Event.ID as the JetStream Msg-Id for broker-side de-dup. - Updates tenant + billing main.go to read NATS_URL + REDPANDA_BROKERS independently, construct the appropriate transports, and wire MultiPublisher into the Handler. Legacy Kafka consumers only start when REDPANDA_BROKERS is non-empty so the pods no longer crashloop dialling localhost:9092 on Sovereigns. - Updates chart templates to inject NATS_URL into both tenant and billing Deployments. ConfigMap default for NATS_URL on Sovereigns is nats://nats-jetstream.nats-system.svc.cluster.local:4222 (fixes the existing bug where defaults pointed at the wrong namespace `nats-jetstream` — NATS actually lives in `nats-system` per clusters/_template/bootstrap-kit/07-nats-jetstream.yaml). - Sovereign default of REDPANDA_BROKERS is now empty (was the wrong NATS URL stuffed into a Kafka env, which made franz-go fail every dial). Subject mapping per CanonicalSubject: tenant.created → catalyst.tenant.created tenant.deleted → catalyst.tenant.deleted tenant.app_install_requested → catalyst.tenant.app_install_requested order.placed → catalyst.billing.order.placed Test: go build ./... in shared/, tenant/, billing/ (clean) go test ./events/... ./handlers/... in all three (existing + new bridge_test.go pass) helm template with global.sovereignFQDN set renders NATS_URL in both Deployments + REDPANDA_BROKERS="" in ConfigMap helm template without global.sovereignFQDN renders the legacy Redpanda broker (Catalyst-Zero contabo path remains intact) NATS-side consumers for sme.tenant.events / sme.provision.events ship in a follow-up PR per the ADR-0001 §6 migration plan; this PR only unblocks the publish leg which is the immediate convergence blocker. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 09:33:36 +04:00
e3mrah	255eb3bf17	feat(sandbox+auth+newapi): Wave 1b — newapi proxy + BYOS + org-scoped JWT (#1619 ) Three coordinated deliverables for Sandbox Wave 1b — scaffolding + design + the ONE prerequisite (long-lived org-scoped JWT) the rest of Sandbox depends on. Deliverable 1 — newapi proxy contract: - products/sandbox/docs/newapi-proxy-contract.md: agent-pod env (LLM_GATEWAY_URL / OPENAI_BASE_URL alias), provider selection (?provider=qwen; default Qwen via omtd.bankdhofar.com), per-Sandbox token issuance via /admin/tokens/sandbox bridge, lifecycle + rotation, auth model. - platform/newapi/internal/handler/sandbox_token.go: bridge handler stub. Validates the inbound PAT (typ=pat + aud=newapi + org_id cross-check vs request body), then echoes a NewAPI-shaped response so the contract is testable without the upstream NewAPI admin API. Wave 4 wires the actual upstream calls. Deliverable 2 — Claude Code BYOS OAuth: - products/sandbox/docs/claude-code-byos.md: UX (Connect Claude Max → OAuth → refresh token Secret/catalyst-system/sandbox-byos-claude- code-<user-uid>), Pod env injection (ANTHROPIC_API_KEY bypassing newapi), per-session toggle, revocation paths, chart wiring. - products/catalyst/bootstrap/api/internal/handler/byos_claude_code.go: POST /start, GET /callback, DELETE, GET /status — four endpoints behind RequireSession. Honest 503 + 501 surface so the popup flow exercises end-to-end against the placeholder client_id; Wave 4 flips it live. Deliverable 3 — Long-lived org-scoped JWT (THE prerequisite): - platform/keycloak/chart/templates/configmap-sovereign-realm.yaml + configmap-tenant-realm.yaml: add `org` protocolMapper emitting user attribute `org` as claim `org_id`; add `org` to default client scopes for ALL clients. - core/services/auth/handlers/handlers.go: include typ=session in JWTs + document the cross-service claim contract. - core/services/auth/handlers/pat.go: NEW POST /auth/pat with admin-configurable TTL (default 7d, max 90d), audience claim, capabilities pass-through, typ=pat discriminator. - core/services/auth/handlers/routes.go + main.go: wire /auth/pat behind JWTAuth middleware. - core/services/shared/auth/claims.go: single Claims struct + HasCapability/HasGroup helpers + ContextKey for cross-service consumers (sandbox-controller, newapi bridge, MCP server). - products/catalyst/bootstrap/api/internal/auth/session.go: align Org JSON tag with new `org_id` claim; UnmarshalJSON accepts BOTH legacy `org` and new `org_id` so a rolling chart upgrade does not regress org-scoped queries. Out of scope (Wave 4 wires): - Sandbox CRD + controller (writes Secret, mounts Pod env). - Actual outbound HTTP to Anthropic /oauth/token + KMS encrypt. - Actual outbound HTTP to NewAPI admin API. - Per-Sandbox capability projection from Keycloak groups. - PAT revocation lookup (jti store) + /auth/pats list. - Settings UI card + session-toolbar routing toggle. Build verification (go vet + go build clean): - core/services/auth/... - core/services/shared/... - platform/newapi/internal/handler/... - products/catalyst/bootstrap/api/... Founder TODO (single knob to flip BYOS live, Wave 4): Register an Anthropic OAuth client at https://console.anthropic.com/settings/oauth (public PKCE, redirect=https://console.<sov-fqdn>/api/v1/sandbox/byos/claude-code/callback) and paste the client_id into clusters/<sovereign>/bootstrap-kit/ sandbox.yaml. Today every BYOS endpoint returns 503 with a clear message pointing at claude-code-byos.md §8. Refs: products/sandbox/docs/architecture.md §6 (THE prerequisite). Co-authored-by: hatiyildiz <269457768+hatiyildiz@users.noreply.github.com>	2026-05-18 08:43:11 +04:00
e3mrah	964dc15570	fix(catalog): D27 — fresh-seed apps default Published+Deployable (#1584 ) * fix(handover): rename itoa→regionSlotIndex (collision with infrastructure.go) PR #1581 introduced an `itoa` helper that collided with the existing `itoa` in handler/infrastructure.go:1952. Go vet failed: internal/handler/infrastructure.go:1952:6: itoa redeclared in this block internal/handler/deployment_handover_export.go:199:6: other declaration of itoa Rename my helper to `regionSlotIndex` — more descriptive of its actual use (deriving the per-region slot suffix for the kubeconfig filename). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalyst-api): D16/D17 — 3 bugs caught on t138 Founder caught on t136 (now wiped) that /dashboard cluster grouping still showed 1 region and /cloud nodes showed 1 node despite earlier D16 PRs shipping. Root cause: 3 bugs in the D16 chain that surfaced on t138 fresh prov. 1. exportSecondaryKubeconfigsToChild was guarded behind the early return of exportDeploymentToChild's failed POST. The child's ingress + cert + gateway are still racing to reach reachable state in the seconds after handover fires, so the first POST gets EOF and the goroutine never fires. Fix: kick off the D16 fan-out IMMEDIATELY at the top of exportDeploymentToChild in its own goroutine, BEFORE the deployment-record POST. 2. Both exports now retry with exponential backoff (5s → 60s) for up to 5 min total. Most handovers will succeed on attempt 2-4. Was: no retry, single shot, silent failure. 3. /api/v1/sovereign/secondary-kubeconfig route moved OUT of the auth group (rg) into the top-level router (r), alongside /api/v1/internal/deployments/import. The previous registration required an operator session that doesn't exist at handover — mothership POSTs were 401'd silently. Validation is now via safeIDPattern regex on depID + regionKey (same security model as the deployments/import companion endpoint). 4. HandleSovereignCloud now fans out across h.k8sCache.Clusters() instead of using only the in-cluster client. Adds Cluster field (omitempty) to sovereignNode/LB/SC/PVC so the UI can group/filter by region. Without this, /cloud?view=list&kind=nodes shows 1 node even when 3 secondary kubeconfigs are registered. Together these fix: - D16 /dashboard Layer-1=Cluster grouping (3 bubbles, not 1) - /cloud?view=list&kind=nodes (3+ nodes, not 1) Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(catalog): D27 — fresh-seed apps default Published+Deployable Founder caught on t136: marketplace.t136/apps shows blank application grid. Root cause: catalog seed.go calls migrateAppPublished + migrateAppDeployable ONLY on the "already populated" path. On a fresh Sovereign install (empty catalog) seedAllData inserts 27 rows with zero-value bools — Published=false, Deployable=false. The marketplace storefront filters with `?published=true`, gets [], renders blank. Fix: after seedAllData also call migrateAppDeployable + migrateAppPublished + seedSystemApps. Both migrations are idempotent (skip rows already true), so re-runs are safe. Verified the bug live on t138 (eaaee1ea24184c2a): http://catalog.sme:8082/catalog/apps returns 27 apps http://catalog.sme:8082/catalog/apps?published=true returns 0 With this fix the latter returns 27. Refs: feedback_test_theater_3rd_violation_2026_05_17.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:28:35 +04:00
e3mrah	c04b2ec76d	feat(wordpress-tenant): activeHotStandby option wires bp-cnpg-pair (D31) (#1562 ) Sovereign DoD D31 — tenants subscribing to an HA-capable marketplace app may opt into a cross-region active-hot-standby Postgres pair for their WordPress instance instead of the default single CNPG Cluster. Mirrors the canonical bp-cnpg-pair pattern (primary + replica Cluster CRs with WAL streaming over Cilium ClusterMesh via a managed Service annotated service.cilium.io/global=true). When the new pg.activeHotStandby.enabled flag is false (default), templates render the existing single Cluster bit-for-bit — no regression for non-HA tenants. Catalog seed flags WordPress with ha + cnpg-pair tags so the marketplace HA filter can surface it. Chart bumped 0.2.1 -> 0.3.0. New render-gate test asserts both default single-cluster shape AND the enabled 2-Cluster shape with the right nodeSelectors, replica.source, externalCluster.host, Cilium global annotation, and bootstrap.pg_basebackup; all 5 cases pass. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:39:29 +04:00
e3mrah	f9ed292198	fix(billing): /redeem-preview + plans + addons bypass JWT (D29) (#1561 ) * chore(slot-13): pin bp-catalyst-platform to 1.4.145 (D29 gateway public routes) PR #1559 added /api/billing/{vouchers/redeem-preview,plans,addons} as public gateway routes — required for the marketplace /redeem zero-touch flow. Pin the slot so future provisions inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing): /redeem-preview + plans + addons bypass JWT (D29) Mirror PR #1559's gateway public routes in the billing service's own middleware chain. The gateway now lets these requests through without an Authorization header (D29 voucher-redeem landing), but billing service's main.go was JWT-gating EVERY /billing/* path except /billing/webhook — so the request still got 401, just one hop later. Caught live on t132 2026-05-16 after PR #1559 rolled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:28:48 +04:00
e3mrah	a11067da1a	fix(gateway): /redeem-preview + plans + addons must be public (D29) (#1559 ) * feat(billing+notification): wire voucher-issued email (D28) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(chart): admin pod uses dedicated image tag (D27 SME stack) t132 caught admin pod stuck in ImagePullBackOff on `admin:b0ed216` — the SME services CI run for that mono-repo SHA published 10 services but admin's image was missing from GHCR. Decouple admin's tag from smeTag so a missing-build for one service doesn't wedge the SME stack. Default to `3c2f7e4` (matches marketplaceApi + console, known-published). When admin's UI changes, bump in lockstep with those. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(slot-13): pin bp-catalyst-platform to 1.4.144 PR #1556 (D28 voucher email wire) + PR #1557 (D27 admin tag override) landed and Blueprint Release packaged 1.4.144. Pin the slot file so future provisions get the latest chart by default — t132 manually upgraded via kubectl patch but t133+ will inherit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gateway): /redeem-preview + plans + addons must be public (D29) The marketplace /redeem?code=XXX landing page calls /api/billing/vouchers/redeem-preview unauthenticated per docs/FRANCHISE- MODEL.md §3, but the gateway's catch-all /api/billing/ entry was returning 401 to it — breaking the entire voucher-redeem zero-touch flow that D29 depends on. Also expose /api/billing/plans and /api/billing/addons so the marketplace landing can render pricing without a session. Caught live on t132 2026-05-16 — every /redeem call returned 401. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:17:04 +04:00
e3mrah	1fe706769f	feat(billing+notification): wire voucher-issued email (D28) (#1556 ) D28 of the Sovereign DoD requires that issuing a voucher emails it to the recipient zero-touch. Today POST /billing/vouchers/issue persists the PromoCode row but never notifies anyone — so a gifted voucher only reaches its recipient if the operator manually sends the code over a side channel. This wires sme-billing -> sme-notification so the email fires automatically on every successful upsert that carries a recipient_email field. Architecture follows the existing notification-service seam: sme-billing POSTs to http://notification.sme.svc.cluster.local:8087/ notification/send with template=voucher-issued; sme-notification renders the HTML and dispatches via Stalwart over SMTP. No direct SMTP code is added to billing, no stalwart-mail calls bypass notification. Server-side only — the owner-UI for issuing vouchers (D28b) is a separate PR. Changes: notification/templates/templates.go + VoucherIssuedEmail(code, creditOMR, description, sovereignFQDN, validityHint) — renders code prominently, redeem button to https://marketplace.<sovereignFQDN>/redeem/?code=<CODE>; FQDN always supplied by caller, NEVER hardcoded. notification/handlers/handlers.go + renderTemplate("voucher-issued") case parsing {code, credit_omr, description, sovereign_fqdn, validity_hint}. + Default subject "You've been gifted a voucher for OpenOva SME". billing/handlers/handlers.go + Handler fields: NotificationURL, SovereignFQDN, NotificationClient. billing/handlers/vouchers.go + issueVoucherRequest = store.PromoCode + RecipientEmail (request- only; never persisted). + sendVoucherIssuedEmail() — POSTs to NotificationURL with a 5s timeout. Best-effort: a non-2xx or transport error logs but does NOT fail the IssueVoucher response, because the row is already persisted and re-issuing the same code re-fires the email. + Re-issue semantics (#91 resurrects soft-deleted rows) extend to the email path — documented in the handler comment. billing/main.go + Reads NOTIFICATION_SERVICE_URL (default http://notification.sme.svc.cluster.local:8087/notification/send) and SOVEREIGN_FQDN env vars. Wires a 5s default http.Client. products/catalyst/chart/templates/sme-services/billing.yaml + Pipes NOTIFICATION_SERVICE_URL (cluster-DNS constant) and SOVEREIGN_FQDN (from .Values.global.sovereignFQDN, NEVER hardcoded) into the billing Deployment. Tests: notification/handlers/handlers_test.go (new) + TestRenderTemplate_VoucherIssued: rendered HTML contains code + credit + a redeem URL built from the supplied FQDN; never falls back to marketplace.openova.io. + TestRenderTemplate_VoucherIssued_CustomSubject + _NoDescription + TestRenderTemplate_UnknownTemplate as guard rails. billing/handlers/vouchers_test.go + TestIssueVoucher_SendsEmail_WhenRecipientPresent: a fake round- tripper sees the POST to notification with the right URL + template + data (code upper-cased, credit_omr, sovereign_fqdn, description) when recipient_email is set. + TestIssueVoucher_NoEmail_WhenRecipientAbsent: no notification call when recipient is empty. + TestIssueVoucher_NotificationFailure_DoesNotFailUpsert: operator gets 200 even when notification returns 500. + TestIssueVoucher_403WithoutVoucherIssuerRole: role gate preserved. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:04:46 +04:00
e3mrah	b0ed216e81	feat(catalog): catalog-svc HTTP REST service + chart wiring (slice L1+L2, #1097 ) (#1148 ) EPIC-2 Slice L of #1097. Multi-source Blueprint catalog HTTP REST service backed by Gitea (3 sources: public mirror, sovereign-curated, per-Org private). Replaces the per-Org SME catalog per ADR-0001 §4.3 (different scope: SME's was Org-bound; catalyst-catalog is Sovereign- wide multi-source). L1 — core/services/catalyst-catalog/ Go service: - Separate go.mod (services group is for HTTP services, controllers group is for CRD reconcilers — documented in DESIGN.md). - Imports the unified Gitea client via Go module replace directive. - Promoted core/controllers/internal/gitea → pkg/gitea so the catalog (a sibling Go module) can import it (Go internal/ rule). 5 Group C controllers updated atomically. - HTTP REST endpoints: /api/v1/catalog{,/{name},/{name}/versions, /{name}/versions/{version}} + /healthz. - Source resolution priority on collision: private > sovereign > public. - Per-Org access filter: caller's Claims.Groups[] determines visible private blueprints; Org A user does NOT see Org B's private set. - 30s TTL LRU cache on blueprint.yaml reads (capacity 1024 default). - Session-cookie / Bearer / ?access_token= claim extraction matching catalyst-api's seam; expired-token rejection in-process. - Containerfile: distroless-static, non-root UID 65532. L2 — products/catalyst/chart/templates/services/catalog/ wiring: - 5 templates (deployment, service, serviceaccount, rbac, httproute) + _helpers.tpl. Default-OFF gate via .Values.services.catalog.enabled. - helm template: 0 catalog resources when OFF, 6 when ON. - Empty image.tag fail-fasts at render per Inviolable Principle #4a. - HTTPRoute exposes /api/v1/catalog on api.<sovereign> hostname. - Chart bumped 1.4.85 → 1.4.86. Gitea client extension (canonical seam, NOT per-service variant): - +ListOrgRepos(ctx, org) []Repo — paginated repo listing. - +ListContents(ctx, org, repo, branch, path) []ContentEntry — directory listing for per-Org shared-blueprints fan-out. GitHub Actions workflow: - .github/workflows/catalyst-catalog-build.yaml — push-on-paths + pull_request + workflow_dispatch (NO cron). go vet + go test (race + count=1) + image build → GHCR :<sha>. repository_dispatch fan-out to chart-bump matches the Group C controllers' pattern. Tests (3-tier gate): unit (config, cache, auth, source, handler) + integration (httptest-backed Gitea fixtures across all 3 sources + priority + per-Org access). All green; race detector on. L3 (SME catalog retirement) is deferred per the EPIC-2 master brief. GraphQL deferred (REST first; gqlgen would pull ~80MB of indirect deps for a feature no UI consumer has asked for yet). Co-authored-by: hatiyildiz <hati.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 04:04:52 +04:00
e3mrah	a57d05d4dd	fix(provisioning,catalog): parent-kustomization prefix collision + disable openclaw/stalwart-mail (#1043 ) Two bugs surfaced live 2026-05-06 on tenant "test": 1) UpdateParentKustomization used substring match against " - <slug>", which falsely "found" the slug when it was a PREFIX of an existing entry. Adding "test" to a file already listing "test11" or "test13" silently no-op'd. Result: tenant manifests committed but the tenants/kustomization.yaml never registered them, Flux's tenants Kustomization couldn't apply the new tenant, vCluster step timed out at 10m. Fix: exact line match on the resources entry. 2) openclaw + stalwart-mail were flagged Deployable=true in #941 but never had AppSpec entries in core/services/provisioning/gitops/apps.go KnownApps. The SME provisioning generator emits a single-Deployment template that requires Image + Port; for those two slugs it produced invalid manifests: Deployment.apps "openclaw" is invalid: containers[0].image: Required value containers[0].ports[0].containerPort: Required value tenant-test11-apps Kustomization rejected the dry-run, no apps ever landed inside the vcluster. Re-enabling these requires per-app overlay support beyond the single-Deployment template — separate work. For now: comment them out of DeployableAppSlugs so the catalog seed flips them back to Deployable=false on next pod restart and the marketplace UI shows them as COMING SOON. Adds regression tests for both: prefix-collision in UpdateParentKustomization, and a stability test on the deployable map shape. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 10:21:39 +04:00
e3mrah	ff0e90156d	fix(provisioning): re-read parent kustomization on commit retry — prevent slug-resurrection race (#1034 ) Live race seen 2026-05-06: bookcheck teardown committed at T (removed the slug from tenants/kustomization.yaml + pruned its directory). Multitest provision's first commit attempt at T-2s got a ref-race rejection, the github client's retry replayed the SAME files map (which held the pre-teardown parent kustomization with bookcheck still in it), and the retry's commit at T+5s overwrote the teardown's removal. Result: the parent kustomization listed bookcheck but the directory was gone, Flux's tenants Kustomization wedged in build-failure loop, and EVERY subsequent tenant change was blocked until manually unblocked. Add CommitFilesWithPruneAndRebuild — same as CommitFilesWithPrune but takes a `rebuild(ctx) (files, error)` callback invoked at the start of each attempt. Wire both consumer paths (provision + teardown) through it; each rebuild re-reads parent kustomization.yaml against the current HEAD and re-applies UpdateParentKustomization / RemoveTenantFromParentKustomization fresh. Static tenant-scoped manifests still flow through unchanged. CommitFilesWithPrune is preserved as a thin wrapper for callers that ship truly static files (e.g. day-2 app installs scoped to a tenant subdir, no parent merge involved). Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 03:28:35 +04:00
e3mrah	f1744c8973	fix(provisioning): BookStack — also emit DB_USERNAME/DB_PASSWORD (Laravel-native) (#1031 ) PR #1028 fixed the APP_KEY halt and switched to DB_USER/DB_PASS, but linuxserver/bookstack's init script does NOT substitute DB_USER → DB_USERNAME in the .env file. Laravel reads env vars natively but using DB_USERNAME / DB_PASSWORD (Laravel-canonical names). Without those, Laravel falls back to the .env placeholder values (database_username / database_user_password) and the app fails with: SQLSTATE[HY000] [1045] Access denied for user 'database_username'@... Caught live on tenant 'bookcheck' 2026-05-06 after PR #1028 deployed — pod ran, app started, but every request hit the placeholder credentials. Emit BOTH name pairs so the env works regardless of which the LSIO upstream eventually wires up. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 02:59:14 +04:00
e3mrah	b180d56926	fix(provisioning): BookStack overlay — add DB_* envs + APP_KEY + APP_URL (#1028 ) linuxserver/bookstack reads DB_HOST/DB_USER/DB_PASS/DB_DATABASE (NOT WORDPRESS_DB_) and halts init with "The application key is missing, halting init!" when APP_KEY isn't set. The pod stays 1/1 Running because the readiness probe doesn't catch the silent halt, but the application never binds to port 80, so the ingress returns 502. Discovered via live E2E on tenant 'aaa' (BookStack on m plan): all 7 provisioning steps reported done, ingress healthy, cert ready, but https://aaa.omani.rest → 502. Add a "bookstack" DBEnvStyle case in the mysql env-emitter that writes DB_, APP_URL=https://<slug>.omani.rest, and a Laravel-format APP_KEY (base64:<32-byte>). Also add a randomAppKey() helper alongside randomHex(). Tag the catalog AppSpec with DBEnvStyle: "bookstack". Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 02:49:35 +04:00
e3mrah	c9b8c13406	fix(tenant): JWT-bypass /tenant/internal/* — paid checkouts never provisioned (#1018 ) (#1019 ) Billing's dispatchOrderPlaced enriches the order.placed NATS event by calling /tenant/internal/tenants/<id>/subdomain over the in-cluster ClusterIP. routes.go registers that path with the comment "Internal — unauthenticated service-to-service", but main.go wraps everything under /tenant/ in JWTAuth except /tenant/check-slug/. So billing got 401, returned "" for the subdomain, published order.placed with subdomain="", and provisioning rejected every paid checkout with "invalid subdomain expected=[a-z][a-z0-9-]{2,30}". Add /tenant/internal/ to the public-paths bypass. Both gateways already 401 the path externally, and subdomain values are public DNS names — the documented threat model. Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 02:09:55 +04:00
e3mrah	689276889c	fix(bp-catalyst-platform+bp-newapi): unblock alice signup gates 2-6 on Sovereigns (#915 ) (#951 ) Six coupled chart + orchestrator fixes that unblock alice marketplace signup → tenant ready → SaaS integrations → LLM → ledger on a freshly franchised Sovereign. C5-final got Gate 1 GREEN on otech113 (2026-05-05) but every downstream gate failed because the SME bundle hardcoded contabo-only assumptions. Bumps: - bp-catalyst-platform 1.4.21 → 1.4.22 - bp-newapi 1.3.0 → 1.4.0 - bootstrap-kit slot 13 + 80 pins updated in lockstep Issues addressed (single consolidated PR — smaller PRs would race against alice signup retries): - #934 (auth SMTP empty → "failed to send email"): sme-secrets.yaml now reads SMTP_* from `catalyst-system/sovereign-smtp-credentials` (the same A5-seeded source #883/#905 the chart 1.4.20 catalyst- openova-kc-credentials Secret already uses) with source-wins precedence. Both canonical (smtp-host/port/from/user/pass) AND legacy (host/port/from/user/password) source-Secret key shapes accepted. Empty source falls back to chart-level defaults so the contabo path stays clean. - #940 (provisioning service GITHUB_TOKEN placeholder + hardcoded upstream github.com): chart values .Values.smeServices.provisioning.{githubToken,git.{apiURL,owner, repo,branch}} make every GitHub-API coordinate operator-overridable with topology-aware defaults (Sovereign ⇒ in-cluster Gitea REST API + `openova` org; contabo ⇒ api.github.com + `openova-io` org). Provisioning binary's startup gate validates the GITHUB_TOKEN does NOT contain placeholder substrings (<placeholder>, PLACEHOLDER, REPLACE_ME, ...) and crashes the Pod into Pending if it does — the operator sees the misconfig immediately instead of after alice signups have failed silently in service logs. GitHub client now accepts a custom API URL via NewClientWithAPIURL so Gitea's GitHub- compatible /api/v1 surface drops in without re-implementing the client. - #941 (catalog "27 apps COMING SOON"): added `openclaw` and `stalwart-mail` to migrateAppDeployable's deployable map at core/services/catalog/handlers/seed.go. Both blueprints (bp-openclaw, bp-stalwart-{sovereign,tenant}) ship with visibility=listed in the embedded blueprints.json AND have working SME-tenant overlay templates in sme_tenant_gitops.go, but the catalog handler silently filtered them out because they were missing here. Map extracted to DeployableAppSlugs() exported function so unit tests can assert membership without invoking a Mongo store. - #942 (REDPANDA_BROKERS hardcoded to talentmesh): configmap.yaml selects broker default at render time based on global.sovereignFQDN — Sovereign ⇒ NATS JetStream Service per ADR-0001 (the only local bus on Sovereigns); contabo ⇒ legacy Redpanda Service in talentmesh. Operator MAY override either default via .Values.smeServices.eventBus.brokers without forking the chart. The ConfigMap key name stays REDPANDA_BROKERS for back-compat with existing SME service Go env wiring; new EVENT_BUS_PROTOCOL key surfaces the protocol hint for services that want to switch wire format independently. - #943 (bp-newapi silently skips Deployment): NEW templates/cnpg-cluster.yaml auto-provisions a CNPG-backed Postgres Cluster + Helm-`lookup`-persistent DSN Secret when .Values.cnpg.enabled (DEFAULT true). NEW templates/credentials- secret.yaml auto-generates SESSION_SECRET + CRYPTO_SECRET (each 64-char randAlphaNum, persistent across reconciles via Helm `lookup`) when .Values.credentials.autoProvision (DEFAULT true). deployment.yaml gate now resolves Secret names from the chart- emitted defaults when the operator hasn't supplied an override. Capabilities-gated on postgresql.cnpg.io/v1 so a cold install before bp-cnpg is Ready surfaces as "no Cluster yet" rather than a hard install error. - #944 (CRITICAL — cross-cluster pollution): provisioning.yaml templates GIT_BASE_PATH from .Values.smeServices.provisioning.gitBasePath with a topology-aware default `clusters/<sovereignFQDN>/sme-tenants` on Sovereigns. NEW `core/services/provisioning/gitguard` package validates at startup AND on every commit code path that the path begins with `clusters/<self-FQDN>/` — refusing to commit to any other cluster's tree. Defence in depth so a runtime env mutation (kubectl exec, ConfigMap update without Pod restart, hostile sidecar) cannot bypass the check. Pre-#944 every alice tenant overlay landed in upstream openova/openova `clusters/contabo-mkt/tenants/<id>/` which contabo Flux would then install on the contabo cluster — C5-final caught + reverted the alice2 incident at commit `5715db04`. Tests: - core/services/provisioning/gitguard: 22 cases covering Sovereign + contabo + traversal + prefix-collision + placeholder token - core/services/catalog/handlers: openclaw/stalwart-mail in deployable map + stable-shape lock against accidental deletes - helm-template smoke pass: bp-newapi (default values renders Deployment + auto-provisioned Secrets); bp-catalyst-platform (Sovereign render shows GIT_BASE_PATH=clusters/otech113.../sme- tenants, REDPANDA_BROKERS=nats-jetstream..., GITHUB_OWNER=openova, GITHUB_API_URL=http://gitea-http...) Closes #934 #940 #941 #942 #943 #944 Refs umbrella #915 Co-authored-by: hatiyildiz <hatice@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 15:27:23 +04:00
e3mrah	95a06f56f8	fix(sme-marketplace): unblock PIN signin — route /api/* to sme/gateway + add send-pin alias (#868 ) (#869 ) Two-part fix for marketplace UI signin flow which 503'd then 404'd on otech103. Live debugging found two stacked bugs. Part A — chart (HTTPRoute backend): - marketplace-routes.yaml: /api/* rule now backendRefs sme/gateway:8080 (cross-namespace) instead of catalyst-system/marketplace-api which had a Service selector matching zero Pods. The gateway in sme already fronts services-auth, catalog, tenant, billing, provisioning. - marketplace-reference-grant.yaml: extend `to:` list with the gateway Service so the cross-ns hop is authorised by Gateway API. - Bump bp-catalyst-platform 1.4.7 → 1.4.8 + lockstep slot 13 pin. Part B — services-auth (route name): - Add /auth/send-pin alias delegating to existing SendMagicLink handler, and /auth/verify-pin alias delegating to VerifyMagicLink. The marketplace UI surfaces a 6-digit PIN ("Send PIN" button), so the PIN-named routes are the canonical UX-facing names. /auth/magic-link and /auth/verify remain registered for backward compat. - services-build workflow auto-rebuilds the auth image on push to core/services/** — no manual dispatch needed. Refs: #868 Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>	2026-05-05 08:22:17 +04:00
e3mrah	fa4395fa3a	fix(bp-catalyst-platform): wire VALKEY_PASSWORD into SME auth + gateway (#863 ) (#864 ) After PR #862 (1.4.4) made cross-ns Valkey reachable from `sme` ns, the auth Pod started CrashLoopBackOff with "NOAUTH HELLO must be called with the client already authenticated". Root cause: bp-valkey 1.0.0 ships auth.enabled=true (bitnami default) but SME service code + Deployment templates never plumbed a password through. Chart 1.4.4 -> 1.4.5. Slot 13 pin lockstep. Changes: - core/services/shared/db/valkey.go: add ConnectValkeyWithAuth overload taking username + password. ConnectValkey kept backwards-compatible for contabo-mkt's auth-less in-namespace Valkey. - core/services/auth/main.go + gateway/main.go: read VALKEY_USERNAME + VALKEY_PASSWORD env, call ConnectValkeyWithAuth when password set, else fall through to no-auth path. - NEW templates/sme-services/valkey-cross-ns-secret.yaml: Helm `lookup` reads bp-valkey's auto-generated `valkey-password` from the `valkey/valkey` Secret and re-emits it as `sme-valkey-auth` in `sme` ns. Same pattern as sme-secrets.yaml (#859) and gitea-admin-secret (#830 Bug 2). On first install the lookup may return nil; Flux's 15m reconcile picks up the mirror once bp-valkey is Ready. - auth.yaml + gateway.yaml: add VALKEY_PASSWORD env from `sme-valkey- auth` Secret with optional=true so contabo-mkt's auth-less path keeps working when the mirror Secret is absent. - values.yaml: add `smeServices.valkey.{sourceSecretName, sourcePasswordKey, destNamespace, destSecretName}` knobs (Inviolable Principle #4). Live verified the failure mode on otech103: 11/13 SME pods Running 1/1, auth in CrashLoopBackOff with NOAUTH HELLO error. Provisioning Pod's CreateContainerConfigError is unrelated (ghcr-pull, separate ticket). Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 06:09:38 +04:00
e3mrah	5cdb738ac9	fix(services): go mod tidy across sibling services after #798 shared deps bump (#821 ) #798 added github.com/nats-io/nats.go to core/services/shared/go.mod and adjusted x/sys/x/crypto/x/text to Go 1.22-compatible versions. The sibling services (auth, catalog, domain, gateway, notification, provisioning, tenant) reference the same shared module via the local `replace` directive — their go.sum files must include the new transitive hashes, otherwise the CI Containerfile build hits: go: updates to go.mod needed; to update it: go mod tidy This commit is a pure `go mod tidy` across all 7 services; no source changes. CI services-build is now unblocked. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:35:46 +04:00
e3mrah	9645a9044a	feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798 ) (#818 ) * feat(metering): NewAPI NATS publisher + sme-billing subscriber + POST /metering/record (#798) Per #795 [Q-mine-3] (NATS not RedPanda) + [Q-mine-4] (one ledger), add the SME-2 metering integration end-to-end. NewAPI is consumed as the upstream image `ghcr.io/openova-io/openova/newapi-mirror` (a pinned mirror, not a fork) — the metering envelope is produced by a Go sidecar that observes the OpenAI-style `usage.total_tokens` field on every 2xx /v1/* response. This avoids forking the upstream binary while still producing the canonical envelope shape on `catalyst.usage.recorded`. A) NewAPI metering sidecar — core/services/metering-sidecar/ - Transparent reverse proxy in front of NewAPI on its own port; the bp-newapi Service routes the cluster-fronting port to the sidecar, which forwards to NewAPI on the pod's loopback. - Observes successful /v1/* JSON responses, parses `usage.{prompt_tokens,completion_tokens,total_tokens}`, computes amount_micro_omr = -tokens * priceMicroOMRPerToken, and publishes one envelope on `catalyst.usage.recorded` per completed request. - Failed (non-2xx), non-JSON, and admin-path requests are NOT billed. - Customer-facing latency is NEVER blocked on metering: the response body is restored before publish; on NATS unreachable the envelope is persisted to disk and retried by a background drain loop. - 14 unit tests (proxy + publisher + safeFilename guards). B) sme-billing NATS subscriber — core/services/billing/handlers/ metering_consumer.go - JetStream durable consumer `sme-billing-metering` on stream `CATALYST_USAGE` (provisioned by sme-billing on startup). - Idempotent on metadata.request_id via a UNIQUE partial index on credit_ledger.external_ref; redelivery from the broker collapses to a single ledger row. - Customer auto-create on cold start (the rbac sme.user.created envelope may land AFTER the first metered request; we don't strand usage waiting for it). - 11 unit tests covering happy-path, idempotency, malformed-payload poison-pill, missing-request-id, non-negative amount guard, resolver error → Nak, derive-micro-OMR-from-OMR, DB-error → Nak. C) HTTP handler POST /billing/metering/record — handlers/metering.go - Synchronous validate → INSERT credit_ledger → return {ledger_entry_id, balance_after_omr, balance_after_micro_omr, duplicate}. Same payload + idempotency guard as the NATS path. - Auth: superadmin OR sovereign-admin (operator-admin model; end-user LLM traffic flows through the sidecar, never this URL). - 8 unit tests covering happy-path, idempotency, role gating, malformed-JSON, positive-amount rejection, customer-not-found. D) Schema — core/services/billing/store/store.go - ALTER TABLE credit_ledger ADD COLUMN amount_micro_omr BIGINT (1 OMR = 1,000,000 micro-OMR; -0.000234 OMR = -234 micro-OMR exact integer — preserves precision at metering rates). - ADD COLUMN external_ref TEXT + UNIQUE partial index for idempotency dedup. - ADD COLUMN metadata JSONB for the raw envelope. - GetCreditBalance projects both amount_omr (legacy) and amount_micro_omr (new) into the integer-OMR view. - GetCreditBalanceMicroOMR returns canonical precision. - RecordUsage method: ON CONFLICT DO UPDATE … RETURNING (xmax<>0) distinguishes fresh insert from duplicate without a follow-up SELECT. E) Wiring - core/services/shared/events/nats.go — minimal NATS JetStream publisher + subscriber surface; legacy RedPanda producer/consumer in events.go untouched per [Q-mine-3]. - core/services/billing/main.go — NATS_URL env; subscriber wired in parallel with the existing RedPanda tenant-events consumer. - middleware/jwt.go — exported test helper WithClaims so handler tests can construct an authenticated context without minting a real signed token. - .github/workflows/services-build.yaml — metering-sidecar added to the build matrix; deploy job skips it (image consumed by the bp-newapi chart, not products/catalyst sme-services). F) bp-newapi chart (1.0.0 → 1.1.0) - meteringSidecar block in values.yaml: image, port, NATS URL, priceMicroOMRPerToken (default 156 = 0.000156 OMR/token), spool dir, header names, resources, securityContext (read-only-rootfs). - deployment.yaml renders the sidecar container + emptyDir spool volume when meteringSidecar.enabled (default true). - service.yaml routes the cluster-fronting :3000 to the sidecar when enabled, exposes a separate :3001 → NewAPI direct port for bp-catalyst-platform admin-API traffic (ADR-0003 §3.2). - networkpolicy.yaml allows the sidecar's port + nats-system egress for JetStream publish. Tests: 33 new (14 sidecar + 11 subscriber + 8 HTTP handler), all green. Helm template renders cleanly with sidecar enabled and disabled. Closes #798 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(billing/store): cast SUM to BIGINT so lib/pq scans into int64 (#798) Postgres returns `SUM(int) + SUM(bigint)/integer` as `numeric`, which lib/pq presents as a `[]uint8` decimal string ("50.000000000000000000000000") that does NOT scan directly into Go int64 — the integration test TestVoucherLifecycle_IssueRedeemAndCreditApplied caught this in CI on the post-redeem balance read. Wrap the SUM expressions in CAST(... AS BIGINT) so the column type is unambiguously bigint and Scan target stays uniform across pre-#798 rows (amount_omr only) and post-#798 rows (amount_micro_omr present). Affects: - GetCreditBalance - GetCreditBalanceMicroOMR - RecordUsage's running-balance read Test mocks updated to match the new SQL prefix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:32:42 +04:00
e3mrah	2a034a0959	feat(catalog): unified catalog with Published flag — operator curates marketplace (#710 wave 2) (#724 ) Single source of truth for apps; Sovereign-console operator decides which apps marketplace customers see; marketplace storefront filters by Published. Per founder rule 2026-05-04: unpublish is a marketplace- visibility toggle, not a deployment-lifecycle action — existing tenant deployments of an unpublished app keep running unaffected. core/services/catalog/store/store.go ==================================== - App.Published bool — operator-controlled visibility - ListPublishedApps: marketplace-storefront subset (Published=true AND System=false AND Deployable=true). System and Deployable are catalog-team-controlled; Published is the operator's curation knob. - SetAppPublished(slug, bool) — hot-path one-bit write the Sovereign console hits per row toggle. Cheaper than UpdateApp; slug-keyed so the UI doesn't need the internal Mongo _id. - UpdateApp: thread published through full-update path too. core/services/catalog/handlers/handlers.go + routes.go ====================================================== - ListApps now honours ?published=true query param: GET /catalog/apps → operator view: every app GET /catalog/apps?published=true → marketplace view: filtered - New PATCH /catalog/admin/apps/{slug}/publish?value={true\|false} for the Sovereign-console operator's row toggle. - requireAdmin gating preserved on the admin endpoint. core/services/catalog/handlers/seed.go ====================================== - migrateAppPublished: defaults Published=true on every existing app on the day Catalyst 1.3.x ships. Operators opt OUT of marketplace visibility per app, not IN — matches how a real SaaS storefront is curated and prevents an empty marketplace on flag-introduction day. Idempotent on re-run. core/marketplace/src/lib/api.ts ================================ - getApps() now hits /catalog/apps?published=true so the marketplace storefront only renders the operator-curated subset. DoD pending wave 2.5 ==================== The Sovereign-console "Catalog & publishing" admin page (per-row toggle UI) is the next chunk and ships in a follow-up — backend + storefront filter are the load-bearing change here. Catalog admins can flip the flag today via the PATCH endpoint; the per-row UI is quality-of-life on top. Co-authored-by: hatiyildiz <hatiyildiz@openova.io>	2026-05-04 11:37:03 +04:00
Emrah Baysal	9519c1ef00	merge: Group L testing (Playwright e2e smoke tests, Hetzner provisioning test scaffold gated on HETZNER_TEST_TOKEN secret, integration tests for bootstrap installer + Dynadot + voucher)	2026-04-28 14:05:59 +02:00
hatiyildiz	7edf63ca7e	docs(franchise),test(billing): voucher CRD propagation invariant #118 verifies that the voucher shape on a franchised Sovereign is identical to Catalyst-Zero. Two artefacts: 1. New §"Voucher shape propagates automatically" in docs/FRANCHISE-MODEL.md explaining WHY there is no propagation problem to solve: vouchers are not a CRD. They are rows in the per-Sovereign billing service's Postgres database, and every Sovereign runs the same SHA-pinned core/services/billing image. Same image → same migration → same schema → same handlers → same shape. The doc lists which file owns each part of the shape and includes a 4-step curl smoke test to run on any Sovereign at first-provisioning to confirm the invariant holds. 2. New core/services/billing/handlers/vouchers_test.go covering the public POST /billing/vouchers/redeem-preview endpoint added in #117. Four cases: - 404 on unknown / soft-deleted code (no tombstone leak) - 200 on a valid live code, asserting the public shape excludes times_redeemed and max_redemptions (defence-in-depth against enumeration) - 410 Gone on a code that exists but has hit its cap, with the credit/description still in the response so the landing page can show "campaign ended" - 400 on whitespace-only input The tests run on every CI build of the billing service, on every Sovereign that builds from this repo. If a future change drifts the preview endpoint's shape, the tests fail before the regression can ship. Also tidies vouchers.go imports (removed two unused stdlib imports that were placeholder). Closes #118.	2026-04-28 13:59:31 +02:00
hatiyildiz	12387a4a74	feat(billing): /billing/vouchers/{issue,list,revoke,redeem-preview} surface #117 adds a franchise-aligned URL surface for the existing PromoCode voucher implementation, plus one new endpoint (redeem-preview) for the public landing flow described in docs/FRANCHISE-MODEL.md §3. The orchestrator's hint was right — the issue/list/revoke handlers already exist (AdminUpsertPromo / AdminListPromos / AdminDeletePromo on the legacy /billing/admin/promos surface). This commit: 1. Adds new endpoint handlers in core/services/billing/handlers/vouchers.go: - POST /billing/vouchers/issue (superadmin or sovereign-admin) - GET /billing/vouchers/list (superadmin or sovereign-admin) - DELETE /billing/vouchers/revoke/{code} (superadmin or sovereign-admin) - POST /billing/vouchers/redeem-preview (unauthenticated; public) The first three reuse the existing store-layer methods. The last is new — it validates a code without consuming it, returning a safe shape (no times_redeemed, no max_redemptions exposure) so an attacker scraping the public endpoint cannot enumerate cap status. 2. Distinguishes 404 (code never existed or soft-deleted — same tombstone-leak protection as #91) from 410 Gone (code exists but is inactive or capped). The 410 body still includes the credit and description so the landing page can show "this campaign has ended". 3. Keeps the legacy /billing/admin/promos endpoints in place — the existing admin UI continues to work without any breaking change. New code should target /billing/vouchers/... 4. Updates docs/FRANCHISE-MODEL.md to point to the new URL surface. The actual REDEMPTION still happens transactionally inside POST /billing/checkout via the `promo_code` field — that path locks the promo row, inserts the promo_redemptions edge, increments times_redeemed, and adds the credit_ledger entry in one transaction. Splitting it into a separate /redeem endpoint would break that atomicity, so we deliberately do not add one. The public redeem flow is preview → signup → checkout-with-promo_code. Closes #117.	2026-04-28 13:54:19 +02:00
hatiyildiz	3e956b7d81	test: voucher issuance integration test — real Postgres (#147 ) Closes the Group L "integration test — voucher issuance via API — issue → redeem → Org created path" ticket. Per docs/INVIOLABLE-PRINCIPLES.md principle #2 (no mocks where the test would otherwise verify real behavior), this test runs against a real PostgreSQL — not sqlmock. The voucher mechanic lives in store.RedeemPromoCode which runs a transaction with SELECT FOR UPDATE on promo_codes, COUNT lookup on promo_redemptions, and inserts into credit_ledger. Mocking SQL strings doesn't verify whether the transactional invariants actually hold under concurrent contention; this codebase has been bitten by exactly that gap before (#93: counter incremented before order was committed). The test is gated on BILLING_TEST_PG_URL — when unset, it skips (NOT mocks). CI populates it via the new postgres service container in .github/workflows/test-billing-integration.yaml. Each test gets its own Postgres schema (via CREATE SCHEMA + libpq's options=-c search_path) so parallel runs don't cross-contaminate, and so goroutine concurrency tests reliably hit the same schema regardless of which pooled connection they pick up. Coverage: - Issue → Redeem → Credit applied (the canonical happy path) - Per-customer double-redemption blocked - Redemption cap enforced under concurrency (12 goroutines fighting for a 5-cap voucher → exactly 5 successful redemptions, no more) - Soft-deleted codes rejected as "not found" (no tombstone leak per #91) - Inactive codes rejected with distinct "not active" error - Two different customers can each redeem the same voucher - Org-creation prerequisites: customer.tenant_id non-empty, balance > 0 (these are the inputs the downstream tenant.created event consumer feeds into CreateTenant — covered by tenant-service consumer_test.go) CI workflow added: .github/workflows/test-billing-integration.yaml runs the tests against a postgres:16-alpine service container with -race. Refs #147 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 13:53:43 +02:00
hatiyildiz	fabedd42c1	feat(admin,billing): per-Sovereign voucher issuance for sovereign-admin #115 extends the existing PromoCode (voucher) admin surface so a sovereign-admin role can issue, list, and revoke vouchers on a franchised Sovereign. No new endpoints, no new schema, no new CRD — all the changes are role-gating widenings on the existing surface. Backend (core/services/billing/handlers/handlers.go): - New `requireVoucherIssuer` helper accepts both `superadmin` and `sovereign-admin`. Used by AdminListPromos, AdminUpsertPromo, and AdminDeletePromo only. All other admin endpoints (Stripe settings, revenue, orders) keep the existing `requireAdmin` (superadmin-only). UI (core/admin/src/components/AdminShell.svelte + BillingPage.svelte): - AdminShell now accepts both roles. Sidebar nav is filtered by role: superadmin sees Revenue / Catalog / Tenants / Orders / Billing; sovereign-admin sees only Billing. Filtering is via a `superadminOnly` flag on each nav item (defence-in-depth: even if a sovereign-admin guesses a URL, the backend's requireAdmin will return 403). - BillingPage hides the Stripe Configuration section for sovereign-admin (it would 403 from GET /billing/admin/settings anyway). The Vouchers (Promo Codes) section is shown to both roles with a small label tweak ("Issued vouchers are scoped to this Sovereign" for sovereign-admin). Per docs/INVIOLABLE-PRINCIPLES.md §1 (target-state shape, no MVP) and §3 (follow documented architecture exactly) — this matches the FRANCHISE-MODEL.md design where "every franchised Sovereign runs the same admin app" with role-based gating. Closes #115.	2026-04-28 13:52:19 +02:00
hatiyildiz	7646840ffe	feat(consolidation): move 8 SME backend services + shared module to public repo Per docs/PROVISIONING-PLAN.md and tickets [B] sme-backend group. Migrates the 8 Go backend services from openova-private/services/ to openova/core/services/, plus the shared module they all depend on, plus the services-build CI workflow. What moved: - services/auth → core/services/auth (Go HTTP service for SME marketplace authentication) - services/billing → core/services/billing (Go HTTP service for billing + voucher backend) - services/catalog → core/services/catalog (Go HTTP service for App catalog) - services/domain → core/services/domain (Go HTTP service for tenant domain mapping) - services/gateway → core/services/gateway (Go HTTP gateway with rate limiting) - services/notification → core/services/notification (Go HTTP service with email templates) - services/provisioning → core/services/provisioning (Go HTTP service that commits tenant Application manifests via Gitea/GitHub API) - services/tenant → core/services/tenant (Go HTTP service for tenant lifecycle) - services/shared → core/services/shared (shared Go module: db, events, health, middleware, respond) - 9 go.mod files updated: module github.com/openova-io/openova-private/services/<X> → github.com/openova-io/openova/core/services/<X> - 9 go.sum and import paths similarly updated - replace directives updated: openova-private/services/shared → openova/core/services/shared - sme-services-build.yaml workflow → services-build.yaml in .github/workflows/, paths/context/image-base/deploy paths all repointed at core/services + ghcr.io/openova-io/openova/services-* + products/catalyst/chart/templates/sme-services - All 8 manifests in products/catalyst/chart/templates/sme-services/ updated: image refs ghcr.io/openova-io/openova-private/sme-{X} → ghcr.io/openova-io/openova/services-{X} - provisioning.yaml GITHUB_REPO env var: "openova-private" → "openova" Closes [B] sme-backend (10 tickets). After this commit, all 14 user-facing + backend Catalyst-Zero modules build from this public repo: - 4 UIs: console, admin, marketplace, catalyst-ui - 2 backends: marketplace-api, catalyst-api - 8 SME services: auth, billing, catalog, domain, gateway, notification, provisioning, tenant - 1 shared Go module Note: 1 line in core/services/provisioning/main.go retains a literal default of "openova-private" for the GITHUB_REPO fallback when env var is unset; the K8s manifest sets GITHUB_REPO=openova explicitly so this path is never exercised in the deployed runtime, and the in-code default will be cleaned up in a follow-up.	2026-04-28 12:30:32 +02:00

36 Commits