openova/platform/opentelemetry-operator
e3mrah f18dd8df19
feat(bp-opentelemetry-operator): scaffold operator + default Instrumentation CR (slice H5, #1095) (#1121)
New platform/opentelemetry-operator/ Blueprint scaffold per design doc
§3.9 row 5. Companion to existing bp-opentelemetry (the collector) —
this Blueprint ships the OPERATOR that auto-injects OTel SDK sidecars
into Pods based on annotations:

  instrumentation.opentelemetry.io/inject-{java|nodejs|python|dotnet}: "default"

Two-Blueprint split is intentional: collector and operator are separate
upgrade cycles. Mixing them risks coupling observability cadence to
auto-instrumentation cadence, and the operator's mutating admission
webhook intercepts every Pod creation cluster-wide so misconfiguration
is high-blast-radius.

What ships:
- platform/opentelemetry-operator/README.md — activation contract
- platform/opentelemetry-operator/blueprint.yaml — bp-opentelemetry-operator 1.0.0
- platform/opentelemetry-operator/chart/Chart.yaml — wraps upstream
  opentelemetry-operator:0.61.0 from open-telemetry-helm-charts.
  Subchart `condition: enabled` — default-off skips it entirely.
- platform/opentelemetry-operator/chart/values.yaml — gate, default
  Instrumentation CR config (exporterEndpoint, sampler, per-language
  toggles), upstream subchart values (manager.collectorImage.repository
  required, serviceAccount, cert-manager-backed admission webhook)
- platform/opentelemetry-operator/chart/templates/instrumentation-default.yaml
  — Catalyst overlay Instrumentation CR with parentbased_traceidratio
  sampler @ 0.25 default, propagators (tracecontext + baggage + b3),
  per-language injection toggles. Default OFF; namespace = cilium by
  default (operator overrides per Sovereign).

Default-OFF for both layers:
- .Values.enabled: false → upstream subchart's `condition: enabled`
  also fires, so 0 resources rendered total
- Even after .Values.enabled=true, the Catalyst Instrumentation CR
  is gated again by .Values.defaultInstrumentation.enabled=false so
  installing the chart doesn't auto-inject anywhere

Per docs/INVIOLABLE-PRINCIPLES.md #4 every parameter (sampler ratio,
exporter endpoint, per-language toggles, namespace) is in values.yaml.

Validated:
- helm dependency build pulls upstream cleanly
- helm template with default values: 0 resources rendered
- helm template with enabled=true defaultInstrumentation.enabled=true:
  22 resources rendered (upstream operator manager Deployment, CRDs,
  RBAC, mutating + validating webhooks, cert-manager Issuer +
  Certificate, plus the Catalyst Instrumentation CR)

Out of scope for this slice:
- Add this Blueprint to clusters/_template/bootstrap-kit/ — EPIC-5
  (#1100) sequences both bp-opentelemetry (collector first) and this
  Blueprint as part of the observability roll-out
- Per-Application Instrumentation CRs from Blueprint.spec.observability.
  traces=otlp — application-controller (slice C4 of #1095) renders
  those at install time

Refs: #1094, #1095, #1100, docs/EPICS-1-6-unified-design.md §3.9 row 5
+ §8.4 (EPIC-5 Networking).

Co-authored-by: hatiyildiz <hatiyildiz@noreply.openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 23:06:29 +04:00
..
chart feat(bp-opentelemetry-operator): scaffold operator + default Instrumentation CR (slice H5, #1095) (#1121) 2026-05-08 23:06:29 +04:00
blueprint.yaml feat(bp-opentelemetry-operator): scaffold operator + default Instrumentation CR (slice H5, #1095) (#1121) 2026-05-08 23:06:29 +04:00
README.md feat(bp-opentelemetry-operator): scaffold operator + default Instrumentation CR (slice H5, #1095) (#1121) 2026-05-08 23:06:29 +04:00

bp-opentelemetry-operator

Status: Phase-0 scaffold (#1095 slice H5). Activated by EPIC-5 (#1100). Updated: 2026-05-08

The OpenTelemetry Operator. Provides the Instrumentation CRD that auto-injects OTel SDK sidecars into Pods based on annotations:

metadata:
  annotations:
    instrumentation.opentelemetry.io/inject-java: "true"
    # or inject-dotnet / inject-nodejs / inject-python

When the annotation is set, the operator's mutating admission webhook adds an init container that copies the OTel SDK into a shared volume and edits the main container's env vars (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_RESOURCE_ATTRIBUTES, OTEL_TRACES_EXPORTER, etc.) to point at the collector deployed by bp-opentelemetry.

This Blueprint is separate from bp-opentelemetry. The latter is the collector (DaemonSet/Deployment scraping + forwarding to Tempo/Loki/Mimir); this one is the operator that injects per-Pod instrumentation. Two distinct upgrade cycles, two distinct opt-ins.

What it ships

Template Effect
Upstream opentelemetry-operator Helm subchart The operator Pod + Instrumentation CRD.
instrumentation-default.yaml A default Instrumentation CR named default in each Org namespace. Operator + per-Org overlays opt in to Java/.NET/Node/Python auto-injection.

Activation contract

# values.yaml override (or per-Sovereign overlay)
enabled: true
defaultInstrumentation:
  enabled: true
  # Where the auto-injected SDK ships traces/logs/metrics. The collector
  # Service is created by bp-opentelemetry; this references it.
  exporter:
    endpoint: http://opentelemetry-collector.monitoring.svc:4317
  java: { enabled: true, image: "ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest" }
  nodejs: { enabled: true }
  python: { enabled: true }
  dotnet: { enabled: false }

When enabled: false (the default), no resources render — installing this chart is a no-op until the operator opts in.

Why default-OFF

  1. The Operator's mutating admission webhook intercepts every Pod creation in the cluster. A misconfigured CR can break workloads cluster-wide.
  2. The Instrumentation CR ties traces to a collector endpoint — bp-opentelemetry (collector) must be reconciled FIRST and reachable on the configured Service URL.
  3. EPIC-5 (#1100) sequences both: collector first, exporters wired (Tempo/Loki/Mimir), then operator + Instrumentation CR.

References