openova/platform/powerdns
e3mrah ce76a7b7ab
fix(bp-powerdns): root-cause Job DeadlineExceeded recurrence (post Fix #144) (#1425)
Fix #144 raised zoneBootstrap.activeDeadlineSeconds 300s → 840s after
prov #22 hit a 5m DeadlineExceeded on the bp-powerdns post-install hook.
That fix was insufficient: prov #37 + #38 (chroot omantel.biz, 2026-05-12)
both wedged on the SAME chart slot with `BackoffLimitExceeded`, NOT
`DeadlineExceeded`. The deadline never got a chance to fire.

Trace from prov #38 chroot (`KUBECONFIG=/tmp/prov38.kubeconfig kubectl
get hr bp-powerdns -o yaml`):

  status:
    Helm install failed for release powerdns/powerdns with chart
    bp-powerdns@1.2.2: failed post-install: 1 error occurred:
      * job powerdns-zone-bootstrap failed: BackoffLimitExceeded

Pod events for powerdns-zone-bootstrap-tq7qq:
  59m Started container zone-bootstrap
  56m Back-off restarting failed container zone-bootstrap
  55m Job has reached the specified backoff limit

Root cause walked end-to-end (per CLAUDE.md TRACE rule):

  TEST: bp-powerdns HR Ready=True
    ↑
  HR: Helm install succeeds (post-install Job exits 0)
    ↑
  Zone-bootstrap Job: curl POST succeeds
    ↑
  powerdns:8081 Service: reachable (has Ready endpoints)
    ↑
  powerdns Deployment: Pods Ready (3 replicas)  ← Pending, blocked here
    ↑
  CNPG cluster: pdns-pg-app Secret exists
    ↑
  pdns-pg-1-initdb Pod: scheduled, Running, Completed  ← Pending too
    ↑
  Worker node has capacity                              ← 99% CPU requested

The zone-bootstrap container curl'd `http://powerdns:8081`, hit
"connection refused" (empty Service endpoints), exited 7, container
restarted under `restartPolicy: OnFailure`. After 6 Kubernetes-level
backoffs (≈10min wall-time with exponential delay), the Job declared
`BackoffLimitExceeded` — well before activeDeadlineSeconds=840s
(14min) could even consider firing.

Fix #144 was directionally right (the upstream IS slow on cold k3s) but
operated on the wrong knob. The container's outer-loop retry budget is
bounded by backoffLimit × backoff-delay, not by activeDeadlineSeconds.
Bumping only the deadline left the BackoffLimit ceiling unchanged.

Architectural fix (this commit):

1. Move the wait-for-API loop INSIDE the container (one Pod, one inner
   poll loop, restartPolicy=Never). The inner loop polls
   GET /api/v1/servers every 10s until HTTP 200, bounded by new
   `apiReadyTimeoutSeconds` (default 600s = 10min). Now ONE container
   run owns the full wait budget instead of N short-lived containers
   racing the backoff timer.

2. restartPolicy: OnFailure → Never. The container script handles its
   own retry; Kubernetes-level backoff is reserved for genuinely
   transient pod failures (image-pull, OS eviction) where the Job-level
   backoffLimit=6 still triggers a fresh Pod.

3. Surface POWERDNS_API_READY_TIMEOUT_S env var so operators on slower
   clusters can raise the inner deadline without forking the chart
   (per docs/INVIOLABLE-PRINCIPLES.md #4).

4. New value `zoneBootstrap.apiReadyTimeoutSeconds` (default 600s).
   Sits below activeDeadlineSeconds (840s) so the zone-creation phase
   keeps ≥240s of headroom AFTER the API comes Ready.

Curl status handling in the wait loop:
  200          → API up, proceed to bootstrap
  401|403      → auth failure, FATAL (no retry — operator misconfig)
  000|5xx|...  → transient, sleep & retry until inner deadline

Files changed:
- platform/powerdns/chart/Chart.yaml         1.2.2 → 1.2.3 + history
- platform/powerdns/chart/values.yaml        + apiReadyTimeoutSeconds knob
- platform/powerdns/chart/templates/
    zone-bootstrap-job.yaml                  inner wait-for-API loop;
                                              restartPolicy: Never
- clusters/_template/bootstrap-kit/
    11-powerdns.yaml                         pin to 1.2.3 + HR comment

Why this is sufficient where Fix #144 was not:

Fix #144 worked the chart-level deadline. This commit works the
inner-loop ownership — the wait budget is now owned by the script
inside the container, not by the Job spec arithmetic
(backoffLimit × backoff-delay). The Job's outer activeDeadlineSeconds
still caps the worst-case runtime (no runaway poll), but the script
now actually GETS to use it.

Verification:
- helm template renders cleanly (deps build OK, empty-zones short-
  circuit preserved, non-empty zones render Job + RBAC + Audit CM)
- kubectl create --dry-run=client --validate=false: 5/5 resources
  created (sa, role, rb, cm, job)
- chart 1.2.3 pinned in clusters/_template/bootstrap-kit/11-powerdns.yaml

Companion infrastructure note (NOT addressed by this commit, flagged
for Coordinator):

The DEEPER bottom of the trace stack is worker capacity. Prov #38's
single cpx32 worker (8 vCPU / 16 GB) is at 99% CPU requested. The
cluster-autoscaler attempted 2→3 scale-up but is in backoff because
two unscheduled pods (gitea/gitea-* PV affinity conflict from a
previous wedged install; trivy-system/node-collector NodeAffinity)
poison the autoscaler's "can the template node fit" check. Even with
this chart fix in place, the powerdns Deployment cannot become Ready
until either:
  (a) the worker autoscales successfully (gitea PV migrated / trivy
      taints relaxed), or
  (b) worker_count is bumped from 2 to 3 in the provisioning body, or
  (c) qa_worker_size is bumped to cpx42.

This chart fix ensures bp-powerdns survives a slow CNPG cold-start.
It does NOT fix a fundamentally undersized cluster. Coordinator next
step: reprov with worker_count=3 OR qa_worker_size=cpx42 + this chart
landed. Either should converge.

Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 02:13:34 +04:00
..
chart fix(bp-powerdns): root-cause Job DeadlineExceeded recurrence (post Fix #144) (#1425) 2026-05-12 02:13:34 +04:00
blueprint.yaml feat(catalyst-chart): land Blueprint CRD + fix 5 string-form depends (slice B4, #1095) (#1112) 2026-05-08 22:25:08 +04:00