Fix#144 raised zoneBootstrap.activeDeadlineSeconds 300s → 840s after
prov #22 hit a 5m DeadlineExceeded on the bp-powerdns post-install hook.
That fix was insufficient: prov #37 + #38 (chroot omantel.biz, 2026-05-12)
both wedged on the SAME chart slot with `BackoffLimitExceeded`, NOT
`DeadlineExceeded`. The deadline never got a chance to fire.
Trace from prov #38 chroot (`KUBECONFIG=/tmp/prov38.kubeconfig kubectl
get hr bp-powerdns -o yaml`):
status:
Helm install failed for release powerdns/powerdns with chart
bp-powerdns@1.2.2: failed post-install: 1 error occurred:
* job powerdns-zone-bootstrap failed: BackoffLimitExceeded
Pod events for powerdns-zone-bootstrap-tq7qq:
59m Started container zone-bootstrap
56m Back-off restarting failed container zone-bootstrap
55m Job has reached the specified backoff limit
Root cause walked end-to-end (per CLAUDE.md TRACE rule):
TEST: bp-powerdns HR Ready=True
↑
HR: Helm install succeeds (post-install Job exits 0)
↑
Zone-bootstrap Job: curl POST succeeds
↑
powerdns:8081 Service: reachable (has Ready endpoints)
↑
powerdns Deployment: Pods Ready (3 replicas) ← Pending, blocked here
↑
CNPG cluster: pdns-pg-app Secret exists
↑
pdns-pg-1-initdb Pod: scheduled, Running, Completed ← Pending too
↑
Worker node has capacity ← 99% CPU requested
The zone-bootstrap container curl'd `http://powerdns:8081`, hit
"connection refused" (empty Service endpoints), exited 7, container
restarted under `restartPolicy: OnFailure`. After 6 Kubernetes-level
backoffs (≈10min wall-time with exponential delay), the Job declared
`BackoffLimitExceeded` — well before activeDeadlineSeconds=840s
(14min) could even consider firing.
Fix#144 was directionally right (the upstream IS slow on cold k3s) but
operated on the wrong knob. The container's outer-loop retry budget is
bounded by backoffLimit × backoff-delay, not by activeDeadlineSeconds.
Bumping only the deadline left the BackoffLimit ceiling unchanged.
Architectural fix (this commit):
1. Move the wait-for-API loop INSIDE the container (one Pod, one inner
poll loop, restartPolicy=Never). The inner loop polls
GET /api/v1/servers every 10s until HTTP 200, bounded by new
`apiReadyTimeoutSeconds` (default 600s = 10min). Now ONE container
run owns the full wait budget instead of N short-lived containers
racing the backoff timer.
2. restartPolicy: OnFailure → Never. The container script handles its
own retry; Kubernetes-level backoff is reserved for genuinely
transient pod failures (image-pull, OS eviction) where the Job-level
backoffLimit=6 still triggers a fresh Pod.
3. Surface POWERDNS_API_READY_TIMEOUT_S env var so operators on slower
clusters can raise the inner deadline without forking the chart
(per docs/INVIOLABLE-PRINCIPLES.md #4).
4. New value `zoneBootstrap.apiReadyTimeoutSeconds` (default 600s).
Sits below activeDeadlineSeconds (840s) so the zone-creation phase
keeps ≥240s of headroom AFTER the API comes Ready.
Curl status handling in the wait loop:
200 → API up, proceed to bootstrap
401|403 → auth failure, FATAL (no retry — operator misconfig)
000|5xx|... → transient, sleep & retry until inner deadline
Files changed:
- platform/powerdns/chart/Chart.yaml 1.2.2 → 1.2.3 + history
- platform/powerdns/chart/values.yaml + apiReadyTimeoutSeconds knob
- platform/powerdns/chart/templates/
zone-bootstrap-job.yaml inner wait-for-API loop;
restartPolicy: Never
- clusters/_template/bootstrap-kit/
11-powerdns.yaml pin to 1.2.3 + HR comment
Why this is sufficient where Fix#144 was not:
Fix#144 worked the chart-level deadline. This commit works the
inner-loop ownership — the wait budget is now owned by the script
inside the container, not by the Job spec arithmetic
(backoffLimit × backoff-delay). The Job's outer activeDeadlineSeconds
still caps the worst-case runtime (no runaway poll), but the script
now actually GETS to use it.
Verification:
- helm template renders cleanly (deps build OK, empty-zones short-
circuit preserved, non-empty zones render Job + RBAC + Audit CM)
- kubectl create --dry-run=client --validate=false: 5/5 resources
created (sa, role, rb, cm, job)
- chart 1.2.3 pinned in clusters/_template/bootstrap-kit/11-powerdns.yaml
Companion infrastructure note (NOT addressed by this commit, flagged
for Coordinator):
The DEEPER bottom of the trace stack is worker capacity. Prov #38's
single cpx32 worker (8 vCPU / 16 GB) is at 99% CPU requested. The
cluster-autoscaler attempted 2→3 scale-up but is in backoff because
two unscheduled pods (gitea/gitea-* PV affinity conflict from a
previous wedged install; trivy-system/node-collector NodeAffinity)
poison the autoscaler's "can the template node fit" check. Even with
this chart fix in place, the powerdns Deployment cannot become Ready
until either:
(a) the worker autoscales successfully (gitea PV migrated / trivy
taints relaxed), or
(b) worker_count is bumped from 2 to 3 in the provisioning body, or
(c) qa_worker_size is bumped to cpx42.
This chart fix ensures bp-powerdns survives a slow CNPG cold-start.
It does NOT fix a fundamentally undersized cluster. Coordinator next
step: reprov with worker_count=3 OR qa_worker_size=cpx42 + this chart
landed. Either should converge.
Co-authored-by: e3mrah <1234567+e3mrah@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>