← Back to Insights

Patching Kubernetes Node Images Without Downtime: A Practical Walkthrough

Kubernetes cluster node patching

Patching a Kubernetes node pool sounds like a solved problem until you're staring at a PodDisruptionBudget that allows exactly one pod unavailable and a StatefulSet running three replicas with persistent volumes attached to zone-specific EBS volumes. At that point, "rolling update" becomes a precisely choreographed sequence of drain, replace, and verify operations — and the margin for mistakes is four minutes before the load balancer starts sending 502s to your health check endpoint.

This walkthrough covers the complete process for updating node images on EKS, AKS, and self-managed clusters: pre-patch inventory, PDB validation, controlled drain procedures, the replacement order for stateful vs stateless workloads, and post-patch verification steps that confirm you've actually applied the fix — not just replaced nodes.

Why Node Image Patching Is Different from Package Patching

On a traditional Linux server, patching means running apt-get upgrade or yum update, verifying the service restarts cleanly, and closing the ticket. On a Kubernetes node, the node itself is intended to be immutable — the operating model assumes you replace nodes rather than modify them in place. This means patching a kernel vulnerability on an EKS node requires creating new node group instances from an updated AMI, draining the existing nodes, and deleting them — not SSHing in to run apt-get upgrade.

Package-level patching on live nodes is also supported and sometimes necessary for rapid response to a critical CVE when a new AMI isn't yet available from your cloud provider. But it creates configuration drift: your node's running state no longer matches the base image, which breaks reproducibility and can cause issues during auto-scaling events. The standard practice is AMI-based replacement for routine patching, with in-place package updates reserved for emergency situations where waiting for an updated AMI is operationally infeasible.

Pre-Patch Inventory: Know Your PDBs Before You Drain Anything

The most common mistake in Kubernetes node patching is starting the drain operation before auditing PodDisruptionBudgets. A kubectl drain command that hits a PDB conflict will hang indefinitely until you --force it (which violates the budget) or manually resolve the conflict. Either outcome is worse than spending five minutes on pre-flight checks.

Run kubectl get pdb --all-namespaces and identify any PDB where ALLOWED DISRUPTIONS is 0. A zero means you cannot evict any pod in that deployment or StatefulSet right now — either because the minimum available count is already at its minimum, or because some pods are currently unhealthy. For each zero-disruption PDB, you need to understand why before proceeding. Common causes: a deployment has fewer replicas than its minAvailable count (someone scaled down for cost savings), or pods are in a CrashLoopBackOff state that the team hasn't investigated yet.

Also enumerate StatefulSets with persistent volumes. Run kubectl get statefulset --all-namespaces -o wide and check the READY column. For each StatefulSet, check which availability zone its PVCs are pinned to — an EBS volume in us-east-1a cannot be attached by a replacement node that schedules into us-east-1b. This matters when your new node group defaults to a different zone than your old group.

The EKS Node Group Replacement Procedure

AWS EKS updates node groups by launching new nodes from the updated launch template, then cordoning and draining the old nodes. The managed node group API handles this when you initiate an update from the console or CLI: aws eks update-nodegroup-version --cluster-name prod-cluster --nodegroup-name workers --release-version 1.30.2-20251001. AWS will respect your configured max unavailable count during the rollout.

What the managed update does not handle automatically: verifying that all daemonsets are healthy on new nodes before draining the next old node, checking that your stateful workloads have reconnected their persistent volumes after rescheduling, or waiting for application-layer health checks to pass (not just Kubernetes readiness probes). For production clusters, supplement the managed update with monitoring of your APM tool — Datadog, New Relic, or Prometheus-based — to confirm that p99 latency and error rates remain stable throughout the rollout.

For self-managed nodes, the procedure is more manual: launch new instances with the updated AMI in your auto-scaling group, wait for them to join the cluster and become Ready, then drain old nodes one at a time with kubectl drain node/NODE_NAME --ignore-daemonsets --delete-emptydir-data --pod-selector='role!=critical'. The --ignore-daemonsets flag is required because daemonsets by definition run on every node and will be restarted on the new node automatically. The --delete-emptydir-data flag is needed for pods using emptyDir volumes — confirm that no critical data lives only in emptyDir before proceeding.

Handling Stateful Workloads During Node Drain

StatefulSets require the most care. When a pod from a StatefulSet is evicted from a draining node, Kubernetes reschedules it on another available node. If the PVC uses a zone-specific storage class (EBS, Azure Disk), the pod can only reschedule onto nodes in the same availability zone as the volume. If no such node is available, the pod sits in Pending indefinitely.

Before draining nodes in a zone, ensure you have replacement nodes ready in that same zone. In EKS, this means checking your node group's multi-zone configuration and confirming the new instances are healthy before you drain the old ones. For workloads using EFS or other zone-agnostic storage, this constraint doesn't apply — pods can reschedule across zones freely.

A practical safeguard for StatefulSet-heavy clusters: drain nodes one at a time and wait for the StatefulSet to return to its target ready count before proceeding to the next node. This is slower than draining multiple nodes in parallel, but it guarantees that you never reduce a StatefulSet's availability below its configured minimum. For a three-replica Kafka cluster with minAvailable=2, draining two nodes simultaneously would drop availability below the threshold and trigger a partition leader election that can cause consumer lag in downstream services.

Verifying the Patch Actually Applied

After completing the node replacement, confirm that the new nodes are running the expected AMI version and that the patched packages are actually present at the expected versions. For EKS, check the AMI ID on the new nodes against the release notes for the AMI version you targeted. For custom AMIs, run a post-patch scan with your vulnerability scanner to confirm the CVE is closed at the OS level.

A common oversight: replacing nodes but not updating the default launch template or the ASG launch configuration. If your auto-scaling group still references the old AMI, scale-out events will launch unpatched nodes into your cluster. The fix is to update the ASG launch template as part of the patching process, not as an afterthought.

Also verify your container base images. Patching the node OS closes kernel-level and host-level CVEs, but vulnerabilities in your container images are unaffected by node replacement. A patched host running a container with a vulnerable OpenSSL version is still a vulnerable container. The node patch and the image patch are separate remediations with separate patch queues — both matter, and they shouldn't be confused.

Automating the Rollout with PatchGuard

PatchGuard's Kubernetes integration handles node pool patching by wrapping the EKS managed node group update API with pre-flight PDB checks and post-flight health verification. Before initiating any node drain, PatchGuard confirms that ALLOWED DISRUPTIONS is greater than zero for all PDBs across all namespaces. If any PDB is blocking, the operation pauses and alerts the operator with the specific PDB name and namespace — rather than failing silently or forcing through the disruption budget.

Post-drain, PatchGuard runs a configurable health check window: it monitors CPU, memory, and error rate metrics for the rescheduled workloads for five minutes after each node drain completes. If any metric exceeds the configured threshold during that window, the rollout halts and a rollback procedure is triggered — restoring the drained node to service and reverting the scheduling changes. This catches the scenario where a workload behaves differently on the new node version before you've committed to draining the remaining old nodes.

A Note on Kernel Patches and Runtime Restarts

Some kernel CVEs require a reboot to take effect, even after the package is installed. On managed node pools, this happens automatically during the replacement process — new nodes boot with the patched kernel. For in-place package updates on live nodes, you'll need to schedule reboots explicitly. kured (the Kubernetes Reboot Daemon) is the standard approach: it watches for the /var/run/reboot-required sentinel file and triggers a coordinated node drain and reboot, respecting PDBs during the process. If you're doing emergency in-place patching and cannot use node replacement, deploy kured first and let it manage the reboot sequencing.

The operational principle across all these scenarios is the same: treat each node drain as a controlled experiment with a defined rollback condition, not a bulk operation to be completed as fast as possible. The four-minute PDB budget in the opening paragraph isn't a constraint to work around — it's the system telling you exactly how much disruption it can absorb. Stay within it.