Kubernetes Cluster Autoscaler

Scale workers to match
real pod demand

Set min and max bounds on a worker pool and the cluster autoscaler grows it when Pending pods need room and shrinks it when nodes go idle. Built on upstream Kubernetes cluster-autoscaler with a Hypervisor.io cloud provider.

On this page

Kubernetes Cluster Autoscaler

The cluster autoscaler watches for Pending pods that can't fit on existing nodes and asks the Hypervisor.io control panel to add more workers. When workers sit idle long enough, it asks the panel to remove them. This page covers how it's installed, how to tune it, and how to diagnose the common cases where it does (or doesn't) move.

Overview #

The autoscaler shipped with Hypervisor.io clusters is the upstream Kubernetes cluster-autoscaler with a Hypervisor.io cloud provider compiled in. It's a single Go binary running as a Deployment on the control plane, talking to two endpoints:

Because it uses the same upstream codebase that runs every major managed Kubernetes service, anything in the upstream FAQ and flag reference applies here. The Hypervisor.io-specific part is the cloud provider that turns a "scale node group from N to N+1" request into a real worker VM in your region.

Workers only. Control plane nodes are not part of any node group. The autoscaler never touches them, even on an HA cluster. To change control plane sizing, use the upgrade or resize flow on the cluster page.

What triggers scale-up?

What triggers scale-down?

Install #

The autoscaler is auto-installed by the panel the moment a worker pool has autoscaling enabled. There's nothing to helm install and no kubeconfig to wire up. On the cluster's detail page, the Autoscaler tab shows current status, image version, args, recent scale events, and a button to update the args without rolling the rest of the cluster.

When a pool is created or edited with autoscaling enabled:

  1. The panel writes the Deployment, ServiceAccount, ClusterRole and ClusterRoleBinding into kube-system.
  2. The image is selected from the compatibility matrix below based on the cluster's Kubernetes version.
  3. A token tied to the cluster is mounted into the pod so the autoscaler can authenticate to the panel's management API.
  4. Default args are applied (shown in the table below). They can be edited from the Autoscaler tab; the pod restarts in a few seconds.

To turn it off, disable autoscaling on every worker pool. The Deployment is removed and the cluster reverts to fixed-size pools.

One autoscaler per cluster, many pools. A single Deployment manages every autoscaling worker pool in the cluster. You don't run multiple autoscalers; you add more pools.

How it works #

Each worker pool becomes a "node group" inside the autoscaler. Workers are tagged so the autoscaler knows which pool they belong to and which template (CPU, RAM, disk) they were sized from.

flow
Pod stuck Pending
    │
    ▼
cluster-autoscaler  (running in kube-system on the CP)
    │  picks node group whose template fits the pod
    │  applies expander rule if multiple groups qualify
    ▼
Hypervisor.io management API
    │  provisions a new worker VM in your region
    │  installs kubelet, joins the cluster
    ▼
New worker registers with the apiserver
    │  becomes Ready in 1-3 minutes
    ▼
Pending pod is scheduled onto the new worker

Scale-down runs in reverse: a candidate worker is cordoned, its pods are drained with respect for PodDisruptionBudgets, then the panel deletes the VM and the node disappears from kubectl get nodes.

Common args #

Defaults are good for most workloads. Override them from the cluster's Autoscaler tab. Changes take effect within seconds of saving (the pod restarts).

FlagDefaultWhat it does
--scale-down-delay-after-add 10m How long to wait after the last scale-up before any scale-down is considered. Stops the autoscaler from oscillating during bursty traffic.
--scale-down-unneeded-time 10m A node must sit below the utilization threshold for at least this long before it becomes a removal candidate.
--scale-down-utilization-threshold 0.5 A node is "unneeded" only if both CPU and memory request-utilization are under this fraction. Lower it to be more aggressive about reclaiming idle nodes; raise it to keep more headroom.
--max-node-provision-time 15m If a newly requested worker isn't Ready within this window, the autoscaler gives up on it and tries a different node group (or surfaces the failure).
--scan-interval 10s How often the autoscaler re-evaluates the cluster. Lower = faster reaction to Pending pods, higher = less apiserver load on very large clusters.
--expander random Strategy used when more than one node group could host a Pending pod. See Expander strategies below.
--max-empty-bulk-delete 10 Maximum number of empty nodes deleted in one scale-down pass. Useful on very large clusters where draining 50 nodes at once is undesirable.
--skip-nodes-with-system-pods true Don't scale down nodes hosting kube-system pods that aren't managed by a controller. Keeps stray system pods from blocking removal. Most people leave this alone.
--skip-nodes-with-local-storage true Don't scale down a node if any pod on it uses emptyDir or HostPath. Switch to false only if you've verified those pods can lose their local data.
Don't set these too low. Cutting --scale-down-delay-after-add to 0s or --scale-down-unneeded-time to 30s looks responsive in testing but causes thrashing in production. Each scale-down forces a VM delete + Kubernetes node deregistration, which is not free.

Expander strategies #

When a Pending pod could fit in more than one of your node groups, the expander breaks the tie. Pick the one that matches how your pools differ.

ValueBehaviourUse when
random Picks any qualifying group at random. All your pools are roughly equivalent.
most-pods Picks the group that would schedule the largest number of Pending pods with a single new node. You have a backlog of similar small pods and want fewer, bigger nodes.
least-waste Picks the group whose template node leaves the least unallocated CPU + memory after placing the pods. Pools differ in size and you want to minimize wasted resource on each new node.
priority Uses a cluster-autoscaler-priority-expander ConfigMap in kube-system to pick groups in a defined order, with regex-matched fallbacks. You have a preferred cheap pool and a fallback pool (for example, "use the standard pool first; only burst into the high-memory pool if the standard pool is at max").

For mixed-instance clusters, least-waste is the most common pick. For homogeneous clusters with one autoscaling pool, the choice doesn't matter and random is fine.

Why no scale-down? #

The most common autoscaler ticket is "I have an idle node sitting there and the autoscaler won't remove it". Almost always one of these.

1. A pod uses local storage

If any pod on the node mounts an emptyDir or HostPath volume, the autoscaler refuses to drain it by default. emptyDir data is lost when the pod moves to a different node, so the autoscaler errs on the side of caution. Either:

2. A kube-system pod has no PodDisruptionBudget

kube-system pods that aren't controlled by a Deployment / DaemonSet / StatefulSet (rare, but it happens with one-off jobs or hand-rolled manifests) block scale-down. Either give the pod a controller, or set cluster-autoscaler.kubernetes.io/safe-to-evict: "true" on it.

3. A pod is annotated safe-to-evict: false

An explicit cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation on any pod pins the node it's on. This is often intentional (singleton workloads, long-running batch jobs) but easy to forget. kubectl get pods -A -o jsonpath='{range .items[?(@.metadata.annotations.cluster-autoscaler\.kubernetes\.io/safe-to-evict=="false")]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' finds them all.

4. DaemonSet pods

By default the autoscaler ignores DaemonSet pods when computing utilization, which is what you want. If a DaemonSet pod is non-idempotent and you've added a custom annotation to block its eviction, the node won't drain.

5. The pool is already at min

Worker pools have a configured minimum (set on the cluster's Workers tab). The autoscaler never scales a pool below its min, even if every node is empty. If you want it to go to zero, set min: 0 on the pool and make sure no critical workload has affinity for that pool.

6. Recent scale-up cooldown

Right after the autoscaler adds a worker, no scale-down can run for --scale-down-delay-after-add (default 10 minutes). This is a feature, not a bug; it prevents oscillation. If you're testing scale-down behaviour, wait the cooldown out before drawing conclusions.

7. Utilization is just barely above threshold

Utilization is measured against requests, not actual usage. A pod that requests: cpu=500m but actually uses 5m of CPU still counts as 500m. Pools full of generously-sized requests look "busy" even when CPU graphs are flat. Either right-size the requests, or lower --scale-down-utilization-threshold.

The autoscaler explains itself. Run kubectl -n kube-system logs deploy/cluster-autoscaler --tail=200. It prints "scale-down: node X is not eligible because Y" for every blocked candidate. Read that before guessing.

Sizing your pool #

The autoscaler is good at filling out a pool to match real demand. It's bad at picking the right shape of node for you. That's a sizing decision.

Pick a worker plan that fits 2-4 pods comfortably

If your average pod requests 500m CPU and 1 GiB RAM, don't pick a 1-vCPU / 2 GiB worker. Reserved overhead (kubelet, container runtime, OS) typically eats 200-400m CPU and 600-900 MiB RAM per node, so a tiny node fits maybe one pod and the autoscaler ends up adding a whole VM per replica. Pick a worker plan where 2-4 typical pods leave the node still useful.

Steady workloads: fixed min, modest headroom

For a service that's pretty constant - five replicas, day in, day out - set min: 5 and max: 8. The autoscaler stays out of the way during normal operation and only kicks in for traffic spikes or rolling deploys.

Batch / queue workloads: low min, large max

If you spin up 200 worker pods when a job lands and run zero in between, set min: 0 (the autoscaler will go all the way to empty) and max generously. Pair with a sensible --max-node-provision-time so a job that overshoots quota fails fast rather than hanging.

Latency-sensitive workloads: keep extra capacity warm

New workers take 1-3 minutes to provision and join. If your pods can't tolerate that, oversize the pool's min by one or two workers' worth so there's always a spare node ready for scheduling. Don't try to "make the autoscaler faster" with aggressive flags; instead, keep idle capacity on purpose.

Multiple pools beats one pool

A single autoscaling pool with mixed workloads almost always ends up oversized to satisfy the most-demanding pod. Splitting into two pools (e.g. app standard workers, build high-CPU workers) lets each one autoscale independently against its own demand.

Image compatibility matrix #

The autoscaler image must be compatible with the cluster's Kubernetes minor version. Upstream cluster-autoscaler is generally tested against its matching minor and the two adjacent ones; mismatched majors will refuse to start or silently misbehave.

The panel selects the image automatically based on the cluster's Kubernetes version. The table below is what it picks today.

Kubernetes versionRecommended imageNotes
1.30.x cluster-autoscaler v1.30 Upstream image. Stock cloud provider list.
1.31.x cluster-autoscaler v1.31 Upstream image. Stock cloud provider list.
1.32.x cluster-autoscaler v1.32 Upstream image. Stock cloud provider list.
1.33.x cluster-autoscaler v1.33 Upstream image. Stock cloud provider list.
1.34.x cluster-autoscaler-hypervisor v1.34.3 default Hypervisor.io build with the native cloud provider compiled in. Recommended.
1.35.x cluster-autoscaler-hypervisor v1.35.0 default Hypervisor.io build with the native cloud provider compiled in. Recommended.

Hypervisor.io-built images live at:

text
ghcr.io/hypervisor-io/cluster-autoscaler-hypervisor:v1.34.3
ghcr.io/hypervisor-io/cluster-autoscaler-hypervisor:v1.35.0
Why two flavours? Upstream images don't bundle the Hypervisor.io cloud provider, so 1.30-1.33 clusters use upstream + an out-of-tree integration. From 1.34 onwards, the panel switched to first-party builds with the provider compiled in for tighter integration and faster scale decisions. Both behave the same from a user's point of view.

When to override the image

Almost never. The default image is the one validated against the matching Kubernetes minor for every release. The Autoscaler tab lets you pin a specific tag if you're chasing a fix in a newer patch release, but anything outside the matrix is unsupported.

Troubleshooting #

Most autoscaler problems show up in its own logs first. Always start with:

bash
kubectl -n kube-system logs deploy/cluster-autoscaler --tail=300
kubectl -n kube-system get events --sort-by=.lastTimestamp | tail -50

Then match against the table below.

SymptomLikely causeFix
No NodeGroup for node in the logs The node either belongs to a non-autoscaling pool, or is a control plane node. CP nodes are excluded by design and this log line is harmless for them. If you see it for a worker, confirm the worker pool has autoscaling enabled and the node has the expected pool label.
Token expired / 401 from management API Older versions of the autoscaler required a manual token rotation. Current versions self-rotate at T-30d, so this is no longer a real problem. Save the args again from the Autoscaler tab; the panel will re-mount a fresh token.
Pods stuck Pending after --max-node-provision-time A new worker was requested but never became Ready. Common causes: VM provision took too long, the hypervisor is out of capacity for the requested plan, or kubelet couldn't reach the apiserver. Check the cluster's Tasks tab for the failed worker provision and the corresponding event on the worker pool.
Scale-down isn't happening One of the seven reasons in Why no scale-down?. Read the autoscaler logs - it logs the exact reason per candidate node.
Autoscaler restarts in a loop An invalid flag was passed via the Autoscaler tab (typo, removed flag in newer version), or the image isn't compatible with the cluster's Kubernetes version. Roll back the last args change. Confirm the image tag matches the matrix above. kubectl -n kube-system logs deploy/cluster-autoscaler --previous shows the crash reason.
Scale-up happens but pods still Pending The new worker's template wouldn't actually fit the pod (often due to a node-selector or taint the autoscaler didn't account for, or a pod with bigger requests than the template). Verify kubectl describe pod <name> shows the pod fits the chosen pool's worker plan, and that selectors/tolerations match the pool's labels and taints.
Two pools but autoscaler always picks the wrong one The default random expander broke a tie poorly. Switch --expander to least-waste, or use priority with a ConfigMap to define an explicit order. See Expander strategies.

Useful one-liners

bash
# Find every pod blocking scale-down via safe-to-evict=false
kubectl get pods -A -o json | jq -r '
  .items[]
  | select(.metadata.annotations["cluster-autoscaler.kubernetes.io/safe-to-evict"] == "false")
  | "\(.metadata.namespace)/\(.metadata.name)"'

# See the autoscaler's view of node groups + bounds
kubectl -n kube-system get configmap cluster-autoscaler-status -o yaml

# Watch scale decisions live
kubectl -n kube-system logs deploy/cluster-autoscaler -f | grep -E 'scale-(up|down)|Pod.*unschedulable'

Still stuck?

Ready to autoscale?

Enable autoscaling on a worker pool and let the cluster grow itself.