Kubernetes Node Pools

Every cluster on Hypervisor.io has at least one node pool (the default pool created with the cluster) and can have any number of additional pools added later. This page covers what pools are for, how to add and configure them, how to schedule pods onto specific pools, and how the autoscaler handles them.

Overview #

A node pool is a group of worker nodes inside a Kubernetes cluster that all share:

The same instance plan - so CPU, RAM, storage, and hourly cost are uniform within the pool.
The same set of Kubernetes labels - applied to every node at boot.
The same set of taints - so pods need a matching toleration to land here.
The same autoscaling bounds (min / max) and rate limits.
The same drain policy on scale-down or node removal.

One autoscaler, many pools. A single cluster-autoscaler Deployment manages every autoscaling pool in the cluster. Each pool shows up as its own node group with its own bounds. See the Cluster Autoscaler page for tuning.

What a pool is good for

Mixed workload shapes. Cache pods want 64 GiB RAM, batch jobs want 32 vCPUs, model serving wants a GPU. One plan can't fit all three; one pool per shape can.
Taint-based isolation. Keep noisy or expensive workloads off the nodes that run ingress or system pods.
Per-workload scaling profile. Batch nodes tolerate dense packing and slow reclaim; latency-sensitive nodes want fast reclaim. Per-pool autoscaler tuning lets you do both in one cluster.
Different fault domains. A pool can target a different hypervisor group for blast-radius isolation.

When to use multiple pools #

A single pool with autoscaling is enough for most clusters. Reach for multiple pools when one of these applies:

Your workloads need different node sizes (e.g. small for ingress, big for caches, GPU for ML).
You need to guarantee separation between two workload classes via taints + tolerations.
You want one pool to scale aggressively for batch bursts while another stays small and steady.
You want to scale a pool to zero between jobs without impacting the rest of the cluster.

If none of those apply, a single default pool with autoscaling on is the right starting point.

The default pool #

Every cluster has exactly one default pool. It is created automatically when the cluster is created and shows up in the Pools tab marked with a Default badge.

You can:

Rename it. The default flag stays.
Edit its plan, size, labels, taints, and policies like any other pool.
Reassign the default flag by editing another pool and ticking Make default. The previous default becomes a regular pool. There is always exactly one default at a time.

You cannot delete the default pool directly. If you want to remove it, first reassign the default flag to another pool, then delete the old one.

Default pool semantics. The default pool is the one the autoscaler falls back to when a scaling request doesn't name a specific pool, and the one used by legacy API endpoints that predate multi-pool clusters.

Adding a pool #

Open your cluster page.
Switch to the Pools tab.
Click Add Pool.
Fill in the form (fields described below).
Click Create.

The new pool starts at its min size. If autoscaling is on, the cluster scales up to min immediately; if autoscaling is off, nothing happens until you scale the pool manually.

Field reference #

Field	What it means
Name	Short label for the pool. Lowercase letters, numbers, and dashes. Used as a Kubernetes label and in node names.
Plan	The instance plan that defines CPU, RAM, storage, and price for every node in this pool.
Min size	The lowest number of nodes the pool will keep, even when idle. Set to `0` to let the pool drain fully when not in use.
Max size	The highest number of nodes the pool can grow to. The autoscaler refuses to scale past this.
Autoscaling	Toggle. When on, the cluster autoscaler can grow and shrink this pool within the bounds above. When off, the pool stays at whatever size you set manually.
Labels	Kubernetes labels applied to every node in the pool. Use these as `nodeSelector` targets on your pods.
Taints	Kubernetes taints applied to every node in the pool. Pods need a matching toleration to land here.
Drain timeout	How long to wait before force-killing pods during scale-down or node removal. Default 5 minutes.
Drain grace period	How long the kubelet gives each pod to shut down cleanly before killing it.
Ignore DaemonSets	Skip DaemonSet pods when deciding if a node is safe to remove. Usually on.
Delete emptyDir data	Allow draining pods that have an `emptyDir` volume. Off by default to avoid losing data.

Advanced & rate limits #

Visible by expanding the advanced section of the pool form. Defaults are conservative and most clusters never hit them.

Field	What it means
Max surge per period	Cap on how many nodes can be created inside one rolling window. Avoids stampedes.
Max unavailable per period	Cap on how many nodes can be removed inside one rolling window. Protects in-flight workloads.
Scale period	Length of the rolling window for the two caps above.
Cooldown after scale up	Idle gap the autoscaler waits after a scale-up before considering another scale-up.
Cooldown after scale down	Idle gap the autoscaler waits after a scale-down before considering another scale-down.

The limits exist to avoid two failure modes:

Stampedes - a sudden burst of Pending pods triggering the autoscaler to ask for fifty nodes at once and overwhelming the hypervisor.
Capacity flapping - rapid alternating scale-ups and scale-downs that churn billing without doing useful work.

If you have a pool that needs to scale fast (for example, a batch pool that processes a daily queue at 09:00), raise max surge per period and shorten scale period. If you have a pool that needs to be slow and steady (for example, a stateful pool that takes a long time to drain), lower max unavailable per period.

Rate-limited, not dropped. When a rate limit is hit, the autoscaler queues the rest of the request and retries on its next cycle. Nothing is lost; the work just paces.

Sending pods to a specific pool #

Use Kubernetes scheduling fields on your pod spec. The pool's labels and taints are what you match against.

Example: a label-only pool

Pool config: label workload=memory, no taint.

Deployment that requests this pool:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
spec:
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      nodeSelector:
        workload: memory
      containers:
      - name: redis
        image: redis:7

Pods land only on nodes in this pool. Other workloads can also land here (no taint to keep them out).

Example: a tainted GPU pool

Pool config: label accelerator=gpu, taint nvidia.com/gpu=present:NoSchedule.

Deployment that requests this pool:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inference
spec:
  replicas: 2
  selector:
    matchLabels:
      app: inference
  template:
    metadata:
      labels:
        app: inference
    spec:
      nodeSelector:
        accelerator: gpu
      tolerations:
      - key: nvidia.com/gpu
        operator: Equal
        value: present
        effect: NoSchedule
      containers:
      - name: server
        image: my-org/inference:latest
        resources:
          limits:
            nvidia.com/gpu: 1

The toleration lets the pod schedule on tainted GPU nodes; the nodeSelector keeps it there. Workloads without the toleration cannot land on GPU nodes, so the GPU pool is reserved for pods that actually want a GPU.

Built-in node labels #

The cluster automatically attaches these labels to every node, in addition to any labels you set on the pool. You can use them as nodeSelector targets without configuring them explicitly.

Label	Value
`hypervisor.io/pool-name`	The pool name
`hypervisor.io/pool-id`	The pool's UUID
`topology.kubernetes.io/region`	The hypervisor group region slug

Scale down behaviour #

When the autoscaler decides a node in a pool is no longer needed, the cluster does the following:

Cordon the node so no new pods land on it.
Drain the pods according to the pool's drain policy (grace period, ignore-DaemonSets flag, emptyDir flag).
If drain succeeds within the drain timeout, destroy the underlying VM.
If drain fails or times out, leave the node marked and retry on the next cycle.

The cluster will never drain so many nodes at once that it leaves zero workers. If a scale-down would remove the last remaining worker, that node is exempted until at least one other worker exists.

Min size is a hard floor. The autoscaler never scales a pool below its configured min, even if every node on it is empty. To let a pool drain to zero, set min: 0 and make sure no critical workload pins itself to that pool.

Deleting a pool #

Schedule deletion

The standard path. Click Delete on the pool in the Pools tab. The pool's nodes are cordoned and drained according to the drain policy, then destroyed. Rate limits apply, so a large pool may take a few cycles to fully drain.

While deletion is in progress the pool stays visible in the Pools tab with a Deleting status. New pods that would have scheduled here go to other pools (assuming their selectors and tolerations match).

Delete now (admin only)

Admins can bypass the drain and rate limits using Delete Now on the admin panel. This destroys all the pool's VMs immediately. Use only when the pool is already broken (for example, every node is stuck in NotReady and a graceful drain will never succeed). Pods running on the pool's nodes are killed without a grace period.

Default pool cannot be deleted. Reassign the default flag to another pool first, then delete the old default like any other pool.

Troubleshooting #

Symptom	Likely cause	Fix
Pods stuck Pending even though the pool's `max` isn't hit	Pod's `nodeSelector` / tolerations don't match any pool, or the pool's template wouldn't fit the pod's requests.	`kubectl describe pod <name>` shows scheduler events. Verify the pool's labels match your `nodeSelector` and the worker plan has enough CPU / RAM.
Pool stays at `min` even when no pods need it	Working as intended. `min` is a floor.	Lower `min` if you want the pool to drain further. Set `min: 0` to allow full reclaim.
Pool scale-up adds nodes but pods still don't schedule	The new node's labels / taints don't match the pod's selectors, or the pod has bigger requests than the template node.	Check pool labels and taints in the Pools tab. Increase the pool's plan size, or pick a different pool.
Scale-down stalls on one node	A pod with a strict `PodDisruptionBudget` or `safe-to-evict: false` annotation is pinned to the node.	Either relax the PDB, scale the blocking workload temporarily, or set Drain timeout higher on the pool.
Cannot delete the default pool	Default pools are protected.	Edit another pool, tick Make default, then delete the old default.
Two pools, autoscaler always picks the same one	Default `random` expander broke the tie one-sided.	Switch the cluster autoscaler's `--expander` flag - see the Expander strategies section.

Ready to mix node shapes?

Add a pool from the cluster's Pools tab and let the scheduler do the rest.

Back to Kubernetes Cluster Autoscaler LoadBalancer Annotations

Mix node shapes insideone cluster

Kubernetes Node Pools

Overview #

What a pool is good for

When to use multiple pools #

The default pool #

Adding a pool #

Field reference #

Advanced & rate limits #

Sending pods to a specific pool #

Example: a label-only pool

Example: a tainted GPU pool

Built-in node labels #

Scale down behaviour #

Deleting a pool #

Schedule deletion

Delete now (admin only)

Troubleshooting #

Ready to mix node shapes?

Mix node shapes inside
one cluster