Xen 4.2: cpupools

Among the more unique features of Xen 4.2 is a feature called cpupools, designed and implemented by Jürgen Groß at Fujitsu. At its core it’s a simple idea, but one that allows it to be a flexible and powerful solution to a number of different problems.
The core idea behind cpupools is to divide the physical cores on the machine into different pools. Each of these pools has an entirely separate cpu scheduler, and can be set with different scheduling parameters. At any time, a given logical cpu can be assigned to only one of these pools (or none). A VM is assigned to one pool at a time, but can be moved from pool to pool.
There are a number of things one can do with this functionality. Suppose you are a hosting or cloud provider, and you have a number of customers who have multiple VMs with you. Instead of selling based on CPU metering, you want to sell access to a fixed number of cpus for all of their VMs: e.g. a customer with 6 single-vcpu VMs might buy 2 cores worth of computing space which all of the VMs share.
You could solve this problem by using cpu masks to pin all of the customer’s vcpus to a single set of cores. However, cpu masks do not work well with the scheduler’s weight algorithm — the customer wont’ be able to specify that VM A should get twice the cpu as VM B. Solving the weight issue in a general way is very difficult, since VMs can have any combination of overlapping cpu masks. Furthermore, this extra complication would be there for all users of the credit algorithm, regardless of whether they use this particular mode or not.

With cpu pools, you simply create a pool for each customer, assign it the number of cpus that customer is paying for, and then put all of that customer’s VMs in the pool. That pool has its own complete cpu scheduler; and as far as that pool’s scheduler is concerned, the only cpus in existence are the one inside the pool. This means all of the algorithms regarding weight and so on work exactly the same, just on a restricted set of cpus.
Additionally, this means that each customer can request different scheduling parameters for their VMs (for example, the timeslice or ratelimit parameters we talked about last week), or even completely different schedulers, including the experimental credit2 scheduler, and the real-time SEDF scheduler.
Cpupools have the potential to increase security as well: they limit the interaction between different customers to physically separate cpus. Sometimes information about cryptographic keys can be pieced together just by knowing cache patterns or the amount of time spent on certain operations; having VMs from different customers run on phsyically separate cpus removes this vector of attack with very little effort.
Of course, all of the above can be useful even if you’re not a cloud provider: your realtime workloads can run in a pool with the SEDF scheduler, your latency sensitive workloads can run in a pool with a short timeslice, and your number-crunching workloads can run in a pool with a really long timeslice.
One of the particulary convenient commands that Jürgen implemented is the cpupool-numa-split command. This command will automatically detect the NUMA topology of the box you’re on, create a single pool for each NUMA node, and put all of the cpus in the corresponding pool. Then when you create VMs, you specify the pool you wish them created in, and all of the memory allocated will be local NUMA accesses.
The details of the interface for cpupools is still undergoing some cleaning up in the last few weeks before the 4.2 release, so I don’t want to go into details. There will be an introduction with examples on the Xen.org wiki page before the release, as well as documentation in the man pages and in the command-line help.

Read more