Xen 4.2 preview: xl and pci pass-through

One of the goals for the 4.2 release is for xl to have feature parity with xm for the most important functions. But along the way, we’ve also been adding a number of improvements to the interface as well. One of the ways in which xl has changed and improved the interface is in passing through pci devices directly to VMs.

A basic device pass-through review

As you may know, Xen has for several years had the ability to “pass through” a pci device to a guest, allowing that guest to control the device directly. This has several applications, including driver domains and increased performance for graphics or networking.
To pass through a device, you need to find out its BDF (Bus, Device, Function). A BDF now consists of three or four numbers in this format: DDDD:bb:dd.f, where:

  • DDDD is a 4-digit hex for the PCI domain. This is optional (if not included, it will be assumed to be 0000).
  • bb is a 2-digit hex of the PCI bus number
  • dd is a 2-digit hex of the PCI device number
  • f is a 1-digit decimal of the PCI function number

The easiest way to find out the BDF is to use lspci. For example:

# lspci
...
07:00.0 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02)
07:00.1 Ethernet controller: Intel Corporation 82575GB Gigabit Network Connection (rev 02)
...

To pass this device through to a guest, include the following line in your configuration file:

pci = [ '07:00.0' ]

If you have an operating system that responds to pci hot-plug, you can also add it to a running VM:

# xl pci-attach 07:00.0

However, before you can do that, there’s an extra step: you have to make sure that the device is designed to pciback. Before the 4.2 release, this involved either assigning the device to pciback on the Linux command-line (requiring editing your bootloader file and rebooting), or doing a lot of poking around in sysfs to un-bind the device from its current driver and re-bind it to pciback. Furthermore, the only way to know if a device was assigned to pciback was to look in sysfs.
But with the release of 4.2, this step just got a lot easier.

The pci-assignable-* commands

Starting with Xen 4.2, xl introduces commands that have to do with the “assignability” of devices. A device is considered “assignable” if and only if it is currently bound to pciback (and is thus currently able to be assigned to a domain). The three new commands introduced are as follows:

  • xl pci-assignable-add
  • xl pci-assignable-list
  • xl pci-assignable-remove [-r]

pci-assignable-add will first attempt to un-bind the device found at from its current driver (if any), and then attempt to bind it to pciback. If the device was bound to a driver, the identity of that driver will be stored, in case the admin wants to return it later. At this point, it is ready to be passed through to a guest.
pci-assignable-list will list which devices are available to be assigned to guests. This will list all devices currently assigned to pciback, whether this was done by pci-assignable-add, or by the two methods mentioned in the previous section (linux command-line or manual sysfs commands).
pci-assignable-remove will un-bind the device from pciback. If the -r option is specified, it will also attempt to rebind the device to original driver, allowing it to be used by domain 0 again.
This greatly simplifies the process of passing through a device. Once you know the BDF of a device, passing it to a guest could be as simple as:

# xl pci-assignable-add 07:00.0
# xl pci-attach [domid] 07:00.0

and returning the device to dom0 as simple as:

# xl pci-detach [domid] 07:00.0
# xl pci-assignable-remove -r 07:00.0

Caution: It should be noted that pci-assignable-add will make a device unusable by Domain 0 until it is returned with pci-assignable-remove. Care should therefore be taken not to do this on a device critical to domain 0’s operation, such as in-use storage controllers, network interfaces, or GPUs.

PCIback and the permissive option for PV guests

If you’re passing through a device to a PV guest, and you currently have to put entries in xend-pci-permissive.sxp or xend-pci-quirks.sxp, you should know that the way to do this in xl is different. Rather than having global files specifically for this purpose, xl implements only the permissive option, and it is configured in the domain config file, rather than a global file.
If you need to set permissive for a domain, you can either set it per-device

pci = [ '07:00.0,permissive=1' ]

or you can make a global default for that domain:

pci_permissive=1

You can also specify it when adding a device manually:

# xl pci-attach [domid] 07:00.0,permissive=1

An explanation of what this option does is beyond the scope of this blog entry, but you may need it if 1) you try to pass through a device to a PV domain, 2) the device doesn’t work properly, and 3) you get a message like the following in your dom0 dmesg:

pciback 0000:01:02.0: Driver tried to write to a read-only configuration space field at offset 0xe0, size 2. This may be harmless, but if you have problems with your device:
1) see permissive attribute in sysfs
2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.

Also note that if you do enable the permissive option, you will see this warning instead:

virtual-host kernel: pciback 0000:01:02.0: enabling permissive mode configuration space accesses!
virtual-host kernel: pciback 0000:01:02.0: permissive mode is potentially unsafe!

Read more