Request for Comment: Virtual disk configuration, PV vs. emulated, backward

Ian Campbell has proposed the following in xen-devel at http://lists.xensource.com/archives/html/xen-devel/2010-07/msg01457.html:
Currently the configuration syntax available in a domain configuration has several ways of specifying devices, some of which have slightly unexpected semantics wrt whether or not an emulated device is created, what the major number in xenstore is etc. Some also expose details of the guest OS’s choice of major number (or rather exposes Linux’s choice to all guests AFAICT).
In an attempt to clean this up, or at least make the strange behaviour more explicit, I’d like to propose some extensions to the dXpY syntax supported by libxl such that the other existing ways of specifying devices become syntactic sugar for specific well defined configurations in the new syntax, whilst preserving backwards compatibility.
I hope that the following will also form the basis for a future document (gasp!) describing the available syntax, which combinations are valid etc (unless someone can point me to an existing document I can update).
Virtual Disk Configuration
————————–
A virtual disk is defined in the guest configuration file as d<X>p<Y> where <X> is the disk number and <Y> is the partition number. In addition a number of options can be specified.
p0 indicates the entire disk.
Device number encoding in xenstore
———————————-
Given a disk specified as dXpY the device encoding used in xenstore has two potential formats, legacy and extended. Both of these are already defined and implemented in guest frontend drivers. The extended encoding is generally preferred but for backwards compatibility the legacy format must still be supported.
The legacy encoding is (major and minor 8 bits each): (major << 8) | minor
The extended encoding is (disk == 19 bits, partition == 256 bits): (1 << 28) | (disk << 8) | partition
Note that the extended encoding for d0p0..d0p255 overlaps in the minor number space with the legacy encodings of d0p0..d15p15 and therefore these must not be used simultaneously.
Configuration Options
———————
Each disk dXpY can optionally be followed by one or more of the following key value pairs (precise syntax TBD, but comma separated is common in similar situations).
Option keys and values with a _ prefix are for internal use only and are used only to provide legacy semantics for syntactic sugar and must not otherwise be used.
pv = true | false
Should a PV backend/frontend pair be created in xenstore to correspond to this device.
Default: true for HVM guests, ignored for PV guests (treated as true)
extended = true | false
Request use of extended device encoding in xenstore.
extended = false is only valid for d0..d15 (as d16+ cannot be represented in the legacy encoding)
When extended = false and in the absence of a specific _vdevice configuration option (see below) the encoding will use major==202 and minor==”(disk << 4) | partition”.
Default: false for d0p0..d0p255, false if _vdevice option present (see below), otherwise true.
emul = none | ide[01].[01] | _ide[01].[01] | …
none = No emulated device to be created.
ide[01].[01] = Emulate IDE device. First [01] => primary, secondary. Second [01] => master, slave
_ide[01].[01] = As per ide[01].[01] however emulation is enabled iff no other disk is explicitly configured with emulation.
In the future sata<X>.<Y> or similar might be added here.
Default: none HVM guests, ignored for PV guests (treated as none)
_vdevice = <N>:<M> | <Q>
Enforce use of legacy device encoding in xenstore with the given major:minor or explicit value.
Default: unset, encoding determined by “extended” option (see above)
Backward compatible disk configuration
————————————–
Given the above configuration options several short hands are defined for backwards compatibility with existing configuration files and guests.
These will be implemented by a straight textual substitution before parsing the configuration.
hda => d0p0,pv=true,emul=ide0.0,_vdevice=3:0
hdb => d1p0,pv=true,emul=ide0.1,_vdevice=3:64
hdc => d2p0,pv=true,emul=ide1.0,_vdevice=22:0
hdd => d3p0,pv=true,emul=ide1.1,_vdevice=22:64
xvda => d0p0,pv=true,emul=_ide0.0,_vdevice=202:0
xvdb => d1p0,pv=true,emul=_ide0.1,_vdevice=202:16
xvdc => d2p0,pv=true,emul=_ide1.0,_vdevice=202:32
xvdd => d3p0,pv=true,emul=_ide1.1,_vdevice=202:64
xvde => d4p0,pv=true,emul=none,_vdevice=202:80

xvdo => d15p0,pv=true,emul=none,_vdevice=202:240
xvdp => d16p0,pv=true,emul=none

xvdz => d25,pv=true,emul=none
xvda[1..15] =>
d0p[1..15],pv=true,emul=_ide0.0,_vdevice=202:[0..15] xvdb[1..15] => etc
Note that all the above are Linux (guest) specific.
The sd* syntax is not covered. It’s unclear if this is used in the wild or what the existing semantics of emul= are for SCSI devices. If someone cares to investigate the existing behaviour then it can be added.
Otherwise it is expected that additions will not be made to this set of shorthands and that new functionality (e.g. emulation types) will be available only via the explicit syntax.
(is there any non-Linux specific syntax used by other guest OSes which needs to be supported?)
Implementation notes
——————–
The behaviour specified by the emul=_ide[01].[01] syntax is currently implemented by qemu (effectively as a workaround for users forgetting to specify any emulated disks). I propose that as part of implementing this new syntax we push responsibility for these semantics up into libxl.
libxl currently uses the legacy encoding for devices specified as xvd or dXpY iff the particular configuration can be represented using the legacy format (e.g. for d0p0..d15p15 or xvda..xvdp) in order to (1) avoid the clash between the extended representation of d0p0 and the legacy representations of d1..d15 and (2) to provide compatibility with guests which do not support the extended device encoding.
The proposal above suggests instead that d1+ should be encoded using the extended format unless overridden using the extended=false option or one
of the shorthands which uses the_vdevice option. Only d0 would default to legacy encoding.
This (1) avoids the clash in minor numbers since d0 is the only disk which can clash with legacy encodings and (2) provides compatibility with old guests through their use of the xvd* syntax.

Read more