Xen 3.3 Feature: Shadow 3

Shadow 3 is the next step in the evolution of the shadow pagetable code.  By making the shadow pagetables behave more like a TLB, we take advantage of guest operating system TLB behavior to reduce and coalesce the number of guest pagetable changes that the hypervisor has to translate to the shadow pagetables.  This can dramatically reduce the virtualization overhead for HVM guests.
Shadow paging overhead is one of the largest source of cpu virtualization overhead for HVM guests.  Because HVM guest operating systems don’t know the physical frame numbers of the pages assigned to them, they use guest frame numbers instead.  This requires the hypervisor to translate each guest frame numbers into machine frames in the shadow pagetables before they can be used by the guest.
Those who have been around awhile may remember the Shadow-1 code.  Its method of propagating changes from guest pagetables to the shadow pagetables was as follows:

  • Remove write access to any guest pagetable.
  • When a guest attempts to write to the guest pagetable, mark it out-of-sync, add the page to the out-of-sync list and give write permission.
  • On the next page fault or cr3 write, take each page from the out-of-sync list and:
    • resync the page: look for changes to the guest pagetable, propagate those entries into the shadow pagetable
    • remove write permission, and clear the out-of-sync bit.

While this method worked so-so for Linux, it was disastrous for Windows.  Windows heavily uses a technique called demand-paging.  Resyncing a guest page is an expensive operation, and under Shadow-1, every time a page was faulted in would cause an out-of-sync, write, and a resync.
The next step, Shadow-2, (among many other things) did away with the out-of-sync mechanism and instead emulated every write to guest pagetables.  Emulation avoids the expensive unsync-resync cycle for demand paging.  However, it removes any “batching” effects: every write is immediately reflected in the shadow pagetables, even though the guest operating system may not have been expecting the address change to be available until later.
Furthermore, Windows will frequently write “transition values” into pagetable entries when a page is being mapped in or mapped out.  The cycle for demand-faulting zero pages in 32-bit Windows looks like:

  • Guest process page faults
  • Write transition PTE
  • Write real PTE
  • Guest process accesses page

On bare hardware, this looks like “Page fault / memory write / memory write”.  Memory writes are relatively inexpensive.  But in Shadow-2, this looks like:

  • Page fault
  • Emulated write
  • Emulated write

Each emulated write involves a VMEXIT/VMENTER as well as about 8000 cycles of emulation inside the hypervisor, much more expensive than a mere memory write.
Shadow-3 brings back the out-of-sync mechanism, but with some key changes.  First, only L1 pagetables are allowed to go out-of-sync.  All L2+ pagetables are emulated.  Secondly, we don’t necessarily resync on the next page fault.  One of the things this enables is to do a “lazy pull-through”: if we get a page fault where the shadow is not-present but the guest is present, we can simply propagate that entry to the shadows, and return to the guest, leaving the rest of the page out-of-sync.   This means that once a page is out-of-sync, demand-faulting looks like this:

  • Page fault
  • Memory write
  • Memory write
  • Propagate guest entry to shadows

Pulling through a single guest value is actually cheaper than emulation.  So for demand-paging under Windows, we have 1/3 fewer trips into the hypervisor.  Furthermore, batch updates, like process destruction or mapping large address spaces, are propagated to the shadows in a batch at the next CR3 switch, rather than going into and out of the hypervisor on each individual write.
All of this adds up to greatly improved performance for workloads like compilation, compression, databases, and any workload which does a lot of memory management in an HVM guest.

Read more

Welcome Honda to the Xen Project Board
12/09/2024

We're excited to announce our newest Advisory Board Member Honda, to Xen Project. Since its foundation, Honda has been committed to "creating a society that is useful to people" by utilizing its technologies and ideas. Honda also focuses on environmental responsiveness and traffic safety, and continue

Say hello to our new website
12/05/2024

Hello Xen Community, You may have noticed something different... We've refreshed our existing website! Why did we do this? Well, all these new changes are part of an ongoing effort to increase our visibility and make it easier to find information on pages. We know how important it

Xen Project Announces Performance and Security Advancements with Release of 4.19
08/05/2024

New release marks significant enhancements in performance, security, and versatility across various architectures.  SAN FRANCISCO – July 31st, 2024 – The Xen Project, an open source project under the Linux Foundation, is proud to announce the release of Xen Project 4.19. This release marks a significant milestone in enhancing performance, security,

Upcoming Closure of Xen Project Colo Facility
07/10/2024

Dear Xen Community, We regret to inform you that the Xen Project is currently experiencing unexpected changes due to the sudden shutdown of our colocated (colo) data center facility by Synoptek. This incident is beyond our control and will impact the continuity of OSSTest (the gating Xen Project CI loop)