**coltonlewis.name: Partitioned PMU Part 1 [Org] All L1 (Kernel Hacker Mode) ---

Partitioned PMU Part 1

I've been working on KVM at Google for a about three years now. For most of those three years I've added support for new CPU features, written new tests, and made efficiency improvements in different parts of the kernel codebase. But for the past six months I've had the opportunity to own and develop a new feature, meaning guest VMs gain a new capability the didn't have before. It was a lot of work and going through review in on the upstream mailing lists at this moment. I am very proud of it and want to explain my work over a few posts.

Definitions

To make this series of posts more accessible to a broad range of programmers, I will begin by talking about the terms I will be using and why they matter.

Virtual Machine

A virtual machine, often abbreviated VM, may be defined as a computer that only exists in software. It's an abstract and funny thing to think about. What good is a computer that only exists in software? It might seem cool as an intellectual exercise or a toy to emulate one computer with another, but it's not immediately obvious such a thing is of practical use.

But there are a number of uses for virtual machines.

  1. Allow a single large computer to be divided up into many smaller computers.
  2. Allow different computer users to run different software with different hardware requirements.
  3. Allow the computer to run untrusted software and keep it strictly isolated from the rest of the system.

These are all things needed by cloud computing providers. They run millions of virtual machines.

Host

A host is the operating system running on the physical hardware that manages the virtual machines.

Guest

A guest is a virtual machine. Many guests may execute on one host, either taking turns or running in parallel if the host has enough CPUs to support it.

Hypervisor

A hypervisor is a program responsible for creating and managing virtual machines. Programs you might have used like VirtualBox, QEMU, and VMWare are hypervisors, but hypervisors can also run directly on hardware as part of an Operating System. Microsoft's Hyper-V and Linux's KVM are both examples of this.

KVM

The KVM (Kernel Virtual Machine) is the Linux module that provides hypervisor capabilities. You might ask why put a hypervisor in the kernel if userspace programs can be capable hypervisors. It's to run guests faster. Hypervisors have a lot of work to do to emulate an entire computer inside a computer. They can never be as fast as the host computer, but with kernel support they can be much faster than they could otherwise. Userspace hypervisors need to rely on the host operating system for a lot of services. Kernel space hypervisors are a part of the host operating system.

vCPU

A vCPU (virtual CPU) is the most fundamental data structure in KVM. It is the primary record of the CPU state of the guest and the foremost example of virtual hardware. Most KVM operations manipulate the vCPU in one way or another.

ARM

ARM, previously an acronym for Advanced RISC Machines, is a family of computer processors that share a similar instruction set and have been gaining market share over x86 based computers due to their simplified designs and better ratio of performance to energy use. Like x86, they also come in 64-bit varieties. KVM only supports ARM64, not ARM32. ARM64 was the architecture I worked on for this feature.

PMU

A PMU (performance monitoring unit) is a module in a CPU that tracks how quickly software is executing on the CPU. It does this by counting CPU events. Events can be things like cycles, cache hits, cache misses, branch predictions, and so on. Modern CPUs have hundreds of trackable events to provide insight into where the CPU is spending its time.

Any application interested in profiling software is probably making use the the PMU. If you have ever run the perf command on Linux, you have used the PMU.

To give an example of how the standard ARM PMU works, lets say you want to know how long it takes the CPU to complete a million cycles with cycle counter 0. You store -1,000,000 in counter register 0, the code for cycles in the event type register 0, enable interrupt bit 0, enable counter bit 0. Then the counter counts until it overflows. That sets overflow bit 0 and executes whatever interrupt handling code you've installed for the counter.

Problem Statement

Now that I've defined some elementary terms about the project I've been working on, I can describe what problem my feature solved. For isolation reasons, KVM uses an entirely emulated PMU for ARM and it's slow. My feature allows some of the counters to be reserved for guests, which allows guests to bypass the slow emulation and use some of the hardware functionality directly and that's a lot faster.