**coltonlewis.name: Partitioned PMU Part 4 [Org] All L1 (Kernel Hacker Mode) ---

Partitioned PMU Part 4

After the driver was fully modified to respect the guest partition and not use any of the counters, KVM could start doing interesting stuff.

FGT

Now KVM can start using FGT to untrap specific registers. By default everything is trapped, so these register writes won't actually affect guest VMs for several more patches.

The registers essential to the function of the PMU are the following

  • PMEVCNTRn_EL0: A general purpose counter register, where n is a number 0-30
  • PMEVTYPERn_EL0: A general purpose event type register, used to program which event the corresponding counter counts and which exception levels it counts at.
  • PMOVSSET_EL0 and PMOVSCLR_EL0: Two names for the bit mask register tracking counter overflow with different behavior.
  • PMCNTENSET_EL0 and PMCNTENCLR_EL0: Two names for the bit mask register controlling whether the individual counters are enabled.
  • PMINTENSET_EL1 and PMINTENCLR_EL1: Two names for the bit mask register controlling whether the individual counters trigger and interrupt when they overflow.
  • PMCR_EL0: A global information register for the PMU, containing things like the global enable bit and how many counters are implemented on the system.

I will elaborate on the registers with multiple names for the same bit mask. A 1 bit written to the SET register sets the underlying bit and a 1 bit written to the CLR register clears the underlying bit. 0s are ignored. This interface is designed so it is easier to modify individual bits with a single write instruction rather than needing a read-modify-write cycle every time.

There are other registers too but they are mostly special cases of or alternate ways to access the above.

At this point, KVM is still setting a register field MDCR_EL2.TPM which is a blanket flag to trap all PMU registers, so the changes to FGT won't affect guests for a few more patches.

In any case, I could untrap all of the above but the PMEVTYPER and PMOVS registers. The PMEVTYPER needs to remain trapped to enforce the KVM event filter which allows host software to prevent guests from running certain events and also prevent information leakage from the host to the guest by allowing the guest to request event counting at the KVM level. The PMOVS register needs to remain trapped so I can present the guest a different view about the overflow flag than hardware says. The reason a different view is needed is so the host PMU driver can handle the interrupt by clearing the overflow flag but allow the guest's virtual view of the overflow flag to remain set so the guest has knowledge of which counter overflowed.

Trapped Register Write-throughs

The next couple of commits add write-through on the registers that remain trapped. With the old emulated PMU implementation, it was OK for PMU register writes to only update the virtual registers in the VCPU structure which would later be used to create host events and hand off managing the hardware to the host driver. Not so any more.

If the intent is to let the guest drive the hardware, then hardware writes have to take effect uniformly and immediately. That means adding write-through semantics to the register trap handlers after enforcing what they need to enforce. For the PMEVTYPER registers that means an event filter. For the PMOVS registers we don't need to do anything special.

Entering Guest Setup

This is where the magic happens and all the changes described above actually apply to guests. That MDCR_EL2 register is set on every entry into a guest defining some important functions of guest execution. I want to draw attention to the fact that as an EL2 register only KVM has access. Anyway, this commit changes the setup of this register on guest entry. It unsets the TPM flag if we have a partitioned PMU so the fine grained traps take effect and calculates an appropriate value for the HPMN field which defines the partition point between the host and guest.

An important nuance to understand is that just because we set a number of guest counters for the host driver to respect, that is not automatically the number of counters the guest will get. The host VMM may decide it wants to restrict the number of counters the guest gets even further and choose an even smaller number of counters.

Something that was especially annoying is that KVM allows this to happen through writes to the field PMCR_EL0.N defining the number of counters available to the system even though the ARM manual specifies that field is read-only. This was an annoying thing to figure out how to support given I untrapped PMCR_EL0, but I did it.