[RFC PATCH v3 00/21] Cleaning up the KVM clock mess
From: David Woodhouse
Date: Tue May 21 2024 - 20:23:57 EST
Clean up the KVM clock mess somewhat so that it is either based on the guest
TSC ("master clock" mode), or on the host CLOCK_MONOTONIC_RAW in cases where
the TSC isn't usable.
Eliminate the third variant where it was based directly on the *host* TSC,
due to bugs in e.g. __get_kvmclock().
Kill off the last vestiges of the KVM clock being based on CLOCK_MONOTONIC
instead of CLOCK_MONOTONIC_RAW and thus being subject to NTP skew.
Fix up migration support to allow the KVM clock to be saved/restored as an
arithmetic function of the guest TSC, since that's what it actually is in
the *common* case so it can be migrated precisely. Or at least to within
±1 ns which is good enough, as discussed in
https://lore.kernel.org/kvm/c8dca08bf848e663f192de6705bf04aa3966e856.camel@xxxxxxxxxxxxx
In v2 of this series, TSC synchronization is improved and simplified a bit
too, and we allow masterclock mode to be used even when the guest TSCs are
out of sync, as long as they're running at the same *rate*. The different
*offset* shouldn't matter.
And the kvm_get_time_scale() function annoyed me by being entirely opaque,
so I studied it until my brain hurt and then added some comments.
In v2 I also dropped the commits which were removing the periodic clock
syncs. In v3 I put them back again but *only* for the non-masterclock
mode, along with cleaning up some other gratuitous clock jumps while in
masterclock mode. And Jack's patch to move the pvclock structure to uapi.
I also fixed the bug pointed out by Chenyi Qiang, that I was failing to
set vcpu->arch.this_tsc_{nsec,write} after removing the cur_tsc_* fields.
I also included patches to fix advertised steal time going backwards, and
to make the guest more resilient to it. Those may end up being split out
and submitted under separate cover (with selftests).
Still needs more comprehensive selftests.
David Woodhouse (18):
KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force
KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC
KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration
KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host
KVM: x86: Fix KVM clock precision in __get_kvmclock()
KVM: x86: Fix software TSC upscaling in kvm_update_guest_time()
KVM: x86: Simplify and comment kvm_get_time_scale()
KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset()
KVM: x86: Improve synchronization in kvm_synchronize_tsc()
KVM: x86: Kill cur_tsc_{nsec,offset,write} fields
KVM: x86: Allow KVM master clock mode when TSCs are offset from each other
KVM: x86: Factor out kvm_use_master_clock()
KVM: x86: Avoid global clock update on setting KVM clock MSR
KVM: x86: Avoid gratuitous global clock reload in kvm_arch_vcpu_load()
KVM: x86: Avoid periodic KVM clock updates in master clock mode
KVM: x86/xen: Prevent runstate times from becoming negative
sched/cputime: Cope with steal time going backwards or negative
Jack Allister (3):
KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
KVM: selftests: Add KVM/PV clock selftest to prove timer correction
Documentation/virt/kvm/api.rst | 37 ++
Documentation/virt/kvm/devices/vcpu.rst | 115 +++-
arch/x86/include/asm/kvm_host.h | 15 +-
arch/x86/include/uapi/asm/kvm.h | 6 +
arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 24 +-
arch/x86/kvm/svm/svm.c | 3 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 716 +++++++++++++++-------
arch/x86/kvm/xen.c | 22 +-
include/uapi/linux/kvm.h | 3 +
kernel/sched/cputime.c | 20 +-
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/x86_64/pvclock_test.c | 192 ++++++
13 files changed, 884 insertions(+), 272 deletions(-)