Re: [PATCH 7/7] KVM: VMX: Introduce test mode related to EPT violation VE
From: Sean Christopherson
Date: Fri May 17 2024 - 12:35:59 EST
On Fri, May 17, 2024, Isaku Yamahata wrote:
> On Thu, May 16, 2024 at 06:40:02PM -0700,
> Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> > On Wed, May 15, 2024, Sean Christopherson wrote:
> > > On Tue, May 07, 2024, Paolo Bonzini wrote:
> > > > @@ -5200,6 +5215,9 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> > > > if (is_invalid_opcode(intr_info))
> > > > return handle_ud(vcpu);
> > > >
> > > > + if (KVM_BUG_ON(is_ve_fault(intr_info), vcpu->kvm))
> > > > + return -EIO;
> > >
> > > I've hit this three times now when running KVM-Unit-Tests (I'm pretty sure it's
> > > the EPT test, unsurprisingly). And unless I screwed up my testing, I verified it
> > > still fires with Isaku's fix[*], though I'm suddenly having problems repro'ing.
> > >
> > > I'll update tomorrow as to whether I botched my testing of Isaku's fix, or if
> > > there's another bug lurking.
> >
> > *sigh*
> >
> > AFAICT, I'm hitting a hardware issue. The #VE occurs when the CPU does an A/D
> > assist on an entry in the L2's PML4 (L2 GPA 0x109fff8). EPT A/D bits are disabled,
> > and KVM has write-protected the GPA (hooray for shadowing EPT entries). The CPU
> > tries to write the PML4 entry to do the A/D assist and generates what appears to
> > be a spurious #VE.
> >
> > Isaku, please forward this to the necessary folks at Intel. I doubt whatever
> > is broken will block TDX, but it would be nice to get a root cause so we at least
> > know whether or not TDX is a ticking time bomb.
>
> Sure, let me forward it.
> I tested it lightly myself. but I couldn't reproduce it.
This repros on a CLX and SKX, but not my client RPL box. I verified the same
A/D-assist write-protection EPT Violation occurs on RPL, and that PROVE_VE is
enabled, so I don't think RPL is simply getting lucky.
Unless I'm missing something, this really does look like a CPU issue.