Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

From: Isaku Yamahata
Date: Fri May 17 2024 - 04:17:57 EST


On Fri, May 17, 2024 at 02:36:43PM +1200,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> On 17/05/2024 7:42 am, Isaku Yamahata wrote:
> > On Thu, May 16, 2024 at 04:36:48PM +0000,
> > "Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote:
> >
> > > On Thu, 2024-05-16 at 13:04 +0000, Huang, Kai wrote:
> > > > On Thu, 2024-05-16 at 02:57 +0000, Edgecombe, Rick P wrote:
> > > > > On Thu, 2024-05-16 at 14:07 +1200, Huang, Kai wrote:
> > > > > >
> > > > > > I meant it seems we should just strip shared bit away from the GPA in
> > > > > > handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr
> > > > > > won't have the shared bit.
> > > > > >
> > > > > > Do you see any problem of doing so?
> > > > >
> > > > > We would need to add it back in "raw_gfn" in kvm_tdp_mmu_map().
> > > >
> > > > I don't see any big difference?
> > > >
> > > > Now in this patch the raw_gfn is directly from fault->addr:
> > > >
> > > >         raw_gfn = gpa_to_gfn(fault->addr);
> > > >
> > > >         tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn+1) {
> > > >                 ...
> > > >         }
> > > >
> > > > But there's nothing wrong to get the raw_gfn from the fault->gfn.  In
> > > > fact, the zapping code just does this:
> > > >
> > > >         /*
> > > >          * start and end doesn't have GFN shared bit.  This function zaps
> > > >          * a region including alias.  Adjust shared bit of [start, end) if
> > > >          * the root is shared.
> > > >          */
> > > >         start = kvm_gfn_for_root(kvm, root, start);
> > > >         end = kvm_gfn_for_root(kvm, root, end);
> > > >
> > > > So there's nothing wrong to just do the same thing in both functions.
> > > >
> > > > The point is fault->gfn has shared bit stripped away at the beginning, and
> > > > AFAICT there's no useful reason to keep shared bit in fault->addr.  The
> > > > entire @fault is a temporary structure on the stack during fault handling
> > > > anyway.
> > >
> > > I would like to avoid code churn at this point if there is not a real clear
> > > benefit. >>
> > > One small benefit of keeping the shared bit in the fault->addr is that it is
> > > sort of consistent with how that field is used in other scenarios in KVM. In
> > > shadow paging it's not even the GPA. So it is simply the "fault address" and has
> > > to be interpreted in different ways in the fault handler. For TDX the fault
> > > address *does* include the shared bit. And the EPT needs to be faulted in at
> > > that address.
>
> It's about how we want to define the semantic of fault->addr (forget about
> shadow MMU because for it fault->addr has different meaning from TDP):
>
> 1) It represents the faulting address which points to the actual guest
> memory (has no shared bit).
>
> 2) It represents the faulting address which is truly used as the hardware
> page table walk.
>
> The fault->gfn always represents the location of actual guest memory (w/o
> shared bit) in both cases.
>
> I originally thought 2) isn't consistent for both SNP and TDX, but thinking
> more, I now think actually both the two semantics are consistent for SNP and
> TDX, which is important in order to avoid confusion.
>
> Anyway it's a trivial because fault->addr is only used for fault handling
> path.
>
> So yes fine to me we choose to use 2) here. But IMHO we should explicitly
> add a comment to 'struct kvm_page_fault' that the @addr represents 2).

Ok. I'm fine with 2).


> And I think more important thing is how we handle this "gfn" and "raw_gfn"
> in tdp_iter and 'struct kvm_mmu_page'. See below.
>
> > >
> > > If we strip the shared bit when setting fault->addr we have to reconstruct it
> > > when we do the actual shared mapping. There is no way around that. Which helper
> > > does it, isn't important I think. Doing the reconstruction inside
> > > tdp_mmu_for_each_pte() could be neat, except that it doesn't know about the
> > > shared bit position.
> > >
> > > The zapping code's use of kvm_gfn_for_root() is different because the gfn comes
> > > without the shared bit. It's not stripped and then added back. Those are
> > > operations that target GFNs really.
> > >
> > > I think the real problem is that we are gleaning whether the fault is to private
> > > or shared memory from different things. Sometimes from fault->is_private,
> > > sometimes the presence of the shared bits, and sometimes the role bit. I think
> > > this is confusing, doubly so because we are using some of these things to infer
> > > unrelated things (mirrored vs private).
> >
> > It's confusing we don't check it in uniform way.
> >
> >
> > > My guess is that you have noticed this and somehow zeroed in on the shared_mask.
> > > I think we should straighten out the mirrored/private semantics and see what the
> > > results look like. How does that sound to you?
> >
> > I had closer look of the related code. I think we can (mostly) uniformly use
> > gpa/gfn without shared mask. Here is the proposal. We need a real patch to see
> > how the outcome looks like anyway. I think this is like what Kai is thinking
> > about.
> >
> >
> > - rename role.is_private => role.is_mirrored_pt
> >
> > - sp->gfn: gfn without shared bit.
>
> I think we can treat 'tdp_iter' and 'struct kvm_mmu_page' in the same way,
> because conceptually they both reflects the page table.

Agreed that iter->gfn and sp->gfn should be in same way.


> So I think both of them can have "gfn" or "raw_gfn", and "shared_gfn_mask".
>
> Or have both "raw_gfn" or "gfn" but w/o "shared_gfn_mask". This may be more
> straightforward because we can just use them when needed w/o needing to play
> with gfn_shared_mask.
>
> >
> > - fault->address: without gfn_shared_mask
> > Actually it doesn't matter much. We can use gpa with gfn_shared_mask.
>
> See above. I think it makes sense to include the shared bit here. It's
> trivial anyway though.

Ok, let's make fault->addr include shared mask.


> > - Update struct tdp_iter
> > struct tdp_iter
> > gfn: gfn without shared bit
>
> Or "raw_gfn"?
>
> Which may be more straightforward because it can be just from:
>
> raw_gfn = gpa_to_gfn(fault->addr);
> gfn = fault->gfn;
>
> tdp_mmu_for_each_pte(..., raw_gfn, raw_gfn + 1, gfn)
>
> Which is the reason to make fault->addr include the shared bit AFAICT.

If we can eliminate raw_gfn and kvm_gfn_for_root(), it's better.


> >
> > /* Add new members */
> >
> > /* Indicates which PT to walk. */
> > bool mirrored_pt;
>
> I don't think you need this? It's only used to select the root for page
> table walk. Once it's done, we already have the @sptep to operate on.
>
> And I think you can just get @mirrored_pt from the sptep:
>
> mirrored_pt = sptep_to_sp(sptep)->role.mirrored_pt;
>
> Instead, I think we should keep the @is_private to indicate whether the GFN
> is private or not, which should be distinguished with 'mirrored_pt', which
> the root page table (and the @sptep) already reflects.
>
> Of course if the @root/@sptep is mirrored_pt, the is_private should be
> always true, like:
>
> WARN_ON_ONCE(sptep_to_sp(sptep)->role.is_mirrored_pt
> && !is_private);
>
> Am I missing anything?

You said it not correct to use role. So I tried to find a way to pass down
is_mirrored and avoid to use role.

Did you change your mind? or you're fine with new name is_mirrored?

https://lore.kernel.org/kvm/4ba18e4e-5971-4683-82eb-63c985e98e6b@xxxxxxxxx/
> I don't think using kvm_mmu_page.role is correct.



> >
> > // This is used tdp_iter_refresh_sptep()
> > // shared gfn_mask if mirrored_pt
> > // 0 if !mirrored_pt
> > gfn_shared_mask >
> > - Pass mirrored_pt and gfn_shared_mask to
> > tdp_iter_start(..., mirrored_pt, gfn_shared_mask)
>
> As mentioned above, I am not sure whether we need @mirrored_pt, because it
> already have the @root. Instead we should pass is_private, which indicates
> the GFN is private.

If we can use role, we don't need iter.mirrored_pt isn't needed.


> > - trace point: update to include mirroredd_pt. Or Leave it as is for now.
> >
> > - pr_err() that log gfn in handle_changed_spte()
> > Update to include mirrored_pt. Or Leave it as is for now.
> >
> > - Update spte handler (handle_changed_spte(), handle_removed_pt()...),
> > use iter->mirror_pt or pass down mirror_pt.
> >
>
> IIUC only sp->role.is_mirrored_pt is needed, tdp_iter->is_mirrored_pt isn't
> necessary. But when the @sp is created, we need to initialize whether it is
> mirrored_pt.
>
> Am I missing anything?

Because you didn't like to use role, I tried to find other way.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>