Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

From: Huang, Kai
Date: Thu May 16 2024 - 22:37:11 EST




On 17/05/2024 7:42 am, Isaku Yamahata wrote:
On Thu, May 16, 2024 at 04:36:48PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote:

On Thu, 2024-05-16 at 13:04 +0000, Huang, Kai wrote:
On Thu, 2024-05-16 at 02:57 +0000, Edgecombe, Rick P wrote:
On Thu, 2024-05-16 at 14:07 +1200, Huang, Kai wrote:

I meant it seems we should just strip shared bit away from the GPA in
handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr
won't have the shared bit.

Do you see any problem of doing so?

We would need to add it back in "raw_gfn" in kvm_tdp_mmu_map().

I don't see any big difference?

Now in this patch the raw_gfn is directly from fault->addr:

        raw_gfn = gpa_to_gfn(fault->addr);

        tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn+1) {
                ...
        }

But there's nothing wrong to get the raw_gfn from the fault->gfn.  In
fact, the zapping code just does this:

        /*
         * start and end doesn't have GFN shared bit.  This function zaps
         * a region including alias.  Adjust shared bit of [start, end) if
         * the root is shared.
         */
        start = kvm_gfn_for_root(kvm, root, start);
        end = kvm_gfn_for_root(kvm, root, end);

So there's nothing wrong to just do the same thing in both functions.

The point is fault->gfn has shared bit stripped away at the beginning, and
AFAICT there's no useful reason to keep shared bit in fault->addr.  The
entire @fault is a temporary structure on the stack during fault handling
anyway.

I would like to avoid code churn at this point if there is not a real clear
benefit. >>
One small benefit of keeping the shared bit in the fault->addr is that it is
sort of consistent with how that field is used in other scenarios in KVM. In
shadow paging it's not even the GPA. So it is simply the "fault address" and has
to be interpreted in different ways in the fault handler. For TDX the fault
address *does* include the shared bit. And the EPT needs to be faulted in at
that address.

It's about how we want to define the semantic of fault->addr (forget about shadow MMU because for it fault->addr has different meaning from TDP):

1) It represents the faulting address which points to the actual guest memory (has no shared bit).

2) It represents the faulting address which is truly used as the hardware page table walk.

The fault->gfn always represents the location of actual guest memory (w/o shared bit) in both cases.

I originally thought 2) isn't consistent for both SNP and TDX, but thinking more, I now think actually both the two semantics are consistent for SNP and TDX, which is important in order to avoid confusion.

Anyway it's a trivial because fault->addr is only used for fault handling path.

So yes fine to me we choose to use 2) here. But IMHO we should explicitly add a comment to 'struct kvm_page_fault' that the @addr represents 2).

And I think more important thing is how we handle this "gfn" and "raw_gfn" in tdp_iter and 'struct kvm_mmu_page'. See below.


If we strip the shared bit when setting fault->addr we have to reconstruct it
when we do the actual shared mapping. There is no way around that. Which helper
does it, isn't important I think. Doing the reconstruction inside
tdp_mmu_for_each_pte() could be neat, except that it doesn't know about the
shared bit position.

The zapping code's use of kvm_gfn_for_root() is different because the gfn comes
without the shared bit. It's not stripped and then added back. Those are
operations that target GFNs really.

I think the real problem is that we are gleaning whether the fault is to private
or shared memory from different things. Sometimes from fault->is_private,
sometimes the presence of the shared bits, and sometimes the role bit. I think
this is confusing, doubly so because we are using some of these things to infer
unrelated things (mirrored vs private).

It's confusing we don't check it in uniform way.


My guess is that you have noticed this and somehow zeroed in on the shared_mask.
I think we should straighten out the mirrored/private semantics and see what the
results look like. How does that sound to you?

I had closer look of the related code. I think we can (mostly) uniformly use
gpa/gfn without shared mask. Here is the proposal. We need a real patch to see
how the outcome looks like anyway. I think this is like what Kai is thinking
about.


- rename role.is_private => role.is_mirrored_pt

- sp->gfn: gfn without shared bit.

I think we can treat 'tdp_iter' and 'struct kvm_mmu_page' in the same way, because conceptually they both reflects the page table.

So I think both of them can have "gfn" or "raw_gfn", and "shared_gfn_mask".

Or have both "raw_gfn" or "gfn" but w/o "shared_gfn_mask". This may be more straightforward because we can just use them when needed w/o needing to play with gfn_shared_mask.


- fault->address: without gfn_shared_mask
Actually it doesn't matter much. We can use gpa with gfn_shared_mask.

See above. I think it makes sense to include the shared bit here. It's trivial anyway though.


- Update struct tdp_iter
struct tdp_iter
gfn: gfn without shared bit

Or "raw_gfn"?

Which may be more straightforward because it can be just from:

raw_gfn = gpa_to_gfn(fault->addr);
gfn = fault->gfn;

tdp_mmu_for_each_pte(..., raw_gfn, raw_gfn + 1, gfn)

Which is the reason to make fault->addr include the shared bit AFAICT.


/* Add new members */

/* Indicates which PT to walk. */
bool mirrored_pt;

I don't think you need this? It's only used to select the root for page table walk. Once it's done, we already have the @sptep to operate on.

And I think you can just get @mirrored_pt from the sptep:

mirrored_pt = sptep_to_sp(sptep)->role.mirrored_pt;

Instead, I think we should keep the @is_private to indicate whether the GFN is private or not, which should be distinguished with 'mirrored_pt', which the root page table (and the @sptep) already reflects.

Of course if the @root/@sptep is mirrored_pt, the is_private should be always true, like:

WARN_ON_ONCE(sptep_to_sp(sptep)->role.is_mirrored_pt
&& !is_private);

Am I missing anything?


// This is used tdp_iter_refresh_sptep()
// shared gfn_mask if mirrored_pt
// 0 if !mirrored_pt
gfn_shared_mask >
- Pass mirrored_pt and gfn_shared_mask to
tdp_iter_start(..., mirrored_pt, gfn_shared_mask)

As mentioned above, I am not sure whether we need @mirrored_pt, because it already have the @root. Instead we should pass is_private, which indicates the GFN is private.


and update tdp_iter_refresh_sptep()
static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
...
iter->sptep = iter->pt_path[iter->level - 1] +
SPTE_INDEX((iter->gfn << PAGE_SHIFT) | iter->gfn_shared_mask, iter->level);

Change for_each_tdp_mte_min_level() accordingly.
Also the iteretor to call this.
#define for_each_tdp_pte_min_level(kvm, iter, root, min_level, start, end) \
for (tdp_iter_start(&iter, root, min_level, start, \
mirrored_root, mirrored_root ? kvm_gfn_shared_mask(kvm) : 0); \
iter.valid && iter.gfn < kvm_gfn_for_root(kvm, root, end); \
tdp_iter_next(&iter))

See above.


- trace point: update to include mirroredd_pt. Or Leave it as is for now.

- pr_err() that log gfn in handle_changed_spte()
Update to include mirrored_pt. Or Leave it as is for now.

- Update spte handler (handle_changed_spte(), handle_removed_pt()...),
use iter->mirror_pt or pass down mirror_pt.


IIUC only sp->role.is_mirrored_pt is needed, tdp_iter->is_mirrored_pt isn't necessary. But when the @sp is created, we need to initialize whether it is mirrored_pt.

Am I missing anything?