Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU
From: Isaku Yamahata
Date: Fri May 17 2024 - 05:04:19 EST
On Fri, May 17, 2024 at 02:35:46AM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote:
> Here is a diff of an attempt to merge all the feedback so far. It's on top of
> the the dev branch from this series.
>
> On Thu, 2024-05-16 at 12:42 -0700, Isaku Yamahata wrote:
> > - rename role.is_private => role.is_mirrored_pt
>
> Agreed.
>
> >
> > - sp->gfn: gfn without shared bit.
> >
> > - fault->address: without gfn_shared_mask
> > Actually it doesn't matter much. We can use gpa with gfn_shared_mask.
>
> I left fault->addr with shared bits. It's not used anymore for TDX except in the
> tracepoint which I think makes sense.
As discussed with Kai [1], make fault->addr represent the real fault address.
[1] https://lore.kernel.org/kvm/20240517081440.GM168153@xxxxxxxxxxxxxxxxxxxxx/
>
> >
> > - Update struct tdp_iter
> > struct tdp_iter
> > gfn: gfn without shared bit
> >
> > /* Add new members */
> >
> > /* Indicates which PT to walk. */
> > bool mirrored_pt;
> >
> > // This is used tdp_iter_refresh_sptep()
> > // shared gfn_mask if mirrored_pt
> > // 0 if !mirrored_pt
> > gfn_shared_mask
> >
> > - Pass mirrored_pt and gfn_shared_mask to
> > tdp_iter_start(..., mirrored_pt, gfn_shared_mask)
> >
> > and update tdp_iter_refresh_sptep()
> > static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
> > ...
> > iter->sptep = iter->pt_path[iter->level - 1] +
> > SPTE_INDEX((iter->gfn << PAGE_SHIFT) | iter->gfn_shared_mask,
> > iter->level);
>
> I tried something else. The iterators still have gfn's with shared bits, but the
> addition of the shared bit is wrapped in tdp_mmu_for_each_pte(), so
> kvm_tdp_mmu_map() and similar don't have to handle the shared bits. They just
> pass in a root, and tdp_mmu_for_each_pte() knows how to adjust the GFN. Like:
>
> #define tdp_mmu_for_each_pte(_iter, _kvm, _root, _start, _end) \
> for_each_tdp_pte(_iter, _root, \
> kvm_gfn_for_root(_kvm, _root, _start), \
> kvm_gfn_for_root(_kvm, _root, _end))
I'm wondering to remove kvm_gfn_for_root() at all.
> I also changed the callers to use the new enum to specify roots. This way they
> can pass something with a nice name instead of true/false for bool private.
This is nice.
> Keeping a gfn_shared_mask inside the iterator didn't seem more clear to me, and
> bit more cumbersome. But please compare it.
>
> >
> > Change for_each_tdp_mte_min_level() accordingly.
> > Also the iteretor to call this.
> >
> > #define for_each_tdp_pte_min_level(kvm, iter, root, min_level, start,
> > end) \
> > for (tdp_iter_start(&iter, root, min_level,
> > start, \
> > mirrored_root, mirrored_root ? kvm_gfn_shared_mask(kvm) :
> > 0); \
> > iter.valid && iter.gfn < kvm_gfn_for_root(kvm, root,
> > end); \
> > tdp_iter_next(&iter))
>
> I liked it a lot because the callers don't need to manually call
> kvm_gfn_for_root() anymore. But I tried it and it required a lot of additions of
> kvm to the iterators call sites. I ended up removing it, but I'm not sure.
..
> > - Update spte handler (handle_changed_spte(), handle_removed_pt()...),
> > use iter->mirror_pt or pass down mirror_pt.
>
> You mean just rename it, or something else?
I scratch this. I thought Kai didn't like to use role [2].
But now it seems okay. [3]
[2] https://lore.kernel.org/kvm/4ba18e4e-5971-4683-82eb-63c985e98e6b@xxxxxxxxx/
> I don't think using kvm_mmu_page.role is correct.
[3] https://lore.kernel.org/kvm/20240517081440.GM168153@xxxxxxxxxxxxxxxxxxxxx/
> I think you can just get @mirrored_pt from the sptep:
> mirrored_pt = sptep_to_sp(sptep)->role.mirrored_pt;
> Anyway below is a first cut based on the discussion.
>
> A few other things:
> 1. kvm_is_private_gpa() is moved into Intel code. kvm_gfn_shared_mask() remains
> for only two operations in common code:
> - kvm_gfn_for_root() <- required for zapping/mapping
> - Stripping the bit when setting fault.gfn <- possible to remove if we strip
> cr2_or_gpa
> 2. I also played with changing KVM_PRIVATE_ROOTS to KVM_MIRROR_ROOTS.
> Unfortunately there is still some confusion between private and mirrored. For
> example you walk a mirror root (what is actually happening), but you have to
> allocate private page tables as you do, as well as call out to x86_ops named
> private. So those concepts are effectively linked and used a bit
> interchangeably.
On top of your patch, I created the following patch to remove kvm_gfn_for_root().
Although I haven't tested it yet, I think the following shows my idea.
- Add gfn_shared_mask to struct tdp_iter.
- Use iter.gfn_shared_mask to determine the starting sptep in the root.
- Remove kvm_gfn_for_root()
---
arch/x86/kvm/mmu/mmu_internal.h | 10 -------
arch/x86/kvm/mmu/tdp_iter.c | 5 ++--
arch/x86/kvm/mmu/tdp_iter.h | 16 ++++++-----
arch/x86/kvm/mmu/tdp_mmu.c | 48 ++++++++++-----------------------
4 files changed, 26 insertions(+), 53 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 2b1b2a980b03..9676af0cb133 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -180,16 +180,6 @@ static inline void kvm_mmu_alloc_private_spt(struct kvm_vcpu *vcpu, struct kvm_m
sp->private_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_private_spt_cache);
}
-static inline gfn_t kvm_gfn_for_root(struct kvm *kvm, struct kvm_mmu_page *root,
- gfn_t gfn)
-{
- gfn_t gfn_for_root = kvm_gfn_to_private(kvm, gfn);
-
- /* Set shared bit if not private */
- gfn_for_root |= -(gfn_t)!is_mirrored_sp(root) & kvm_gfn_shared_mask(kvm);
- return gfn_for_root;
-}
-
static inline bool kvm_mmu_page_ad_need_write_protect(struct kvm_mmu_page *sp)
{
/*
diff --git a/arch/x86/kvm/mmu/tdp_iter.c b/arch/x86/kvm/mmu/tdp_iter.c
index 04c247bfe318..c5f2ca1ceede 100644
--- a/arch/x86/kvm/mmu/tdp_iter.c
+++ b/arch/x86/kvm/mmu/tdp_iter.c
@@ -12,7 +12,7 @@
static void tdp_iter_refresh_sptep(struct tdp_iter *iter)
{
iter->sptep = iter->pt_path[iter->level - 1] +
- SPTE_INDEX(iter->gfn << PAGE_SHIFT, iter->level);
+ SPTE_INDEX((iter->gfn | iter->gfn_shared_mask) << PAGE_SHIFT, iter->level);
iter->old_spte = kvm_tdp_mmu_read_spte(iter->sptep);
}
@@ -37,7 +37,7 @@ void tdp_iter_restart(struct tdp_iter *iter)
* rooted at root_pt, starting with the walk to translate next_last_level_gfn.
*/
void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
- int min_level, gfn_t next_last_level_gfn)
+ int min_level, gfn_t next_last_level_gfn, gfn_t gfn_shared_mask)
{
if (WARN_ON_ONCE(!root || (root->role.level < 1) ||
(root->role.level > PT64_ROOT_MAX_LEVEL))) {
@@ -46,6 +46,7 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
}
iter->next_last_level_gfn = next_last_level_gfn;
+ iter->gfn_shared_mask = gfn_shared_mask;
iter->root_level = root->role.level;
iter->min_level = min_level;
iter->pt_path[iter->root_level - 1] = (tdp_ptep_t)root->spt;
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 8a64bcef9deb..274b42707f0a 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -91,8 +91,9 @@ struct tdp_iter {
tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
/* A pointer to the current SPTE */
tdp_ptep_t sptep;
- /* The lowest GFN (shared bits included) mapped by the current SPTE */
+ /* The lowest GFN (shared bits excluded) mapped by the current SPTE */
gfn_t gfn;
+ gfn_t gfn_shared_mask;
/* The level of the root page given to the iterator */
int root_level;
/* The lowest level the iterator should traverse to */
@@ -120,18 +121,19 @@ struct tdp_iter {
* Iterates over every SPTE mapping the GFN range [start, end) in a
* preorder traversal.
*/
-#define for_each_tdp_pte_min_level(iter, root, min_level, start, end) \
- for (tdp_iter_start(&iter, root, min_level, start); \
- iter.valid && iter.gfn < end; \
+#define for_each_tdp_pte_min_level(iter, kvm, root, min_level, start, end) \
+ for (tdp_iter_start(&iter, root, min_level, start, \
+ is_mirrored_sp(root) ? 0: kvm_gfn_shared_mask(kvm)); \
+ iter.valid && iter.gfn < end; \
tdp_iter_next(&iter))
-#define for_each_tdp_pte(iter, root, start, end) \
- for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end)
+#define for_each_tdp_pte(iter, kvm, root, start, end) \
+ for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_4K, start, end)
tdp_ptep_t spte_to_child_pt(u64 pte, int level);
void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
- int min_level, gfn_t next_last_level_gfn);
+ int min_level, gfn_t next_last_level_gfn, gfn_t gfn_shared_mask);
void tdp_iter_next(struct tdp_iter *iter);
void tdp_iter_restart(struct tdp_iter *iter);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7f13016e210b..bf7aa87eb593 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -862,20 +862,18 @@ static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter,
iter->gfn, iter->level);
}
-#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
- for_each_tdp_pte(_iter, _root, _start, _end)
+#define tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \
+ for_each_tdp_pte(_iter, _kvm, _root, _start, _end)
-#define tdp_root_for_each_leaf_pte(_iter, _root, _start, _end) \
- tdp_root_for_each_pte(_iter, _root, _start, _end) \
+#define tdp_root_for_each_leaf_pte(_iter, _kvm, _root, _start, _end) \
+ tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \
if (!is_shadow_present_pte(_iter.old_spte) || \
!is_last_spte(_iter.old_spte, _iter.level)) \
continue; \
else
#define tdp_mmu_for_each_pte(_iter, _kvm, _root, _start, _end) \
- for_each_tdp_pte(_iter, _root, \
- kvm_gfn_for_root(_kvm, _root, _start), \
- kvm_gfn_for_root(_kvm, _root, _end))
+ for_each_tdp_pte(_iter, _kvm, _root, _start, _end)
/*
* Yield if the MMU lock is contended or this thread needs to return control
@@ -941,7 +939,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
gfn_t end = tdp_mmu_max_gfn_exclusive();
gfn_t start = 0;
- for_each_tdp_pte_min_level(iter, root, zap_level, start, end) {
+ for_each_tdp_pte_min_level(iter, kvm, root, zap_level, start, end) {
retry:
if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
continue;
@@ -1043,17 +1041,9 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
lockdep_assert_held_write(&kvm->mmu_lock);
- /*
- * start and end doesn't have GFN shared bit. This function zaps
- * a region including alias. Adjust shared bit of [start, end) if the
- * root is shared.
- */
- start = kvm_gfn_for_root(kvm, root, start);
- end = kvm_gfn_for_root(kvm, root, end);
-
rcu_read_lock();
- for_each_tdp_pte_min_level(iter, root, PG_LEVEL_4K, start, end) {
+ for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_4K, start, end) {
if (can_yield &&
tdp_mmu_iter_cond_resched(kvm, &iter, flush, false)) {
flush = false;
@@ -1448,19 +1438,9 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
* into this helper allow blocking; it'd be dead, wasteful code.
*/
__for_each_tdp_mmu_root(kvm, root, range->slot->as_id, types) {
- gfn_t start, end;
-
- /*
- * For TDX shared mapping, set GFN shared bit to the range,
- * so the handler() doesn't need to set it, to avoid duplicated
- * code in multiple handler()s.
- */
- start = kvm_gfn_for_root(kvm, root, range->start);
- end = kvm_gfn_for_root(kvm, root, range->end);
-
rcu_read_lock();
- tdp_root_for_each_leaf_pte(iter, root, start, end)
+ tdp_root_for_each_leaf_pte(iter, kvm, root, range->start, range->end)
ret |= handler(kvm, &iter, range);
rcu_read_unlock();
@@ -1543,7 +1523,7 @@ static bool wrprot_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
BUG_ON(min_level > KVM_MAX_HUGEPAGE_LEVEL);
- for_each_tdp_pte_min_level(iter, root, min_level, start, end) {
+ for_each_tdp_pte_min_level(iter, kvm, root, min_level, start, end) {
retry:
if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
continue;
@@ -1706,7 +1686,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
* level above the target level (e.g. splitting a 1GB to 512 2MB pages,
* and then splitting each of those to 512 4KB pages).
*/
- for_each_tdp_pte_min_level(iter, root, target_level + 1, start, end) {
+ for_each_tdp_pte_min_level(iter, kvm, root, target_level + 1, start, end) {
retry:
if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
continue;
@@ -1791,7 +1771,7 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
rcu_read_lock();
- tdp_root_for_each_pte(iter, root, start, end) {
+ tdp_root_for_each_pte(iter, kvm, root, start, end) {
retry:
if (!is_shadow_present_pte(iter.old_spte) ||
!is_last_spte(iter.old_spte, iter.level))
@@ -1846,7 +1826,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
rcu_read_lock();
- tdp_root_for_each_leaf_pte(iter, root, gfn + __ffs(mask),
+ tdp_root_for_each_leaf_pte(iter, kvm, root, gfn + __ffs(mask),
gfn + BITS_PER_LONG) {
if (!mask)
break;
@@ -1903,7 +1883,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
rcu_read_lock();
- for_each_tdp_pte_min_level(iter, root, PG_LEVEL_2M, start, end) {
+ for_each_tdp_pte_min_level(iter, kvm, root, PG_LEVEL_2M, start, end) {
retry:
if (tdp_mmu_iter_cond_resched(kvm, &iter, false, true))
continue;
@@ -1973,7 +1953,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
rcu_read_lock();
- for_each_tdp_pte_min_level(iter, root, min_level, gfn, gfn + 1) {
+ for_each_tdp_pte_min_level(iter, kvm, root, min_level, gfn, gfn + 1) {
if (!is_shadow_present_pte(iter.old_spte) ||
!is_last_spte(iter.old_spte, iter.level))
continue;
--
2.43.2
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>