On Tue, May 21, 2024, Kai Huang wrote:
On 21/05/2024 5:35 am, Sean Christopherson wrote:
On Mon, May 20, 2024, Kai Huang wrote:
I am wondering whether this can be done in the KVM page fault handler?
No, because the state of a pfn in the RMP is tied to the guest_memfd inode,
not to the file descriptor, i.e. not to an individual VM.
It's strange that as state of a PFN of SNP doesn't bind to individual VM, at
least for the private pages. The command rpm_make_private() indeed reflects
the mapping between PFN <-> <GFN, SSID>.
s/SSID/ASID
KVM allows a single ASID to be bound to multiple "struct kvm" instances, e.g.
for intra-host migration. If/when trusted I/O is a thing, presumably KVM will
also need to share the ASID with other entities, e.g. IOMMUFD.
rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned),
level, sev->asid, false);
And the NPT page tables are treated as ephemeral for SNP.
Do you mean private mappings for SNP guest can be zapped from the VM (the
private pages are still there unchanged) and re-mapped later w/o needing to
have guest's explicit acceptance?
Correct.
If so, I think "we can zap" doesn't mean "we need to zap"?
Correct.
Because the privates are now pinned anyway.
Pinning is an orthogonal issue. And it's not so much that the pfns are pinned
as it is that guest_memfd simply doesn't support page migration or swap at this
time.
Regardless of whether or not guest_memfd supports page migration, KVM needs to
track the state of the physical page in guest_memfd, e.g. if it's been assigned
to the ASID versus if it's still in a shared state.
If we truly want to zap private mappings for SNP, IIUC it can be done by
distinguishing whether a VM needs to use a separate private table, which is
TDX-only.
I wouldn't say we "want" to zap private mappings for SNP, rather that it's a lot
less work to keep KVM's existing behavior (literally do nothing) than it is to
rework the MMU and whatnot to not zap SPTEs.
And there's no big motivation to
avoid zapping because SNP VMs are unlikely to delete memslots.
If it turns out that it's easy to preserve SNP mappings after TDX lands, then we
can certainly go that route, but AFAIK there's no reason to force the issue.