Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
From: Peter Xu
Date: Wed May 22 2024 - 18:03:45 EST
On Wed, May 15, 2024 at 10:18:31PM +0200, Borislav Petkov wrote:
> So if I were to design this, I'd do it this way:
>
> 0. guest gets hw poison injected
>
> 1. it runs memory_failure() and it kills the processes using the page.
>
> 2. page is marked poisoned on the host so no other guest gets it.
>
> That's it. No second accesses whatsoever. At least this is how it works
> on baremetal.
>
> This hw poisoning emulation is just silly and unnecessary.
We (QEMU) haven't yet consumed this.. but I think it makes sense to have
such emulation, as it's slightly different from a real hwpoison.
I think the important bit that's missing in this picture is migration, that
the VM can migrate from one host to another, carrying that poisoned PFN.
Let's assume we have two hosts: src and dst. Currently VM runs on src
host.
Before migration, there is a real PFN that is bad, MCE injected. When
accesssed by either guest vcpu or host cpu / hypervisor, VM gets killed.
This is so far the same to any process that has a bad page.
However it's possible a VM got migrated _before_ that bad PFN accessed, in
this case the VM is still legal to run, the hypervisor will not migrate
that bad PFN data knowing that its data is invalid. What it does is it'll
tell dst that "this guest PFN is bad, if guest access it let's crash it".
Then what dst host needs is a way to describe "this guest PFN is bad": the
easiest way is to describe "this VA of the process is bad", meanwhile
there'll be no real page backing that VA anyway, and also no real poisoned
pages. We want to poison a VA only. That's why an emulation is needed.
Besides that we want to get exactly whatever we'll get for a real hwpoison,
e.g. SIGBUS with the address encoded, then KVM work naturally with that
just like a real MCE.
One other thing we can do is to inject-poison to the VA together with the
page backing it, but that'll pollute a PFN on dst host to be a real bad PFN
and won't be able to be used by the dst OS anymore, so it's less optimal.
Thanks,
--
Peter Xu