RE: [PATCH v2 09/14] arm64: Enable memory encrypt for Realms
From: Michael Kelley
Date: Tue May 21 2024 - 11:58:42 EST
From: Catalin Marinas <catalin.marinas@xxxxxxx>Sent: Tuesday, May 21, 2024 3:14 AM
>
> On Mon, May 20, 2024 at 08:32:43PM +0000, Michael Kelley wrote:
> > From: Catalin Marinas <catalin.marinas@xxxxxxx> Sent: Monday, May 20, 2024 9:53 AM
> > > > > On Fri, Apr 12, 2024 at 09:42:08AM +0100, Steven Price wrote:
> > > > > > static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
> > > > > > @@ -41,6 +45,7 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data)
> > > > > > pte = clear_pte_bit(pte, cdata->clear_mask);
> > > > > > pte = set_pte_bit(pte, cdata->set_mask);
> > > > > > + /* TODO: Break before make for PROT_NS_SHARED updates */
> > > > > > __set_pte(ptep, pte);
> > > > > > return 0;
> [...]
> > > Thanks for the clarification on RIPAS states and behaviour in one of
> > > your replies. Thinking about this, since the page is marked as
> > > RIPAS_EMPTY prior to changing the PTE, the address is going to fault
> > > anyway as SEA if accessed. So actually breaking the PTE, TLBI, setting
> > > the new PTE would not add any new behaviour. Of course, this assumes
> > > that set_memory_decrypted() is never called on memory being currently
> > > accessed (can we guarantee this?).
> >
> > While I worked on CoCo VM support on Hyper-V for x86 -- both AMD
> > SEV-SNP and Intel TDX, I haven't ramped up on the ARM64 CoCo
> > VM architecture yet. With that caveat in mind, the assumption is that callers
> > of set_memory_decrypted() and set_memory_encrypted() ensure that
> > the target memory isn't currently being accessed. But there's a big
> > exception: load_unaligned_zeropad() can generate accesses that the
> > caller can't control. If load_unaligned_zeropad() touches a page that is
> > in transition between decrypted and encrypted, a SEV-SNP or TDX architectural
> > fault could occur. On x86, those fault handlers detect this case, and
> > fix things up. The Hyper-V case requires a different approach, and marks
> > the PTEs as "not present" before initiating a transition between decrypted
> > and encrypted, and marks the PTEs "present" again after the transition.
>
> Thanks. The load_unaligned_zeropad() case is a good point. I thought
> we'd get away with this on arm64 since accessing such decrypted page
> would trigger a synchronous exception but looking at the code, the
> do_sea() path never calls fixup_exception(), so we just kill the whole
> kernel.
>
> > This approach causes a reference generated by load_unaligned_zeropad()
> > to take the normal page fault route, and use the page-fault-based fixup for
> > load_unaligned_zeropad(). See commit 0f34d11234868 for the Hyper-V case.
>
> I think for arm64 set_memory_decrypted() (and encrypted) would have to
> first make the PTE invalid, TLBI, set the RIPAS_EMPTY state, set the new
> PTE. Any page fault due to invalid PTE would be handled by the exception
> fixup in load_unaligned_zeropad(). This way we wouldn't get any
> synchronous external abort (SEA) in standard uses. Not sure we need to
> do anything hyper-v specific as in the commit above.
Sounds good to me. I tried to do the same for all the x86 cases (instead of
just handling the Hyper-V paravisor), since that would completely decouple
TDX/SEV-SNP from load_unaligned_zeropad(). It worked for TDX. But
SEV-SNP does the PVALIDATE instruction during a decrypted<->encrypted
transition, and PVALIDATE inconveniently requires the virtual address as
input. It only uses the vaddr to translate to the paddr, but with the vaddr
PTE "not present", PVALIDATE fails. Sigh. This problem will probably come
back again when/if Coconut or any other paravisor redirects #VC/#VE to
the paravisor. But I disgress ....
>
> > > (I did come across the hv_uio_probe() which, if I read correctly, it
> > > ends up calling set_memory_decrypted() with a vmalloc() address; let's
> > > pretend this code doesn't exist ;))
> >
> > While the Hyper-V UIO driver is perhaps a bit of an outlier, the Hyper-V
> > netvsc driver also does set_memory_decrypted() on 16 Mbyte vmalloc()
> > allocations, and there's not really a viable way to avoid this. The
> > SEV-SNP and TDX code handles this case. Support for this case will
> > probably also be needed for CoCo guests on Hyper-V on ARM64.
>
> Ah, I was hoping we can ignore it. So the arm64 set_memory_*() code will
> have to detect and change both the vmalloc map and the linear map.
Yep.
> Currently this patchset assumes the latter only.
>
> Such buffers may end up in user space as well but I think at the
> set_memory_decrypted() call there aren't any such mappings and
> subsequent remap_pfn_range() etc. would handle the permissions properly
> through the vma->vm_page_prot attributes (assuming that someone set
> those pgprot attributes).
Yes, I'm pretty sure that's what we've seen on the x86 side.
Michael