On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
On 5/14/24 3:39 AM, Catalin Marinas wrote:Ah, sorry, I missed that. This seems like a valid reason.
It would be good to understand why openjdk is doing this instead of aYes, you are right. I think I quoted the JVM justification in earlier email,
plain write. Is it because it may be racing with some other threads
already using the heap? That would be a valid pattern.
anyway they said "permit use of memory concurrently with pretouch".
Nit: write implies read, so you only need to check !write.A point Will raised was on potential ABI changes introduced by thisI think I can do something like the below conceptually:
patch. The ESR_EL1 reported to user remains the same as per the hardware
spec (read-only), so from a SIGSEGV we may have some slight behaviour
changes:
1. PTE invalid:
a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
patch, the PTE is mapped as PTE_RDONLY first and a subsequent
fault will report SIGSEGV with ESR_EL1.WnR == 1.
if is_el0_atomic_instr && !is_write_abort
force_write = true
if VM_READ && !VM_WRITE && force_write == true
vm_flags = VM_READI think this should work. So instead of reporting the write fault
mm_flags ~= FAULT_FLAG_WRITE
Then we just fallback to read fault. The following write fault will trigger
SIGSEGV with consistent ABI.
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.
I agree, with your approach above we don't need to fake WnR.b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported withI think we don't need to fake the ESR_EL1.WnR bit with the fallback.
ESR_EL1.WnR == 0, so no change from current behaviour, unless we
fix the patch for (1.a) to fake the WnR bit which would change the
current expectations.
2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
hardware, no need to fix ESR_EL1 up.
The patch would have to address (1) above but faking the ESR_EL1.WnR bit
based on the vma flags looks a bit fragile.
Not really familiar with uffd but just looking at the code, if a handlerSimilarly, we have userfaultfd that reports the fault to user. I thinkI don't quite get what the problem is. IIUC, uffd just needs a signal from
in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
and UFFD_PAGEFAULT_FLAG_WRITE set.
kernel to tell this area will be written. It seems not break the semantic.
Added Peter Xu in this loop, who is the uffd developer. He may shed some
light.
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).
With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).
However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.