Re: Regression in 6.1.81: Missing memory in pmem device
From: Chaney, Ben
Date: Thu May 16 2024 - 12:38:23 EST
The 'nokaslr' flag does work around this issue, but using it has a few downsides.
First, we would like the security benefit provided be ASLR. Also, this imposes a restriction on what memmaps are possible. It would then be required to have them offset from the beginning of the memory.
I also think there are a few other features that may be impacted by this, that were not addressed by the patch. crashkernel and pstore both probably need physical kaslr disabled as well.
Thanks,
Ben
On 5/15/24, 2:30 PM, "Kees Cook" <kees@xxxxxxxxxx <mailto:kees@xxxxxxxxxx>> wrote:
On May 15, 2024 10:42:49 AM PDT, Ard Biesheuvel <ardb@xxxxxxxxxx <mailto:ardb@xxxxxxxxxx>> wrote:
>(cc Kees)
>
>On Wed, 15 May 2024 at 19:32, Chaney, Ben <bchaney@xxxxxxxxxx <mailto:bchaney@xxxxxxxxxx>> wrote:
>>
>> Hello,
>> I encountered an issue when upgrading to 6.1.89 from 6.1.77. This upgrade caused a breakage in emulated persistent memory. Significant amounts of memory are missing from a pmem device:
>>
>> fdisk -l /dev/pmem*
>> Disk /dev/pmem0: 355.9 GiB, 382117871616 bytes, 746323968 sectors
>> Units: sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> Disk /dev/pmem1: 25.38 GiB, 27246198784 bytes, 53215232 sectors
>> Units: sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 4096 bytes
>> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
>>
>> The memmap parameter that created these pmem devices is “memmap=364416M!28672M,367488M!419840M”, which should cause a much larger amount of memory to be allocated to /dev/pmem1. The amount of missing memory and the device it is missing from is randomized on each reboot. There is some amount of memory missing in almost all cases, but not 100% of the time. Notably, the memory that is missing from these devices is not reclaimed by the system for general use. This system in question has 768GB of memory split evenly across two NUMA nodes.
>>
>> When the error occurs, there are also the following error messages showing up in dmesg:
>>
>> [ 5.318317] nd_pmem namespace1.0: [mem 0x5c2042c000-0x5ff7ffffff flags 0x200] misaligned, unable to map
>> [ 5.335073] nd_pmem: probe of namespace1.0 failed with error -95
>>
>> Bisection implicates 2dfaeac3f38e4e550d215204eedd97a061fdc118 as the patch that first caused the issue. I believe the cause of the issue is that the EFI stub is randomizing the location of the decompressed kernel without accounting for the memory map, and it is clobbering some of the memory that has been reserved for pmem.
>>
>
>Does using 'nokaslr' on the kernel command line work around this?
>
>I think in this particular case, we could just disable physical KASLR
>(but retain virtual KASLR) if memmap= appears on the kernel command
>line, on the basis that emulated persistent memory is somewhat of a
>niche use case, and physical KASLR is not as important as virtual
>KASLR (which shouldn't be implicated in this).
Yeah, that seems reasonable to me. As long as we put a notice to dmesg that physical ASLR was disabled due to memmap's physical reservation. If this usage becomes more common, we should find a better way, though.
This reminds me a bit of the work Steve has been exploring:
https://urldefense.com/v3/__https://lore.kernel.org/all/20240509163310.2aa0b2e1@xxxxxxxxxxxxxxxxxxxx <mailto:20240509163310.2aa0b2e1@xxxxxxxxxxxxxxxxxxxx>/__;!!GjvTz_vk!WsENA8w3PaYEGppSkEYSpelC-CH2JR35SATJXrj8mHixFG3SC_aj_Ii0ySbmGhQg8V1SV4sszkY$
--
Kees Cook