On 7/24/23 06:55, mawupeng wrote:
On 2023/7/21 18:36, Will Deacon wrote:
On Mon, Jul 17, 2023 at 07:51:50PM +0800, Wupeng Ma wrote:
From: Ma Wupeng <mawupeng1@xxxxxxxxxx>
During our test, we found that kernel page table may be unexpectedly
cleared with rodata off. The root cause is that the kernel page is
initialized with pud size(1G block mapping) while offline is memory
block size(MIN_MEMORY_BLOCK_SIZE 128M), eg, if 2G memory is hot-added,
when offline a memory block, the call trace is shown below,
offline_and_remove_memory
try_remove_memory
arch_remove_memory
__remove_pgd_mapping
unmap_hotplug_range
unmap_hotplug_p4d_range
unmap_hotplug_pud_range
if (pud_sect(pud))
pud_clear(pudp);
Sorry, but I'm struggling to understand the problem here. If we're adding
and removing a 2G memory region, why _wouldn't_ we want to use large 1GiB
mappings?
Or are you saying that only a subset of the memory is removed,Yes, umap a subset but the whole thing page table entry is removed.
but we then accidentally unmap the whole thing?
This could solve this problem.diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.cI think we should allow large mappings here and instead prevent partial
index 95d360805f8a..44c724ce4f70 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -44,6 +44,7 @@
#define NO_BLOCK_MAPPINGS BIT(0)
#define NO_CONT_MAPPINGS BIT(1)
#define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
+#define NO_PUD_MAPPINGS BIT(3)
int idmap_t0sz __ro_after_init;
@@ -344,7 +345,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end,
*/
if (pud_sect_supported() &&
((addr | next | phys) & ~PUD_MASK) == 0 &&
- (flags & NO_BLOCK_MAPPINGS) == 0) {
+ (flags & (NO_BLOCK_MAPPINGS | NO_PUD_MAPPINGS)) == 0) {
pud_set_huge(pudp, phys, prot);
/*
@@ -1305,7 +1306,7 @@ struct range arch_get_mappable_range(void)
int arch_add_memory(int nid, u64 start, u64 size,
struct mhp_params *params)
{
- int ret, flags = NO_EXEC_MAPPINGS;
+ int ret, flags = NO_EXEC_MAPPINGS | NO_PUD_MAPPINGS;
removal of the block, if that's what is causing the issue.
Or we can prevent partial removal? Or rebulid page table entry which is not removed?
+ David Hildenbrand <david@xxxxxxxxxx>
Splitting the block mapping and rebuilding page table entry to reflect non-removed
areas will require additional information such as flags and pgtable alloc function
as in __create_pgd_mapping(), which need to be passed along, depending on whether
it's tearing down vmemmap (would not have PUD block map) or linear mapping. But I
am just wondering if we have to go in that direction at all or just prevent partial
memory block removal as suggested by Will.
- arch_remove_memory() does not have return type, core MM hotremove would not fail
because arch_remove_memory() failed or warned
- core MM hotremove does check_hotplug_memory_range() which ensures the range and
start address are memory_block_size_bytes() aligned
- Default memory_block_size_bytes() is dependent on SECTION_SIZE_BITS which on arm64
now can be less than PUD_SIZE triggering this problem.
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
unsigned long __weak memory_block_size_bytes(void)
{
return MIN_MEMORY_BLOCK_SIZE;
}
EXPORT_SYMBOL_GPL(memory_block_size_bytes);
- We would need to override memory_block_size_bytes() on arm64 to accommodate such
scenarios here
Something like this might work (built but not tested)
commit 2eb8dc0d08dfe0b2a3bb71df93b12f7bf74a2ca6 (HEAD)
Author: Anshuman Khandual <anshuman.khandual@xxxxxxx>
Date: Mon Jul 24 06:45:34 2023 +0100
arm64/mm: Define memory_block_size_bytes()
Define memory_block_size_bytes() on arm64 platforms to set minimum hot plug
and remove granularity as PUD_SIZE in case where MIN_MEMORY_BLOCK_SIZE just
falls below PUD_SIZE. Otherwise a complete PUD block mapping will be teared
down while unmapping MIN_MEMORY_BLOCK_SIZE range.
Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 95d360805f8a..1918459b3460 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1157,6 +1157,17 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
}
#ifdef CONFIG_MEMORY_HOTPLUG
+unsigned long memory_block_size_bytes(void)
+{
+ /*
+ * Linear mappings might include PUD based block mappings which
+ * cannot be teared down in part during memory hotremove. Hence
+ * PUD_SIZE needs to be the minimum granularity, for memory hot
+ * removal in case MIN_MEMORY_BLOCK_SIZE falls below.
+ */
+ return max_t(unsigned long, MIN_MEMORY_BLOCK_SIZE, PUD_SIZE);
+}
+
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{