On 7/15/19 4:34 PM, John Hubbard wrote:
On 7/15/19 3:00 PM, Andrew Morton wrote:
On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell <rcampbell@xxxxxxxxxx> wrote:
 mm/rmap.c | 1 +
 1 file changed, 1 insertion(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * No need to invalidate here it will synchronize on
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * against the special swap migration pte.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ */
+ÂÂÂÂÂÂÂÂÂÂÂ subpage = page;
ÂÂÂÂÂÂÂÂÂÂÂÂÂ goto discard;
ÂÂÂÂÂÂÂÂÂ }
Hi Ralph and everyone,
While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more
about how this is triggered, in the commit description.)
I'll take a closer look at possible fixes (I have to step out for a bit) soon,
but any more experienced help is also appreciated here.
thanks,
I'm not surprised at the confusion. It took me quite awhile to understand how migrate_vma() works with ZONE_DEVICE private memory.
The big point to be aware of is that when migrating a page to
device private memory, the source page's page->mapping pointer
is copied to the ZONE_DEVICE struct page and the page_mapcount()
is increased. So, the kernel sees the page as being "mapped"
but the page table entry as being is_swap_pte() so the CPU will fault
if it tries to access the mapped address.
So yes, the source anon page is unmapped, DMA'ed to the device,
and then mapped again. Then on a CPU fault, the zone device page
is unmapped, DMA'ed to system memory, and mapped again.
The rmap_walk() is used to clear the temporary migration pte so
that is another important detail of how migrate_vma() works.
At the moment, only single anon private pages can migrate to
device private memory so there are no subpages and setting it to "page"
should be correct for now. I'm looking at supporting migration of
transparent huge pages but that is a work in progress.
Let me know how much of all that you think should be in the change log.
Getting an Acked-by from Jerome would be nice too.
I see Christoph Hellwig got confused by this too [1].
I have a patch to clear page->mapping when freeing ZONE_DEVICE private
struct pages which I'll send out soon.
I'll probably also add some comments to struct page to include the
above info and maybe remove the _zd_pad_1 field.
[1] 740d6310ed4cd5c78e63 ("mm: don't clear ->mapping in hmm_devmem_free")