On Fri, May 10, 2024 at 12:26:00AM -0600, Jane Chu wrote:Thanks! Will fix, and add comments.
Added two explicit MF_MSG messages describing failure in get_hwpoison_page....
Attemped to document the definition of various action names, and made a few
adjustment to the action_result() calls.
Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
+/*"or if it"
+ * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
+ * something, then caught in a race condition which renders the effort sort"it was caught"
I would also add to MF_IGNORED that we mark the page hwpoisoned anyway.
Yes, it's a catch-all versus something that suppose to happen, will add comments.@@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)I was confused because I saw you replaced all MF_MSG_UNKNOWN, so I
{
pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
unlock_page(p);
- return MF_FAILED;
+ return MF_IGNORED;
}
wondered how we can end up here until I saw this is a catch-all in case
we fail to make sense of the page->flags.
While you are improving this, I would suggest to add a little comment
above the function explaining how we can reach it.
/*I do not understand why are you doing this.
@@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
if (flags & MF_ACTION_REQUIRED) {
folio = page_folio(p);
res = kill_accessing_process(current, folio_pfn(folio), flags);
+ return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
First of all, why is this considered a failure? We did not fail this
time, did we? We went right ahead and kill the process which was
re-accessing the hwpoisoned page. Is that considered a failure?
Second, you are know supressing -EHWPOISON with whatever action_result()
will gives us, which judging from the actual code would be -EBUSY?
I do not think that that is right, and we should be returning
-EHWPOISON. Or whatever error code kill_accessing_process() gives us.
@@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)This is not coherent with what you did in try_memory_failure_hugetlb
res = kill_accessing_process(current, pfn, flags);
if (flags & MF_COUNT_INCREASED)
put_page(p);
+ action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
for MF_MSG_ALREADY_POISONED, I __think__ that in there we should be
doing the same as we do here.