Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem

From: Borislav Petkov
Date: Tue May 21 2024 - 04:07:31 EST


On Fri, May 17, 2024 at 12:44:18PM +0100, Jonathan Cameron wrote:
> Given we are talking about something new, maybe this is an opportunity
> to not perpetuate this?
>
> If we add scrub in here I'd prefer to just use the normal bus registration
> handling rather than creating a nest of additional nodes. So perhaps we
> could consider
> /sys/bus/edac/device/scrub0 (or whatever name makes sense, as per the
> earlier discussion of cxl_scrub0 or similar).

Yes, my main worry is how this RAS functionality is going to be all
organized in the tree. Yes, EDAC legacy methods can die but the
user-visible part can't so we might as well use it to concentrate stuff
there.

> Could consider moving the bus location of mc0 etc in future to there with
> symlinks to /sys/bus/edac/device/mc/* for backwards compatibility either
> via setting their parents or more explicit link creation.

You can ignore the mc - that's the memory controller representation EDAC
does and that's also kind of semi-legacy considering how heterogeneous
devices are becoming. Nowadays, scrubbing functionality can be on
anything that has memory and that's not only a memory controller.

So it would actually be the better thing to abstract that differently
and use .../edac/device/ for the different RAS functionalities. I.e.,
have the "device" organize it all.

> These scrub0 would have their dev->parent set to who ever actually
> registered them providing that reference cleanly and letting all the
> normal device model stuff work more simply.

Ack.

> If we did that with the scrub nodes, the only substantial change from
> a separate subsystem as seen in this patch set would be to register
> them on the edac bus rather than a separate class.
>
> As you pointed out, there is a simple scrub interface in the existing
> edac memory controller code. How would you suggest handling that?
> Have them all register an additional device on the bus (as a child
> of the mcX devices) perhaps? Seems an easy step forwards and should
> be no backwards compatibility concerns.

Well, you guys want to control that scrubbing from userspace and those
old things probably do not fit that model? We could just not convert
them for now and add them later if really needed. I.e., leave sleeping
dogs lie.

> It absolutely doesn't as long as we can do it fairly cleanly within
> existing code. I wasn't sure that was possible, but you know edac
> a lot better than me and so I'll defer to you on that!

Meh, I'm simply maintaining it because no one else wants to. :)

> Several options for that, but fair question - bringing (at least some of)
> the RAS mess together will focus reviewer bandwidth etc better.

Review is more than appreciated, as always.

> I'm definitely keen on unifying things as I agree, this mixture of different
> RAS functionality is a ever worsening mess.

Yap, it needs to be unified and reigned into something more
user-friendly and manageable.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette