Re: [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem
From: Jonathan Cameron
Date: Wed May 22 2024 - 05:40:38 EST
On Tue, 21 May 2024 10:06:21 +0200
Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Fri, May 17, 2024 at 12:44:18PM +0100, Jonathan Cameron wrote:
> > Given we are talking about something new, maybe this is an opportunity
> > to not perpetuate this?
> >
> > If we add scrub in here I'd prefer to just use the normal bus registration
> > handling rather than creating a nest of additional nodes. So perhaps we
> > could consider
> > /sys/bus/edac/device/scrub0 (or whatever name makes sense, as per the
> > earlier discussion of cxl_scrub0 or similar).
>
> Yes, my main worry is how this RAS functionality is going to be all
> organized in the tree. Yes, EDAC legacy methods can die but the
> user-visible part can't so we might as well use it to concentrate stuff
> there.
Understood.
>
> > Could consider moving the bus location of mc0 etc in future to there with
> > symlinks to /sys/bus/edac/device/mc/* for backwards compatibility either
> > via setting their parents or more explicit link creation.
>
> You can ignore the mc - that's the memory controller representation EDAC
> does and that's also kind of semi-legacy considering how heterogeneous
> devices are becoming. Nowadays, scrubbing functionality can be on
> anything that has memory and that's not only a memory controller.
>
> So it would actually be the better thing to abstract that differently
> and use .../edac/device/ for the different RAS functionalities. I.e.,
> have the "device" organize it all.
I'm not sure I follow this. Definitely worth ensuring we are thinking
the same thing wrt to layout before we go further,
Do you mean keep it similar to the existing device/mc device/pci
structure so /sys/bus/edac/devices/scrub/cxl_mem0_scrub etc?
This would rely on symlinks to paper over the dev->parent not being
the normal parent. Hence would be similar to /sys/bus/edac/devices/pci in
edac_pci_create_sysfs() or equivalent in edac_device_create_sysfs().
Or is the ../edac/device bit about putting an extra device under edac/devices/?
e.g.
/sys/bus/edac/devices/cxl_memX/scrub
/sys/bus/edac/devices/cxl_memX/other_ras_thing
which would be fairly standard driver model stuff.
This would sit alongside 'legacy'
/sys/bus/edac/devices/mc/mcX
/sys/bus/edac/devices/pci/pciX etc
I'd prefer this second model as it's very standard and but grouping is per
providing parent device, rather than functionality. However, it is rather
different from the existing edac structure.
Where I've used the symlink approach in the past, it has always
been about keeping a legacy interface in place, not where I'd start
with something new. Hence I think this is a question of how far
we 'breakaway' from existing edac structure.
>
> > These scrub0 would have their dev->parent set to who ever actually
> > registered them providing that reference cleanly and letting all the
> > normal device model stuff work more simply.
>
> Ack.
This suggests the second option above, but I wanted to confirm as Shiju
and I read this differently.
>
> > If we did that with the scrub nodes, the only substantial change from
> > a separate subsystem as seen in this patch set would be to register
> > them on the edac bus rather than a separate class.
> >
> > As you pointed out, there is a simple scrub interface in the existing
> > edac memory controller code. How would you suggest handling that?
> > Have them all register an additional device on the bus (as a child
> > of the mcX devices) perhaps? Seems an easy step forwards and should
> > be no backwards compatibility concerns.
>
> Well, you guys want to control that scrubbing from userspace and those
> old things probably do not fit that model? We could just not convert
> them for now and add them later if really needed. I.e., leave sleeping
> dogs lie.
Ok. There is an existing is the minimal sysfs existing interface but I'm
fine with ignoring it for now.
>
> > It absolutely doesn't as long as we can do it fairly cleanly within
> > existing code. I wasn't sure that was possible, but you know edac
> > a lot better than me and so I'll defer to you on that!
>
> Meh, I'm simply maintaining it because no one else wants to. :)
*much sympathy!* As we ramp up more on this stuff, we'll try and
help out where we can.
>
> > Several options for that, but fair question - bringing (at least some of)
> > the RAS mess together will focus reviewer bandwidth etc better.
>
> Review is more than appreciated, as always.
>
> > I'm definitely keen on unifying things as I agree, this mixture of different
> > RAS functionality is a ever worsening mess.
>
> Yap, it needs to be unified and reigned into something more
> user-friendly and manageable.
Hopefully we all agree on a unified solution being the target.
Feels like we are converging. Now we are down to the details :)
Thanks,
Jonathan
>
> Thx.
>