Re: [RFC PATCH 02/11] rust: add driver abstraction

From: Danilo Krummrich
Date: Tue May 21 2024 - 18:22:16 EST


On Tue, May 21, 2024 at 11:35:43AM +0200, Greg KH wrote:
> On Tue, May 21, 2024 at 12:30:37AM +0200, Danilo Krummrich wrote:
> > On Mon, May 20, 2024 at 08:14:18PM +0200, Greg KH wrote:
> > > On Mon, May 20, 2024 at 07:25:39PM +0200, Danilo Krummrich wrote:
> > > > From: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> > > >
> > > > This defines general functionality related to registering drivers with
> > > > their respective subsystems, and registering modules that implement
> > > > drivers.
> > > >
> > > > Co-developed-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > > > Signed-off-by: Asahi Lina <lina@xxxxxxxxxxxxx>
> > > > Co-developed-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > > > Signed-off-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> > > > Signed-off-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> > > > Signed-off-by: Danilo Krummrich <dakr@xxxxxxxxxx>
> > > > ---
> > > > rust/kernel/driver.rs | 492 +++++++++++++++++++++++++++++++++++
> > > > rust/kernel/lib.rs | 4 +-
> > > > rust/macros/module.rs | 2 +-
> > > > samples/rust/rust_minimal.rs | 2 +-
> > > > samples/rust/rust_print.rs | 2 +-
> > > > 5 files changed, 498 insertions(+), 4 deletions(-)
> > > > create mode 100644 rust/kernel/driver.rs
> > > >
> > > > diff --git a/rust/kernel/driver.rs b/rust/kernel/driver.rs
> > > > new file mode 100644
> > > > index 000000000000..e0cfc36d47ff
> > > > --- /dev/null
> > > > +++ b/rust/kernel/driver.rs
> > > > @@ -0,0 +1,492 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +//! Generic support for drivers of different buses (e.g., PCI, Platform, Amba, etc.).
> > > > +//!
> > > > +//! Each bus/subsystem is expected to implement [`DriverOps`], which allows drivers to register
> > > > +//! using the [`Registration`] class.
> > >
> > > Why are you creating new "names" here? "DriverOps" is part of a 'struct
> > > device_driver' why are you separating it out here? And what is
> >
> > DriverOps is a trait which abstracts a subsystems register() and unregister()
> > functions to (un)register drivers. It exists such that a generic Registration
> > implementation calls the correct one for the subsystem.
> >
> > For instance, PCI would implement DriverOps::register() by calling into
> > bindings::__pci_register_driver().
> >
> > We can discuss whether DriverOps is a good name for the trait, but it's not a
> > (different) name for something that already exists and already has a name.
>
> It's a name we don't have in the C code as the design of the driver core
> does not need or provide it. It's just the section of 'struct
> device_driver' that provides function callbacks, why does it need to be
> separate at all?

I'm confused by the relationship to `struct device_driver` you seem to imply.
How is it related?

Again, this is just a trait for subsystems to provide their corresponding
register and unregister implementation, e.g. pci_register_driver() and
pci_unregister_driver(), such that they can be called from the generic
Registration code below.

See [1] for an example implementation in PCI.

Please also consider that some structures might be a 1:1 representation of C
structures, some C structures are not required at the Rust side at all, and
then there might be additional structures and traits that abstract things C has
no data structure for.

This is due to the fact that 1. we're not replicating the functionality on the C
side, but only make use of it, and 2. we're trying to abstract existing code to
make it work with a conceptually different language. Which means, if you find
something that's not on the C side, it doesn't (necessarily) mean we're making
something up that should either not exist or be on the C side as well.

I'm not saying don't question it, I appreciate that, and it's even the whole
reason for sending this RFC, but I ask you to consider this fact. :)

[1] https://lore.kernel.org/rust-for-linux/20240520172554.182094-10-dakr@xxxxxxxxxx/

>
> > > 'Registration'? That's a bus/class thing, not a driver thing.
> >
> > A Registration is an object representation of the driver's registered state.
>
> And that representation should not ever need to be tracked by the
> driver, that's up to the driver core to track that.

The driver doesn't need it, the Registration abstraction does need it. Please
see my comments below.

>
> > Having an object representation for that is basically the Rust way to manage the
> > lifetime of this state.
>
> This all should be a static chunk of read-only memory that should never
> have a lifetime, why does it need to be managed at all?

What I meant here is that if a driver was registered, we need to make sure it's
going to be unregistered eventually, e.g. when the module is removed or when
something fails after registration and we need to unwind.

When the Registration structure goes out of scope, which would happen in both
the cases above, it will automatically unregister the driver, due to the
automatic call to `drop()`.

>
> > Once the Registration is dropped, the driver is
> > unregistered. For instance, a PCI driver Registration can be bound to a module,
> > such that the PCI driver is unregistered when the module is unloaded (the
> > Registration is dropped in the module_exit() callback).
>
> Ok, that's fine, but note that your module_exit() function will never be
> called if your module_init() callback fails, so why do you need to track
> this state? Again, C drivers never need to track this state, why is
> rust requiring more logic here for something that is NOT a dynamic chunk
> of memory (or should not be a dynamic chunk of memory, let's get it
> correct from the start and not require us to change things later on to
> make it more secure).

That's fine, if module_init() would fail, the Registration would be dropped as
well.

As for why doesn't C need this is a good example of what I wrote above. Because
it is useful for Rust, but not for C.

In Rust we get drop() automatically called when a structure is destroyed. This
means that if we let drivers put the Registration structure (e.g. representing
that a PCI driver was registered) into its `Module` representation structure
(already upstream) then this Registration is automatically destroyed once the
module representation is destroyed (which happens on module_exit()). This leads
to `drop()` of the `Registration` structure being called, which unregisteres the
(e.g. PCI) driver.

This way the driver does not need to take care of unregistering the PCI driver
explicitly. The driver can also place multiple registrations into the `Module`
structure. All of them would be unregistered automatically in module_exit().

>
> > > And be very careful of the use of the word 'class' here, remember, there
> > > is 'struct class' as part of the driver model :)
> >
> > I think in this context "class" might be meant as something like "struct with
> > methods", not sure what would be the best term to describe this.
>
> "struct with methods" is nice :)

Great, I will change that.

>
> Again, 'class' means something different here in the driver model, so be
> careful with terms, language matters, especially when many of our
> developers do not have English as their native language.
>
> > > > +/// The registration of a driver.
> > > > +pub struct Registration<T: DriverOps> {
> > > > + is_registered: bool,
> > >
> > > Why does a driver need to know if it is registered or not? Only the
> > > driver core cares about that, please do not expose that, it's racy and
> > > should not be relied on.
> >
> > We need it to ensure we do not try to register the same thing twice
>
> Your logic in your code is wrong if you attempt to register it twice,
> AND the driver core will return an error if you do so, so a driver
> should not need to care about this.

We want it to be safe, if the driver logic is wrong and registers it twice, we
don't want it to blow up.

The driver core takes care, but I think there are subsystems that do
initializations that could make things blow up when registering the driver
twice.

>
> > , some subsystems might just catch fire otherwise.
>
> Which ones?

Let's take the one we provide abstractons for, PCI.

In __pci_register_driver() we call spin_lock_init() and INIT_LIST_HEAD() before
driver_register() could bail out [1].

What if this driver is already registered and in use and we're randomly altering
the list pointers or call spin_lock_init() on a spin lock that's currently being
held?

[1] https://elixir.bootlin.com/linux/latest/source/drivers/pci/pci-driver.c#L1447

>
> > > > +impl<T: DriverOps> Drop for Registration<T> {
> > > > + fn drop(&mut self) {
> > > > + if self.is_registered {
> > > > + // SAFETY: This path only runs if a previous call to `T::register` completed
> > > > + // successfully.
> > > > + unsafe { T::unregister(self.concrete_reg.get()) };
> > >
> > > Can't the rust code ensure that this isn't run if register didn't
> > > succeed? Having a boolean feels really wrong here (can't that race?)
> >
> > No, we want to automatically unregister once this object is destroyed, but not
> > if it was never registered in the first place.
> >
> > This shouldn't be racy, we only ever (un)register things in places like probe()
> > / remove() or module_init() / module_exit().
>
> probe/remove never calls driver_register/unregister on itself, so that's
> not an issue. module_init/exit() does not race with itself and again,
> module_exit() is not called if module_init() fails.

That's clear, I should mention that we can use the Registration abstraction also
for things that are typically registered in probe(), a DRM device for instance.

As mentioned above I'm aware that module_exit() is not called when module_init()
fails, in this case the Registration structure is dropped in module_init(),
hence it's fine.

>
> Again, you are adding logic here that is not needed. Or if it really is
> needed, please explain why the C code does not need this today and let's
> work to fix that.

Please see above, where I start with "As for why doesn't C need this..".

>
> > > > + }
> > > > + }
> > > > +}
> > > > +
> > > > +/// Conversion from a device id to a raw device id.
> > > > +///
> > > > +/// This is meant to be implemented by buses/subsystems so that they can use [`IdTable`] to
> > > > +/// guarantee (at compile-time) zero-termination of device id tables provided by drivers.
> > > > +///
> > > > +/// Originally, RawDeviceId was implemented as a const trait. However, this unstable feature is
> > > > +/// broken/gone in 1.73. To work around this, turn IdArray::new() into a macro such that it can use
> > > > +/// concrete types (which can still have const associated functions) instead of a trait.
> > > > +///
> > > > +/// # Safety
> > > > +///
> > > > +/// Implementers must ensure that:
> > > > +/// - [`RawDeviceId::ZERO`] is actually a zeroed-out version of the raw device id.
> > > > +/// - [`RawDeviceId::to_rawid`] stores `offset` in the context/data field of the raw device id so
> > > > +/// that buses can recover the pointer to the data.
> > > > +pub unsafe trait RawDeviceId {
> > > > + /// The raw type that holds the device id.
> > > > + ///
> > > > + /// Id tables created from [`Self`] are going to hold this type in its zero-terminated array.
> > > > + type RawType: Copy;
> > > > +
> > > > + /// A zeroed-out representation of the raw device id.
> > > > + ///
> > > > + /// Id tables created from [`Self`] use [`Self::ZERO`] as the sentinel to indicate the end of
> > > > + /// the table.
> > > > + const ZERO: Self::RawType;
> > >
> > > All busses have their own way of creating "ids" and that is limited to
> > > the bus code itself, why is any of this in the rust side? What needs
> > > this? A bus will create the id for the devices it manages, and can use
> > > it as part of the name it gives the device (but not required), so all of
> > > this belongs to the bus, NOT to a driver, or a device.
> >
> > This is just a generalized interface which can be used by subsystems to
> > implement the subsystem / bus specific ID type.
>
> Please move this all to a different file as it has nothing to do with
> the driver core bindings with the include/device/driver.h api.

I don't think driver.rs in an unreasonable place for a generic device ID
representation that is required by every driver structure.

But I'm also not overly resistant to move it out. What do you think would be a
good name?

Please consider that this name will also be the namespace for this trait.
Currently you can reference it with `kernel::driver::RawDeviceId`. If you move
it to foo.rs, it'd be `kernel::foo::RawDeviceId`.

>
> Let's keep it simple and obvious please, and separate out things into
> logical chunks, hopefully in the same way that the C apis are separated
> out into. That way we can properly review, understand, and most
> importantly for all of us, maintain the code over the next 40+ years.
>
> thanks,
>
> greg k-h
>