Re: [PATCH v3 2/3] genirq: introduce update_irq_devid()

From: Thomas Gleixner
Date: Thu Aug 15 2019 - 10:58:34 EST


Ben,

On Thu, 15 Aug 2019, Ben Luo wrote:

> Sometimes, only the dev_id field of irqaction need to be changed.
> E.g. KVM VM with device passthru via VFIO may switch irq injection
> path between KVM irqfd and userspace eventfd. These two paths
> share the same irq num and handler for the same vector of a device,

s/irq num/interrupt number/

Changelogs are text and should not contain cryptic abbreviations.

> only with different dev_id referencing to different fds' contexts.
>
> So, instead of free/request irq, only update dev_id of irqaction.

Please write functions like this: function_name() so they can be easily
identified in the text as such.

> This narrows the gap for setting up new irq (and irte, if has)

What does that mean: "narrows the gap"

What's the gap and why is it only made smaller and not closed?

> and also gains some performance benefit.
>
> Signed-off-by: Ben Luo <luoben@xxxxxxxxxxxxxxxxx>
> Reviewed-by: Liu Jiang <gerry@xxxxxxxxxxxxxxxxx>

I prefer to see the 'reviewed-by' as a reply on LKML rather than coming
from some internal process.

> Reviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

While I reviewed the previous version, I surely did not give a
'Reviewed-by' tag. That tag means that the person did review the patch and
did not find an issue. I surely found issues, right?

> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 6f9b20f..a76ef61 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -2064,6 +2064,84 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
> EXPORT_SYMBOL(request_threaded_irq);
>
> /**
> + * update_irq_devid - update irq dev_id to a new one

Can you please name this: irq_update_devid(). We try to have a consistent
name space for new functions.

> + * @irq: Interrupt line to update
> + * @dev_id: A cookie to find the irqaction to update
> + * @new_dev_id: New cookie passed to the handler function

Can you please arrange these in tabular fashion:

* @irq: Interrupt line to update
* @dev_id: A cookie to find the irqaction to update
* @new_dev_id: New cookie passed to the handler function

> + * Sometimes, only the cookie data need to be changed.
> + * Instead of free/request irq, only update dev_id here
> + * Not only to gain some performance benefit, but also
> + * reduce the risk of losing interrupt.
> + *
> + * E.g. irq affinity setting in a VM with VFIO passthru

Again. Please write it out 'interrupt' and everything else.

> + * device is carried out in a free-then-request-irq way
> + * since lack of this very function. The free_irq()

That does not make sense. The function is there for such a use case. So
immediately when the VFIO change is merged this comment is stale and bogus.

> + * eventually triggers deactivation of IR domain, which
> + * will cleanup IRTE. There is a gap before request_irq()
> + * finally setup the IRTE again. In this gap, a in-flight
> + * irq buffering in hardware layer may trigger DMAR fault
> + * and be lost.

Exactly this information wants to be in the changelog.

> + *
> + * On failure, it returns a negative value. On success,
> + * it returns 0
> + */
> +int update_irq_devid(unsigned int irq, void *dev_id, void *new_dev_id)
> +{
> + struct irq_desc *desc = irq_to_desc(irq);
> + struct irqaction *action, **action_ptr;
> + unsigned long flags;
> +
> + if (in_interrupt()) {
> + WARN(1, "Trying to update IRQ %d (dev_id %p to %p) from IRQ context!\n",
> + irq, dev_id, new_dev_id);
> + return -EPERM;
> + }

if (WARN(....)

> +
> + if (!desc)
> + return -EINVAL;
> +
> + chip_bus_lock(desc);

This does not need to take bus lock. The action pointer is sufficiently
protected by desc->lock.

> + raw_spin_lock_irqsave(&desc->lock, flags);
> +
> + /*
> + * There can be multiple actions per IRQ descriptor, find the right
> + * one based on the dev_id:
> + */
> + action_ptr = &desc->action;
> + for (;;) {
> + action = *action_ptr;
> +
> + if (!action) {
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
> + chip_bus_sync_unlock(desc);
> + WARN(1, "Trying to update already-free IRQ %d (dev_id %p to %p)\n",
> + irq, dev_id, new_dev_id);
> + return -ENXIO;
> + }
> +
> + if (action->dev_id == dev_id) {
> + action->dev_id = new_dev_id;
> + break;
> + }
> + action_ptr = &action->next;
> + }
> +
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
> + chip_bus_sync_unlock(desc);
> +
> + /*
> + * Make sure it's not being used on another CPU:
> + * There is a risk of UAF for old *dev_id, if it is
> + * freed in a short time after this func returns

function please. Also it does not matter whether the time is short or
not. The point is:

Ensure that an interrupt in flight on another CPU which uses the
old 'dev_id' has completed because the caller can free the memory
to which it points after this function returns.

But this has another twist:

CPU0 CPU1

interrupt
primary_handler(old_dev_id)
do_stuff_on(old_dev_id)
return WAKE_THREAD; update_dev_id()
wakeup_thread();
action->dev_id = new_dev_id;
irq_thread()
secondary_handler(new_dev_id)

That's broken and synchronize_irq() does not protect against it.

> + */
> + synchronize_irq(irq);

Thanks,

tglx