Re: [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock

From: Juergen Gross
Date: Thu Apr 04 2019 - 01:05:32 EST


On 03/04/2019 18:01, Peter Zijlstra wrote:
> On Wed, Apr 03, 2019 at 11:39:09AM -0400, Alex Kogan wrote:
>
>>>> The patch that I am looking for is to have a separate
>>>> numa_queued_spinlock_slowpath() that coexists with
>>>> native_queued_spinlock_slowpath() and
>>>> paravirt_queued_spinlock_slowpath(). At boot time, we select the most
>>>> appropriate one for the system at hand.
>> Is this how this selection works today for paravirt?
>> I see a PARAVIRT_SPINLOCKS config option, but IIUC you are talking about a different mechanism here.
>> Can you, please, elaborate or give me a link to a page that explains that?
>
> Oh man, you ask us to explain how paravirt patching works... that's
> magic :-)
>
> Basically, the compiler will emit a bunch of indirect calls to the
> various pv_ops.*.* functions.
>
> Then, at alternative_instructions() <- apply_paravirt() it will rewrite
> all these indirect calls to direct calls to the function pointers that
> are in the pv_ops structure at that time (+- more magic).
>
> So we initialize the pv_ops.lock.* methods to the normal
> native_queued_spin*() stuff, if KVM/Xen/whatever setup detectors pv
> spnlock support changes the methods to the paravirt_queued_*() stuff.
>
> If you wnt more details, you'll just have to read
> arch/x86/include/asm/paravirt*.h and arch/x86/kernel/paravirt*.c, I
> don't think there's a coherent writeup of all that.
>
>>> Agreed; and until we have static_call, I think we can abuse the paravirt
>>> stuff for this.
>>>
>>> By the time we patch the paravirt stuff:
>>>
>>> check_bugs()
>>> alternative_instructions()
>>> apply_paravirt()
>>>
>>> we should already have enumerated the NODE topology and so nr_node_ids()
>>> should be set.
>>>
>>> So if we frob pv_ops.lock.queued_spin_lock_slowpath to
>>> numa_queued_spin_lock_slowpath before that, it should all get patched
>>> just right.
>>>
>>> That of course means the whole NUMA_AWARE_SPINLOCKS thing depends on
>>> PARAVIRT_SPINLOCK, which is a bit awkwardâ
>
>> Just to mention here, the patch so far does not address paravirt, but
>> our goal is to add this support once we address all the concerns for
>> the native version. So we will end up with four variants for the
>> queued_spinlock_slowpath() â one for each combination of
>> native/paravirt and NUMA/non-NUMA. Or perhaps we do not need a
>> NUMA/paravirt variant?
>
> I wouldn't bother with a pv version of the numa aware code at all. If
> you have overcommitted guests, topology is likely irrelevant anyway. If
> you have 1:1 pinned guests, they'll not use pv spinlocks anyway.
>
> So keep it to tertiary choice:
>
> - native
> - native/numa
> - paravirt

Just for the records: the paravirt variant could easily choose whether
it wants to include a numa version just by using the existing hooks.
With PARAVIRT_SPINLOCK configured I guess even the native case would
need to use the paravirt hooks for selection of native or native/numa.

Without PARAVIRT_SPINLOCK this would be just an alternative() then?

Maybe the resulting code would be much more readable if we'd just
make PARAVIRT_SPINLOCK usable without the other PARAVIRT hooks? So
splitting up PARAVIRT into PARAVIRT_GUEST (timer hooks et al) and
the patching infrastructure, with PARAVIRT_GUEST and PARAVIRT_SPINLOCK
selecting PARAVIRT, and PARAVIRT_XXL selecting PARAVIRT_GUEST.


Juergen