Re: [PATCH] x86/paravirt: Guard against invalid cpu # in pv_vcpu_is_preempted()
From: Waiman Long
Date: Mon Apr 01 2019 - 10:01:54 EST
On 04/01/2019 02:38 AM, Juergen Gross wrote:
> On 25/03/2019 19:03, Waiman Long wrote:
>> On 03/25/2019 12:40 PM, Juergen Gross wrote:
>>> On 25/03/2019 16:57, Waiman Long wrote:
>>>> It was found that passing an invalid cpu number to pv_vcpu_is_preempted()
>>>> might panic the kernel in a VM guest. For example,
>>>>
>>>> [ 2.531077] Oops: 0000 [#1] SMP PTI
>>>> :
>>>> [ 2.532545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>>>> [ 2.533321] RIP: 0010:__raw_callee_save___kvm_vcpu_is_preempted+0x0/0x20
>>>>
>>>> To guard against this kind of kernel panic, check is added to
>>>> pv_vcpu_is_preempted() to make sure that no invalid cpu number will
>>>> be used.
>>>>
>>>> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
>>>> ---
>>>> arch/x86/include/asm/paravirt.h | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>>>> index c25c38a05c1c..4cfb465dcde4 100644
>>>> --- a/arch/x86/include/asm/paravirt.h
>>>> +++ b/arch/x86/include/asm/paravirt.h
>>>> @@ -671,6 +671,12 @@ static __always_inline void pv_kick(int cpu)
>>>>
>>>> static __always_inline bool pv_vcpu_is_preempted(long cpu)
>>>> {
>>>> + /*
>>>> + * Guard against invalid cpu number or the kernel might panic.
>>>> + */
>>>> + if (WARN_ON_ONCE((unsigned long)cpu >= nr_cpu_ids))
>>>> + return false;
>>>> +
>>>> return PVOP_CALLEE1(bool, lock.vcpu_is_preempted, cpu);
>>>> }
>>> Can this really happen without being a programming error?
>> This shouldn't happen without a programming error, I think. In my case,
>> it was caused by a race condition leading to use-after-free of the cpu
>> number. However, my point is that error like that shouldn't cause the
>> kernel to panic.
>>
>>> Basically you'd need to guard all percpu area accesses to foreign cpus
>>> this way. Why is this one special?
>> It depends. If out-of-bound access can only happen with obvious
>> programming error, I don't think we need to guard against them. In this
>> case, I am not totally sure if the race condition that I found may
>> happen with existing code or not. To be prudent, I decide to send this
>> patch out.
>>
>> The race condition that I am looking at is as follows:
>>
>> Â CPU 0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPU 1
>> Â -----ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ -----
>> up_write:
>> Â owner = NULL;
>> Â <release-barrier>
>> Â count = 0;
>>
>> <rcu-free task structure>
>> Â
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rwsem_can_spin_on_owner:
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_read_lock();
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ read owner;
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ :
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vcpu_is_preempted(owner->cpu);
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ :
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ rcu_read_unlock()
>>
>> When I tried to merge the owner into the count (clear the owner after
>> the barrier), I can reproduce the crash 100% when booting up the kernel
>> in a VM guest. However, I am not sure if the configuration above is safe
>> and is just very hard to reproduce.
>>
>> Alternatively, I can also do the cpu check before calling
>> vcpu_is_preempted().
> I think I'd prefer that.
>
>
> Juergen
>
It turns out that it may be caused by a software bug after all. You can
ignore this patch for now.
Thanks,
Longman