在 2023/7/31 23:16, Waiman Long 写道:
On 7/30/23 22:33, guoren@xxxxxxxxxx wrote:arch_spin_value_unlocked is called with lockref like this:
From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>I am fine with the current change. However, modern optimizing compiler should be able to avoid the redundant memory read anyway. So this patch may not have an impact from the performance point of view.
The arch_spin_value_unlocked would cause an unnecessary memory
access to the contended value. Although it won't cause a significant
performance gap in most architectures, the arch_spin_value_unlocked
argument contains enough information. Thus, remove unnecessary
atomic_read in arch_spin_value_unlocked().
The caller of arch_spin_value_unlocked() could benefit from this
change. Currently, the only caller is lockref.
Signed-off-by: Guo Ren <guoren@xxxxxxxxxx>
Cc: Waiman Long <longman@xxxxxxxxxx>
Cc: David Laight <David.Laight@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
---
Changelog
V2:
- Fixup commit log with Waiman advice.
- Add Waiman comment in the commit msg.
---
include/asm-generic/spinlock.h | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/asm-generic/spinlock.h b/include/asm-generic/spinlock.h
index fdfebcb050f4..90803a826ba0 100644
--- a/include/asm-generic/spinlock.h
+++ b/include/asm-generic/spinlock.h
@@ -68,11 +68,18 @@ static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
smp_store_release(ptr, (u16)val + 1);
}
+static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
+{
+ u32 val = lock.counter;
+
+ return ((val >> 16) == (val & 0xffff));
+}
+
static __always_inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- u32 val = atomic_read(lock);
+ arch_spinlock_t val = READ_ONCE(*lock);
- return ((val >> 16) != (val & 0xffff));
+ return !arch_spin_value_unlocked(val);
}
static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
@@ -82,11 +89,6 @@ static __always_inline int arch_spin_is_contended(arch_spinlock_t *lock)
return (s16)((val >> 16) - (val & 0xffff)) > 1;
}
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
- return !arch_spin_is_locked(&lock);
-}
-
#include <asm/qrwlock.h>
#endif /* __ASM_GENERIC_SPINLOCK_H */
#define CMPXCHG_LOOP(CODE, SUCCESS) do { \
int retry = 100; \
struct lockref old; \
BUILD_BUG_ON(sizeof(old) != 8); \
old.lock_count = READ_ONCE(lockref->lock_count); \
while (likely(arch_spin_value_unlocked(old.lock.rlock.raw_lock))) { \
With modern optimizing compiler, Is it possible that old value of
old.lock.rlock.raw_lock is cached in register, despite that try_cmpxchg64_relaxed
modifies the memory of old.lock_count with new value?