[patch -rt 17/17] sched: Fix dynamic power-balancing crash

From: dino
Date: Thu Oct 22 2009 - 08:43:30 EST



This crash:

[ 1774.088275] divide error: 0000 [#1] SMP
[ 1774.100355] CPU 13
[ 1774.102498] Modules linked in:
[ 1774.105631] Pid: 30881, comm: hackbench Not tainted 2.6.31-rc8-tip-01308-g484d664-dirty #1629 X8DTN
[ 1774.114807] RIP: 0010:[<ffffffff81041c38>] [<ffffffff81041c38>]
sched_balance_self+0x19b/0x2d4

Triggers because update_group_power() modifies the sd tree and does
temporary calculations there - not considering that other CPUs
could observe intermediate values, such as the zero initial value.

Calculate it in a temporary variable instead. (we need no memory
barrier as these are all statistical values anyway)

Got the same oops with the backport to -rt
Signed-off-by: Dinakar Guniguntala <dino@xxxxxxxxxx>

Index: linux-2.6.31.4-rt14-lb1/kernel/sched.c
===================================================================
--- linux-2.6.31.4-rt14-lb1.orig/kernel/sched.c 2009-10-21 10:49:03.000000000 -0400
+++ linux-2.6.31.4-rt14-lb1/kernel/sched.c 2009-10-22 01:48:41.000000000 -0400
@@ -3864,19 +3864,22 @@
{
struct sched_domain *child = sd->child;
struct sched_group *group, *sdg = sd->groups;
+ unsigned long power;

if (!child) {
update_cpu_power(sd, cpu);
return;
}

- sdg->cpu_power = 0;
+ power = 0;

group = child->groups;
do {
- sdg->cpu_power += group->cpu_power;
+ power += group->cpu_power;
group = group->next;
} while (group != child->groups);
+
+ sdg->cpu_power = power;
}

/**

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/