Re: [PATCH v4] time/tick-sched: idle load balancing when nohz_full cpu becomes idle.
From: Peter Zijlstra
Date: Thu May 16 2024 - 03:57:05 EST
On Thu, May 16, 2024 at 12:52:06AM +0200, Frederic Weisbecker wrote:
> Le Thu, May 09, 2024 at 10:29:32AM +0100, Levi Yun a écrit :
> > When nohz_full CPU stops tick in tick_nohz_irq_exit(),
> > It wouldn't be chosen to perform idle load balancing because it doesn't
> > call nohz_balance_enter_idle() in tick_nohz_idle_stop_tick() when it
> > becomes idle.
> >
> > Formerly, __tick_nohz_idle_enter() is called in both
> > tick_nohz_irq_exit() and in do_idle().
> > That's why commit a0db971e4eb6 ("nohz: Move idle balancer registration
> > to the idle path") prevents nohz_full cpu which isn't yet
> > idle state but tick is stopped from entering idle balance.
> >
> > However, this prevents nohz_full cpu which already stops tick from
> > entering idle balacne when this cpu really becomes idle state.
> >
> > Currently, tick_nohz_idle_stop_tick() is only called in idle state and
> > it calls nohz_balance_enter_idle(). this function tracks the CPU
> > which is part of nohz.idle_cpus_mask with rq->nohz_tick_stopped properly.
> >
> > Therefore, Change tick_nohz_idle_stop_tick() to call nohz_balance_enter_idle()
> > without checking !was_stopped so that nohz_full cpu can be chosen to
> > perform idle load balancing when it enters idle state.
> >
> > Fixes: a0db971e4eb6 ("nohz: Move idle balancer registration to the idle path")
> > Signed-off-by: Levi Yun <ppbuk5246@xxxxxxxxx>
> > ---
> > v4:
> > - Add fixes tags.
> >
> > v3:
> > - Rewording commit message.
> >
> > v2:
> > - Fix typos in commit message.
> >
> > kernel/time/tick-sched.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 71a792cd8936..31a4cd89782f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -1228,8 +1228,10 @@ void tick_nohz_idle_stop_tick(void)
> > ts->idle_sleeps++;
> > ts->idle_expires = expires;
> >
> > - if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > - ts->idle_jiffies = ts->last_jiffies;
> > + if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
> > + if (!was_stopped)
> > + ts->idle_jiffies = ts->last_jiffies;
> > +
>
> I've taken some time to respond because your patch has raised more questions
> while discussing this with Anna-Maria:
>
> 1) Is Idle load balancing actually relevant for nohz_full? HK_TYPE_MISC already
> prevent those CPUs from becoming idle load balancer. They can still be
> targets for load balancing but nohz_full CPUs are supposed to run only one
> task.
>
> 2) This is related to previous point: HK_TYPE_SCHED is never activated. It would
> prevent the CPU from even beeing part of idle load balancing. Should we
> remove it or plug it?
>
>
> 3) nohz_balance_enter_idle() is called when the tick is stopped for the first
> time and nohz_balance_exit_idle() is called from the tick. But that also
> applies to idle ticks. So if the load balancing triggers while the tick is
> stopped, nohz_balance_enter_idle() won't be re-called in the idle loop even
> though the tick is stopped (that would be fixed with your patch).
>
> 4) Why is nohz_balance_exit_idle() called from the tick and not from the idle
> exit path? Is it to avoid overhead?
>
> I'm adding some scheduler people in Cc who might help answer some of those
> questions.
None of that HK nonsense is relevant. The NOHZ_FULL nonsense implies
single CPU partitions, and *that* should be avoiding any and all
load-balancing.
If there still is, that's a bug, but that's not related to HK goo.
As such, I don't think the HK_TYPE_SCHED check in
nohz_balance_enter_idle() actually makes sense, the on_null_omain()
check a little below that should already take care of things, no?