Re: [PATCH v4] time/tick-sched: idle load balancing when nohz_full cpu becomes idle.
From: Peter Zijlstra
Date: Thu May 16 2024 - 10:45:19 EST
On Thu, May 16, 2024 at 04:23:31PM +0200, Frederic Weisbecker wrote:
> On Thu, May 16, 2024 at 04:00:03PM +0200, Peter Zijlstra wrote:
> > > If I make you annoyed I'm sorry in advance but let me clarify please.
> > >
> > > 1. In case of none-HK-TICK-housekeeping cpu (a.k.a nohz_full cpu),
> > > It should be on the null_domain. right?
> > >
> > > 2. If (1) is true, when none-HK-TICK is set, should it set none-HK-DOMAIN
> > > to prevent on any sched_domain (cpusets filter out none-HK-DOMAIN cpu)?
> > >
> > > 3. If (1) is true, Is HK_SCHED still necessary? There seems to be no use case
> > > and the check for this can be replaced by on_null_domain().
> >
> > I've no idea about all those HK knobs, it's all insane if you ask me.
> >
> > Frederic, afaict all the HK_ goo in kernel/sched/fair.c is total
> > nonsense, can you please explain?
>
> Yes. Lemme unearth this patch:
> https://lore.kernel.org/all/20230203232409.163847-2-frederic@xxxxxxxxxx/
AFAICT we need more cleanups.
> Because all we need now is:
>
> _ HK_TYPE_KERNEL_NOISE: nohz_full= or isolcpus=nohz
> _ HK_TYPE_DOMAIN: isolcpus=domain (or classic isolcpus= alone)
What does this do?
> _ HK_TYPE_MANAGED_IRQ: isolcpus=managed_irq
>
> And that's it. Then let's remove HK_TYPE_SCHED that is unused. And then
> lemme comment the HK_TYPE_* uses within sched/* within the same
> patchset.
Please, I find this MISC and DOMAIN stuff confusing, wth does it do? It
can't possibly be right.
> Just a question, correct me if I'm wrong, we don't want nohz_full= to ever
> take the idle load balancer duty (this is what HK_TYPE_MISC prevents in
> find_new_ilb) because the nohz_full CPU going back to userspace concurrently
> doesn't want to be disturbed by a loose IPI telling it to do idle balancing. But
> we still want nohz_full CPUs to be part of nohz.idle_cpus_mask so that the
> idle balancing can be performed on them by a non isolated CPU. Right?
I'm confused, none of that makes sense. If you're part of a
load-balancer, you're part of a load-balancer, no ifs buts or other
nonsense.
idle load balancer is no different from regular load balancing.
Fundamentally, you can't disable the tick if you're part of a
load-balance group, the load-balancer needs the tick.
The only possible way to use nohz_full is to not be part of a
load-balancer, and the only way that is so is by having (lots of) single
CPU partitions.