Re: [RFC][PATCH 13/16] sched: Add core wide task selection and scheduling.

From: Aaron Lu
Date: Fri Apr 05 2019 - 10:58:19 EST


On Tue, Apr 02, 2019 at 10:28:12AM +0200, Peter Zijlstra wrote:
> Another approach would be something like the below:
>
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -87,7 +87,7 @@ static inline int __task_prio(struct tas
> */
>
> /* real prio, less is less */
> -static inline bool __prio_less(struct task_struct *a, struct task_struct *b, bool runtime)
> +static inline bool __prio_less(struct task_struct *a, struct task_struct *b, u64 vruntime)
> {
> int pa = __task_prio(a), pb = __task_prio(b);
>
> @@ -104,21 +104,25 @@ static inline bool __prio_less(struct ta
> if (pa == -1) /* dl_prio() doesn't work because of stop_class above */
> return !dl_time_before(a->dl.deadline, b->dl.deadline);
>
> - if (pa == MAX_RT_PRIO + MAX_NICE && runtime) /* fair */
> - return !((s64)(a->se.vruntime - b->se.vruntime) < 0);
> + if (pa == MAX_RT_PRIO + MAX_NICE) /* fair */
> + return !((s64)(a->se.vruntime - vruntime) < 0);
~~~
I think <= should be used here, so that two tasks with the same vruntime
will return false. Or we could bounce two tasks having different tags
with one set to max in the first round and the other set to max in the
next round. CPU would stuck in __schedule() with irq disabled.

>
> return false;
> }
>
> static inline bool cpu_prio_less(struct task_struct *a, struct task_struct *b)
> {
> - return __prio_less(a, b, true);
> + return __prio_less(a, b, b->se.vruntime);
> }
>
> static inline bool core_prio_less(struct task_struct *a, struct task_struct *b)
> {
> - /* cannot compare vruntime across CPUs */
> - return __prio_less(a, b, false);
> + u64 vruntime = b->se.vruntime;
> +
> + vruntime -= task_rq(b)->cfs.min_vruntime;
> + vruntime += task_rq(a)->cfs.min_vruntime

After some testing, I figured task_cfs_rq() should be used instead of
task_rq(:-)

With the two changes(and some other minor ones that still need more time
to sort out), I'm now able to start doing 2 full CPU kbuilds in 2 tagged
cgroups. Previouslly, the system would hang pretty soon after I started
kbuild in any tagged cgroup(presumbly, CPUs stucked in __schedule() with
irqs disabled).

And there is no warning appeared due to two tasks having different tags
get scheduled on the same CPU.

Thanks,
Aaron

> +
> + return __prio_less(a, b, vruntime);
> }
>
> static inline bool __sched_core_less(struct task_struct *a, struct task_struct *b)