Re: [PATCH] sched/cputime: make scale_stime() more precise
From: Oleg Nesterov
Date: Tue Jul 23 2019 - 10:00:48 EST
On 07/22, Peter Zijlstra wrote:
>
> On Fri, Jul 19, 2019 at 04:37:42PM +0200, Oleg Nesterov wrote:
> > On 07/19, Peter Zijlstra wrote:
>
> > > But I'm still confused, since in the long run, it should still end up
> > > with a proportionally divided user/system, irrespective of some short
> > > term wobblies.
> >
> > Why?
> >
> > Yes, statistically the numbers are proportionally divided.
>
> This; due to the loss in precision the distribution is like a step
> function around the actual s:u ratio line, but on average it still is
> s:u.
You know, I am no longer sure... perhaps it is even worse, I need to recheck.
> Even if it were a perfect function, we'd still see increments in stime even
> if the current program state never does syscalls, simply because it
> needs to stay on that s:u line.
>
> > but you will (probably) never see the real stime == 1000 && utime == 10000
> > numbers if you watch incrementally.
>
> See, there are no 'real' stime and utime numbers. What we have are user
> and system samples -- tick based.
Yes, yes, I know.
> Sure, we take a shortcut, it wobbles a bit, but seriously, the samples
> are inaccurate anyway, so who bloody cares :-)
...
> People always complain, just tell em to go pound sand :-)
I tried ;) this was my initial reaction to this bug report.
However,
> You can construct a program that runs 99% in userspace but has all
> system samples.
Yes, but with the current implementation you do not need to construct
such a program, this is what you can easily get "in practice". And this
confuses people.
They can watch /proc/pid/stat incrementally and (when the numbers are big)
find that a program that runs 100% in userspace somehow spends 10 minutes
almost entirely in kernel. Or at least more in kernel than in userspace.
Even if task->stime doesn't grow at all.
Oleg.