Re: Bisected GFP in bfq_bfqq_expire on v5.1-rc1
From: Dmitrii Tcvetkov
Date: Thu Apr 04 2019 - 15:23:07 EST
On Mon, 1 Apr 2019 12:35:11 +0200
Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
>
>
> > Il giorno 1 apr 2019, alle ore 11:22, Dmitrii Tcvetkov
> > <demfloro@xxxxxxxxxxx> ha scritto:
> >
> > On Mon, 1 Apr 2019 11:01:27 +0200
> > Paolo Valente <paolo.valente@xxxxxxxxxx> wrote:
> >> Ok, thank you. Could you please do a
> >>
> >> list *(bfq_bfqq_expire+0x1f3)
> >>
> >> for me?
> >>
> >> Thanks,
> >> Paolo
> >>
> >>>
> >>> <gpf.txt><gpf-w-bfq-group-iosched.txt><config.txt>
> >
> > Reading symbols from vmlinux...done.
> > (gdb) list *(bfq_bfqq_expire+0x1f3)
> > 0xffffffff813d02c3 is in bfq_bfqq_expire (block/bfq-iosched.c:3390).
> > 3385 * even in case bfqq and thus parent entities go on
> > receiving 3386 * service with the same budget.
> > 3387 */
> > 3388 entity = entity->parent;
> > 3389 for_each_entity(entity)
> > 3390 entity->service = 0;
> > 3391 }
> > 3392
> > 3393 /*
> > 3394 * Budget timeout is not implemented through a dedicated
> > timer, but
>
> Thank you very much. Unfortunately this doesn't ring any bell. I'm
> trying to reproduce the failure. It will probably take a little
> time. If I don't make it, I'll ask you to kindly retry after applying
> some instrumentation patch.
>
I looked at what git is doing just before panic and it's doing a lot of
lstat() syscalls on working tree.
I've attached a python script which reproduces the crash in about
10 seconds after it prepares testdir, git checkout origin/linux-5.0.y
reproduces it in about 2 seconds. I have to use multiprocessing Pool as
I couldn't reproduce the crash using ThreadPool, probably due to Python
GIL.
#!/usr/bin/env python3
from glob import glob
from os import lstat,mkdir
from random import randint
from os.path import isdir,exists
from pathlib import Path
from time import sleep
from subprocess import run
from multiprocessing import Pool
def drop_caches():
with open('/proc/sys/vm/drop_caches','w') as f:
f.write('3')
def enable_bfq():
with open('/sys/block/sda/queue/scheduler','w') as f:
f.write('bfq')
def sync():
run(('sync'))
def prepare_tree(name):
def populate(dir, depth=6):
if not depth:
return
for fname in range(1,20):
if randint(0,100) > 80:
dirname = "{}{}/".format(dir,fname)
mkdir(dirname)
populate(dirname, depth - 1)
continue
fname = "{}{}".format(dir, fname)
Path(fname).touch(exist_ok=True)
if not isdir(name):
mkdir(name)
if not name.endswith('/'):
name = '{}/'.format(name)
populate(name)
def traverse(dir):
drop_caches()
for inode in glob("{}/*".format(dir)):
if isdir(inode):
traverse(inode)
else:
lstat(inode)
if randint(0,10) > 6:
sleep(0)
def main():
nproc = 16
dirname = 'testdir'
if not exists(dirname):
prepare_tree(dirname)
sync()
drop_caches()
enable_bfq()
drop_caches()
with Pool(nproc) as pool:
dirs = (dirname,) * nproc
pool.map(traverse,dirs)
main()