Re: [patch,rfc] cfq: merge cooperating cfq_queues
From: Corrado Zoccolo
Date: Thu Oct 22 2009 - 04:45:40 EST
Hi
On Thu, Oct 22, 2009 at 2:09 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Corrado Zoccolo <czoccolo@xxxxxxxxx> writes:
>
> Hi, Corrado! ÂThanks for looking at the patch.
>
>> Hi Jeff,
> [...]
>> I'm not sure that 3 broken userspace programs justify increasing the
>> complexity of a core kernel part as the I/O scheduler.
>
> I think it's wrong to call the userspace programs broken. ÂThey worked
> fine when CFQ was quantum based, and they work well with noop and
> deadline.
So they didn't work well with anticipatory, that was the default from
2.6.0 to 2.6.17,
and with CFQ with time slices, that was the default from 2.6.18 up to now.
I think enough time has passed to start fixing those programs.
> Further, the patch I posted is fairly trivial, in my opinion.
Yes. We should see if also the un-merging part is so simple, then.
>> The original close cooperator code is not limited to those programs.
>> It can actually result in a better overall scheduling on rotating
>> media, since it can help with transient close relationships (and
>> should probably be disabled on non-rotating ones).
>> Merging queues, instead, can lead to bad results in case of false
>> positives. I'm thinking for examples to two programs that are loading
>> shared libraries (that are close on disk, being in the same dir) on
>> startup, and end up being tied to the same queue.
>
> The idea is not to leave cfqq's merged indefinitely. ÂI'm putting
> together a follow-on patch that will split the queues back up when they
> are no longer working on the same area of the disk.
>
Yes, this would help to mitigate the impact on false positives.
>> Can't the userspace programs be fixed to use the same I/O context for
>> their threads?
>> qemu already has a bug report for it
>> (https://bugzilla.redhat.com/show_bug.cgi?id=498242).
>
> I submitted a patch to dump to address this. ÂI think the SCSI target
> mode driver folks also patched their code. ÂThe qemu folks are working
> on a couple of different fixes to the problem. ÂThat leaves nfsd, which
> I could certainly try to whip into shape, but I wonder if there are
> others.
>
Good.
>
>> For the I/O pattern, instead, sorting all requests in a single queue
>> may still be preferable, since they will be at least sorted in disk
>> order, instead of the random order given by which thread in the pool
>> received the request.
>> This is, though, an argument in favor of using CLONE_IO inside nfsd,
>> since having a single queue, with proper priority, will always give a
>> better overall performance.
>
> Well, I started to work on a patch to nfsd that would share and unshare
> I/O contexts based on the client with which the request was associated.
> So, much like there is the shared readahead state, there would now be a
> shared I/O scheduler state. ÂHowever, believe it or not, it is much
> simpler to do in the I/O scheduler. ÂBut maybe that's because cfq is my
> hammer. Â;-)
I think fixing nfsd at least for TCP should be easy. In TCP case, each
client has a private thread pool, so you can just share the I/O
context once, when creating those threads, and forget it.
For the UDP case, would just reducing idle window fix the problem? Or
the problem is not really the idling, but the bad I/O pattern?
>
> Thanks again for your review Corrado. ÂIt is much appreciated.
Thanks.
Corrado
> Cheers,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/