Re: [PATCH] jbd2: do not start commit when t_updates does not backtozero
From: Theodore Ts'o
Date: Mon Apr 01 2019 - 09:20:20 EST
On Mon, Apr 01, 2019 at 10:35:04AM +0800, liu.song11@xxxxxxxxxx wrote:
>
> Our device is CF card(TS8GCF300), mount options are very general(rw,dirsync,
> relatime,data=ordered).
> The hung problem appears under ext4, but the reason is related to the way
> of use. In our system, there are many RT tasks, which make normal priority
> tasks survived in harsh environments, such as syslogd. The syslog record is
> also under the same device, which is really a stumbling block.
> We moved the location of the syslog record to another device and the hungtask
> problem was solved.
So the general advice which is going to be true for all file systems
is (a) don't try to do any file I/O from real-time tasks, and (b) if
you must do file I/O from a real-time task, be prepared to be willing
to accept your real-time time tasks blocking behind device I/O, thus
destroying your real-time guarantees, and (c) make sure any kernel
threads used by the file system (e.g., such as the jbd2 kernel thread
for ext4) is also given real-time priority.
Was syslogd being run with real-time priority? If not, you're going
to not really have real-time performance unles you make sure syslog(3)
calls don't block waiting for syslogd to acknowledge the write. See
syslog-async as referenced here[1].
[1] https://stackoverflow.com/questions/208098/can-syslog-performance-be-improved
What I suspect was happening was you were using standard syslog(3)
which was blocking for syslogd to respond, syslog was by default
trying to fsync every single log entry before returning success (this
can changed by making the appropriate change to syslog.conf; that's a
different change suggested by [1] above), and so your real-time task
that was calling syslog was blocking. Since it was a real-time task,
and the jbd2 kernel thread was not a real time thread, this caused a
deadlock.
There are multiple things you can try to optimize (and with real-time
systems, getting configuration right is really, REALLY, critical), but
it sounds like the real root cause is you have a real-time task using
syslog(3). Don't do that. It will probably cause you problems in
multiple dimensions.
- Ted
P.S. Especially don't try using syslog in real-time tasks if said
real-time system is going to be used in commercial aviation. It might
cause scandals ala the 737 MAX. :-)