Re: [RFC] Mitigating unexpected arithmetic overflow

From: Theodore Ts'o
Date: Fri May 17 2024 - 22:52:21 EST


On Fri, May 17, 2024 at 02:15:01PM -0700, Kees Cook wrote:
> On Thu, May 16, 2024 at 02:51:34PM -0600, Theodore Ts'o wrote:
> > On Thu, May 16, 2024 at 12:48:47PM -0700, Justin Stitt wrote:
> > >
> > > It is incredibly important that the exact opposite approach is taken;
> > > we need to be annotating (or adding type qualifiers to) the _expected_
> > > overflow cases. The omniscience required to go and properly annotate
> > > all the spots that will cause problems would suggest that we should
> > > just fix the bug outright. If only it was that easy.
> >
> > It certainly isn't easy, yes. But the problem is when you dump a huge
> > amount of work and pain onto kernel developers, when they haven't
> > signed up for it, when they don't necessarily have the time to do all
> > of the work themselves, and when their corporate overlords won't given
> > them the headcount to handle unfunded mandates which folks who are
> > looking for a bright new wonderful future --- don't be surprised if
> > kernel developers push back hard.
>
> I never claimed this to be some kind of "everyone has to stop and make
> these changes" event. I even talked about how it would be gradual and
> not break existing code (all "WARN and continue anyway"). This is what
> we've been doing with the array bounds work. Lots of people are helping
> with that, but a lot of those patches have been from Gustavo and me; we
> tried to keep the burden away from other developers as much as we could.

Kees, sorry, I wasn't intending to be criticism of your work; I know
you've been careful. What I was reacting was Justin's comment that
it's important to annotate every single expected overflow case. This
sounded like the very common attitude of "type II errors must be
avoided at all costs", even in that means a huge number of "type I
errors". And false positive errors invariably end up requiring work
on the part of over-worked maintainers.

>From my perspective, it's OK if we don't find all possible security
bugs if we can keep the false negative rate down to near-zero.
Because if it's too high, I will just start ignoring all of the
"security reports", just out of self-defense. This is now the case
with clusterfuzz reports example. The attemp[t to shame/pressure me
with "fix this RIGHT NOW or we will make the report public in 30 days"
no longer works on me, because the vast majority of the reports
against e2fsprogs are bullshit. And there are many syzkaller reports
which are also bullshit, and when they are reported against our data
center kernels, they get down-prioritized to P3 and ignored, because
they can't happen in real life. But they still get reported to
upstream (me).

This is why I am worried when there are proposals to add new
sanitizers; even if they are used responsibly by you, I can easily see
them resulting in more clusterfuzz reports (for example) that I will
have to ignore as bullshit. So please considerr this a plea to
**really** seriously reduce the false positive rate of sanitizers
whenever possible.

Cheers,

- Ted