Re: [RFCv2] string: Use faster alternatives when constant arguments are used
From: Sultan Alsawaf
Date: Mon Apr 01 2019 - 19:55:43 EST
On Mon, Apr 01, 2019 at 10:43:13PM +0200, Rasmus Villemoes wrote:
> Consider your patch replacing !strcmp(buf, "123") by !memcmp(buf, "123",
> 4). buf is known to point to a nul-terminated string. But it may point
> at, say, the second-last byte in a page, with the last byte in that page
> being a nul byte, and the following page being MMIO or unmapped or all
> kinds of bad things. On e.g. x86 where unaligned accesses are cheap, and
> seeing that you're only comparing for equality, gcc is likely to compile
> the memcmp version into
>
> *(u32*)buf == 0x00333231
>
> because you've told the compiler that there's no problem accessing four
> bytes starting at buf. Boom. Even without unaligned access being cheap
> this can happen; suppose the length is 8 instead, and gcc somehow knows
> that buf is four-byte aligned (and in this case it happens to point four
> bytes before a page boundary), so it could compile the memcmp(,,8) into
>
> *(u32*)(buf+4) == secondword && *(u32*)buf == firstword
>
> (or do the comparisons in the "natural" order, but it might still do
> both loads first).
Ok, this is the first solid counter I've seen against my patch. I hadn't
considered unaligned word-sized accesses. Thanks.
> > And if there are concerns for some arches but not others, then couldn't this be
> > a feasible optimization for those which would work well with it?
>
> No. First, these are concerns for all arches. Second, if you can find
> some particular place where string parsing/matching is in any way
> performance relevant and not just done once during driver init or
> whatnot, maybe the maintainers of that file would take a patch
> hand-optimizing some strcmps to memcmps, or, depending on what the code
> does, perhaps replacing the whole *cmp logic with a custom hash table.
FWIW, I didn't have a specific place in the kernel in mind that heavily relied
on str* operations and needed optimization. I just thought it could be a "free"
optimization, but apparently not.
> But a patch implicitly and silently touching thousands of lines of code,
> without an analysis of why none of the above is a problem for any of
> those lines, for any .config, arch, compiler version? No.
Well, I thought it could be proven to be correct regardless of the context,
which would imply that making such a global change touching thousands lof lines
of code would be safe. But sadly it isn't.
This does bring into question a greater problem though: usage of memcmp
throughout the kernel where the size provided is not the size of the smaller
container being compared. This usage of memcmp (which is quite easy to find all
across the kernel) could run into the unaligned memory access across a page
boundary that you gave as an example, no?
Sultan