Re: [PATCH v3] kasan: add memory corruption identification for software tag-based mode

From: Walter Wu
Date: Mon Jul 22 2019 - 05:52:52 EST


On Thu, 2019-07-18 at 19:11 +0300, Andrey Ryabinin wrote:
>
> On 7/15/19 6:06 AM, Walter Wu wrote:
> > On Fri, 2019-07-12 at 13:52 +0300, Andrey Ryabinin wrote:
> >>
> >> On 7/11/19 1:06 PM, Walter Wu wrote:
> >>> On Wed, 2019-07-10 at 21:24 +0300, Andrey Ryabinin wrote:
> >>>>
> >>>> On 7/9/19 5:53 AM, Walter Wu wrote:
> >>>>> On Mon, 2019-07-08 at 19:33 +0300, Andrey Ryabinin wrote:
> >>>>>>
> >>>>>> On 7/5/19 4:34 PM, Dmitry Vyukov wrote:
> >>>>>>> On Mon, Jul 1, 2019 at 11:56 AM Walter Wu <walter-zh.wu@xxxxxxxxxxxx> wrote:
> >>>>
> >>>>>>>
> >>>>>>> Sorry for delays. I am overwhelm by some urgent work. I afraid to
> >>>>>>> promise any dates because the next week I am on a conference, then
> >>>>>>> again a backlog and an intern starting...
> >>>>>>>
> >>>>>>> Andrey, do you still have concerns re this patch? This change allows
> >>>>>>> to print the free stack.
> >>>>>>
> >>>>>> I 'm not sure that quarantine is a best way to do that. Quarantine is made to delay freeing, but we don't that here.
> >>>>>> If we want to remember more free stacks wouldn't be easier simply to remember more stacks in object itself?
> >>>>>> Same for previously used tags for better use-after-free identification.
> >>>>>>
> >>>>>
> >>>>> Hi Andrey,
> >>>>>
> >>>>> We ever tried to use object itself to determine use-after-free
> >>>>> identification, but tag-based KASAN immediately released the pointer
> >>>>> after call kfree(), the original object will be used by another
> >>>>> pointer, if we use object itself to determine use-after-free issue, then
> >>>>> it has many false negative cases. so we create a lite quarantine(ring
> >>>>> buffers) to record recent free stacks in order to avoid those false
> >>>>> negative situations.
> >>>>
> >>>> I'm telling that *more* than one free stack and also tags per object can be stored.
> >>>> If object reused we would still have information about n-last usages of the object.
> >>>> It seems like much easier and more efficient solution than patch you proposing.
> >>>>
> >>> To make the object reused, we must ensure that no other pointers uses it
> >>> after kfree() release the pointer.
> >>> Scenario:
> >>> 1). The object reused information is valid when no another pointer uses
> >>> it.
> >>> 2). The object reused information is invalid when another pointer uses
> >>> it.
> >>> Do you mean that the object reused is scenario 1) ?
> >>> If yes, maybe we can change the calling quarantine_put() location. It
> >>> will be fully use that quarantine, but at scenario 2) it looks like to
> >>> need this patch.
> >>> If no, maybe i miss your meaning, would you tell me how to use invalid
> >>> object information? or?
> >>>
> >>
> >>
> >> KASAN keeps information about object with the object, right after payload in the kasan_alloc_meta struct.
> >> This information is always valid as long as slab page allocated. Currently it keeps only one last free stacktrace.
> >> It could be extended to record more free stacktraces and also record previously used tags which will allow you
> >> to identify use-after-free and extract right free stacktrace.
> >
> > Thanks for your explanation.
> >
> > For extend slub object, if one record is 9B (sizeof(u8)+ sizeof(struct
> > kasan_track)) and add five records into slub object, every slub object
> > may add 45B usage after the system runs longer.
> > Slub object number is easy more than 1,000,000(maybe it may be more
> > bigger), then the extending object memory usage should be 45MB, and
> > unfortunately it is no limit. The memory usage is more bigger than our
> > patch.
>
> No, it's not necessarily more.
> And there are other aspects to consider such as performance, how simple reliable the code is.
>
> >
> > We hope tag-based KASAN advantage is smaller memory usage. If itâs
> > possible, we should spend less memory in order to identify
> > use-after-free. Would you accept our patch after fine tune it?
>
> Sure, if you manage to fix issues and demonstrate that performance penalty of your
> patch is close to zero.


I remember that there are already the lists which you concern. Maybe we
can try to solve those problems one by one.

1. deadlock issue? cause by kmalloc() after kfree()?
2. decrease allocation fail, to modify GFP_NOWAIT flag to GFP_KERNEL?
3. check whether slim 48 bytes (sizeof (qlist_object) +
sizeof(kasan_alloc_meta)) and additional unique stacktrace in
stackdepot?
4. duplicate struct 'kasan_track' information in two different places

Would you have any other concern? or?