Re: [PATCH v1 0/3] Use BPF filters for a "perf top -u" workaround
From: Ian Rogers
Date: Thu May 16 2024 - 13:34:30 EST
On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
>
> On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > Allow uid and gid to be terms in BPF filters by first breaking the
> > connection between filter terms and PERF_SAMPLE_xx values. Calculate
> > the uid and gid using the bpf_get_current_uid_gid helper, rather than
> > from a value in the sample. Allow filters to be passed to perf top, this allows:
> >
> > $ perf top -e cycles:P --filter "uid == $(id -u)"
> >
> > to work as a "perf top -u" workaround, as "perf top -u" usually fails
> > due to processes/threads terminating between the /proc scan and the
> > perf_event_open.
>
> Fwiw, something I noticed playing around with this (my workload was
> `perf test -w noploop 100000` as different users) is that old samples
> appeared to linger around making terminated processes still appear in
> the top list. My guess is that there aren't other samples showing up
> and pushing the old sample events out of the ring buffers due to the
> filter. This can look quite odd and I don't know if we have a way to
> improve upon it, flush the ring buffers, histograms, etc. It appears
> to be a latent `perf top` issue that you could encounter on other low
> frequency events, but I thought I'd mention it anyway.
Some other thoughts:
- It is kind of annoying with the --filter option (either on top or
record) that there first needs to be an event to filter on. It'd be
nice if we could just filter the default event.
- Should "perf top --uid=1234" be removed or turned into an alias
for '--filter "uid == $(id -u)"' given the --uid option generally
doesn't work?
- What should happen to the perf top --pid and --tid options, should
they be filters? Should they fallback on /proc scanning if there
aren't sufficient BPF permissions? The plumbing for that is going to
be messy.
- There should probably be a way to filter on cgroups.
- Does the user care that there are 3 kinds of filter that will work
differently? Could we break them apart to make it more explicit, I may
want tracepoint events with a BPF filter. How can we ensure 1 syntax
for the 3 kinds of filter.
- Filtering on register values could be potentially interesting, for
example, sampling on memcpy-s where the length is over a threshold. We
have a register capture test:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81
Perhaps the filter could look something like 'perf record -g -e
mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' - this makes me
think we need to make a more convenient way to specify memory
addresses as symbols.
Thanks,
Ian
>
> > Ian Rogers (3):
> > perf bpf filter: Give terms their own enum
> > perf bpf filter: Add uid and gid terms
> > perf top: Allow filters on events
> >
> > tools/perf/Documentation/perf-record.txt | 2 +-
> > tools/perf/Documentation/perf-top.txt | 4 ++
> > tools/perf/builtin-top.c | 9 +++
> > tools/perf/util/bpf-filter.c | 55 ++++++++++++----
> > tools/perf/util/bpf-filter.h | 5 +-
> > tools/perf/util/bpf-filter.l | 66 +++++++++----------
> > tools/perf/util/bpf-filter.y | 7 +-
> > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++-
> > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
> > 9 files changed, 172 insertions(+), 70 deletions(-)
> >
> > --
> > 2.45.0.rc1.225.g2a3ae87e7f-goog
> >