Re: [PATCH v2 0/5] pid: add pidfd_open()

From: Christian Brauner
Date: Mon Apr 01 2019 - 15:42:26 EST


On Mon, Apr 01, 2019 at 09:01:29AM -0700, Linus Torvalds wrote:
> On Mon, Apr 1, 2019 at 8:55 AM Daniel Colascione <dancol@xxxxxxxxxx> wrote:
> >
> >
> > > I wonder if we really want a fill procfs2, or maybe we could just make
> > > the pidfd readable (yes, it's a directory file descriptor, but we
> > > could allow reading).
> >
> > What would read(2) read?
>
> We could make it read anything, but it would have to be something
> people agree is sufficient (and not so expensive to create that rare
> users of that data would find the overhead excessive).
>
> Eg we could make it return the same thing that /proc/<pid>/status
> reads right now.
>
> But it sounds like you need pretty much all of /proc/<pid>/xyz:

>From what I gather from this thread we are still best of with using fds
to /proc/<pid> as pidfds. Linus, do you agree or have I misunderstood?
Yes, we can have an internal mount option to restrict access to various
parts of procfs from such pidfds or do the parent-less bind-mount trick
but I think this beats having a stunted dummy dirfd that we implement a
read method on.
One thing is that we also need something to disable access to the
"/proc/<pid>/net". One option could be to give the files in "net/" an
->open-handler which checks that our file->f_path.mnt is not one of our
special clone() mounts and if it is refuse the open.

To clarify the way forward:
Jann and I were discussing whether pidfd_open() still makes sense and
whether I shouldn't just jump straight to a first version of
CLONE_PIDFD.
Basically, if you have a system without CONFIG_PROC_FS it makes sense
that clone gives back an anon inode file descriptor as pidfds because
you can still signal threads in a race-free way. But it doesn't make a
lot of sense to have pidfd_open() in this scenario because you can't
really do anything with that pidfd apart from sending signals. And on a
system like that sending a signal is still racy. Since the process can
be recycled between learning the pid number and calling pidfd_open()
[1]. So it only makes sense to have _clone()_ give back anon_inode() fds
on a system without CONFIG_PROC_FS but it doesn't make sense for
pidfd_open() In other news, I think it makes more sense if I jump to the
implementation of CLONE_PIDFD instead of working on pidfd_open().

[1]: The only case - that seems rather far-fetched - where it makes
sense is when the parent wants to create that pidfd and hand it to
someone else.

Christian