Daniel Lezcano [daniel.lezcano@xxxxxxx] wrote:I forgot to mention a constraint with the specified pid : P2 has to be child of P1.
Sukadev Bhattiprolu wrote:
Subject: [RFC][v8][PATCH 0/10] Implement clone3() system callSorry for jumping so late in the discussion and for having maybe my
To support application checkpoint/restart, a task must have the same pid it
had when it was checkpointed. When containers are nested, the tasks within
the containers exist in multiple pid namespaces and hence have multiple pids
to specify during restart.
This patchset implements a new system call, clone3() that lets a process
specify the pids of the child process.
Patches 1 through 7 are helper patches, needed for choosing a pid for the
child process.
PATCH 9 defines a prototype of the new system call. PATCH 10 adds some
documentation on the new system call, some/all of which will eventually
go into a man page.
remarks pointless...
If this syscall is only for checkpoint / restart, why this shouldn't be
used with a future generic sys_restart syscall ?
As I tried to explain in PATCH 0/9, the ability to choose a pid is only
for C/R but we are also trying to clone-flags so we won't need yet
another variant of clone() fairly soon.
Otherwise, shouldn't be more convenient to have something usable for
everyone, let's say:
cloneat(pid_t pid, pid_t desiredpid, ...);
Where 'desiredpid' is a hint of for the kernel for the pid to be
allocated (zero means the kernel will choose one for us) and the newly
allocated task is the son of 'pid'.
Hmm, so P1 would call cloneat() to create a child P3 _on behalf_ of process
P2 ? I did not know we had a requirement for that. Can you explain the
use-case more ? IOW, why can't P2 create the child P3 by itself ?
Note also that 'desiredpid' must be a list of pids (one for each pidWell, hiding multiple clone in one clone call is ... weird. AFAIR, there was a debate between kernel or userspace proctree creation but it looks like it's done from the kernel with this call.
namespaces that the child will belong to) and hence we need 'nr_pids'
to specify the list. Given that we are limited to 6 parameters to the
syscall, such parameters must be stuffed into 'struct clone_args'.
So we should do something like:
sys_clone3(u32 flags_low, pid_t pid, struct clone_args *carg,
pid_t *desired_pids)
or (to match the name and parameters, move 'pid' parameter into clone_args)
Yes and no, depending of where you put the cursor. If you consider the 'at' suffix means a process context, then I agree with you, there is a difference because the cloneat will be out of the current process context. But if you consider the 'at' suffix as a context in general, and openat means "relatively to a file descriptor" and cloneat means "relatively to a pid namespace" the 'at' suffix may apply. But I agree that we are so used to call the posix "fork", that cloneat sounds scary :)That looks more consistent with the "<syscall>at" family, 'openat',
'faccessat', 'readlinkat', etc ... and usable for something else than
the checkpoint / restart.
The subtle difference though is that openat() does not open a file on
behalf of another process and so the 'at' suffix would not apply ?