On 2016-10-19 11:16:16 [-0700], Davidlohr Bueso wrote:
On Mon, 17 Oct 2016, Sebastian Andrzej Siewior wrote:
> By default the application uses malloc() and all available CPUs. This
> patch introduces NUMA support which means:
> - memory is allocated node local via numa_alloc_local()
> - all CPUs of the specified NUMA node are used. This is also true if the
> number of threads set is greater than the number of CPUs available on
> this node.
Can't we just use numactl to bind cpus and memory to be node-local?
something like
numactl --cpunodebind=$NODE --membind=$NODE perf ???
?
This should work for memory however since we use
pthread_attr_setaffinity_np(&thread_attr, sizeof(cpu_set_t), &cpu);
we would need to query the affinity mask, and deploy threads based on
that mask.
Using NUMA support within this bench-tool has also the side effect that
the output gives all the option used.