Karan Singh

Code Never Lies, Comments Sometime Do !!

Working With NUMA/CPU Pinning

| Comments

NUMA

The term CPU pinning / process affinity / NUMA generally boils down to the same idea that In a multi socket system, application best performance can be achieved by allowing application threads to get execute on the CPU core which is as close as to its memory bank. In most of the cases Linux process scheduler is intelligent enough to do this , however if you do this manually by yourself , it’s most likely that you will enjoy luxury of increased application performance. Here are some of my notes describing steps required for process affinity setup

Verify how application (radosgw in my case) threads being executed currently. The 5th column psr which represents processor core.

1
for i in $(pgrep radosgw); do ps -mo pid,tid,fname,user,psr -p $i;done

NUMA requires the following

  • Multi socket system
  • NUMA feature should be enabled from BIOS or UEFI ( most modern motherboards have this feature enabled by default )

First verify if NUMA configuration is available at OS level by looking for numa in /var/log/dmesg or /var/log/messages

1
2
3
4
5
6
7
[[email protected] ~]# grep -i numa /var/log/dmesg
[    0.000000] NUMA: Initialized distance table, cnt=2
[    0.000000] NUMA: Node 0 [mem 0x00000000-0x7fffffff] + [mem 0x100000000-0x107fffffff] -> [mem 0x00000000-0x107fffffff]
[    0.000000] Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    1.023141] pci_bus 0000:00: on NUMA node 0
[    1.026985] pci_bus 0000:80: on NUMA node 1
[[email protected] ~]#

To make use of NUMA node pinning, identify your NUMA zones , using numactl CLI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[[email protected] ~]# yum install -y numactl
[[email protected] ~]# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29
node 0 size: 65430 MB
node 0 free: 3096 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39
node 1 size: 65536 MB
node 1 free: 7359 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10
[[email protected] ~]#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[[email protected] ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    2
Core(s) per socket:    10
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               2824.656
BogoMIPS:              5193.34
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              25600K
NUMA node0 CPU(s):     0-9,20-29
NUMA node1 CPU(s):     10-19,30-39
[[email protected] ~]#

These 2 commands should give you enough information about your server’s CPU and NUMA configuration.

In my environment, I have a 2 socket server and each socket has 10 cores , so 2 x 10 = 20 cores. On top of it, system has Hyper Threading support which makes 20 cores to 40 cores. These 40 cores are divided into two numa zones - node0 with CPUs 0-9 and 20-29 - node1 with CPUs 10-19 and 30-39 Verify on what cores your service is entitled to run

1
2
3
[[email protected] ~]# for i in $(pgrep radosgw) ; do taskset -pc $i ; done
pid 3793213's current affinity list: 0-39
[[email protected] ~]#

Based on the above output, my application (radosgw) can execute on any core between 0-39. I want to do CPU Affinity with the following configuration - radosgw process from CPU’s 0-39 to CPU’s 10-19,30-39 ( allocating whole socket-2 ) Now its the time to tune your application and instruct it to use a specific CPU or NUMA node. For this we need to modify service configuration of your application and add CPU pinning parameters there.

Edit your applications ( in my case radosgw ) systemctl service configuration files /usr/lib/systemd/system/ceph-radosgw\@.service and add the following section to [Service] section of unit file

1
CPUAffinity= 10 11 12 13 14 15 16 17 18 19 30 31 32 33 34 35 36 37 38 39

Reload changes

1
2
3
4
5
6
7
8
9
10
11
12
$ sudo systemctl daemon-reload
$ sudo systemctl restart [email protected]$(hostname -s).service
$ sudo systemctl status [email protected]$(hostname -s).service -l
[email protected] - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2016-09-02 12:20:02 EEST; 9s ago
 Main PID: 229662 (radosgw)
   CGroup: /system.slice/system-ceph\x2dradosgw.slice/[email protected]
           └─229662 /usr/bin/radosgw -f --cluster ceph --name client.rgw.ceph-osd1 --setuser ceph --setgroup ceph

Sep 02 12:20:02 ceph-osd1 systemd[1]: Started Ceph rados gateway.
Sep 02 12:20:02 ceph-osd1 systemd[1]: Starting Ceph rados gateway...

Verify your daemon is getting pinned to the correct CPU cores

1
2
$ sudo for i in $(pgrep radosgw) ; do taskset -pc $i ; done
pid 229662's current affinity list: 10-19,30-39

Another way of doing this or may more easier way is to use numactl command which can bind application to entire NUMA node which includes both CPU and Memory bindings

1
$ sudo numactl --cpunodebind=1 --membind=1 radosgw

Image source: stackoverflow.com , Thanks

Hope this is useful for you :)

Comments