This week attended a webinar from Alex Blewitt about CPU microarchiteture to increase application performance. The link was sent by a work colleague but you can get pdf and see the presentation from the below source:


The presentation is obviously very interesting. You need to know your CPU to get the most of it.

How can I see my CPU architecture? “lstopo” (graphic) or “lstopo-no-graphics” is your friend. This is from my laptop.

# lstopo-no-graphics
Machine (7934MB total)
Package L#0
NUMANode L#0 (P#0 7934MB)
L3 L#0 (4096KB)
L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#2)
L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#3)
PCI 00:02.0 (VGA)
PCI 02:00.0 (Network)
Net "wlp2s0"
PCI 00:1f.2 (SATA)
Block(Disk) "sda"

As you can see, my humble laptop just have one NUMA node, with two cores/processors, and two hyperthreads per core.

But in a server, very likely you will have more NUMA nodes, more cores and more processors so you want to be sure that is used properly.

I am not expert in CPU performance at all but there many important points like memory allocation, huge pages, pinning memory/threads (isolcpu, taskset, etc), compiler strategies and tools to test the performance. Some of them ring the bell and it is nice to know that exist. You never know when you will have to dive in this type of water.