AMD open source engineers have come up with a new kernel feature called “PAN” (Process Adaptive autoNUMA), and early data shown by AMD suggests that PAN can help improve the performance of certain workloads on its latest server hardware to some extent.
PAN is an adaptive algorithm for calculating AutoNUMA scan periods, as further explained by Bharata B Rao of AMD in the Request for Comments (RFC) Linux Kernel Patch Series:
In this new approach (process adaptive autoNUMA or PAN), we collect NUMA fault statistics at the per-process level to better capture application behavior.
In addition, the algorithm learns and adjusts the scan rate based on the remote failure rate. By not adhering to static thresholds, the algorithm can better respond to different workload behaviors.
Since threads of a process are already considered as a group, we added a bunch of metrics to the task’s [memory management] to track various types of faults and derive scan rates from them.
The new per-process fault statistics only help to calculate the per-process scan period, while the existing per-thread statistics continue to contribute to the numa_group statistics and ultimately determine the threshold for migrating memory and threads across nodes.
PAN brings some performance benefits to Linux builds. Compared to the default Linux kernel build, the Linux kernel build using PAN benefited up to 14.93% in the Graph500 interconnected HPC benchmark, 8% faster in the NAS benchmark, about 0.37% better in PageRank, and a few other improvements of less than 1%.
So far, no other kernel developers have commented on the Process Adaptive autoNUMA proposal, but interested parties can check out the RFC for PAN for more information on this feature or to test it. Currently, PAN has less than 400 lines of new code to improve the behavior of Linux NUMA.