Test-Label: intel-Testing Test-Status: SUCCESS _Testing PASS DPDK git repo: dpdk commit f62f4a375ff496abf66e48d5e1b1c442b86a82c1 Author: Fengnan Chang Date: Fri Feb 10 14:30:22 2023 +0800 malloc: optimize 4K allocations Here is a simple test case: " uint64_t entry_time, time; size_t size = 4096; unsigned align = 4096; for (int j = 0; j < 10; j++) { entry_time = rte_get_timer_cycles(); for (int i = 0; i < 2000; i++) { rte_malloc(NULL, size, align); } time = (rte_get_timer_cycles()-entry_time) * 1000000 / rte_get_timer_hz(); printf("total open time %lu avg time %lu\n", time, time/2000); } " Single rte_malloc cost time may becomes worse as the number of malloc increases, In my env, first round avg time is 15us, second is 44us, third is 77us, fourth is 168us... The reason is, in the malloc process, malloc_elem_alloc may split new_elem if there have too much free space after new_elem, and insert the trailer into freelist. When alloc 4k with align 4k, the trailer very likely insert to free_head[2] again, it makes free_head[2] longer. when alloc 4k again, it will search free_head[2] from begin, with the number of malloc increases, search free_head[2] need more time, so the performance will become worse. Same problem will also occurs in alloc 64k with align 64k, but if alloc 4k with align 64, doesn't have this problem. Fix this by adjust free_head list size range, make free_head[3] hold elements which bigger or equal 4k, free_head[4] hold elements which bigger or equal 16k. In terms of probabilities, when alloc 4k or 16k, the probability of finding a suitable elem from a larger size list is greater than from a smaller size list. Signed-off-by: Fengnan Chang Acked-by: Morten Brørup Testing Summary : 18 Case Done, 18 Successful, 0 Failures Testbed #1: 9 Case Done, 9 Successful, 0 Failures * Test result details: +-------------+---------------------------+-------+ | suite | case | status| +-------------+---------------------------+-------+ | asan_smoke | test_rxtx_with_ASan_enable| passed| | pf_smoke | test_pf_jumbo_frames | passed| | pf_smoke | test_pf_rss | passed| | pf_smoke | test_pf_tx_rx_queue | passed| | vf_smoke | test_vf_jumbo_frames | passed| | vf_smoke | test_vf_rss | passed| | vf_smoke | test_vf_tx_rx_queue | passed| | virtio_smoke| test_virtio_loopback | passed| | virtio_smoke| test_virtio_pvp | passed| +-------------+---------------------------+-------+ * Environment: OS : Ubuntu 20.04.5 LTS Kernel : 5.8.0-63-generic GCC : 9.4.0-1ubuntu1~20.04.1 NIC : Ethernet Controller E810-C for SFP Target : x86_64-native-linuxapp-gcc Testbed #2: 9 Case Done, 9 Successful, 0 Failures * Test result details: +-------------+---------------------------+-------+ | suite | case | status| +-------------+---------------------------+-------+ | asan_smoke | test_rxtx_with_ASan_enable| passed| | pf_smoke | test_pf_jumbo_frames | passed| | pf_smoke | test_pf_rss | passed| | pf_smoke | test_pf_tx_rx_queue | passed| | vf_smoke | test_vf_rss | passed| | vf_smoke | test_vf_tx_rx_queue | passed| | vf_smoke | test_vf_jumbo_frames | n/a | | virtio_smoke| test_virtio_loopback | passed| | virtio_smoke| test_virtio_pvp | passed| +-------------+---------------------------+-------+ * Environment: OS : Ubuntu 20.04.5 LTS Kernel : 5.13.0-30-generic GCC : 9.4.0-1ubuntu1~20.04.1 NIC : Ethernet Controller XL710 for 40GbE QSFP+ Target : x86_64-native-linuxapp-gcc TestPlan: pf_smoke: http://git.dpdk.org/tools/dts/tree/test_plans/pf_smoke_test_plan.rst vf_smoke: http://git.dpdk.org/tools/dts/tree/test_plans/vf_smoke_test_plan.rst asan_smoke: http://git.dpdk.org/tools/dts/tree/test_plans/asan_smoke_test_plan.rst TestSuite: pf_smoke: http://git.dpdk.org/tools/dts/tree/tests/TestSuite_pf_smoke.py vf_smoke: http://git.dpdk.org/tools/dts/tree/tests/TestSuite_vf_smoke.py virtio_smoke: http://git.dpdk.org/tools/dts/tree/tests/TestSuite_virtio_smoke.py asan_smoke: http://git.dpdk.org/tools/dts/tree/tests/TestSuite_asan_smoke.py DPDK STV team