Hi experts,
I had a traffic throughput test for my dpdk application, with same software and test case, only difference is the number of rx/tx descriptor:
Rx/tx descriptor 512, test result 3.2mpps
Rx/tx descriptor 2048, test result 3mpp
From perf data, rx descriptor 2048 case has more cache miss, and lower instruction per cycle
Perf for 512 rx descriptor
114289237792 cpu-cycles
365408402395 instructions
# 3.20 insn per cycle
74186289932 branches
36020793 branch-misses # 0.05% of all branches
1298741388 bus-cycles
3413460 cache-misses
# 0.723 % of all cache refs
472363654 cache-references
Perf for 2048 rx descriptor:
57038451185 cpu-cycles
173805485573 instructions
# 3.05 insn per cycle
35289607389 branches
15418885 branch-misses # 0.04% of all branches
648164239 bus-cycles
13170596 cache-misses
# 1.702 % of all cache refs
773765263 cache-references
I understand it means more rx descriptor somehow causes more cache miss and then less instruction per cycle, so lower performance.
Any one observe similar results?
Any idea to mitigate (or investigate further) the impact? (we want to use 2048 to better tolerate some jitter/burst)
Any comment?
Thank you.
Br, Xiaoping