Hi experts,

 

I had a traffic throughput test for my dpdk application, with same software and test case, only difference is the number of rx/tx descriptor:

Rx/tx descriptor 512, test result 3.2mpps

Rx/tx descriptor 2048, test result 3mpp

From perf data, rx descriptor 2048 case has more cache miss, and lower instruction per cycle

Perf for 512 rx descriptor

      114289237792      cpu-cycles

      365408402395      instructions              #    3.20  insn per cycle

       74186289932      branches

          36020793      branch-misses             #    0.05% of all branches

        1298741388      bus-cycles

           3413460      cache-misses              #    0.723 % of all cache refs

         472363654      cache-references

Perf for 2048 rx descriptor:

       57038451185      cpu-cycles

      173805485573      instructions              #    3.05  insn per cycle

       35289607389      branches

          15418885      branch-misses             #    0.04% of all branches

         648164239      bus-cycles

          13170596      cache-misses              #    1.702 % of all cache refs

         773765263      cache-references

 

I understand it means more rx descriptor somehow causes more cache miss and then less instruction per cycle, so lower performance.

 

Any one observe similar results?

Any idea to mitigate (or investigate further) the impact? (we want to use 2048 to better tolerate some jitter/burst)

Any comment?

 

Thank you.

 

Br, Xiaoping