[dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G
@ 2019-03-01  1:42 Arvind Narayanan
  2019-03-01  2:23 ` Cliff Burdick
  0 siblings, 1 reply; 4+ messages in thread
From: Arvind Narayanan @ 2019-03-01  1:42 UTC (permalink / raw)
  To: users

Hi,

I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G
EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64).
Packet generator: t-rex 2.49 running on another machine.

I am able to achieve 100G line rate with l3fwd application (fr sz 64B)
using the parameters suggested in their performance report.
(https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf)

However, as soon as I install rte_flow rules to steer packets to
different queues and/or use rte_flow's mark action, the throughput
reduces to ~41G. I also modified DPDK's flow_filtering example
application, and am getting the same reduced throughput of around 41G
out of 100G. But without rte_flow, it goes to 100G.

I didn't change any OS/Kernel parameters to test l3fwd or the
application that uses rte_flow. I also ensure the application is
numa-aware and use 20 cores to handle 100G traffic.

Upon further investigation (using Mellanox NIC counters), the drop in
throughput is due to mbuf allocation errors.

Is such performance degradation normal when performing hw-acceleration
using rte_flow?
Has anyone tested throughput performance using rte_flow @ 100G?

Its surprising to see hardware offloading is degrading the
performance, unless I am doing something wrong.

Thanks,
Arvind

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G
  2019-03-01  1:42 [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G Arvind Narayanan
@ 2019-03-01  2:23 ` Cliff Burdick
  2019-03-01  2:57   ` Arvind Narayanan
  0 siblings, 1 reply; 4+ messages in thread
From: Cliff Burdick @ 2019-03-01  2:23 UTC (permalink / raw)
  To: Arvind Narayanan; +Cc: users

What size packets are you using? I've only steered to 2 rx queues by IP dst
match, and was able to hit 100Gbps. That's with a 4KB jumboframe.

On Thu, Feb 28, 2019, 17:42 Arvind Narayanan <webguru2688@gmail.com> wrote:

> Hi,
>
> I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G
> EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64).
> Packet generator: t-rex 2.49 running on another machine.
>
> I am able to achieve 100G line rate with l3fwd application (fr sz 64B)
> using the parameters suggested in their performance report.
> (
> https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf
> )
>
> However, as soon as I install rte_flow rules to steer packets to
> different queues and/or use rte_flow's mark action, the throughput
> reduces to ~41G. I also modified DPDK's flow_filtering example
> application, and am getting the same reduced throughput of around 41G
> out of 100G. But without rte_flow, it goes to 100G.
>
> I didn't change any OS/Kernel parameters to test l3fwd or the
> application that uses rte_flow. I also ensure the application is
> numa-aware and use 20 cores to handle 100G traffic.
>
> Upon further investigation (using Mellanox NIC counters), the drop in
> throughput is due to mbuf allocation errors.
>
> Is such performance degradation normal when performing hw-acceleration
> using rte_flow?
> Has anyone tested throughput performance using rte_flow @ 100G?
>
> Its surprising to see hardware offloading is degrading the
> performance, unless I am doing something wrong.
>
> Thanks,
> Arvind
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G
  2019-03-01  2:23 ` Cliff Burdick
@ 2019-03-01  2:57   ` Arvind Narayanan
  2019-03-01  3:07     ` Cliff Burdick
  0 siblings, 1 reply; 4+ messages in thread
From: Arvind Narayanan @ 2019-03-01  2:57 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: users

On Thu, Feb 28, 2019, 8:23 PM Cliff Burdick <shaklee3@gmail.com> wrote:

> What size packets are you using? I've only steered to 2 rx queues by IP
> dst match, and was able to hit 100Gbps. That's with a 4KB jumboframe.
>

64 bytes. Agreed this is small, what seems interesting is l3fwd is able to
handle 64B but rte_flow suffers (a lot) - suggesting offloading is
expensive?!

I'm doing something similar, steering to different queues based off dst_ip.
However, my tests have around 80 rules, each rule steering to one of the 20
rx_queues. I have a one-to-one rx_queue-to-core_id mapping.

Arvind



> On Thu, Feb 28, 2019, 17:42 Arvind Narayanan <webguru2688@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G
>> EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64).
>> Packet generator: t-rex 2.49 running on another machine.
>>
>> I am able to achieve 100G line rate with l3fwd application (fr sz 64B)
>> using the parameters suggested in their performance report.
>> (
>> https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf
>> )
>>
>> However, as soon as I install rte_flow rules to steer packets to
>> different queues and/or use rte_flow's mark action, the throughput
>> reduces to ~41G. I also modified DPDK's flow_filtering example
>> application, and am getting the same reduced throughput of around 41G
>> out of 100G. But without rte_flow, it goes to 100G.
>>
>> I didn't change any OS/Kernel parameters to test l3fwd or the
>> application that uses rte_flow. I also ensure the application is
>> numa-aware and use 20 cores to handle 100G traffic.
>>
>> Upon further investigation (using Mellanox NIC counters), the drop in
>> throughput is due to mbuf allocation errors.
>>
>> Is such performance degradation normal when performing hw-acceleration
>> using rte_flow?
>> Has anyone tested throughput performance using rte_flow @ 100G?
>>
>> Its surprising to see hardware offloading is degrading the
>> performance, unless I am doing something wrong.
>>
>> Thanks,
>> Arvind
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G
  2019-03-01  2:57   ` Arvind Narayanan
@ 2019-03-01  3:07     ` Cliff Burdick
  0 siblings, 0 replies; 4+ messages in thread
From: Cliff Burdick @ 2019-03-01  3:07 UTC (permalink / raw)
  To: Arvind Narayanan; +Cc: users

That's definitely interesting. Hopefully someone from mellanox can comment
on the performance impact since I haven't seen it qualified.

On Thu, Feb 28, 2019, 18:57 Arvind Narayanan <webguru2688@gmail.com> wrote:

>
> On Thu, Feb 28, 2019, 8:23 PM Cliff Burdick <shaklee3@gmail.com> wrote:
>
>> What size packets are you using? I've only steered to 2 rx queues by IP
>> dst match, and was able to hit 100Gbps. That's with a 4KB jumboframe.
>>
>
> 64 bytes. Agreed this is small, what seems interesting is l3fwd is able to
> handle 64B but rte_flow suffers (a lot) - suggesting offloading is
> expensive?!
>
> I'm doing something similar, steering to different queues based off
> dst_ip. However, my tests have around 80 rules, each rule steering to one
> of the 20 rx_queues. I have a one-to-one rx_queue-to-core_id mapping.
>
> Arvind
>
>
>
>> On Thu, Feb 28, 2019, 17:42 Arvind Narayanan <webguru2688@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am using DPDK 18.11 on Ubuntu 18.04, with Mellanox Connect X-5 100G
>>> EN (MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64).
>>> Packet generator: t-rex 2.49 running on another machine.
>>>
>>> I am able to achieve 100G line rate with l3fwd application (fr sz 64B)
>>> using the parameters suggested in their performance report.
>>> (
>>> https://fast.dpdk.org/doc/perf/DPDK_18_11_Mellanox_NIC_performance_report.pdf
>>> )
>>>
>>> However, as soon as I install rte_flow rules to steer packets to
>>> different queues and/or use rte_flow's mark action, the throughput
>>> reduces to ~41G. I also modified DPDK's flow_filtering example
>>> application, and am getting the same reduced throughput of around 41G
>>> out of 100G. But without rte_flow, it goes to 100G.
>>>
>>> I didn't change any OS/Kernel parameters to test l3fwd or the
>>> application that uses rte_flow. I also ensure the application is
>>> numa-aware and use 20 cores to handle 100G traffic.
>>>
>>> Upon further investigation (using Mellanox NIC counters), the drop in
>>> throughput is due to mbuf allocation errors.
>>>
>>> Is such performance degradation normal when performing hw-acceleration
>>> using rte_flow?
>>> Has anyone tested throughput performance using rte_flow @ 100G?
>>>
>>> Its surprising to see hardware offloading is degrading the
>>> performance, unless I am doing something wrong.
>>>
>>> Thanks,
>>> Arvind
>>>
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-01  3:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01  1:42 [dpdk-users] rte_flow / hw-offloading is degrading performance when testing @ 100G Arvind Narayanan
2019-03-01  2:23 ` Cliff Burdick
2019-03-01  2:57   ` Arvind Narayanan
2019-03-01  3:07     ` Cliff Burdick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).