[dpdk-dev] Performance impact with QoS

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] Performance impact with QoS
@ 2014-11-11  0:24 satish
  2014-11-17  6:02 ` satish
  0 siblings, 1 reply; 3+ messages in thread
From: satish @ 2014-11-11  0:24 UTC (permalink / raw)
  To: dev

Hi,
I need comments on performance impact with DPDK-QoS.

We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.

Without QOS, we are achieving almost full wire rate (bi-directional
traffic) with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte
packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS
(Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to
first queue in traffic class 0)

In our application, we are using memory buffer pools to receive the packet
bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue).
All above handled on the same core.

For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.

For forwarding with QoS, using rte_sched_port_pkt_write(),
rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().

We understood that performance dip in case of 128 and 256 byte packet is
bacause
of processing more number of packets compared to 512 byte packet.

Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?

To achieve good performance in QoS case, is it must to use worker thread
(running on different core) with ring buffer?

Please provide your comments.

Thanks in advance.

Regards,
Satish Babu

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] Performance impact with QoS
  2014-11-11  0:24 [dpdk-dev] Performance impact with QoS satish
@ 2014-11-17  6:02 ` satish
  2014-11-17 21:03   ` Dumitrescu, Cristian
  0 siblings, 1 reply; 3+ messages in thread
From: satish @ 2014-11-17  6:02 UTC (permalink / raw)
  To: dev

Hi All,
Can someone please provide comments on queries in below mail?

Regards,
Satish Babu

On Mon, Nov 10, 2014 at 4:24 PM, satish <nsatishbabu@gmail.com> wrote:

> Hi,
> I need comments on performance impact with DPDK-QoS.
>
> We are working on developing a application based on DPDK.
> Our application supports IPv4 forwarding with and without QoS.
>
> Without QOS, we are achieving almost full wire rate (bi-directional
> traffic) with 128, 256 and 512 byte packets.
> But when we enabled QoS, performance dropped to half for 128 and 256 byte
> packets.
> For 512 byte packet, we didn't observe any drop even after enabling QoS
> (Achieving full wire rate).
> Traffic used in both the cases is same. ( One stream with Qos match to
> first queue in traffic class 0)
>
> In our application, we are using memory buffer pools to receive the packet
> bursts (Ring buffer is not used).
> Same buffer is used during packet processing and TX (enqueue and dequeue).
> All above handled on the same core.
>
> For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.
>
> For forwarding with QoS, using rte_sched_port_pkt_write(),
> rte_sched_port_enqueue () and rte_sched_port_dequeue ()
> before rte_eth_tx_burst ().
>
> We understood that performance dip in case of 128 and 256 byte packet is
> bacause
> of processing more number of packets compared to 512 byte packet.
>
> Can some comment on performance dip in my case with QOS enabled?
> [1] can this be because of inefficient use of RTE calls for QoS?
> [2] Is it the poor buffer management?
> [3] any other comments?
>
> To achieve good performance in QoS case, is it must to use worker thread
> (running on different core) with ring buffer?
>
> Please provide your comments.
>
> Thanks in advance.
>
> Regards,
> Satish Babu
>
>


-- 
Regards,
Satish Babu

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] Performance impact with QoS
  2014-11-17  6:02 ` satish
@ 2014-11-17 21:03   ` Dumitrescu, Cristian
  0 siblings, 0 replies; 3+ messages in thread
From: Dumitrescu, Cristian @ 2014-11-17 21:03 UTC (permalink / raw)
  To: satish, dev

Hi Satish,

The QoS traffic manager has a large memory footprint due to large number of packet queues (e.g. 64K queues of 64 packets each) and large tables (e.g. 4K pipes with one cache line of context per pipe) that far exceeds the amount of CPU cache physically available. There are a lot of data structures that need to be brought into the L1 cache of the traffic manager core in order to take the scheduling decision: bitmap, pipe table entry, queue read/write pointers, queue elements, packet metadata (mbuf), etc. To minimize the penalties associated with the CPU pipeline stalling due to memory accesses, all these data structures are prefetched.

So, the point I am trying to make is there are a lot of critical CPU resources involved: size of L1/L2 cache (per CPU core), size of L3 cache (shared by all CPU cores), bandwidth of L1/L2 cache (per core), bandwidth of L3 cache (shared by all CPU cores), number of outstanding prefetches (per CPU core), etc.

If you map the QoS traffic manager on the same core with packet I/O (i.e. Poll Mode Driver RX/TX), my guess is these two I/O intensive workloads will both compete for the CPU resources listed above and will also impact each other by thrashing each other data structures in and out of L1/L2 cache. If you split them on different CPU cores, their operation is more performant and more predictable, as each one is having its own L1/L2 cache now.

Did you try a CPU core chaining setup (through rte_rings) similar to qos_sched application, like: RX -> (TM enqueue & dequeue) -> TX or RX -> (TM enqueue & TM dequeue & TX)? I am sure you will find the right setup for you by conducting similar experiments. Of course, result also depends on which other workloads your application is performing.

Regards,
Cristian

From: satish [mailto:nsatishbabu@gmail.com]
Sent: Monday, November 17, 2014 6:03 AM
To: dev@dpdk.org
Cc: Dumitrescu, Cristian
Subject: Re: Performance impact with QoS

Hi All,
Can someone please provide comments on queries in below mail?

Regards,
Satish Babu

On Mon, Nov 10, 2014 at 4:24 PM, satish <nsatishbabu@gmail.com<mailto:nsatishbabu@gmail.com>> wrote:
Hi,
I need comments on performance impact with DPDK-QoS.

We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.

Without QOS, we are achieving almost full wire rate (bi-directional traffic) with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS (Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to first queue in traffic class 0)

In our application, we are using memory buffer pools to receive the packet bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue). All above handled on the same core.

For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.

For forwarding with QoS, using rte_sched_port_pkt_write(), rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().

We understood that performance dip in case of 128 and 256 byte packet is bacause
of processing more number of packets compared to 512 byte packet.

Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?

To achieve good performance in QoS case, is it must to use worker thread (running on different core) with ring buffer?

Please provide your comments.

Thanks in advance.

Regards,
Satish Babu

--
Regards,
Satish Babu
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-11-17 20:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-11  0:24 [dpdk-dev] Performance impact with QoS satish
2014-11-17  6:02 ` satish
2014-11-17 21:03   ` Dumitrescu, Cristian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).