From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: satish <nsatishbabu@gmail.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Performance impact with QoS
Date: Mon, 17 Nov 2014 21:03:37 +0000 [thread overview]
Message-ID: <3EB4FA525960D640B5BDFFD6A3D89126322A94F4@IRSMSX154.ger.corp.intel.com> (raw)
In-Reply-To: <CADVv77qxGt3hzCWqYVS4acxR2dpMyk=xy=CdokX3XMQDVCMaFA@mail.gmail.com>
Hi Satish,
The QoS traffic manager has a large memory footprint due to large number of packet queues (e.g. 64K queues of 64 packets each) and large tables (e.g. 4K pipes with one cache line of context per pipe) that far exceeds the amount of CPU cache physically available. There are a lot of data structures that need to be brought into the L1 cache of the traffic manager core in order to take the scheduling decision: bitmap, pipe table entry, queue read/write pointers, queue elements, packet metadata (mbuf), etc. To minimize the penalties associated with the CPU pipeline stalling due to memory accesses, all these data structures are prefetched.
So, the point I am trying to make is there are a lot of critical CPU resources involved: size of L1/L2 cache (per CPU core), size of L3 cache (shared by all CPU cores), bandwidth of L1/L2 cache (per core), bandwidth of L3 cache (shared by all CPU cores), number of outstanding prefetches (per CPU core), etc.
If you map the QoS traffic manager on the same core with packet I/O (i.e. Poll Mode Driver RX/TX), my guess is these two I/O intensive workloads will both compete for the CPU resources listed above and will also impact each other by thrashing each other data structures in and out of L1/L2 cache. If you split them on different CPU cores, their operation is more performant and more predictable, as each one is having its own L1/L2 cache now.
Did you try a CPU core chaining setup (through rte_rings) similar to qos_sched application, like: RX -> (TM enqueue & dequeue) -> TX or RX -> (TM enqueue & TM dequeue & TX)? I am sure you will find the right setup for you by conducting similar experiments. Of course, result also depends on which other workloads your application is performing.
Regards,
Cristian
From: satish [mailto:nsatishbabu@gmail.com]
Sent: Monday, November 17, 2014 6:03 AM
To: dev@dpdk.org
Cc: Dumitrescu, Cristian
Subject: Re: Performance impact with QoS
Hi All,
Can someone please provide comments on queries in below mail?
Regards,
Satish Babu
On Mon, Nov 10, 2014 at 4:24 PM, satish <nsatishbabu@gmail.com<mailto:nsatishbabu@gmail.com>> wrote:
Hi,
I need comments on performance impact with DPDK-QoS.
We are working on developing a application based on DPDK.
Our application supports IPv4 forwarding with and without QoS.
Without QOS, we are achieving almost full wire rate (bi-directional traffic) with 128, 256 and 512 byte packets.
But when we enabled QoS, performance dropped to half for 128 and 256 byte packets.
For 512 byte packet, we didn't observe any drop even after enabling QoS (Achieving full wire rate).
Traffic used in both the cases is same. ( One stream with Qos match to first queue in traffic class 0)
In our application, we are using memory buffer pools to receive the packet bursts (Ring buffer is not used).
Same buffer is used during packet processing and TX (enqueue and dequeue). All above handled on the same core.
For normal forwarding(without QoS), we are using rte_eth_tx_burst for TX.
For forwarding with QoS, using rte_sched_port_pkt_write(), rte_sched_port_enqueue () and rte_sched_port_dequeue ()
before rte_eth_tx_burst ().
We understood that performance dip in case of 128 and 256 byte packet is bacause
of processing more number of packets compared to 512 byte packet.
Can some comment on performance dip in my case with QOS enabled?
[1] can this be because of inefficient use of RTE calls for QoS?
[2] Is it the poor buffer management?
[3] any other comments?
To achieve good performance in QoS case, is it must to use worker thread (running on different core) with ring buffer?
Please provide your comments.
Thanks in advance.
Regards,
Satish Babu
--
Regards,
Satish Babu
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
prev parent reply other threads:[~2014-11-17 20:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-11 0:24 satish
2014-11-17 6:02 ` satish
2014-11-17 21:03 ` Dumitrescu, Cristian [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3EB4FA525960D640B5BDFFD6A3D89126322A94F4@IRSMSX154.ger.corp.intel.com \
--to=cristian.dumitrescu@intel.com \
--cc=dev@dpdk.org \
--cc=nsatishbabu@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).