* [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst
@ 2020-07-08 17:23 Bev SCHWARTZ
2020-07-08 20:42 ` Suraj R Gupta
0 siblings, 1 reply; 4+ messages in thread
From: Bev SCHWARTZ @ 2020-07-08 17:23 UTC (permalink / raw)
To: users
I am writing a bridge using DPDK, where I have traffic read from one port transmitted to the other. Here is the core of the program, based on basicfwd.c.
while (!force_quit) {
nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
for (i = 0; i < nb_rx; i++) {
/* inspect packet */
}
nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx);
for (i = nb_tx; i < nb_rx; i++) {
rte_pktmbuf_free(bufs[i]);
}
}
(A bunch of error checking and such left out for brevity.)
This worked great, I got bandwidth equivalent to using a Linux Bridge.
I then tried using tx buffers instead. (Initialization code left out for brevity.) Here is the new loop.
while (!force_quit) {
nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
for (i = 0; i < nb_rx; i++) {
/* inspect packet */
rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]);
}
rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer);
}
(Once again, error checking left out for brevity.)
I am running this on 8 cores, each core has its own loop. (tx_buffer is created for each core.)
If I have well balanced traffic across the cores, then my performance goes down, about 5% or so. If I have unbalanced traffic such as all traffic coming from a single flow, my performance goes down 80% from about 10 gbs to 2gbs.
I want to stress that the ONLY thing that changed in this code is changing how I transmit packets. Everything else is the same.
Any idea why this would cause such a degradation in bit rate?
-Bev
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst
2020-07-08 17:23 [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst Bev SCHWARTZ
@ 2020-07-08 20:42 ` Suraj R Gupta
2020-07-13 6:32 ` Manish Kumar
0 siblings, 1 reply; 4+ messages in thread
From: Suraj R Gupta @ 2020-07-08 20:42 UTC (permalink / raw)
To: Bev SCHWARTZ; +Cc: users
Hi bev,
If my understanding is right, rte_eth_tx_burst transmits output packets
immediately with a specified number of packets.
While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port,
the packets would be transmitted only when buffer is or
rte_eth_tx_buffer_flush is called.
Since you are buffering packets one by one and then you are calling flush,
this may have contributed to the delay.
Thanks and Regards
Suraj R Gupta
On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com>
wrote:
> I am writing a bridge using DPDK, where I have traffic read from one port
> transmitted to the other. Here is the core of the program, based on
> basicfwd.c.
>
> while (!force_quit) {
> nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> for (i = 0; i < nb_rx; i++) {
> /* inspect packet */
> }
> nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx);
> for (i = nb_tx; i < nb_rx; i++) {
> rte_pktmbuf_free(bufs[i]);
> }
> }
>
> (A bunch of error checking and such left out for brevity.)
>
> This worked great, I got bandwidth equivalent to using a Linux Bridge.
>
> I then tried using tx buffers instead. (Initialization code left out for
> brevity.) Here is the new loop.
>
> while (!force_quit) {
> nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> for (i = 0; i < nb_rx; i++) {
> /* inspect packet */
> rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]);
> }
> rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer);
> }
>
> (Once again, error checking left out for brevity.)
>
> I am running this on 8 cores, each core has its own loop. (tx_buffer is
> created for each core.)
>
> If I have well balanced traffic across the cores, then my performance goes
> down, about 5% or so. If I have unbalanced traffic such as all traffic
> coming from a single flow, my performance goes down 80% from about 10 gbs
> to 2gbs.
>
> I want to stress that the ONLY thing that changed in this code is changing
> how I transmit packets. Everything else is the same.
>
> Any idea why this would cause such a degradation in bit rate?
>
> -Bev
--
Thanks and Regards
Suraj R Gupta
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst
2020-07-08 20:42 ` Suraj R Gupta
@ 2020-07-13 6:32 ` Manish Kumar
2020-07-13 17:18 ` Bev SCHWARTZ
0 siblings, 1 reply; 4+ messages in thread
From: Manish Kumar @ 2020-07-13 6:32 UTC (permalink / raw)
To: Suraj R Gupta; +Cc: Bev SCHWARTZ, users
I agree with Suraj on the same. @Bev : Were you trying to use
rte_eth_tx_buffer function as part of just an experiment ? As per your
email you already got performance with the rte_eth_tx_burst function.
Regards
Manish
On Wed, Jul 8, 2020 at 1:42 PM Suraj R Gupta <surajrgupta@iith.ac.in> wrote:
> Hi bev,
> If my understanding is right, rte_eth_tx_burst transmits output packets
> immediately with a specified number of packets.
> While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port,
> the packets would be transmitted only when buffer is or
> rte_eth_tx_buffer_flush is called.
> Since you are buffering packets one by one and then you are calling flush,
> this may have contributed to the delay.
> Thanks and Regards
> Suraj R Gupta
>
>
> On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com>
> wrote:
>
> > I am writing a bridge using DPDK, where I have traffic read from one port
> > transmitted to the other. Here is the core of the program, based on
> > basicfwd.c.
> >
> > while (!force_quit) {
> > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> > for (i = 0; i < nb_rx; i++) {
> > /* inspect packet */
> > }
> > nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx);
> > for (i = nb_tx; i < nb_rx; i++) {
> > rte_pktmbuf_free(bufs[i]);
> > }
> > }
> >
> > (A bunch of error checking and such left out for brevity.)
> >
> > This worked great, I got bandwidth equivalent to using a Linux Bridge.
> >
> > I then tried using tx buffers instead. (Initialization code left out for
> > brevity.) Here is the new loop.
> >
> > while (!force_quit) {
> > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> > for (i = 0; i < nb_rx; i++) {
> > /* inspect packet */
> > rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]);
> > }
> > rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer);
> > }
> >
> > (Once again, error checking left out for brevity.)
> >
> > I am running this on 8 cores, each core has its own loop. (tx_buffer is
> > created for each core.)
> >
> > If I have well balanced traffic across the cores, then my performance
> goes
> > down, about 5% or so. If I have unbalanced traffic such as all traffic
> > coming from a single flow, my performance goes down 80% from about 10 gbs
> > to 2gbs.
> >
> > I want to stress that the ONLY thing that changed in this code is
> changing
> > how I transmit packets. Everything else is the same.
> >
> > Any idea why this would cause such a degradation in bit rate?
> >
> > -Bev
>
>
>
> --
> Thanks and Regards
> Suraj R Gupta
>
--
Thanks
Manish Kumar
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst
2020-07-13 6:32 ` Manish Kumar
@ 2020-07-13 17:18 ` Bev SCHWARTZ
0 siblings, 0 replies; 4+ messages in thread
From: Bev SCHWARTZ @ 2020-07-13 17:18 UTC (permalink / raw)
To: Manish Kumar, Suraj R Gupta; +Cc: users
I am writing a bridge program. Originally, I based my implementation on skeleton/basicfwd.c. I next wanted to support using multi-core, so I found l2fwd.c as a simple model for tying queues to cores. However, l2fwd.c uses rte_eth_tx_buffer. Not understanding enough about dpdk, I switched over to using rte_eth_tx_buffer because I wrongly thought that it had to be used with multi-core.
I have changed my code back to using rte_eth_tx_burst, and that has solved my problem. However, on very unbalanced traffic, using rte_eth_tx_buffer caused an 80% performance degradation. That seems rather extreme for such a small change, so I was inquiring to see if people understood why. And given this degradation, I'm surprised that l2fwd uses rte_eth_tx_buffer instead of rte_eth_tx_burst.
-Bev
________________________________________
From: Manish Kumar <manish.jangid08@gmail.com>
Sent: Monday, July 13, 2020 2:32 AM
To: Suraj R Gupta
Cc: Bev SCHWARTZ; users@dpdk.org
Subject: [External] Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst
I agree with Suraj on the same. @Bev : Were you trying to use rte_eth_tx_buffer function as part of just an experiment ? As per your email you already got performance with the rte_eth_tx_burst function.
Regards
Manish
On Wed, Jul 8, 2020 at 1:42 PM Suraj R Gupta <surajrgupta@iith.ac.in<mailto:surajrgupta@iith.ac.in>> wrote:
Hi bev,
If my understanding is right, rte_eth_tx_burst transmits output packets
immediately with a specified number of packets.
While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port,
the packets would be transmitted only when buffer is or
rte_eth_tx_buffer_flush is called.
Since you are buffering packets one by one and then you are calling flush,
this may have contributed to the delay.
Thanks and Regards
Suraj R Gupta
On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com<mailto:bev.schwartz@raytheon.com>>
wrote:
> I am writing a bridge using DPDK, where I have traffic read from one port
> transmitted to the other. Here is the core of the program, based on
> basicfwd.c.
>
> while (!force_quit) {
> nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> for (i = 0; i < nb_rx; i++) {
> /* inspect packet */
> }
> nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx);
> for (i = nb_tx; i < nb_rx; i++) {
> rte_pktmbuf_free(bufs[i]);
> }
> }
>
> (A bunch of error checking and such left out for brevity.)
>
> This worked great, I got bandwidth equivalent to using a Linux Bridge.
>
> I then tried using tx buffers instead. (Initialization code left out for
> brevity.) Here is the new loop.
>
> while (!force_quit) {
> nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE);
> for (i = 0; i < nb_rx; i++) {
> /* inspect packet */
> rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]);
> }
> rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer);
> }
>
> (Once again, error checking left out for brevity.)
>
> I am running this on 8 cores, each core has its own loop. (tx_buffer is
> created for each core.)
>
> If I have well balanced traffic across the cores, then my performance goes
> down, about 5% or so. If I have unbalanced traffic such as all traffic
> coming from a single flow, my performance goes down 80% from about 10 gbs
> to 2gbs.
>
> I want to stress that the ONLY thing that changed in this code is changing
> how I transmit packets. Everything else is the same.
>
> Any idea why this would cause such a degradation in bit rate?
>
> -Bev
--
Thanks and Regards
Suraj R Gupta
--
Thanks
Manish Kumar
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-13 17:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08 17:23 [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst Bev SCHWARTZ
2020-07-08 20:42 ` Suraj R Gupta
2020-07-13 6:32 ` Manish Kumar
2020-07-13 17:18 ` Bev SCHWARTZ
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).