* [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst @ 2020-07-08 17:23 Bev SCHWARTZ 2020-07-08 20:42 ` Suraj R Gupta 0 siblings, 1 reply; 4+ messages in thread From: Bev SCHWARTZ @ 2020-07-08 17:23 UTC (permalink / raw) To: users I am writing a bridge using DPDK, where I have traffic read from one port transmitted to the other. Here is the core of the program, based on basicfwd.c. while (!force_quit) { nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); for (i = 0; i < nb_rx; i++) { /* inspect packet */ } nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx); for (i = nb_tx; i < nb_rx; i++) { rte_pktmbuf_free(bufs[i]); } } (A bunch of error checking and such left out for brevity.) This worked great, I got bandwidth equivalent to using a Linux Bridge. I then tried using tx buffers instead. (Initialization code left out for brevity.) Here is the new loop. while (!force_quit) { nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); for (i = 0; i < nb_rx; i++) { /* inspect packet */ rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]); } rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer); } (Once again, error checking left out for brevity.) I am running this on 8 cores, each core has its own loop. (tx_buffer is created for each core.) If I have well balanced traffic across the cores, then my performance goes down, about 5% or so. If I have unbalanced traffic such as all traffic coming from a single flow, my performance goes down 80% from about 10 gbs to 2gbs. I want to stress that the ONLY thing that changed in this code is changing how I transmit packets. Everything else is the same. Any idea why this would cause such a degradation in bit rate? -Bev ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst 2020-07-08 17:23 [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst Bev SCHWARTZ @ 2020-07-08 20:42 ` Suraj R Gupta 2020-07-13 6:32 ` Manish Kumar 0 siblings, 1 reply; 4+ messages in thread From: Suraj R Gupta @ 2020-07-08 20:42 UTC (permalink / raw) To: Bev SCHWARTZ; +Cc: users Hi bev, If my understanding is right, rte_eth_tx_burst transmits output packets immediately with a specified number of packets. While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port, the packets would be transmitted only when buffer is or rte_eth_tx_buffer_flush is called. Since you are buffering packets one by one and then you are calling flush, this may have contributed to the delay. Thanks and Regards Suraj R Gupta On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com> wrote: > I am writing a bridge using DPDK, where I have traffic read from one port > transmitted to the other. Here is the core of the program, based on > basicfwd.c. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > } > nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx); > for (i = nb_tx; i < nb_rx; i++) { > rte_pktmbuf_free(bufs[i]); > } > } > > (A bunch of error checking and such left out for brevity.) > > This worked great, I got bandwidth equivalent to using a Linux Bridge. > > I then tried using tx buffers instead. (Initialization code left out for > brevity.) Here is the new loop. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]); > } > rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer); > } > > (Once again, error checking left out for brevity.) > > I am running this on 8 cores, each core has its own loop. (tx_buffer is > created for each core.) > > If I have well balanced traffic across the cores, then my performance goes > down, about 5% or so. If I have unbalanced traffic such as all traffic > coming from a single flow, my performance goes down 80% from about 10 gbs > to 2gbs. > > I want to stress that the ONLY thing that changed in this code is changing > how I transmit packets. Everything else is the same. > > Any idea why this would cause such a degradation in bit rate? > > -Bev -- Thanks and Regards Suraj R Gupta ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst 2020-07-08 20:42 ` Suraj R Gupta @ 2020-07-13 6:32 ` Manish Kumar 2020-07-13 17:18 ` Bev SCHWARTZ 0 siblings, 1 reply; 4+ messages in thread From: Manish Kumar @ 2020-07-13 6:32 UTC (permalink / raw) To: Suraj R Gupta; +Cc: Bev SCHWARTZ, users I agree with Suraj on the same. @Bev : Were you trying to use rte_eth_tx_buffer function as part of just an experiment ? As per your email you already got performance with the rte_eth_tx_burst function. Regards Manish On Wed, Jul 8, 2020 at 1:42 PM Suraj R Gupta <surajrgupta@iith.ac.in> wrote: > Hi bev, > If my understanding is right, rte_eth_tx_burst transmits output packets > immediately with a specified number of packets. > While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port, > the packets would be transmitted only when buffer is or > rte_eth_tx_buffer_flush is called. > Since you are buffering packets one by one and then you are calling flush, > this may have contributed to the delay. > Thanks and Regards > Suraj R Gupta > > > On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com> > wrote: > > > I am writing a bridge using DPDK, where I have traffic read from one port > > transmitted to the other. Here is the core of the program, based on > > basicfwd.c. > > > > while (!force_quit) { > > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > > for (i = 0; i < nb_rx; i++) { > > /* inspect packet */ > > } > > nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx); > > for (i = nb_tx; i < nb_rx; i++) { > > rte_pktmbuf_free(bufs[i]); > > } > > } > > > > (A bunch of error checking and such left out for brevity.) > > > > This worked great, I got bandwidth equivalent to using a Linux Bridge. > > > > I then tried using tx buffers instead. (Initialization code left out for > > brevity.) Here is the new loop. > > > > while (!force_quit) { > > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > > for (i = 0; i < nb_rx; i++) { > > /* inspect packet */ > > rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]); > > } > > rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer); > > } > > > > (Once again, error checking left out for brevity.) > > > > I am running this on 8 cores, each core has its own loop. (tx_buffer is > > created for each core.) > > > > If I have well balanced traffic across the cores, then my performance > goes > > down, about 5% or so. If I have unbalanced traffic such as all traffic > > coming from a single flow, my performance goes down 80% from about 10 gbs > > to 2gbs. > > > > I want to stress that the ONLY thing that changed in this code is > changing > > how I transmit packets. Everything else is the same. > > > > Any idea why this would cause such a degradation in bit rate? > > > > -Bev > > > > -- > Thanks and Regards > Suraj R Gupta > -- Thanks Manish Kumar ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst 2020-07-13 6:32 ` Manish Kumar @ 2020-07-13 17:18 ` Bev SCHWARTZ 0 siblings, 0 replies; 4+ messages in thread From: Bev SCHWARTZ @ 2020-07-13 17:18 UTC (permalink / raw) To: Manish Kumar, Suraj R Gupta; +Cc: users I am writing a bridge program. Originally, I based my implementation on skeleton/basicfwd.c. I next wanted to support using multi-core, so I found l2fwd.c as a simple model for tying queues to cores. However, l2fwd.c uses rte_eth_tx_buffer. Not understanding enough about dpdk, I switched over to using rte_eth_tx_buffer because I wrongly thought that it had to be used with multi-core. I have changed my code back to using rte_eth_tx_burst, and that has solved my problem. However, on very unbalanced traffic, using rte_eth_tx_buffer caused an 80% performance degradation. That seems rather extreme for such a small change, so I was inquiring to see if people understood why. And given this degradation, I'm surprised that l2fwd uses rte_eth_tx_buffer instead of rte_eth_tx_burst. -Bev ________________________________________ From: Manish Kumar <manish.jangid08@gmail.com> Sent: Monday, July 13, 2020 2:32 AM To: Suraj R Gupta Cc: Bev SCHWARTZ; users@dpdk.org Subject: [External] Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst I agree with Suraj on the same. @Bev : Were you trying to use rte_eth_tx_buffer function as part of just an experiment ? As per your email you already got performance with the rte_eth_tx_burst function. Regards Manish On Wed, Jul 8, 2020 at 1:42 PM Suraj R Gupta <surajrgupta@iith.ac.in<mailto:surajrgupta@iith.ac.in>> wrote: Hi bev, If my understanding is right, rte_eth_tx_burst transmits output packets immediately with a specified number of packets. While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port, the packets would be transmitted only when buffer is or rte_eth_tx_buffer_flush is called. Since you are buffering packets one by one and then you are calling flush, this may have contributed to the delay. Thanks and Regards Suraj R Gupta On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <bev.schwartz@raytheon.com<mailto:bev.schwartz@raytheon.com>> wrote: > I am writing a bridge using DPDK, where I have traffic read from one port > transmitted to the other. Here is the core of the program, based on > basicfwd.c. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > } > nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx); > for (i = nb_tx; i < nb_rx; i++) { > rte_pktmbuf_free(bufs[i]); > } > } > > (A bunch of error checking and such left out for brevity.) > > This worked great, I got bandwidth equivalent to using a Linux Bridge. > > I then tried using tx buffers instead. (Initialization code left out for > brevity.) Here is the new loop. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]); > } > rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer); > } > > (Once again, error checking left out for brevity.) > > I am running this on 8 cores, each core has its own loop. (tx_buffer is > created for each core.) > > If I have well balanced traffic across the cores, then my performance goes > down, about 5% or so. If I have unbalanced traffic such as all traffic > coming from a single flow, my performance goes down 80% from about 10 gbs > to 2gbs. > > I want to stress that the ONLY thing that changed in this code is changing > how I transmit packets. Everything else is the same. > > Any idea why this would cause such a degradation in bit rate? > > -Bev -- Thanks and Regards Suraj R Gupta -- Thanks Manish Kumar ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-13 17:18 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-08 17:23 [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst Bev SCHWARTZ 2020-07-08 20:42 ` Suraj R Gupta 2020-07-13 6:32 ` Manish Kumar 2020-07-13 17:18 ` Bev SCHWARTZ
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).