DPDK usage discussions
 help / color / mirror / Atom feed
* rte_pktmbuf_free_bulk vs rte_pktmbuf_free
@ 2022-01-11 12:12 Filip Janiszewski
  2022-01-11 18:02 ` Stephen Hemminger
  0 siblings, 1 reply; 3+ messages in thread
From: Filip Janiszewski @ 2022-01-11 12:12 UTC (permalink / raw)
  To: users

Hi,

Is there any specific reason why using rte_pktmbuf_free_bulk seems to be
much slower than rte_pktmbuf_free in a loop? (DPDK 21.11)

I ran a bunch of tests on a 50GbE link where I'm getting packet drops
(running with too few RX cores on purpose, to make some performance
verification) and when the time comes to release the packets, i did a
quick change like this:

.
            //rte_pktmbuf_free_bulk( data, pkt_cnt );
            for( idx = 0 ; idx < pkt_cnt ; ++idx ) {
                rte_pktmbuf_free( data[ idx ] );
            }
.

And suddenly I'm dropping around 10% less packets (The traffic rate is
around ~95Mpps). In case that's relevant, RX from the nic is done on a
separate core than where the pkts are released (processed and released)

I did also the following experiment: Found the MPPs speed value where i
get around 2-5% drops using rte_pktmbuf_free_bulk, executed a bunch of
readings where I consistently get drops.. Then switched to the loop with
rte_pktmbuf_free and executed the same tests again, of a sudden I can't
drop anymore.

Isn't this strange? I was sure rte_pktmbuf_free_bulk would be kind of
optimized for bulk releases so people don't have to loop themselves.

Thanks

-- 
BR, Filip
+48 666 369 823

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rte_pktmbuf_free_bulk vs rte_pktmbuf_free
  2022-01-11 12:12 rte_pktmbuf_free_bulk vs rte_pktmbuf_free Filip Janiszewski
@ 2022-01-11 18:02 ` Stephen Hemminger
  2022-01-12  6:32   ` Filip Janiszewski
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2022-01-11 18:02 UTC (permalink / raw)
  To: Filip Janiszewski; +Cc: users

On Tue, 11 Jan 2022 13:12:24 +0100
Filip Janiszewski <contact@filipjaniszewski.com> wrote:

> Hi,
> 
> Is there any specific reason why using rte_pktmbuf_free_bulk seems to be
> much slower than rte_pktmbuf_free in a loop? (DPDK 21.11)
> 
> I ran a bunch of tests on a 50GbE link where I'm getting packet drops
> (running with too few RX cores on purpose, to make some performance
> verification) and when the time comes to release the packets, i did a
> quick change like this:
> 
> .
>             //rte_pktmbuf_free_bulk( data, pkt_cnt );
>             for( idx = 0 ; idx < pkt_cnt ; ++idx ) {
>                 rte_pktmbuf_free( data[ idx ] );
>             }
> .
> 
> And suddenly I'm dropping around 10% less packets (The traffic rate is
> around ~95Mpps). In case that's relevant, RX from the nic is done on a
> separate core than where the pkts are released (processed and released)
> 
> I did also the following experiment: Found the MPPs speed value where i
> get around 2-5% drops using rte_pktmbuf_free_bulk, executed a bunch of
> readings where I consistently get drops.. Then switched to the loop with
> rte_pktmbuf_free and executed the same tests again, of a sudden I can't
> drop anymore.
> 
> Isn't this strange? I was sure rte_pktmbuf_free_bulk would be kind of
> optimized for bulk releases so people don't have to loop themselves.
> 
> Thanks
> 

Is your mbuf pool close to exhausted? How big is your bulk size?
It might be with that with larger bulk sizes, the loop is giving packets
back that instantly get consumed by incoming packets. So either pool is almost
empty or the non-bulk is keeping packets in cache more.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rte_pktmbuf_free_bulk vs rte_pktmbuf_free
  2022-01-11 18:02 ` Stephen Hemminger
@ 2022-01-12  6:32   ` Filip Janiszewski
  0 siblings, 0 replies; 3+ messages in thread
From: Filip Janiszewski @ 2022-01-12  6:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

Il 1/11/22 7:02 PM, Stephen Hemminger ha scritto:
> On Tue, 11 Jan 2022 13:12:24 +0100
> Filip Janiszewski <contact@filipjaniszewski.com> wrote:
> 
>> Hi,
>>
>> Is there any specific reason why using rte_pktmbuf_free_bulk seems to be
>> much slower than rte_pktmbuf_free in a loop? (DPDK 21.11)
>>
>> I ran a bunch of tests on a 50GbE link where I'm getting packet drops
>> (running with too few RX cores on purpose, to make some performance
>> verification) and when the time comes to release the packets, i did a
>> quick change like this:
>>
>> .
>>             //rte_pktmbuf_free_bulk( data, pkt_cnt );
>>             for( idx = 0 ; idx < pkt_cnt ; ++idx ) {
>>                 rte_pktmbuf_free( data[ idx ] );
>>             }
>> .
>>
>> And suddenly I'm dropping around 10% less packets (The traffic rate is
>> around ~95Mpps). In case that's relevant, RX from the nic is done on a
>> separate core than where the pkts are released (processed and released)
>>
>> I did also the following experiment: Found the MPPs speed value where i
>> get around 2-5% drops using rte_pktmbuf_free_bulk, executed a bunch of
>> readings where I consistently get drops.. Then switched to the loop with
>> rte_pktmbuf_free and executed the same tests again, of a sudden I can't
>> drop anymore.
>>
>> Isn't this strange? I was sure rte_pktmbuf_free_bulk would be kind of
>> optimized for bulk releases so people don't have to loop themselves.
>>
>> Thanks
>>
> 
> Is your mbuf pool close to exhausted? How big is your bulk size?
> It might be with that with larger bulk sizes, the loop is giving packets
> back that instantly get consumed by incoming packets. So either pool is almost
> empty or the non-bulk is keeping packets in cache more.
> 

Well, yes, once it starts dropping the buffer is full, but for quite a
while before that event the mem usage is pretty low.

In fact, I've added few diagnostics and here they are for the
rte_pktmbuf_free_bulk test:

.
Mem usage: 0.244141%, captured 0 pkts
Mem usage: 0.244141%, captured 0 pkts
Mem usage: 0.241852%, captured 11,681,034 pkts
Mem usage: 0.243807%, captured 44,327,015 pkts
Mem usage: 0.243855%, captured 78,834,947 pkts
Mem usage: 0.243235%, captured 113,343,787 pkts
Mem usage: 0.246191%, captured 147,867,507 pkts
Mem usage: 0.264502%, captured 182,367,926 pkts
Mem usage: 0.244856%, captured 216,917,982 pkts
Mem usage: 0.248837%, captured 251,445,720 pkts
Mem usage: 0.257087%, captured 285,985,575 pkts
Mem usage: 0.338078%, captured 320,509,279 pkts
Mem usage: 0.362778%, captured 355,016,693 pkts
Mem usage: 0.692415%, captured 389,521,441 pkts
Mem usage: 52.050495%, captured 424,066,179 pkts
Mem usage: 99.960041%, captured 456,936,573 pkts // DROPPING FROM HERE
Mem usage: 99.962330%, captured 485,568,660 pkts
Mem usage: 0.241208%, captured 491,178,294 pkts
Mem usage: 0.241208%, captured 491,178,294 pkts
.

The % value is for the pool usage, it's a 8M items pool. As you can see
all of a sudden it sharply gets exhausted and it never recover (the test
stops at 500M packets.). Please note the prints have a 500ms interval.

Attempting the same test with 1 billion packets lead to a similar
result, the pool is exhausted after a while and there are plenty of drops:

.
Mem usage: 0.244141%, captured 0 pkts
Mem usage: 0.244141%, captured 0 pkts
Mem usage: 0.242686%, captured 1,994,944 pkts
Mem usage: 0.243521%, captured 23,094,546 pkts
Mem usage: 0.350094%, captured 57,594,139 pkts
Mem usage: 0.245333%, captured 92,103,632 pkts
Mem usage: 0.243330%, captured 126,616,534 pkts
Mem usage: 0.244308%, captured 161,136,760 pkts
Mem usage: 0.244093%, captured 195,633,863 pkts
Mem usage: 0.245523%, captured 230,149,916 pkts
Mem usage: 0.249910%, captured 264,648,839 pkts
Mem usage: 0.258422%, captured 299,165,901 pkts
Mem usage: 0.301266%, captured 333,678,228 pkts
Mem usage: 0.425720%, captured 368,197,372 pkts
Mem usage: 0.542426%, captured 402,699,822 pkts
Mem usage: 21.447337%, captured 437,244,879 pkts
Mem usage: 86.296201%, captured 471,804,014 pkts
Mem usage: 99.958158%, captured 501,730,958 pkts // DROPPING FROM HERE
Mem usage: 99.954629%, captured 529,462,253 pkts
Mem usage: 99.958587%, captured 556,391,644 pkts
Mem usage: 99.932027%, captured 582,999,427 pkts
Mem usage: 99.959493%, captured 609,456,194 pkts
Mem usage: 99.959779%, captured 635,641,696 pkts
Mem usage: 99.958920%, captured 661,792,475 pkts
Mem usage: 99.954844%, captured 687,919,194 pkts
Mem usage: 99.957728%, captured 713,992,293 pkts
Mem usage: 99.960685%, captured 740,042,732 pkts
Mem usage: 99.956965%, captured 766,240,304 pkts
Mem usage: 99.960780%, captured 792,423,477 pkts
Mem usage: 99.960351%, captured 818,629,881 pkts
Mem usage: 99.959016%, captured 844,904,955 pkts
Mem usage: 99.960637%, captured 871,162,327 pkts
Mem usage: 0.241995%, captured 878,826,100 pkts
Mem usage: 0.241995%, captured 878,826,100 pkts
.

I can fix the issue switching from rte_pktmbuf_free_bulk to
rte_pktmbuf_free in a loop (Not dropping at all, no matter how many
packets I capture)..

I would like to understand this issue better, also I'm a little confused
on why after a while the performance degrade, you're suggesting there's
some packet cache going on, can you elaborate on that?

Thanks

-- 
BR, Filip
+48 666 369 823

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-12  6:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11 12:12 rte_pktmbuf_free_bulk vs rte_pktmbuf_free Filip Janiszewski
2022-01-11 18:02 ` Stephen Hemminger
2022-01-12  6:32   ` Filip Janiszewski

DPDK usage discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/users/0 users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 users users/ http://inbox.dpdk.org/users \
		users@dpdk.org
	public-inbox-index users

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git