DPDK usage discussions
 help / color / mirror / Atom feed
* Unexpected behavior when using mbuf pool with external buffers
@ 2021-12-22  9:56 Michał Niciejewski
  2021-12-22 10:24 ` Van Haaren, Harry
  2021-12-22 12:28 ` Gábor LENCSE
  0 siblings, 2 replies; 6+ messages in thread
From: Michał Niciejewski @ 2021-12-22  9:56 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 5009 bytes --]

Hi,

recently I stumbled upon a problem with mbuf pool with external buffers. I
allocated some memory with aligned_alloc(), registered it, DMA mapped the
memory, and created mbuf pool:

size_t mem_size = RTE_ALIGN_CEIL(MBUFS_NUM * QUEUE_NUM *
RTE_MBUF_DEFAULT_BUF_SIZE, 4096);
auto mem = aligned_alloc(4096, mem_size);
mlock(mem, mem_size);
rte_pktmbuf_extmem ext_mem = {
    .buf_ptr = mem,
    .buf_iova = (uintptr_t)mem,
    .buf_len = mem_size,
    .elt_size = RTE_MBUF_DEFAULT_BUF_SIZE,
};

if (rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, nullptr, 0, 4096)
!= 0)
    throw runtime_error("Failed to register DPDK external memory");

if (rte_dev_dma_map(dev, ext_mem.buf_ptr, ext_mem.buf_iova,
ext_mem.buf_len) != 0)
    throw runtime_error("Failed to DMA map external memory");

mp = rte_pktmbuf_pool_create_extbuf("ext_mbuf_pool", MBUFS_NUM * QUEUE_NUM,
0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_eth_dev_socket_id(0), &ext_mem, 1);
if (mp == nullptr)
    throw runtime_error("Failed to create external mbuf pool");

The main loop of the program works like normal l2fwd: it receives packets
and sends them to another port.

std::vector<rte_mbuf *> mbufs(MAX_PKT_BURST);
while (true) {
    auto rx_num = rte_eth_rx_burst(0, queue, mbufs.data(), MAX_PKT_BURST);
    if (!rx_num)
        continue;
    // ...
    auto tx_num = rte_eth_tx_burst(1, queue, mbufs.data(), rx_num);
    rte_pktmbuf_free_bulk(mbufs.data() + tx_num, rx_num - tx_num);
}

Every second, the program prints some info about the packets received in
this second and some stats regarding rte_eth_tx_burst calls. For example,
logs printed while receiving and sending 10mpps:

Number of all rx burst calls: 12238365
Number of non-zero rx burst calls: 966834
Avg pkt nb received per rx burst: 0.816879
All received pkts: 9997264
All sent pkts: 9997264
All dropped pkts: 0

For lower traffic, everything looks fine. But when I start sending more
packets some unexpected behavior occurs. When I increase traffic to 15mpps
most of the packets are dropped on TX:

Queue: 0
Number of rx burst calls: 4449541
Number of non-zero rx burst calls: 1616833
Avg pkt nb received per rx burst: 3.36962
All received pkts: 14993272
All sent pkts: 5827744
All dropped pkts: 9165528

After that, I checked again the results for 10mpps. Even though previously
the application didn't have any troubles in managing 10mpps, now it does:

Queue: 0
Number of all rx burst calls: 8722385
Number of non-zero rx burst calls: 1447741
Avg pkt nb received per rx burst: 1.14617
All received pkts: 9997316
All sent pkts: 8194416
All dropped pkts: 1802900

So basically it looks like sending too many packets breaks something and
starts causing problems when sending fewer packets.

I also tried allocating huge pages for mbuf pool instead of memory returned
from aligned_alloc:

auto mem = mmap(0, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

And actually, it solved the problems - too big traffic doesn't affect lower
traffic management. But I still want to know why memory allocated using
aligned_alloc causes problems because in the place where I want to use mbuf
pools with external buffers huge pages cannot be used like that.

The full code used for testing:
https://gist.github.com/tropuq/22625e0e5ac420a8ff5ae072a16f4c06

NIC used: Supermicro AOC-S25G-I2S-O Std Low Profile 25G Dual Port SFP28,
based on Intel XXV710

Did anyone have similar issues or know what could cause such behavior? Is
this allocation of the mbuf pool correct or am I missing something?

Thanks in advance

-- 

Michał Niciejewski

Junior Software Engineer

michal.niciejewski@codilime.com
[image: Logo Codilime]
<http://www.codilime.com/?utm_source=Stopka&utm_medium=Email&utm_campaign=Stopka>

[image: Logo Facebook] <https://www.facebook.com/codilime/> [image: Logo
Linkedin] <https://www.linkedin.com/company/codilime> [image: Logo Twitter]
<https://twitter.com/codilime>

CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland,
02-493 Warsaw, ul. Krancowa 5.
Registered by The District Court for the Capital City of Warsaw,
XII Commercial Department of the National Court Register.
Entered into National Court Register under No. KRS 0000388871.
Tax identification number (NIP) 5272657478. Statistical number (REGON)
142974628.

-- 


-------------------------------
This document contains material that is 
confidential in CodiLime Sp. z o.o. DO NOT PRINT. DO NOT COPY. DO NOT 
DISTRIBUTE. If you are not the intended recipient of this document, be 
aware that any use, review, retransmission, distribution, reproduction or 
any action taken in reliance upon this message is strictly prohibited. If 
you received this in error, please contact the sender and help@codilime.com 
<mailto:help@codilime.com>. Return the paper copy, delete the material from 
all computers and storage media.

[-- Attachment #2: Type: text/html, Size: 8074 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Unexpected behavior when using mbuf pool with external buffers
  2021-12-22  9:56 Unexpected behavior when using mbuf pool with external buffers Michał Niciejewski
@ 2021-12-22 10:24 ` Van Haaren, Harry
  2021-12-22 16:30   ` Michał Niciejewski
  2021-12-22 12:28 ` Gábor LENCSE
  1 sibling, 1 reply; 6+ messages in thread
From: Van Haaren, Harry @ 2021-12-22 10:24 UTC (permalink / raw)
  To: Michał Niciejewski, users

[-- Attachment #1: Type: text/plain, Size: 5928 bytes --]

Hi Michal,

I'll "top post" on this reply as the content is in HTML format below. In future, please try to send plain-text emails to DPDK mailing lists.

Regarding the issue you're having, its interesting that allocating from hugepage backed memory "solves" the problem, even when going
back to the lower traffic rate. The main difference for a CPU to access hugepage backed or 4k paged backed memory is the DTLB[1] pressure.

In your scenario, both page-sizes work equally well at the start (no drops). This is likely as all buffers are being accessed linearly,
and there are no packet drops, resulting in good re-use of buffers.

Lets discuss the 4K page scenario:
When the rate is turned up, packets are dropped, and the CPU(s) cannot keep up. This results in NIC rx descriptor rings being totally
full of used packets, and the mempools that contain the buffers become more "fragmented" in that not every buffer is on the same
4k page anymore. In the worst case, each mbuf could be on a _different_ 4k page!

I think that when turning down the rate again, the fragmentation of mbufs in the mempool remains, resulting in continued loss of packets.

Estimating and talking is never conclusive – lets measure using Linux "Perf" tool. Run this command 3x, just like you posted the drop stats below.
I expect to see lower dTLB-load-misses on the first run (no drops, 10 mpps), and that the dTLB misses are higher for 15 mpps *and* for 10 mpps again afterwards.
perf stat -e cycles,dTLB-load-misses -C <datapath_lcore_here> -- sleep 1

Please try the commands, and report back your findings! Hope that helps, -Harry

[1] TLB & DPDK Resources;
https://en.wikipedia.org/wiki/Translation_lookaside_buffer (DTLB just means Data-TLB, as opposed to instruction-TLB)
https://stackoverflow.com/questions/52077230/huge-number-of-dtlb-load-misses-when-dpdk-forwarding-test
https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/LeiJiayu_Revise-4K-Pages-Performance-Impact-For-DPDK-Applications.pdf


From: Michał Niciejewski <michal.niciejewski@codilime.com>
Sent: Wednesday, December 22, 2021 9:57 AM
To: users@dpdk.org
Subject: Unexpected behavior when using mbuf pool with external buffers

Hi,

recently I stumbled upon a problem with mbuf pool with external buffers. I allocated some memory with aligned_alloc(), registered it, DMA mapped the memory, and created mbuf pool:

size_t mem_size = RTE_ALIGN_CEIL(MBUFS_NUM * QUEUE_NUM * RTE_MBUF_DEFAULT_BUF_SIZE, 4096);
auto mem = aligned_alloc(4096, mem_size);
mlock(mem, mem_size);
rte_pktmbuf_extmem ext_mem = {
    .buf_ptr = mem,
    .buf_iova = (uintptr_t)mem,
    .buf_len = mem_size,
    .elt_size = RTE_MBUF_DEFAULT_BUF_SIZE,
};

if (rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, nullptr, 0, 4096) != 0)
    throw runtime_error("Failed to register DPDK external memory");

if (rte_dev_dma_map(dev, ext_mem.buf_ptr, ext_mem.buf_iova, ext_mem.buf_len) != 0)
    throw runtime_error("Failed to DMA map external memory");

mp = rte_pktmbuf_pool_create_extbuf("ext_mbuf_pool", MBUFS_NUM * QUEUE_NUM, 0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_eth_dev_socket_id(0), &ext_mem, 1);
if (mp == nullptr)
    throw runtime_error("Failed to create external mbuf pool");

The main loop of the program works like normal l2fwd: it receives packets and sends them to another port.

std::vector<rte_mbuf *> mbufs(MAX_PKT_BURST);
while (true) {
    auto rx_num = rte_eth_rx_burst(0, queue, mbufs.data(), MAX_PKT_BURST);
    if (!rx_num)
        continue;
    // ...
    auto tx_num = rte_eth_tx_burst(1, queue, mbufs.data(), rx_num);
    rte_pktmbuf_free_bulk(mbufs.data() + tx_num, rx_num - tx_num);
}

Every second, the program prints some info about the packets received in this second and some stats regarding rte_eth_tx_burst calls. For example, logs printed while receiving and sending 10mpps:

Number of all rx burst calls: 12238365
Number of non-zero rx burst calls: 966834
Avg pkt nb received per rx burst: 0.816879
All received pkts: 9997264
All sent pkts: 9997264
All dropped pkts: 0

For lower traffic, everything looks fine. But when I start sending more packets some unexpected behavior occurs. When I increase traffic to 15mpps most of the packets are dropped on TX:

Queue: 0
Number of rx burst calls: 4449541
Number of non-zero rx burst calls: 1616833
Avg pkt nb received per rx burst: 3.36962
All received pkts: 14993272
All sent pkts: 5827744
All dropped pkts: 9165528

After that, I checked again the results for 10mpps. Even though previously the application didn't have any troubles in managing 10mpps, now it does:

Queue: 0
Number of all rx burst calls: 8722385
Number of non-zero rx burst calls: 1447741
Avg pkt nb received per rx burst: 1.14617
All received pkts: 9997316
All sent pkts: 8194416
All dropped pkts: 1802900

So basically it looks like sending too many packets breaks something and starts causing problems when sending fewer packets.

I also tried allocating huge pages for mbuf pool instead of memory returned from aligned_alloc:

auto mem = mmap(0, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

And actually, it solved the problems - too big traffic doesn't affect lower traffic management. But I still want to know why memory allocated using aligned_alloc causes problems because in the place where I want to use mbuf pools with external buffers huge pages cannot be used like that.

The full code used for testing: https://gist.github.com/tropuq/22625e0e5ac420a8ff5ae072a16f4c06

NIC used: Supermicro AOC-S25G-I2S-O Std Low Profile 25G Dual Port SFP28, based on Intel XXV710

Did anyone have similar issues or know what could cause such behavior? Is this allocation of the mbuf pool correct or am I missing something?

Thanks in advance

--

Michał Niciejewski

Junior Software Engineer


[-- Attachment #2: Type: text/html, Size: 16700 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected behavior when using mbuf pool with external buffers
  2021-12-22  9:56 Unexpected behavior when using mbuf pool with external buffers Michał Niciejewski
  2021-12-22 10:24 ` Van Haaren, Harry
@ 2021-12-22 12:28 ` Gábor LENCSE
  1 sibling, 0 replies; 6+ messages in thread
From: Gábor LENCSE @ 2021-12-22 12:28 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

Dear Michal,

12/22/2021 10:56 AM keltezéssel, Michał Niciejewski írta:
>     auto tx_num = rte_eth_tx_burst(1, queue, mbufs.data(), rx_num);

I suspect that the frame is sometimes simply not transmitted at high rates.

The rte_eth_tx_burst() function reports the number of actually 
transmitted frames. I usually send a frame using the following loop:

while ( !rte_eth_tx_burst(eth_id, 0, &pkt_mbuf, 1) )
   ;

> Did anyone have similar issues or know what could cause such behavior?

I met a somewhat similar issue, when I implemented the first version of 
siitperf: https://github.com/lencsegabor/siitperf

Then I was not aware of RFC 4814 yet, so I always transmitted the very 
same frame for throughput measurements, but I used a counter to be able 
to uniquely identify the test frames for packet delay variation 
measurement. I wanted to use the frame as a template and I modified its 
counter (and checksum) field after sending.  I have found that even if 
the rte_eth_tx_burst() function reported that the frame was sent, it was 
still in the buffer and I have rewritten its counter. (The receiver 
reported no frame with counter value 0, and 2 frames with the highest 
value of the counter.)
To circumvent the problem, I used an array of frames and I modified 
always the next element (modulo array size).

Best regards,

Gábor

[-- Attachment #2: Type: text/html, Size: 2384 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected behavior when using mbuf pool with external buffers
  2021-12-22 10:24 ` Van Haaren, Harry
@ 2021-12-22 16:30   ` Michał Niciejewski
  2022-01-18 13:41     ` Michał Niciejewski
  0 siblings, 1 reply; 6+ messages in thread
From: Michał Niciejewski @ 2021-12-22 16:30 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: users

Thank you for the replay,

On Wed, Dec 22, 2021 at 11:24 AM Van Haaren, Harry
<harry.van.haaren@intel.com> wrote:
> I'll "top post" on this reply as the content is in HTML format below. In future, please try to send plain-text emails to DPDK mailing lists.

I hope it's better now.

> Estimating and talking is never conclusive – lets measure using Linux "Perf" tool. Run this command 3x, just like you posted the drop stats below.
>
> I expect to see lower dTLB-load-misses on the first run (no drops, 10 mpps), and that the dTLB misses are higher for 15 mpps *and* for 10 mpps again afterwards.
>
> perf stat -e cycles,dTLB-load-misses -C <datapath_lcore_here> -- sleep 1

extbuf, aligned_alloc, 10mpps, first run
 Performance counter stats for 'CPU(s) 0':
       2404553948      cycles
              461      dTLB-load-misses
      1.001938861 seconds time elapsed

extbuf, aligned_alloc, 15mpps
 Performance counter stats for 'CPU(s) 0':
       2404518710      cycles
              466      dTLB-load-misses
      1.001920171 seconds time elapsed

extbuf, aligned_alloc, 10mpps, second run
 Performance counter stats for 'CPU(s) 0':
       2402586106      cycles
              449      dTLB-load-misses
      1.001114692 seconds time elapsed

I also checked what happens when there is no traffic at all and the
results are similar:

 Performance counter stats for 'CPU(s) 0':
       2949935339      cycles
              465      dTLB-load-misses
      1.002236168 seconds time elapsed

Also, I checked how the application behaves when adding --no-huge
option and using a normal mbuf pool. The results are very different
compared to aligned_alloc + extbuf mbuf pool:

10mpps, --no-huge
 Performance counter stats for 'CPU(s) 0':
       2402616160      cycles
         17980033      dTLB-load-misses
      1.001125954 seconds time elapsed

Application logs:
Queue: 0
Number of all rx burst calls: 5757205
Number of non-zero rx burst calls: 1073081
Avg pkt nb received per rx burst: 1.7364
All received pkts: 9996804
All sent pkts: 8074460
All dropped pkts: 1922344

-- 

Michał Niciejewski

Junior Software Engineer

michal.niciejewski@codilime.com



CodiLime Sp. z o.o. - Ltd. company with its registered office in
Poland, 02-493 Warsaw, ul. Krancowa 5.
Registered by The District Court for the Capital City of Warsaw, XII
Commercial Department of the National Court Register.
Entered into National Court Register under No. KRS 0000388871. Tax
identification number (NIP) 5272657478. Statistical number (REGON)
142974628.

-- 


-------------------------------
This document contains material that is 
confidential in CodiLime Sp. z o.o. DO NOT PRINT. DO NOT COPY. DO NOT 
DISTRIBUTE. If you are not the intended recipient of this document, be 
aware that any use, review, retransmission, distribution, reproduction or 
any action taken in reliance upon this message is strictly prohibited. If 
you received this in error, please contact the sender and help@codilime.com 
<mailto:help@codilime.com>. Return the paper copy, delete the material from 
all computers and storage media.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unexpected behavior when using mbuf pool with external buffers
  2021-12-22 16:30   ` Michał Niciejewski
@ 2022-01-18 13:41     ` Michał Niciejewski
  0 siblings, 0 replies; 6+ messages in thread
From: Michał Niciejewski @ 2022-01-18 13:41 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: users

Hi,

based on the materials you provided I found that IOTLB is the
bottleneck (not the TLB). I am DMA mapping the memory so if I
understand correctly, only IOMMU is involved here. Below I post some
outputs from pcm tool:

10mpps, first run
IOTLB Hit - 25 M
IOTLB Miss - 10 M

15mpps
IOTLB Hit - 28 M
IOTLB Miss - 20 M

10mpps, second run
IOTLB Hit - 23 M
IOTLB Miss - 18 M

I also tested the same scenario on another, more powerful server, and
the results differ greatly:

10mpps, first run
IOTLB Hit - 36 M
IOTLB Miss - 644 K

25mpps (here I had to send more packets before drops appeared)
IOTLB Hit - 71 M
IOTLB Miss - 3860 K

10mpps, second run
IOTLB Hit - 36 M
IOTLB Miss - 1047 K

So the problems with mempool fragmentation are visible but it is not
that painful as in the first server. It looks like the first server is
much worse in terms of IOMMU than the second one. I disabled IOMMU and
used physical addresses to create an external buffer mempool
(https://gist.github.com/tropuq/55c334bf3a2ab86b89a0b59e42b8af08) and
it solved the performance issues.

I have some questions about these results:
1. Is there something wrong with IOMMU in the first server - could it
be that it's missing some additional configuration?
2. Is it normal to see that big differences?
3. Is there any way to find some info about IOMMU, like the size of IOTLB, etc.?

Thanks,
Michał Niciejewski


On Wed, Dec 22, 2021 at 5:30 PM Michał Niciejewski
<michal.niciejewski@codilime.com> wrote:
>
> Thank you for the replay,
>
> On Wed, Dec 22, 2021 at 11:24 AM Van Haaren, Harry
> <harry.van.haaren@intel.com> wrote:
> > I'll "top post" on this reply as the content is in HTML format below. In future, please try to send plain-text emails to DPDK mailing lists.
>
> I hope it's better now.
>
> > Estimating and talking is never conclusive – lets measure using Linux "Perf" tool. Run this command 3x, just like you posted the drop stats below.
> >
> > I expect to see lower dTLB-load-misses on the first run (no drops, 10 mpps), and that the dTLB misses are higher for 15 mpps *and* for 10 mpps again afterwards.
> >
> > perf stat -e cycles,dTLB-load-misses -C <datapath_lcore_here> -- sleep 1
>
> extbuf, aligned_alloc, 10mpps, first run
>  Performance counter stats for 'CPU(s) 0':
>        2404553948      cycles
>               461      dTLB-load-misses
>       1.001938861 seconds time elapsed
>
> extbuf, aligned_alloc, 15mpps
>  Performance counter stats for 'CPU(s) 0':
>        2404518710      cycles
>               466      dTLB-load-misses
>       1.001920171 seconds time elapsed
>
> extbuf, aligned_alloc, 10mpps, second run
>  Performance counter stats for 'CPU(s) 0':
>        2402586106      cycles
>               449      dTLB-load-misses
>       1.001114692 seconds time elapsed
>
> I also checked what happens when there is no traffic at all and the
> results are similar:
>
>  Performance counter stats for 'CPU(s) 0':
>        2949935339      cycles
>               465      dTLB-load-misses
>       1.002236168 seconds time elapsed
>
> Also, I checked how the application behaves when adding --no-huge
> option and using a normal mbuf pool. The results are very different
> compared to aligned_alloc + extbuf mbuf pool:
>
> 10mpps, --no-huge
>  Performance counter stats for 'CPU(s) 0':
>        2402616160      cycles
>          17980033      dTLB-load-misses
>       1.001125954 seconds time elapsed
>
> Application logs:
> Queue: 0
> Number of all rx burst calls: 5757205
> Number of non-zero rx burst calls: 1073081
> Avg pkt nb received per rx burst: 1.7364
> All received pkts: 9996804
> All sent pkts: 8074460
> All dropped pkts: 1922344



-- 

Michał Niciejewski

Junior Software Engineer

michal.niciejewski@codilime.com



CodiLime Sp. z o.o. - Ltd. company with its registered office in
Poland, 02-493 Warsaw, ul. Krancowa 5.
Registered by The District Court for the Capital City of Warsaw, XII
Commercial Department of the National Court Register.
Entered into National Court Register under No. KRS 0000388871. Tax
identification number (NIP) 5272657478. Statistical number (REGON)
142974628.

-- 


-------------------------------
This document contains material that is 
confidential in CodiLime Sp. z o.o. DO NOT PRINT. DO NOT COPY. DO NOT 
DISTRIBUTE. If you are not the intended recipient of this document, be 
aware that any use, review, retransmission, distribution, reproduction or 
any action taken in reliance upon this message is strictly prohibited. If 
you received this in error, please contact the sender and help@codilime.com 
<mailto:help@codilime.com>. Return the paper copy, delete the material from 
all computers and storage media.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Unexpected behavior when using mbuf pool with external buffers
@ 2021-12-21 11:48 Michał Niciejewski
  0 siblings, 0 replies; 6+ messages in thread
From: Michał Niciejewski @ 2021-12-21 11:48 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 5009 bytes --]

Hi,

recently I stumbled upon a problem with mbuf pool with external buffers. I
allocated some memory with aligned_alloc(), registered it, DMA mapped the
memory, and created mbuf pool:

size_t mem_size = RTE_ALIGN_CEIL(MBUFS_NUM * QUEUE_NUM *
RTE_MBUF_DEFAULT_BUF_SIZE, 4096);
auto mem = aligned_alloc(4096, mem_size);
mlock(mem, mem_size);
rte_pktmbuf_extmem ext_mem = {
    .buf_ptr = mem,
    .buf_iova = (uintptr_t)mem,
    .buf_len = mem_size,
    .elt_size = RTE_MBUF_DEFAULT_BUF_SIZE,
};

if (rte_extmem_register(ext_mem.buf_ptr, ext_mem.buf_len, nullptr, 0, 4096)
!= 0)
    throw runtime_error("Failed to register DPDK external memory");

if (rte_dev_dma_map(dev, ext_mem.buf_ptr, ext_mem.buf_iova,
ext_mem.buf_len) != 0)
    throw runtime_error("Failed to DMA map external memory");

mp = rte_pktmbuf_pool_create_extbuf("ext_mbuf_pool", MBUFS_NUM * QUEUE_NUM,
0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_eth_dev_socket_id(0), &ext_mem, 1);
if (mp == nullptr)
    throw runtime_error("Failed to create external mbuf pool");

The main loop of the program works like normal l2fwd: it receives packets
and sends them to another port.

std::vector<rte_mbuf *> mbufs(MAX_PKT_BURST);
while (true) {
    auto rx_num = rte_eth_rx_burst(0, queue, mbufs.data(), MAX_PKT_BURST);
    if (!rx_num)
        continue;
    // ...
    auto tx_num = rte_eth_tx_burst(1, queue, mbufs.data(), rx_num);
    rte_pktmbuf_free_bulk(mbufs.data() + tx_num, rx_num - tx_num);
}

Every second, the program prints some info about the packets received in
this second and some stats regarding rte_eth_tx_burst calls. For example,
logs printed while receiving and sending 10mpps:

Number of all rx burst calls: 12238365
Number of non-zero rx burst calls: 966834
Avg pkt nb received per rx burst: 0.816879
All received pkts: 9997264
All sent pkts: 9997264
All dropped pkts: 0

For lower traffic, everything looks fine. But when I start sending more
packets some unexpected behavior occurs. When I increase traffic to 15mpps
most of the packets are dropped on TX:

Queue: 0
Number of rx burst calls: 4449541
Number of non-zero rx burst calls: 1616833
Avg pkt nb received per rx burst: 3.36962
All received pkts: 14993272
All sent pkts: 5827744
All dropped pkts: 9165528

After that, I checked again the results for 10mpps. Even though previously
the application didn't have any troubles in managing 10mpps, now it does:

Queue: 0
Number of all rx burst calls: 8722385
Number of non-zero rx burst calls: 1447741
Avg pkt nb received per rx burst: 1.14617
All received pkts: 9997316
All sent pkts: 8194416
All dropped pkts: 1802900

So basically it looks like sending too many packets breaks something and
starts causing problems when sending fewer packets.

I also tried allocating huge pages for mbuf pool instead of memory returned
from aligned_alloc:

auto mem = mmap(0, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE |
MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);

And actually, it solved the problems - too big traffic doesn't affect lower
traffic management. But I still want to know why memory allocated using
aligned_alloc causes problems because in the place where I want to use mbuf
pools with external buffers huge pages cannot be used like that.

The full code used for testing:
https://gist.github.com/tropuq/22625e0e5ac420a8ff5ae072a16f4c06

NIC used: Supermicro AOC-S25G-I2S-O Std Low Profile 25G Dual Port SFP28,
based on Intel XXV710

Did anyone have similar issues or know what could cause such behavior? Is
this allocation of the mbuf pool correct or am I missing something?

Thanks in advance

-- 

Michał Niciejewski

Junior Software Engineer

michal.niciejewski@codilime.com
[image: Logo Codilime]
<http://www.codilime.com/?utm_source=Stopka&utm_medium=Email&utm_campaign=Stopka>

[image: Logo Facebook] <https://www.facebook.com/codilime/> [image: Logo
Linkedin] <https://www.linkedin.com/company/codilime> [image: Logo Twitter]
<https://twitter.com/codilime>

CodiLime Sp. z o.o. - Ltd. company with its registered office in Poland,
02-493 Warsaw, ul. Krancowa 5.
Registered by The District Court for the Capital City of Warsaw,
XII Commercial Department of the National Court Register.
Entered into National Court Register under No. KRS 0000388871.
Tax identification number (NIP) 5272657478. Statistical number (REGON)
142974628.

-- 


-------------------------------
This document contains material that is 
confidential in CodiLime Sp. z o.o. DO NOT PRINT. DO NOT COPY. DO NOT 
DISTRIBUTE. If you are not the intended recipient of this document, be 
aware that any use, review, retransmission, distribution, reproduction or 
any action taken in reliance upon this message is strictly prohibited. If 
you received this in error, please contact the sender and help@codilime.com 
<mailto:help@codilime.com>. Return the paper copy, delete the material from 
all computers and storage media.

[-- Attachment #2: Type: text/html, Size: 8056 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-18 13:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-22  9:56 Unexpected behavior when using mbuf pool with external buffers Michał Niciejewski
2021-12-22 10:24 ` Van Haaren, Harry
2021-12-22 16:30   ` Michał Niciejewski
2022-01-18 13:41     ` Michał Niciejewski
2021-12-22 12:28 ` Gábor LENCSE
  -- strict thread matches above, loose matches on Subject: below --
2021-12-21 11:48 Michał Niciejewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).