* Re: mbuf refcnt issue (Fabio Fernandes)
@ 2025-04-06 2:24 Fabio Fernandes
0 siblings, 0 replies; only message in thread
From: Fabio Fernandes @ 2025-04-06 2:24 UTC (permalink / raw)
To: users
Hi Ed,
Are you using RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE?
This flag is not compatible with manual refcount, since PMDs are not required to check refcount when freeing the mbuf.
Regards,
Fabio Fernandes
Sent with Proton Mail secure email.
On Saturday, April 5th, 2025 at 7:00 AM, users-request@dpdk.org <users-request@dpdk.org> wrote:
> Send users mailing list submissions to
> users@dpdk.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mails.dpdk.org/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-request@dpdk.org
>
> You can reach the person managing the list at
> users-owner@dpdk.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
> 1. Re: mbuf refcnt issue (Dmitry Kozlyuk)
> 2. Re: hugepages on both sockets (Dmitry Kozlyuk)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 5 Apr 2025 01:29:05 +0300
> From: Dmitry Kozlyuk dmitry.kozliuk@gmail.com
>
> To: "Lombardo, Ed" Ed.Lombardo@netscout.com, "users@dpdk.org"
>
> users@dpdk.org
>
> Subject: Re: mbuf refcnt issue
> Message-ID: af1f01dc-8ab1-4d39-9b29-93448e97057b@gmail.com
>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hi Ed,
>
> On 05.04.2025 01:00, Lombardo, Ed wrote:
>
> > Hi,
> >
> > I have an application where we receive packets and transmit them.? The
> > packet data is inspected and later mbuf is freed to mempool.
> >
> > The pipeline is such that the Rx packet mbuf is saved to rx worker
> > ring, then the application threads process the packets and decides if
> > to transmit the packet and if true then increments the mbuf to a value
> > of 2.
>
> Do I understand the pipeline correctly?
>
> Rx thread:
>
> ??? receive mbuf
> ??? put mbuf into the ring
> ??? inspect mbuf
> ??? free mbuf
>
> Worker thread:
>
> ??? take mbuf from the ring
> ??? if decided to transmit it,
> ??? ??? increment refcnt
> ??? ??? transmit mbuf
>
> If so, there's a problem that after Rx thread puts mbuf into the ring,
> mbuf is owned by Rx thread and the ring, so its refcnt must be 2 when it
> enters the ring:
>
> Rx thread:
>
> ??? receive mbuf
> ??? increment refcnt
> ??? put mbuf into the ring
> ??? inspect mbuf
> ??? free mbuf (just decrements refcnt if > 1)
>
>
> Worker thread:
>
> ??? take mbuf from the ring
> ??? if decided to transmit it,
> ??? ??? transmit (or put into the bulk transmitted later)
> ??? else
> ??? ??? free mbuf (just decrements refcnt if > 1)
>
> > The batch of mbufs to transmit are put in a Tx ring queue for the Tx
> > thread to pull from and call the DPDK rte_eth_tx_burst() with the
> > batch of mbufs (limited to 400 mbufs).? In theory the transmit
> > operation will decrement the mbuf refcnt.? In our application we could
> > see the tx of the mbuf followed by another application thread that
> > calls to free the mbufs, or vice versa.? We have no way to synchronize
> > these threads.
> >
> > Is the mbuf refcnt updates thread safe to allow un-deterministic
> > handling of the mbufs among multiple threads?? The decision to
> > transmit the mbuf and increment the mbuf refcnt and load in the tx
> > ring is completed before the application says it is finished and frees
> > the mbufs.
>
> Have you validated this assumption?
> If my understanding above is correct, there's no synchronization and
> thus no guarantees.
>
> > I am seeing in my error checking code the mbuf refcnt contains large
> > values like 65520, 65529, 65530, 65534, 65535 in the early pipeline
> > stage refcnt checks.
> >
> > I read online and in the DPDK code that the mbuf refcnt update is
> > atomic, and is thread safe; so, this is good.
> >
> > Now this part is unclear to me and that is when the rte_eth_tx_burst()
> > is called and returns the number of packets transmitted , does this
> > ?mean that transmit of the packets are completed and mbuf refcnt is
> > decremented by 1 on return, or maybe the Tx engine queue is populated
> > and mbuf refcnt is not decremented until it is actually transmitted,
> > or much worse later in time.
> >
> > Is the DPDK Tx operation intended to be the last stage of any pipeline
> > that will free the mbuf if successfully transmitted?
>
> Return from rte_eth_tx_burst() means that mbufs are queued for transmission.
> Hardware completes transmission asynchronously.
> The next call to rte_eth_tx_burst() will poll HW,
> learn status of mbufs previously queued,
> and calls rte_pktmbuf_free() for those that are transmitted.
> The latter will free mbufs to mempool if and only if refcnt == 1.
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 5 Apr 2025 01:39:47 +0300
> From: Dmitry Kozlyuk dmitry.kozliuk@gmail.com
>
> To: "Lombardo, Ed" Ed.Lombardo@netscout.com, "users@dpdk.org"
>
> users@dpdk.org
>
> Subject: Re: hugepages on both sockets
> Message-ID: e66932f6-5c01-4b62-91fa-41f0b9b2bd1d@gmail.com
>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hi Ed,
>
> On 05.04.2025 01:24, Lombardo, Ed wrote:
>
> > Hi,
> >
> > I tried to pass into dpdk_eal_init() the argument
> > --socket-mem=2048,2048? and I get segmentation error when strsplit()
> > function is called
> >
> > ????????arg_num = rte_strsplit(strval, len,
> >
> > ??????????????????????? arg, RTE_MAX_NUMA_NODES, ',');
>
> Please forgive me for the stupid question:
> "strval" points to a mutable buffer, like "char strval[] = "2048,2048",
> not "char *strval = "2048,2048"?
>
> > If I pass ?--socket_mem=2048?, --socket-mem=2048?, rte_eal_init() does
> > not complain.
> >
> > Not sure if this would ensure both CPU sockets will host 2-1G
> > hugepages?? I suspect it doesn?t because I only see to rtemap_0 and
> > rtemap_1 in /mnt/huge directory.? I think I should see four total.
> >
> > # /opt/dpdk/dpdk-hugepages.py -s
> >
> > Node Pages Size Total
> >
> > 0??? 2???? 1Gb??? 2Gb
> >
> > 1??? 2???? 1Gb??? 2Gb
> >
> > I don?t know if I should believe the above output showing 2Gb on Numa
> > Nodes 0 and 1.
>
> You are correct, --socket-mem=2048 allocates 2048 MB total, spreading
> between nodes.
>
>
> End of users Digest, Vol 484, Issue 3
> *************************************
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-04-06 2:24 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-06 2:24 mbuf refcnt issue (Fabio Fernandes) Fabio Fernandes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).