From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 49D6F464D7 for ; Sun, 6 Apr 2025 04:24:19 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CDAC140041; Sun, 6 Apr 2025 04:24:18 +0200 (CEST) Received: from mail-24425.protonmail.ch (mail-24425.protonmail.ch [109.224.244.25]) by mails.dpdk.org (Postfix) with ESMTP id 15D734003C for ; Sun, 6 Apr 2025 04:24:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=frzh2yuy7zdyfghurp6j44bhra.protonmail; t=1743906256; x=1744165456; bh=G9uX3YPwWuu48N6Peh1KNvwMvi5Onv4Pq0lV82scxdE=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector: List-Unsubscribe:List-Unsubscribe-Post; b=IkTVTWzzjl7yUbiEnZoPv6pV+jjMG5aVTcTXlBdsf5V5FGPQWUWSLa46v5VEJaXqi XKeGQuD0NXh+jREqhiKPCB8ejAOJopJI9LyWcYv5Rz/Wj3i4C1l1cavRa9uMwKR73E s4uJPrmxp+GIbTvWEIMQHtPefUUe5xtY72n/GaKHkwcniBQVYxQ3Q8AhYLCRTiQIv8 qvzp8LCh0iRdO2boxyvsNhTxyUakJ2aCbt2rZhfijmK1U+sZq9cmv12PuHGAv3bJii vb4+v78oEoPkWcGNvY9udtK6Us/v30uo6afngOtIrmKKiLMmlIQpGHoRZ2B6sFcuID pRDF2ahOKCw7g== Date: Sun, 06 Apr 2025 02:24:09 +0000 To: users@dpdk.org From: Fabio Fernandes Subject: Re: mbuf refcnt issue (Fabio Fernandes) Message-ID: Feedback-ID: 46522295:user:proton X-Pm-Message-ID: 02af32cb5b0c6ece22ec1233a0d2d047ddbe5954 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Hi Ed, Are you using RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE? This flag is not compatible with manual refcount, since PMDs are not requir= ed to check refcount when freeing the mbuf. Regards, Fabio Fernandes Sent with Proton Mail secure email. On Saturday, April 5th, 2025 at 7:00 AM, users-request@dpdk.org wrote: > Send users mailing list submissions to > users@dpdk.org >=20 > To subscribe or unsubscribe via the World Wide Web, visit > https://mails.dpdk.org/listinfo/users > or, via email, send a message with subject or body 'help' to > users-request@dpdk.org >=20 > You can reach the person managing the list at > users-owner@dpdk.org >=20 > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." >=20 >=20 > Today's Topics: >=20 > 1. Re: mbuf refcnt issue (Dmitry Kozlyuk) > 2. Re: hugepages on both sockets (Dmitry Kozlyuk) >=20 >=20 > ---------------------------------------------------------------------- >=20 > Message: 1 > Date: Sat, 5 Apr 2025 01:29:05 +0300 > From: Dmitry Kozlyuk dmitry.kozliuk@gmail.com >=20 > To: "Lombardo, Ed" Ed.Lombardo@netscout.com, "users@dpdk.org" >=20 > users@dpdk.org >=20 > Subject: Re: mbuf refcnt issue > Message-ID: af1f01dc-8ab1-4d39-9b29-93448e97057b@gmail.com >=20 > Content-Type: text/plain; charset=3DUTF-8; format=3Dflowed >=20 > Hi Ed, >=20 > On 05.04.2025 01:00, Lombardo, Ed wrote: >=20 > > Hi, > >=20 > > I have an application where we receive packets and transmit them.? The > > packet data is inspected and later mbuf is freed to mempool. > >=20 > > The pipeline is such that the Rx packet mbuf is saved to rx worker > > ring, then the application threads process the packets and decides if > > to transmit the packet and if true then increments the mbuf to a value > > of 2. >=20 > Do I understand the pipeline correctly? >=20 > Rx thread: >=20 > ??? receive mbuf > ??? put mbuf into the ring > ??? inspect mbuf > ??? free mbuf >=20 > Worker thread: >=20 > ??? take mbuf from the ring > ??? if decided to transmit it, > ??? ??? increment refcnt > ??? ??? transmit mbuf >=20 > If so, there's a problem that after Rx thread puts mbuf into the ring, > mbuf is owned by Rx thread and the ring, so its refcnt must be 2 when it > enters the ring: >=20 > Rx thread: >=20 > ??? receive mbuf > ??? increment refcnt > ??? put mbuf into the ring > ??? inspect mbuf > ??? free mbuf (just decrements refcnt if > 1) >=20 >=20 > Worker thread: >=20 > ??? take mbuf from the ring > ??? if decided to transmit it, > ??? ??? transmit (or put into the bulk transmitted later) > ??? else > ??? ??? free mbuf (just decrements refcnt if > 1) >=20 > > The batch of mbufs to transmit are put in a Tx ring queue for the Tx > > thread to pull from and call the DPDK rte_eth_tx_burst() with the > > batch of mbufs (limited to 400 mbufs).? In theory the transmit > > operation will decrement the mbuf refcnt.? In our application we could > > see the tx of the mbuf followed by another application thread that > > calls to free the mbufs, or vice versa.? We have no way to synchronize > > these threads. > >=20 > > Is the mbuf refcnt updates thread safe to allow un-deterministic > > handling of the mbufs among multiple threads?? The decision to > > transmit the mbuf and increment the mbuf refcnt and load in the tx > > ring is completed before the application says it is finished and frees > > the mbufs. >=20 > Have you validated this assumption? > If my understanding above is correct, there's no synchronization and > thus no guarantees. >=20 > > I am seeing in my error checking code the mbuf refcnt contains large > > values like 65520, 65529, 65530, 65534, 65535 in the early pipeline > > stage refcnt checks. > >=20 > > I read online and in the DPDK code that the mbuf refcnt update is > > atomic, and is thread safe; so, this is good. > >=20 > > Now this part is unclear to me and that is when the rte_eth_tx_burst() > > is called and returns the number of packets transmitted , does this > > ?mean that transmit of the packets are completed and mbuf refcnt is > > decremented by 1 on return, or maybe the Tx engine queue is populated > > and mbuf refcnt is not decremented until it is actually transmitted, > > or much worse later in time. > >=20 > > Is the DPDK Tx operation intended to be the last stage of any pipeline > > that will free the mbuf if successfully transmitted? >=20 > Return from rte_eth_tx_burst() means that mbufs are queued for transmissi= on. > Hardware completes transmission asynchronously. > The next call to rte_eth_tx_burst() will poll HW, > learn status of mbufs previously queued, > and calls rte_pktmbuf_free() for those that are transmitted. > The latter will free mbufs to mempool if and only if refcnt =3D=3D 1. >=20 >=20 > ------------------------------ >=20 > Message: 2 > Date: Sat, 5 Apr 2025 01:39:47 +0300 > From: Dmitry Kozlyuk dmitry.kozliuk@gmail.com >=20 > To: "Lombardo, Ed" Ed.Lombardo@netscout.com, "users@dpdk.org" >=20 > users@dpdk.org >=20 > Subject: Re: hugepages on both sockets > Message-ID: e66932f6-5c01-4b62-91fa-41f0b9b2bd1d@gmail.com >=20 > Content-Type: text/plain; charset=3DUTF-8; format=3Dflowed >=20 > Hi Ed, >=20 > On 05.04.2025 01:24, Lombardo, Ed wrote: >=20 > > Hi, > >=20 > > I tried to pass into dpdk_eal_init() the argument > > --socket-mem=3D2048,2048? and I get segmentation error when strsplit() > > function is called > >=20 > > ????????arg_num =3D rte_strsplit(strval, len, > >=20 > > ??????????????????????? arg, RTE_MAX_NUMA_NODES, ','); >=20 > Please forgive me for the stupid question: > "strval" points to a mutable buffer, like "char strval[] =3D "2048,2048", > not "char *strval =3D "2048,2048"? >=20 > > If I pass ?--socket_mem=3D2048?, --socket-mem=3D2048?, rte_eal_init() d= oes > > not complain. > >=20 > > Not sure if this would ensure both CPU sockets will host 2-1G > > hugepages?? I suspect it doesn?t because I only see to rtemap_0 and > > rtemap_1 in /mnt/huge directory.? I think I should see four total. > >=20 > > # /opt/dpdk/dpdk-hugepages.py -s > >=20 > > Node Pages Size Total > >=20 > > 0??? 2???? 1Gb??? 2Gb > >=20 > > 1??? 2???? 1Gb??? 2Gb > >=20 > > I don?t know if I should believe the above output showing 2Gb on Numa > > Nodes 0 and 1. >=20 > You are correct, --socket-mem=3D2048 allocates 2048 MB total, spreading > between nodes. >=20 >=20 > End of users Digest, Vol 484, Issue 3 > *************************************