* mbufs getting reused despite nonzero refcnt @ 2024-11-10 16:23 Alan Beadle 2024-11-10 17:12 ` Stephen Hemminger 0 siblings, 1 reply; 5+ messages in thread From: Alan Beadle @ 2024-11-10 16:23 UTC (permalink / raw) To: users Hi everyone, I am using DPDK to send two-way traffic between a pair of machines. My application has local readers, remote acknowledgments, as well as automatic retries when a packet is lost. For these reasons I am using rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet and recycling the mbuf before my local readers are done and the remote reader has acknowledged the message. I was advised to do this in an earlier thread on this mailing list. However, this does not seem to be working. After running my app for awhile and exchanging about 1000 messages in this way, my queue of unacknowledged mbufs is getting corrupted. The mbufs attached to my queue seem to contain data for newer messages than what is supposed to be in them, and in some cases contains a totally different type of packet (an acknack for instance). Obviously this results in retries of those messages failing to send the correct data and my application gets stuck. I have ensured that the refcount is not reaching 0. Each new mbuf immediately has the refcnt incremented by 1. I was concerned that retries might need the refcnt bumped again, but if I bump the refcount every time I resend a specific mbuf to the NIC, the refcounts just keep getting higher. So it looks like re-bumping it on a resend is not necessary. I have ruled out other possible explanations. The mbufs are being reused by rte_pktmbuf_alloc. I even tried playing with the EAL settings related to the number of mbuf descriptors and saw my changes directly correlate with how long it takes this problem to occur. How do I really prevent the driver from reusing packets that I still might need to resend? Thanks in advance, -Alan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mbufs getting reused despite nonzero refcnt 2024-11-10 16:23 mbufs getting reused despite nonzero refcnt Alan Beadle @ 2024-11-10 17:12 ` Stephen Hemminger 2024-11-10 17:31 ` Alan Beadle 0 siblings, 1 reply; 5+ messages in thread From: Stephen Hemminger @ 2024-11-10 17:12 UTC (permalink / raw) To: Alan Beadle; +Cc: users On Sun, 10 Nov 2024 11:23:29 -0500 Alan Beadle <ab.beadle@gmail.com> wrote: > Hi everyone, > > I am using DPDK to send two-way traffic between a pair of machines. My > application has local readers, remote acknowledgments, as well as > automatic retries when a packet is lost. For these reasons I am using > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > and recycling the mbuf before my local readers are done and the remote > reader has acknowledged the message. I was advised to do this in an > earlier thread on this mailing list. > > However, this does not seem to be working. After running my app for > awhile and exchanging about 1000 messages in this way, my queue of > unacknowledged mbufs is getting corrupted. The mbufs attached to my > queue seem to contain data for newer messages than what is supposed to > be in them, and in some cases contains a totally different type of > packet (an acknack for instance). Obviously this results in retries of > those messages failing to send the correct data and my application > gets stuck. > > I have ensured that the refcount is not reaching 0. Each new mbuf > immediately has the refcnt incremented by 1. I was concerned that > retries might need the refcnt bumped again, but if I bump the refcount > every time I resend a specific mbuf to the NIC, the refcounts just > keep getting higher. So it looks like re-bumping it on a resend is not > necessary. > > I have ruled out other possible explanations. The mbufs are being > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > settings related to the number of mbuf descriptors and saw my changes > directly correlate with how long it takes this problem to occur. How > do I really prevent the driver from reusing packets that I still might > need to resend? > > Thanks in advance, > -Alan Which driver, could be a driver bug. Also, you should be able to trace mbuf functions, either with rte_trace or by other facility. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mbufs getting reused despite nonzero refcnt 2024-11-10 17:12 ` Stephen Hemminger @ 2024-11-10 17:31 ` Alan Beadle 2024-11-12 13:02 ` Alan Beadle 0 siblings, 1 reply; 5+ messages in thread From: Alan Beadle @ 2024-11-10 17:31 UTC (permalink / raw) To: Stephen Hemminger; +Cc: users I'm using the vfio-pci module with Intel X550-T2 NICs. I believe this means it will use the ixgbe driver? To be honest, I am a bit confused about the use of drivers in DPDK. I am using the first setup that I got to work and send/receive packets. Additional tips would be greatly appreciated. After loading the vfio-pci module I run dpdk-devbind.py --bind vfio-pci 65:00.1 and then I just use the standard DPDK API calls in my app. I was meaning to revisit this once my app was more complete. On Sun, Nov 10, 2024 at 12:12 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Sun, 10 Nov 2024 11:23:29 -0500 > Alan Beadle <ab.beadle@gmail.com> wrote: > > > Hi everyone, > > > > I am using DPDK to send two-way traffic between a pair of machines. My > > application has local readers, remote acknowledgments, as well as > > automatic retries when a packet is lost. For these reasons I am using > > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > > and recycling the mbuf before my local readers are done and the remote > > reader has acknowledged the message. I was advised to do this in an > > earlier thread on this mailing list. > > > > However, this does not seem to be working. After running my app for > > awhile and exchanging about 1000 messages in this way, my queue of > > unacknowledged mbufs is getting corrupted. The mbufs attached to my > > queue seem to contain data for newer messages than what is supposed to > > be in them, and in some cases contains a totally different type of > > packet (an acknack for instance). Obviously this results in retries of > > those messages failing to send the correct data and my application > > gets stuck. > > > > I have ensured that the refcount is not reaching 0. Each new mbuf > > immediately has the refcnt incremented by 1. I was concerned that > > retries might need the refcnt bumped again, but if I bump the refcount > > every time I resend a specific mbuf to the NIC, the refcounts just > > keep getting higher. So it looks like re-bumping it on a resend is not > > necessary. > > > > I have ruled out other possible explanations. The mbufs are being > > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > > settings related to the number of mbuf descriptors and saw my changes > > directly correlate with how long it takes this problem to occur. How > > do I really prevent the driver from reusing packets that I still might > > need to resend? > > > > Thanks in advance, > > -Alan > > Which driver, could be a driver bug. > > Also, you should be able to trace mbuf functions, either with rte_trace > or by other facility. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mbufs getting reused despite nonzero refcnt 2024-11-10 17:31 ` Alan Beadle @ 2024-11-12 13:02 ` Alan Beadle 2024-11-14 16:14 ` Alan Beadle 0 siblings, 1 reply; 5+ messages in thread From: Alan Beadle @ 2024-11-12 13:02 UTC (permalink / raw) To: Stephen Hemminger; +Cc: users Is there anything in the usage I described in my previous email which might explain this problem? Is there anything else wrong with what I'm doing driver-wise? On Sun, Nov 10, 2024 at 12:31 PM Alan Beadle <ab.beadle@gmail.com> wrote: > > I'm using the vfio-pci module with Intel X550-T2 NICs. I believe this > means it will use the ixgbe driver? To be honest, I am a bit confused > about the use of drivers in DPDK. I am using the first setup that I > got to work and send/receive packets. Additional tips would be greatly > appreciated. After loading the vfio-pci module I run dpdk-devbind.py > --bind vfio-pci 65:00.1 and then I just use the standard DPDK API > calls in my app. I was meaning to revisit this once my app was more > complete. > > On Sun, Nov 10, 2024 at 12:12 PM Stephen Hemminger > <stephen@networkplumber.org> wrote: > > > > On Sun, 10 Nov 2024 11:23:29 -0500 > > Alan Beadle <ab.beadle@gmail.com> wrote: > > > > > Hi everyone, > > > > > > I am using DPDK to send two-way traffic between a pair of machines. My > > > application has local readers, remote acknowledgments, as well as > > > automatic retries when a packet is lost. For these reasons I am using > > > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > > > and recycling the mbuf before my local readers are done and the remote > > > reader has acknowledged the message. I was advised to do this in an > > > earlier thread on this mailing list. > > > > > > However, this does not seem to be working. After running my app for > > > awhile and exchanging about 1000 messages in this way, my queue of > > > unacknowledged mbufs is getting corrupted. The mbufs attached to my > > > queue seem to contain data for newer messages than what is supposed to > > > be in them, and in some cases contains a totally different type of > > > packet (an acknack for instance). Obviously this results in retries of > > > those messages failing to send the correct data and my application > > > gets stuck. > > > > > > I have ensured that the refcount is not reaching 0. Each new mbuf > > > immediately has the refcnt incremented by 1. I was concerned that > > > retries might need the refcnt bumped again, but if I bump the refcount > > > every time I resend a specific mbuf to the NIC, the refcounts just > > > keep getting higher. So it looks like re-bumping it on a resend is not > > > necessary. > > > > > > I have ruled out other possible explanations. The mbufs are being > > > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > > > settings related to the number of mbuf descriptors and saw my changes > > > directly correlate with how long it takes this problem to occur. How > > > do I really prevent the driver from reusing packets that I still might > > > need to resend? > > > > > > Thanks in advance, > > > -Alan > > > > Which driver, could be a driver bug. > > > > Also, you should be able to trace mbuf functions, either with rte_trace > > or by other facility. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: mbufs getting reused despite nonzero refcnt 2024-11-12 13:02 ` Alan Beadle @ 2024-11-14 16:14 ` Alan Beadle 0 siblings, 0 replies; 5+ messages in thread From: Alan Beadle @ 2024-11-14 16:14 UTC (permalink / raw) To: Stephen Hemminger; +Cc: users I think I have a theory about what might be going wrong now, but I have a question. Is there any circumstance under which rte_eth_rx_burst() will return mbufs which were previously sent by the same machine? That is, might my application be "receiving" it's own packets? It looks like this may be the case, and since dropped packets get freed, and packets not addressed to the machine are dropped, I might be accidentally freeing mbufs when they are "received" on the same machine that sent them. On Tue, Nov 12, 2024 at 8:02 AM Alan Beadle <ab.beadle@gmail.com> wrote: > > Is there anything in the usage I described in my previous email which > might explain this problem? Is there anything else wrong with what I'm > doing driver-wise? > > On Sun, Nov 10, 2024 at 12:31 PM Alan Beadle <ab.beadle@gmail.com> wrote: > > > > I'm using the vfio-pci module with Intel X550-T2 NICs. I believe this > > means it will use the ixgbe driver? To be honest, I am a bit confused > > about the use of drivers in DPDK. I am using the first setup that I > > got to work and send/receive packets. Additional tips would be greatly > > appreciated. After loading the vfio-pci module I run dpdk-devbind.py > > --bind vfio-pci 65:00.1 and then I just use the standard DPDK API > > calls in my app. I was meaning to revisit this once my app was more > > complete. > > > > On Sun, Nov 10, 2024 at 12:12 PM Stephen Hemminger > > <stephen@networkplumber.org> wrote: > > > > > > On Sun, 10 Nov 2024 11:23:29 -0500 > > > Alan Beadle <ab.beadle@gmail.com> wrote: > > > > > > > Hi everyone, > > > > > > > > I am using DPDK to send two-way traffic between a pair of machines. My > > > > application has local readers, remote acknowledgments, as well as > > > > automatic retries when a packet is lost. For these reasons I am using > > > > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > > > > and recycling the mbuf before my local readers are done and the remote > > > > reader has acknowledged the message. I was advised to do this in an > > > > earlier thread on this mailing list. > > > > > > > > However, this does not seem to be working. After running my app for > > > > awhile and exchanging about 1000 messages in this way, my queue of > > > > unacknowledged mbufs is getting corrupted. The mbufs attached to my > > > > queue seem to contain data for newer messages than what is supposed to > > > > be in them, and in some cases contains a totally different type of > > > > packet (an acknack for instance). Obviously this results in retries of > > > > those messages failing to send the correct data and my application > > > > gets stuck. > > > > > > > > I have ensured that the refcount is not reaching 0. Each new mbuf > > > > immediately has the refcnt incremented by 1. I was concerned that > > > > retries might need the refcnt bumped again, but if I bump the refcount > > > > every time I resend a specific mbuf to the NIC, the refcounts just > > > > keep getting higher. So it looks like re-bumping it on a resend is not > > > > necessary. > > > > > > > > I have ruled out other possible explanations. The mbufs are being > > > > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > > > > settings related to the number of mbuf descriptors and saw my changes > > > > directly correlate with how long it takes this problem to occur. How > > > > do I really prevent the driver from reusing packets that I still might > > > > need to resend? > > > > > > > > Thanks in advance, > > > > -Alan > > > > > > Which driver, could be a driver bug. > > > > > > Also, you should be able to trace mbuf functions, either with rte_trace > > > or by other facility. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-14 16:14 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-11-10 16:23 mbufs getting reused despite nonzero refcnt Alan Beadle 2024-11-10 17:12 ` Stephen Hemminger 2024-11-10 17:31 ` Alan Beadle 2024-11-12 13:02 ` Alan Beadle 2024-11-14 16:14 ` Alan Beadle
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).