just received an update marking this as `Superseded`. I will send again with `ACK` also Thank you Morten for the understanding > ------------------------------------------------------------------------ > *From:* Morten Brørup > *Sent:* 12 December 2023 23:39 > *To:* Varghese, Vipin ; Bruce Richardson > > *Cc:* Yigit, Ferruh ; dev@dpdk.org > ; stable@dpdk.org ; > honest.jiang@foxmail.com ; P, Thiyagarajan > > *Subject:* RE: [PATCH] app/dma-perf: replace pktmbuf with mempool objects > > > Caution: This message originated from an External Source. Use proper > caution when opening attachments, clicking links, or responding. > > > *From:*Varghese, Vipin [mailto:Vipin.Varghese@amd.com] > *Sent:* Tuesday, 12 December 2023 18.14 > > Sharing a few critical points based on my exposure to the dma-perf > application below > > > > On Tue, Dec 12, 2023 at 04:16:20PM +0100, Morten Brørup wrote: > > +TO: Bruce, please stop me if I'm completely off track here. > > > > > From: Ferruh Yigit [mailto:ferruh.yigit@amd.com > ] Sent: Tuesday, 12 > > > December 2023 15.38 > > > > > > On 12/12/2023 11:40 AM, Morten Brørup wrote: > > > >> From: Vipin Varghese [mailto:vipin.varghese@amd.com > ] Sent: Tuesday, > > > >> 12 December 2023 11.38 > > > >> > > > >> Replace pktmbuf pool with mempool, this allows increase in MOPS > > > >> especially in lower buffer size. Using Mempool, allows to > reduce the > > > >> extra CPU cycles. > > > > > > > > I get the point of this change: It tests the performance of copying > > > raw memory objects using respectively rte_memcpy and DMA, without the > > > mbuf indirection overhead. > > > > > > > > However, I still consider the existing test relevant: The > performance > > > of copying packets using respectively rte_memcpy and DMA. > > > > > > > > > > This is DMA performance test application and packets are not used, > > > using pktmbuf just introduces overhead to the main focus of the > > > application. > > > > > > I am not sure if pktmuf selected intentionally for this test > > > application, but I assume it is there because of historical reasons. > > > > I think pktmbuf was selected intentionally, to provide more accurate > > results for application developers trying to determine when to use > > rte_memcpy and when to use DMA. Much like the "copy breakpoint" in Linux > > Ethernet drivers is used to determine which code path to take for each > > received packet. > > yes Ferruh, this is the right understanding. In DPDK example we > already have > > dma-forward application which makes use of pktmbuf payload to copy over > > new pktmbuf payload area. > > by moving to mempool, we are actually now focusing on source and > destination buffers. > > This allows to create mempool objects with 2MB and 1GB src-dst areas. > Thus allowing > > to focus src to dst copy. With pktmbuf we were not able to achieve the > same. > > > > > > Most applications will be working with pktmbufs, so these applications > > will also experience the pktmbuf overhead. Performance testing with the > > same overhead as the application will be better to help the application > > developer determine when to use rte_memcpy and when to use DMA when > > working with pktmbufs. > > Morten thank you for the input, but as shared above DPDK example > dma-fwd does > > justice to such scenario. inline to test-compress-perf & > test-crypto-perf IMHO test-dma-perf > > should focus on getting best values of dma engine and memcpy comparision. > > > > > > (Furthermore, for the pktmbuf tests, I wonder if copying performance > > could also depend on IOVA mode and RTE_IOVA_IN_MBUF.) > > > > Nonetheless, there may also be use cases where raw mempool objects are > > being copied by rte_memcpy or DMA, so adding tests for these use cases > > are useful. > > > > > > @Bruce, you were also deeply involved in the DMA library, and probably > > have more up-to-date practical experience with it. Am I right that > > pktmbuf overhead in these tests provides more "real life use"-like > > results? Or am I completely off track with my thinking here, i.e. the > > pktmbuf overhead is only noise? > > > I'm actually not that familiar with the dma-test application, so can't > comment on the specific overhead involved here. In the general case, if we > are just talking about the overhead of dereferencing the mbufs then I > would > expect the overhead to be negligible. However, if we are looking to > include > the cost of allocation and freeing of buffers, I'd try to avoid that as it > is a cost that would have to be paid for both SW copies and HW copies, so > should not count when calculating offload cost. > > Bruce, as per test-dma-perf there is no repeated pktmbuf-alloc or > pktmbuf-free. > > Hence I disagree that the overhead discussed for pkmbuf here is not > related to alloc and free. > > But the cost as per my investigation goes into fetching the cacheline > and performing mtod on > > each iteration. > > /Bruce > > I can rewrite the logic to make use pktmbuf objects by sending the src > and dst with pre-computed > > mtod to avoid the overhead. But this will not resolve the 2MB and 1GB > huge page copy alloc failures. > > IMHO, I believe in similar lines to other perf application, dma-perf > application should focus on acutal device > > performance over application application performance. > > [MB:] > > OK, Vipin has multiple good arguments for this patch. I am convinced, > let’s proceed with it. > > Acked-by: Morten Brørup >