just received an update marking this as `Superseded`. I will send again 
with `ACK` also


Thank you Morten for the understanding

> ------------------------------------------------------------------------
> *From:* Morten Brørup <mb@smartsharesystems.com>
> *Sent:* 12 December 2023 23:39
> *To:* Varghese, Vipin <Vipin.Varghese@amd.com>; Bruce Richardson 
> <bruce.richardson@intel.com>
> *Cc:* Yigit, Ferruh <Ferruh.Yigit@amd.com>; dev@dpdk.org 
> <dev@dpdk.org>; stable@dpdk.org <stable@dpdk.org>; 
> honest.jiang@foxmail.com <honest.jiang@foxmail.com>; P, Thiyagarajan 
> <Thiyagarajan.P@amd.com>
> *Subject:* RE: [PATCH] app/dma-perf: replace pktmbuf with mempool objects
>
> 	
> Caution: This message originated from an External Source. Use proper 
> caution when opening attachments, clicking links, or responding.
>
>
> *From:*Varghese, Vipin [mailto:Vipin.Varghese@amd.com]
> *Sent:* Tuesday, 12 December 2023 18.14
>
> Sharing a few critical points based on my exposure to the dma-perf 
> application below
>
> <Snipped>
>
> On Tue, Dec 12, 2023 at 04:16:20PM +0100, Morten Brørup wrote:
> > +TO: Bruce, please stop me if I'm completely off track here.
> >
> > > From: Ferruh Yigit [mailto:ferruh.yigit@amd.com 
> <mailto:ferruh.yigit@amd.com>] Sent: Tuesday, 12
> > > December 2023 15.38
> > >
> > > On 12/12/2023 11:40 AM, Morten Brørup wrote:
> > > >> From: Vipin Varghese [mailto:vipin.varghese@amd.com 
> <mailto:vipin.varghese@amd.com>] Sent: Tuesday,
> > > >> 12 December 2023 11.38
> > > >>
> > > >> Replace pktmbuf pool with mempool, this allows increase in MOPS
> > > >> especially in lower buffer size. Using Mempool, allows to 
> reduce the
> > > >> extra CPU cycles.
> > > >
> > > > I get the point of this change: It tests the performance of copying
> > > raw memory objects using respectively rte_memcpy and DMA, without the
> > > mbuf indirection overhead.
> > > >
> > > > However, I still consider the existing test relevant: The 
> performance
> > > of copying packets using respectively rte_memcpy and DMA.
> > > >
> > >
> > > This is DMA performance test application and packets are not used,
> > > using pktmbuf just introduces overhead to the main focus of the
> > > application.
> > >
> > > I am not sure if pktmuf selected intentionally for this test
> > > application, but I assume it is there because of historical reasons.
> >
> > I think pktmbuf was selected intentionally, to provide more accurate
> > results for application developers trying to determine when to use
> > rte_memcpy and when to use DMA. Much like the "copy breakpoint" in Linux
> > Ethernet drivers is used to determine which code path to take for each
> > received packet.
>
> yes Ferruh, this is the right understanding. In DPDK example we 
> already have
>
> dma-forward application which makes use of pktmbuf payload to copy over
>
> new pktmbuf payload area.
>
> by moving to mempool, we are actually now focusing on source and 
> destination buffers.
>
> This allows to create mempool objects with 2MB and 1GB src-dst areas. 
> Thus allowing
>
> to focus src to dst copy. With pktmbuf we were not able to achieve the 
> same.
>
>
> >
> > Most applications will be working with pktmbufs, so these applications
> > will also experience the pktmbuf overhead. Performance testing with the
> > same overhead as the application will be better to help the application
> > developer determine when to use rte_memcpy and when to use DMA when
> > working with pktmbufs.
>
> Morten thank you for the input, but as shared above DPDK example 
> dma-fwd does
>
> justice to such scenario. inline to test-compress-perf & 
> test-crypto-perf IMHO test-dma-perf
>
> should focus on getting best values of dma engine and memcpy comparision.
>
>
> >
> > (Furthermore, for the pktmbuf tests, I wonder if copying performance
> > could also depend on IOVA mode and RTE_IOVA_IN_MBUF.)
> >
> > Nonetheless, there may also be use cases where raw mempool objects are
> > being copied by rte_memcpy or DMA, so adding tests for these use cases
> > are useful.
> >
> >
> > @Bruce, you were also deeply involved in the DMA library, and probably
> > have more up-to-date practical experience with it. Am I right that
> > pktmbuf overhead in these tests provides more "real life use"-like
> > results? Or am I completely off track with my thinking here, i.e. the
> > pktmbuf overhead is only noise?
> >
> I'm actually not that familiar with the dma-test application, so can't
> comment on the specific overhead involved here. In the general case, if we
> are just talking about the overhead of dereferencing the mbufs then I 
> would
> expect the overhead to be negligible. However, if we are looking to 
> include
> the cost of allocation and freeing of buffers, I'd try to avoid that as it
> is a cost that would have to be paid for both SW copies and HW copies, so
> should not count when calculating offload cost.
>
> Bruce, as per test-dma-perf there is no repeated pktmbuf-alloc or 
> pktmbuf-free.
>
> Hence I disagree that the overhead discussed for pkmbuf here is not 
> related to alloc and free.
>
> But the cost as per my investigation goes into fetching the cacheline 
> and performing mtod on
>
> each iteration.
>
> /Bruce
>
> I can rewrite the logic to make use pktmbuf objects by sending the src 
> and dst with pre-computed
>
> mtod to avoid the overhead. But this will not resolve the 2MB and 1GB 
> huge page copy alloc failures.
>
> IMHO, I believe in similar lines to other perf application, dma-perf 
> application should focus on acutal device
>
> performance over application application performance.
>
> [MB:]
>
> OK, Vipin has multiple good arguments for this patch. I am convinced, 
> let’s proceed with it.
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>