From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B10FB436D4; Tue, 12 Dec 2023 18:13:48 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0F04442E65; Tue, 12 Dec 2023 18:13:48 +0100 (CET) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2070.outbound.protection.outlook.com [40.107.93.70]) by mails.dpdk.org (Postfix) with ESMTP id 9299542E4E; Tue, 12 Dec 2023 18:13:45 +0100 (CET) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P7uwgeif9DcboMMA3c3KHAq9dcAKARIl9AOcBynj6cGXno3QHB2aG6J6r7uFMBkwCveNrCMz1ocL0iWfQ9YyyPqbQceOUl6exKb5NJfURqFgh6Qsyg82zssKiHx3R6+VlWq7Qpao3A5UtDtay9FIXXDsLI4EOCRwFM1GItyuJI6LbX6F6eWU4ty3MgM/S66ICEn6mxOZxKCy8E/wt83ULLdAE4Q3jaS3ms2h22l1o03VZRhEKjmgF8o+wsLIiFMuPc59TLlcJpDoLLho0ZMYRqqAGOc09Tq1hAvEcnUHBZBpcGivJ9+Fqk6TiX28NB+/Wm+2gFyZvqB2q7lK3jWbPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wygrm4gLBHivxUq9EsXpNv78o8dBCnkD0pdzarTu87k=; b=MPINaPdV4kCYUYcantFLhVVS0IvzhRw3Mjgd0IVisIfVEhyq5hmxc8DjotKJbr9CAUf2iWXrphxfvRNHY76skr8Dv7aAZyQjOCyhpSUO69P9kTi82b4J6yuW8HY3BYkoIC1rtIn01zxUcJCokKvoiMhkeUcjIsYaKWz7d711uZRf8VnA/twaNrwmM7d7GaohKqnHn4erVRqztam+beAHRzg/CVCMS1W9FCrGIrTHrw53kavuP+lQXL59Ncj2xGOV4wZCcKk9ss5zcnKDEyjqjQ03xx9x42eGhkZ+CrDMvbQjaupzM14Zva6RtzSZtfHeloR05uz2ovj7B47xlJCv2g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wygrm4gLBHivxUq9EsXpNv78o8dBCnkD0pdzarTu87k=; b=T3WIeQmkks2Ma17dgZbkqL4OZrMswzjTaH0BTfdbOmSjqBKenHFfI6qqunm7meyka4cZfdlMA3l3VrrLnxqUNgVnW4rVcAnpaW5JiPBH9DgIAH28t1a0s7MA2STIuGy2UGr0aLybTtDUZzmnLO34cuCA5lXbb5MrJtpLc8cV57E= Received: from MN2PR12MB3085.namprd12.prod.outlook.com (2603:10b6:208:c5::29) by DS0PR12MB7632.namprd12.prod.outlook.com (2603:10b6:8:11f::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7068.33; Tue, 12 Dec 2023 17:13:43 +0000 Received: from MN2PR12MB3085.namprd12.prod.outlook.com ([fe80::579d:6ed5:68a6:3cba]) by MN2PR12MB3085.namprd12.prod.outlook.com ([fe80::579d:6ed5:68a6:3cba%3]) with mapi id 15.20.7091.022; Tue, 12 Dec 2023 17:13:42 +0000 From: "Varghese, Vipin" To: Bruce Richardson , =?iso-8859-1?Q?Morten_Br=F8rup?= CC: "Yigit, Ferruh" , "dev@dpdk.org" , "stable@dpdk.org" , "honest.jiang@foxmail.com" , "P, Thiyagarajan" Subject: Re: [PATCH] app/dma-perf: replace pktmbuf with mempool objects Thread-Topic: [PATCH] app/dma-perf: replace pktmbuf with mempool objects Thread-Index: AQHaLOc+P5veE9z1rka5g8pnqslJW7ClhhQAgAAxxICAAAqrAIAABemAgAAXjzw= Date: Tue, 12 Dec 2023 17:13:42 +0000 Message-ID: References: <20231212103746.1910-1-vipin.varghese@amd.com> <98CBD80474FA8B44BF855DF32C47DC35E9F0BF@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F0C0@smartserver.smartshare.dk> In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-IN X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_Enabled=True; MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_SiteId=3dd8961f-e488-4e60-8e11-a82d994e183d; MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_SetDate=2023-12-12T17:13:42.404Z; MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_Name=Public; MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_ContentBits=0; MSIP_Label_d4243a53-6221-4f75-8154-e4b33a5707a1_Method=Privileged; authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: MN2PR12MB3085:EE_|DS0PR12MB7632:EE_ x-ms-office365-filtering-correlation-id: b6c48b99-870b-480e-40b9-08dbfb35b143 x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: HZtNt/BCvSbiTYW5/mGcORgZMoazgbLEXt8mj0QJoaIGF7Bx9ca8+Q73w99I4gS75uePSejuYZTCzxAVlUcgPe0gGViuL2GsFj7UU/E7Tn4dSJFj1WRAWvMTQtRQ+1bGUCLHprx8nTBXsGwYbSvetzK7XI9f3Dy7L3gyJ1DbgAuNXrKka6JcGv/T4ktw2hj2mZV/8roNTM/PTPhwUfWj+YaTl1VEuspCvLvRB5aOMTom/Bp/S3rWMgLpCsOVPRoflqjhvHhGp8uT51MPRVlADEOQ2abgBclTOhbrycUy0Da35wYmO+7a0HmRHSe5fcZ0yH/5lI6q2EI3XIZTXBbU7lLQMuaL6dWmnits5KpnE1ZweGeBg7CX5UzqLnatvMOcArAirblSumRLUAaBQBo7sIw6Bu55EbkYnZHrEFE9t2S5SGqcben/VTQMaaCGvfUVfh91bha9iTzrY18sVpCPb3NblYuAAadv4o7Lmlw4oogi0u1Z4OCVPU3HfRlMuET+6olPZXTFmGifdlGjb6zXYTqema4iFHfOo2NigiHy1mQSIhMtb0uoTbSk9v1QG58uZa/okVPOiVyHBg4+m9A+nwB2X0Lmu3JL5wUKNglKM9g4Sjisc38A8oGXouGxd74F x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MN2PR12MB3085.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(366004)(396003)(136003)(376002)(346002)(230922051799003)(451199024)(186009)(64100799003)(1800799012)(5660300002)(2906002)(71200400001)(86362001)(7696005)(6506007)(478600001)(26005)(53546011)(9686003)(316002)(76116006)(8676002)(66946007)(4326008)(33656002)(66476007)(66556008)(110136005)(91956017)(64756008)(66446008)(54906003)(122000001)(38100700002)(66574015)(8936002)(83380400001)(52536014)(38070700009)(55016003)(19627405001)(41300700001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?OEpEillhqqG3osxTO8fjojNz9laSulNlT/IQZApaUO/mLEhhZcw4YV/ztP?= =?iso-8859-1?Q?E38XoJNGNJEKHVUovwXGaIl75Geu6G5ezjhbxtlEhg6SfwBcFNEfZA3lj3?= =?iso-8859-1?Q?PN72lX8WGtcx5OjVMv4LwoQiUC/8vnnVQc80flBv6hWM0goomyiQnxXXMC?= =?iso-8859-1?Q?cSyzvtUu+cf6KFqQocNtZvS8bhYfVIWBl2Kx1cqYZirC8asrZ+bCEqvlSs?= =?iso-8859-1?Q?54TJb50HaLhFAs+HbwROrc884G/UYVkZWKtAVRvhklxe2zWU6dnqCNPn8g?= =?iso-8859-1?Q?Ctdj7Ypq28L3TOPiTk4bvjqW2DiJojSJE3gEwZOMnG3dOhpaL/NEPXfrYE?= =?iso-8859-1?Q?1aKFyLZ3BDB5X8hgCsVb2uWxA6/EBC6w3lvGc/K+Km/zMJb6lw6ReTZgNs?= =?iso-8859-1?Q?MDXIpxij/stYTWi2RErIF2bPFC5gs4lipxVKgB0nZuXOlByXZ/hsg0uDgM?= =?iso-8859-1?Q?oomkbxQM/7eKfKj2s3cZm2HdNUU+gk8yVsBFYkd2JEAez7jq3Qiy9HRdd0?= =?iso-8859-1?Q?KKbPWBqjyc15Qqiv/DmrdOHgmWUrwdc8hwsZxBbR8kuy0o5dE/QE9zamgz?= =?iso-8859-1?Q?sLgUq9PweYwEW7Rb9ly5vgmzu7KgRbAbYELohcbQWYih7QymKhDhFZFKGN?= =?iso-8859-1?Q?s9FuBBlNjDMruHAzWjbvLjJ90bfK7pkw2VAK2wpUY3DvWsHxmmMGUO+V6v?= =?iso-8859-1?Q?Cs7Cfw4ysjZyelwpoORDTg/MmuL6nofB8HQpo2QypHA61IjH3gMoQsCsRI?= =?iso-8859-1?Q?McqMj7pxTcH2k95DC87CsBXMccSG8QUYD4/J0WHxt6wHKSQZ9npoUKwDyp?= =?iso-8859-1?Q?5BrcgNdeUCYjsYtuNpGvE2lcTCX/F74nNQ5gsD7VRsIz0GVFYarfeTLCqR?= =?iso-8859-1?Q?zZn2P+qbSOsz8SFbn7JNtG2wQxocO2ib1mttbXMN1sKK+XjSkJcjswxPcU?= =?iso-8859-1?Q?6r/+qV+qDI7tLmPhvOdnrEUYa6EJI8Z4RFzgbMC1VWv+EbuJsJKQs7HsBj?= =?iso-8859-1?Q?YFs8S231jx5LoFYLqzwfEBNpuEpsdJOhT4oCmZ1w5wfQFqFkii89YucQUg?= =?iso-8859-1?Q?QAjyeS3wJLEOuJhSJGK/8JUAxxSEExlLRqCIBwoHtj/BwupZl8o93l3XNK?= =?iso-8859-1?Q?kb+LamD8+RojoV6j76yJEj9VEWvjg4rdRnL8ZkmDShfV8iQBGVlktr5mv5?= =?iso-8859-1?Q?cT2QeV29r77glcPxNnrJo6dY3LGsC7OLZKgxKLnb/8LWMbVeVdqSaLWroH?= =?iso-8859-1?Q?maahwf8SJeX8L1lXgHcV9dEFx99msKqy4DFP7xt5/3jK5jPuXEhi1tVb9o?= =?iso-8859-1?Q?46t2cWrZEpcwkKhaGg6BIjsO/cypXJ98gEohZTS5E2N/ujWRMZEfYxRR6V?= =?iso-8859-1?Q?fbkBppx5+kXr/hxTdcHQd8nYk7jXIvALen+XuaX2sUEf2Iqts9YBl/FKOV?= =?iso-8859-1?Q?ch2PvKtwIPBbWFaukUHhNWnw7nrw61Y4g31XktBNClJUoCAh/dwubIauzx?= =?iso-8859-1?Q?aBBfKaYQMe5vZlRWMTOf6xx7WMCe7L/qo74UVBmEHQdZV7ctMRNw+hKbYv?= =?iso-8859-1?Q?tmC8Tmt3XDw6tDBjlRP+tdGiGA+08ngR/gAnNf4bkZp6wnt4IVvzfM3Cue?= =?iso-8859-1?Q?fH7G03AKdrpew=3D?= Content-Type: multipart/alternative; boundary="_000_MN2PR12MB3085AD6D53497A37C5FB638D828EAMN2PR12MB3085namp_" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MN2PR12MB3085.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: b6c48b99-870b-480e-40b9-08dbfb35b143 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Dec 2023 17:13:42.8527 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 9us9WwgtLe9I/68EiRFzyc1/Zp25EkD9rNjYm1YFecqwk2fC5rY8ged8NnCcOB16h8KPtqJ6KI0C5Wyye+shng== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7632 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --_000_MN2PR12MB3085AD6D53497A37C5FB638D828EAMN2PR12MB3085namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable [Public] Sharing a few critical points based on my exposure to the dma-perf applicat= ion below On Tue, Dec 12, 2023 at 04:16:20PM +0100, Morten Br=F8rup wrote: > +TO: Bruce, please stop me if I'm completely off track here. > > > From: Ferruh Yigit [mailto:ferruh.yigit@amd.com] Sent: Tuesday, 12 > > December 2023 15.38 > > > > On 12/12/2023 11:40 AM, Morten Br=F8rup wrote: > > >> From: Vipin Varghese [mailto:vipin.varghese@amd.com] Sent: Tuesday, > > >> 12 December 2023 11.38 > > >> > > >> Replace pktmbuf pool with mempool, this allows increase in MOPS > > >> especially in lower buffer size. Using Mempool, allows to reduce the > > >> extra CPU cycles. > > > > > > I get the point of this change: It tests the performance of copying > > raw memory objects using respectively rte_memcpy and DMA, without the > > mbuf indirection overhead. > > > > > > However, I still consider the existing test relevant: The performance > > of copying packets using respectively rte_memcpy and DMA. > > > > > > > This is DMA performance test application and packets are not used, > > using pktmbuf just introduces overhead to the main focus of the > > application. > > > > I am not sure if pktmuf selected intentionally for this test > > application, but I assume it is there because of historical reasons. > > I think pktmbuf was selected intentionally, to provide more accurate > results for application developers trying to determine when to use > rte_memcpy and when to use DMA. Much like the "copy breakpoint" in Linux > Ethernet drivers is used to determine which code path to take for each > received packet. yes Ferruh, this is the right understanding. In DPDK example we already hav= e dma-forward application which makes use of pktmbuf payload to copy over new pktmbuf payload area. by moving to mempool, we are actually now focusing on source and destinatio= n buffers. This allows to create mempool objects with 2MB and 1GB src-dst areas. Thus = allowing to focus src to dst copy. With pktmbuf we were not able to achieve the same= . > > Most applications will be working with pktmbufs, so these applications > will also experience the pktmbuf overhead. Performance testing with the > same overhead as the application will be better to help the application > developer determine when to use rte_memcpy and when to use DMA when > working with pktmbufs. Morten thank you for the input, but as shared above DPDK example dma-fwd do= es justice to such scenario. inline to test-compress-perf & test-crypto-perf I= MHO test-dma-perf should focus on getting best values of dma engine and memcpy comparision. > > (Furthermore, for the pktmbuf tests, I wonder if copying performance > could also depend on IOVA mode and RTE_IOVA_IN_MBUF.) > > Nonetheless, there may also be use cases where raw mempool objects are > being copied by rte_memcpy or DMA, so adding tests for these use cases > are useful. > > > @Bruce, you were also deeply involved in the DMA library, and probably > have more up-to-date practical experience with it. Am I right that > pktmbuf overhead in these tests provides more "real life use"-like > results? Or am I completely off track with my thinking here, i.e. the > pktmbuf overhead is only noise? > I'm actually not that familiar with the dma-test application, so can't comment on the specific overhead involved here. In the general case, if we are just talking about the overhead of dereferencing the mbufs then I would expect the overhead to be negligible. However, if we are looking to include the cost of allocation and freeing of buffers, I'd try to avoid that as it is a cost that would have to be paid for both SW copies and HW copies, so should not count when calculating offload cost. Bruce, as per test-dma-perf there is no repeated pktmbuf-alloc or pktmbuf-f= ree. Hence I disagree that the overhead discussed for pkmbuf here is not related= to alloc and free. But the cost as per my investigation goes into fetching the cacheline and p= erforming mtod on each iteration. /Bruce I can rewrite the logic to make use pktmbuf objects by sending the src and = dst with pre-computed mtod to avoid the overhead. But this will not resolve the 2MB and 1GB huge = page copy alloc failures. IMHO, I believe in similar lines to other perf application, dma-perf applic= ation should focus on acutal device performance over application application performance. --_000_MN2PR12MB3085AD6D53497A37C5FB638D828EAMN2PR12MB3085namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

[Public]


Sharing a few critical points based on my exposure to the dma-perf applicat= ion below

<Snipped>

On Tue, Dec 12, 2023 at 04:16:20PM +0100, Morten Br=F8rup wrote:
> +TO: Bruce, please stop me if I'm completely off track here.
>
> > From: Ferruh Yigit [mailto:ferruh.yigit@amd.com] Sent: Tuesday, 12
> > December 2023 15.38
> >
> > On 12/12/2023 11:40 AM, Morten Br=F8rup wrote:
> > >> From: Vipin Varghese [mailto:vipin.varghese@amd.com] Sent: Tues= day,
> > >> 12 December 2023 11.38
> > >>
> > >> Replace pktmbuf pool with mempool, this allows increase = in MOPS
> > >> especially in lower buffer size. Using Mempool, allows t= o reduce the
> > >> extra CPU cycles.
> > >
> > > I get the point of this change: It tests the performance of = copying
> > raw memory objects using respectively rte_memcpy and DMA, without= the
> > mbuf indirection overhead.
> > >
> > > However, I still consider the existing test relevant: The pe= rformance
> > of copying packets using respectively rte_memcpy and DMA.
> > >
> >
> > This is DMA performance test application and packets are not used= ,
> > using pktmbuf just introduces overhead to the main focus of the > > application.
> >
> > I am not sure if pktmuf selected intentionally for this test
> > application, but I assume it is there because of historical reaso= ns.
>
> I think pktmbuf was selected intentionally, to provide more accurate > results for application developers trying to determine when to use
> rte_memcpy and when to use DMA. Much like the "copy breakpoint&qu= ot; in Linux
> Ethernet drivers is used to determine which code path to take for each=
> received packet.

yes Ferruh, this is the right understanding. In = DPDK example we already have 
dma-forward application which makes use of pktmb= uf payload to copy over
new pktmbuf payload area. 

by moving to mempool, we are actually now focusi= ng on source and destination buffers.
This allows to create mempool objects with 2MB a= nd 1GB src-dst areas. Thus allowing
to focus src to dst copy. With pktmbuf we were n= ot able to achieve the same.


>
> Most applications will be working with pktmbufs, so these applications=
> will also experience the pktmbuf overhead. Performance testing with th= e
> same overhead as the application will be better to help the applicatio= n
> developer determine when to use rte_memcpy and when to use DMA when > working with pktmbufs.

Morten thank= you for the input, but as shared above DPDK example dma-fwd does 
justice to s= uch scenario. inline to test-compress-perf & test-crypto-perf IMHO test= -dma-perf
should focus= on getting best values of dma engine and memcpy comparision.

>
> (Furthermore, for the pktmbuf tests, I wonder if copying performance > could also depend on IOVA mode and RTE_IOVA_IN_MBUF.)
>
> Nonetheless, there may also be use cases where raw mempool objects are=
> being copied by rte_memcpy or DMA, so adding tests for these use cases=
> are useful.
>
>
> @Bruce, you were also deeply involved in the DMA library, and probably=
> have more up-to-date practical experience with it. Am I right that
> pktmbuf overhead in these tests provides more "real life use"= ;-like
> results? Or am I completely off track with my thinking here, i.e. the<= br> > pktmbuf overhead is only noise?
>
I'm actually not that familiar with the dma-test application, so can't
comment on the specific overhead involved here. In the general case, if we<= br> are just talking about the overhead of dereferencing the mbufs then I would=
expect the overhead to be negligible. However, if we are looking to include=
the cost of allocation and freeing of buffers, I'd try to avoid that as it<= br> is a cost that would have to be paid for both SW copies and HW copies, so should not count when calculating offload cost.

Bruce, as pe= r test-dma-perf there is no repeated pktmbuf-alloc or pktmbuf-free. 
Hence I disa= gree that the overhead discussed for pkmbuf here is not related to alloc an= d free.
But the cost= as per my investigation goes into fetching the cacheline and performing mt= od on
each iterati= on.

/Bruce

I can rewrit= e the logic to make use pktmbuf objects by sending the src and dst with pre= -computed 
mtod to avoi= d the overhead. But this will not resolve the 2MB and 1GB huge page copy al= loc failures.
IMHO, I beli= eve in similar lines to other perf application, dma-perf application should= focus on acutal device
performance = over application application performance.
--_000_MN2PR12MB3085AD6D53497A37C5FB638D828EAMN2PR12MB3085namp_--