From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C92DBA0527; Mon, 9 Nov 2020 11:06:22 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ADDC55953; Mon, 9 Nov 2020 11:06:21 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 60EBE5928; Mon, 9 Nov 2020 11:06:19 +0100 (CET) IronPort-SDR: szZXQAuANvXjCt3VYVws/pFGjU2g9474/lTLVY8M3WQt0v7VBOWj8pwRMIx4umzGCsfvl0zpHz kMTnaAw0KHxg== X-IronPort-AV: E=McAfee;i="6000,8403,9799"; a="254484890" X-IronPort-AV: E=Sophos;i="5.77,463,1596524400"; d="scan'208";a="254484890" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2020 02:06:18 -0800 IronPort-SDR: 3GCM1pidYL44pdLgFSXLy/udij7zMOVE/pSlTZjFYZzdQy5Sa5/S7rwIA+ycLOro+4spXlxRAV zkV3hQuI5psw== X-IronPort-AV: E=Sophos;i="5.77,463,1596524400"; d="scan'208";a="540783982" Received: from bricha3-mobl.ger.corp.intel.com ([10.214.194.11]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 09 Nov 2020 02:06:11 -0800 Date: Mon, 9 Nov 2020 10:06:05 +0000 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Jerin Jacob , Thomas Monjalon , dpdk-dev , David Marchand , Ferruh Yigit , Olivier Matz , "Ananyev, Konstantin" , Andrew Rybchenko , Viacheslav Ovsiienko , Ajit Khaparde , Jerin Jacob , Hemant Agrawal , Ray Kinsella , Neil Horman , Nithin Dabilpuram , Kiran Kumar K , techboard@dpdk.org Message-ID: <20201109100605.GB831@bricha3-MOBL.ger.corp.intel.com> References: <20201107155306.463148-1-thomas@monjalon.net> <4509916.LqRtgDRpI1@thomas> <6088267.6fNGb03Fmp@thomas> <98CBD80474FA8B44BF855DF32C47DC35C61405@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C61405@smartserver.smartshare.dk> Subject: Re: [dpdk-dev] [dpdk-techboard] [PATCH 1/1] mbuf: move pool pointer in first half X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Nov 09, 2020 at 09:16:27AM +0100, Morten Brørup wrote: > +CC techboard > > > From: Jerin Jacob [mailto:jerinjacobk@gmail.com] > > Sent: Monday, November 9, 2020 6:18 AM > > > > On Sun, Nov 8, 2020 at 2:03 AM Thomas Monjalon > > wrote: > > > > > > 07/11/2020 20:05, Jerin Jacob: > > > > On Sun, Nov 8, 2020 at 12:09 AM Thomas Monjalon > > wrote: > > > > > 07/11/2020 18:12, Jerin Jacob: > > > > > > On Sat, Nov 7, 2020 at 10:04 PM Thomas Monjalon > > wrote: > > > > > > > > > > > > > > The mempool pointer in the mbuf struct is moved > > > > > > > from the second to the first half. > > > > > > > It should increase performance on most systems having 64-byte > > cache line, > > > > > > > > > > > > > i.e. mbuf is split in two cache lines. > > > > > > > > > > > > But In any event, Tx needs to touch the pool to freeing back to > > the > > > > > > pool upon Tx completion. Right? > > > > > > Not able to understand the motivation for moving it to the > > first 64B cache line? > > > > > > The gain varies from driver to driver. For example, a Typical > > > > > > ARM-based NPU does not need to > > > > > > touch the pool in Rx and its been filled by HW. Whereas it > > needs to > > > > > > touch in Tx if the reference count is implemented. > > > > > > > > See below. > > > > > > > > > > > > > > > > > Due to this change, tx_offload is moved, so some vector data > > paths > > > > > > > may need to be adjusted. Note: OCTEON TX2 check is removed > > temporarily! > > > > > > > > > > > > It will be breaking the Tx path, Please just don't remove the > > static > > > > > > assert without adjusting the code. > > > > > > > > > > Of course not. > > > > > I looked at the vector Tx path of OCTEON TX2, > > > > > it's close to be impossible to understand :) > > > > > Please help! > > > > > > > > Off course. Could you check the above section any share the > > rationale > > > > for this change > > > > and where it helps and how much it helps? > > > > > > It has been concluded in the techboard meeting you were part of. > > > I don't understand why we restart this discussion again. > > > I won't have the energy to restart this process myself. > > > If you don't want to apply the techboard decision, then please > > > do the necessary to request another quick decision. > > > > Yes. Initially, I thought it is OK as we have 128B CL, After looking > > into Thomas's change, I realized > > it is not good for ARM64 64B catchlines based NPU as > > - A Typical ARM-based NPU does not need to touch the pool in Rx and > > its been filled by HW. Whereas it needs to > > touch in Tx if the reference count is implemented. > > Jerin, I don't understand what the problem is here... > > Since RX doesn't touch m->pool, it shouldn't matter for RX which cache line m->pool resides in. I get that. > > You are saying that TX needs to touch m->pool if the reference count is implemented. I get that. But I don't understand why it is worse having m->pool in the first cache line than in the second cache line; can you please clarify? > > > - Also it will be effecting exiting vector routines > > That is unavoidable if we move something from the second to the first cache line. > > It may require some rework on the vector routines, but it shouldn't be too difficult for whoever wrote these vector routines. > > > > > I request to reconsider the tech board decision. > > I was on the techboard meeting as an observer (or whatever the correct term would be for non-members), and this is my impression of the decision on the meeting: > > The techboard clearly decided not to move any dynamic fields in the first cache line, on the grounds that if we move them away again in a later version, DPDK users utilizing a dynamic field in the first cache line might experience a performance drop at that later time. And this will be a very bad user experience, causing grief and complaints. To me, this seemed like a firm decision, based on solid arguments. > > Then the techboard discussed which other field to move to the freed up space in the first cache line. There were no performance reports showing any improvements by moving the any of the suggested fields (m->pool, m->next, m->tx_offload), and there was a performance report showing no improvements by moving m->next in a test case with large segmented packets. The techboard decided to move m->pool as originally suggested. To me, this seemed like a somewhat random choice between A, B and C, on the grounds that moving one of them is probably better than moving none of them. > This largely tallies with what I remember of the discussion too. I'd also add though that the choice between the next pointer and the pool pointer came down to the fact that the next pointer is only used for chained, multi-segment, packets - which also tend to be larger packets - while the pool pointer is of relevance to all packets, big and small, single and multi-segment. /Bruce