From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 44CC7A0A0C for ; Mon, 5 Jul 2021 12:02:04 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3C17A411BC; Mon, 5 Jul 2021 12:02:04 +0200 (CEST) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2045.outbound.protection.outlook.com [40.107.92.45]) by mails.dpdk.org (Postfix) with ESMTP id 9CBD44003C; Mon, 5 Jul 2021 12:02:00 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=COHoPv23mQH98HQ0cLpMye9kCPPIV8udX2oDbBKZtnU+euBT5hrr7hpomyHABHoW9v2wlD4Avhs5zK6eE/tMyDv7HiMOC09+etrlCeo2KAz0NOwX5fG5PAyomV0W+rtQ3a3fo3q8gOPwslefSnUzS9jNKL7LzHXLIgcFkKDvh5PIsuoRi2UfbDKN9IHohl/GesCxOPydyuc4VHlzeOhtAXlJTYsGCo7hfSX/XfNG/zUOenzZhyP3awVWpGx/uSxkvFzIxL8En6soyrpqJ7oBdDkVVqfHX7c/4C+pUHkU2Zfl3OlP8KbW8M5v+iApkc7b4iQ6kxLcMQ9vrr1J8VQ9iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VLoF+gv1TipaIR8iLBV+jwCMMGpgnzMWTG69Qe3spBE=; b=dWTtLUIlYW8WyLd3GIWYnwjIdYHaru+2FAT/86t03cbbDAGkPa0UbfNA2nKy081hoG0NNAZyr8W67M3rflGS4a96CAgkykwkw3r1pZZyPdPbobQUbsC21hZteckrMNfy8yH0Jst2YYwuVLhbiymn8gyAC7L6TAdyPXFcTFMkkXlfNgkRgGs1uo2S99pY+6LGkLjzXMj5f0Kl9lPVPzA2gQZ7KJ2VEZzXOKOBb9h9itKUtkLsSHjPcIem9ujsmIwVQPfmPiI1BP5xPDK1ZAZ+Q6eqzP+5kQpBTa10UI3hzzzBYfV+rXV+evTqKwFhjTNxJ10YaucbafWaZ3nHNevH4g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VLoF+gv1TipaIR8iLBV+jwCMMGpgnzMWTG69Qe3spBE=; b=VzToWckLC4Kh1+ZAfoU+5LqoSn5Eh7a5AuAGbgr70y0Jd+ga2NNLxGZINoYepyoWaOnKWGdmFGaScIDfk7263OWb8FrLIkmYPYfzw5GABNzaS3i4U5tTy11suBdVGlUrZV9UUf4mwVEypEvdM3tF0fdVraexo5tDyOOH2AIEX21aRSfqka83gZov/CAoorG32hXqUvMxc0m5yW4XA8JvcT0s5eluJS7rMUOJ71eHAmaBJM7m5WKrQ5/V7kPRTDWXCEuX1PXnjhuRl0hxXeqcnkuuNBFX2EEpmU0FYABghQIiwIyEF6aCNqxuHT+MxM2kUrAaZ7I3FXMiV5FG7spPzw== Received: from DM6PR12MB3753.namprd12.prod.outlook.com (2603:10b6:5:1c7::18) by DM6PR12MB4986.namprd12.prod.outlook.com (2603:10b6:5:16f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.31; Mon, 5 Jul 2021 10:01:59 +0000 Received: from DM6PR12MB3753.namprd12.prod.outlook.com ([fe80::2521:f25e:e8db:6e35]) by DM6PR12MB3753.namprd12.prod.outlook.com ([fe80::2521:f25e:e8db:6e35%6]) with mapi id 15.20.4287.033; Mon, 5 Jul 2021 10:01:59 +0000 From: Slava Ovsiienko To: Ruifeng Wang , Raslan Darawsheh , Matan Azrad , Shahaf Shuler CC: "dev@dpdk.org" , "jerinj@marvell.com" , nd , Honnappa Nagarahalli , "stable@dpdk.org" , nd Thread-Topic: [PATCH 1/2] net/mlx5: remove redundant operations Thread-Index: AQHXVsCIHi/JRbvYhU6jj89JE+9MVKsubmQQgAE+jQCABK1xkA== Date: Mon, 5 Jul 2021 10:01:59 +0000 Message-ID: References: <20210601083055.97261-1-ruifeng.wang@arm.com> <20210601083055.97261-2-ruifeng.wang@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=nvidia.com; x-originating-ip: [188.163.75.31] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 6dda2266-bce1-49fb-1994-08d93f9bedd6 x-ms-traffictypediagnostic: DM6PR12MB4986: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: iofT/HtC4E8TjZ9gLwlGbACEexUqmrysXCcObgD5wuRz/ohL05Ex170jCfdbhIV6cYEU3gQpdxhRqNo56Gvycjp3alNS+s9p6q57bmDKs/1RyeV+IGZoWuhGXOkxTZH10aZxhYjJ5FNqkCrt3kftm88cG5pV8Qhot2nK0JsLEoP00eQXmZ9XytXt1UuOU5nbWNOQ+n+yQiZUMXelspjFKFHC4QnRpf9qNbpFkhb2bDQM3/1jD2H/GO0L0f5eLgiR9+8yV4kr1wmr8lCE+X2guZ9RElQhYlXiWan6D/6z+gQYaod2zjUtuGP68Tcf5jFbnpb4ESphsZXrmJyGiMcfzJpve07ag6I/G7uEzkVscyNqoidk+IAZK0G7ugcWSazvfskyuAn463kMGNRDUKHP0lI33NnAdHQKQFU0yaLJhwanTHjN14eMSMNcAE3MHCuVlCBo770gNjouF7Eg0Jb5iAHw79iVfzEDye3Tt3G+JVfHge6ZNbcGR3miLRamZisd/c0QFDZykCjXVwPtZRxIQfiSuLso/LUWF1GjhYX8WLHGkMqz0IajmmOv9zto5/2wwXMirtX969/IbSKkoQDAJberPLQdoiezSxR5CSQWJ6EBEYgxFNivar8KTdl6R2pCkzuOwr6fHcv6A3sBl587cg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR12MB3753.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(346002)(366004)(376002)(136003)(396003)(39860400002)(2906002)(55016002)(53546011)(6506007)(8676002)(52536014)(83380400001)(8936002)(9686003)(122000001)(54906003)(5660300002)(26005)(55236004)(33656002)(186003)(6636002)(110136005)(71200400001)(316002)(76116006)(7696005)(86362001)(66946007)(66446008)(64756008)(66556008)(66476007)(4326008)(478600001)(38100700002); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?EFT62prIVbwEjPIXxXoYBg3G1BFMJC7pHzAUOKsVGviIlmBd8xCGMblYUOs2?= =?us-ascii?Q?mievuHH3ii26qPhvCtnGzkdostkKD05N/+89N3tcp8QvJIpard1pNFUb+ZWl?= =?us-ascii?Q?TCHXLAVHQRK/+cUl664t7FjruncEMgDmYTkXtHRJaogL/blKXXQxyRsD4SB8?= =?us-ascii?Q?ufLIYPs9+DaHrBtVeUf53V9EL/zi7FcokTlUP/he/UScFQ+JNF7SDVK780HZ?= =?us-ascii?Q?1pIpPd7M1lLTZX3c4bwoHE3B2xhv2HX6Q5QWuWVhnHuItrndO0jWeWWRHL6x?= =?us-ascii?Q?sSBYdOnGRvOMZ+o0oTKLim82zvtTX2ZRBgyGXCUt/l0565K/dxksTHOTuJVC?= =?us-ascii?Q?DAijhZx4/p+W2VdeduhN7R2FmPrwBcCDxUbpH7Yo7Vmy+sev4eB+z50cwg3J?= =?us-ascii?Q?0Mq4set4F4KBSzQpAAj3aUpUK/kOH+skm3NDa36LLHjLHgGc6VJuINomGQ6j?= =?us-ascii?Q?RC9Ht35DVNMaQetG71Buq3AAXcgbCcXZwKYbjDKx6gIOhXsPSXBppGLaOMjR?= =?us-ascii?Q?HlwFfd5gXI4I3Y6W3ezEiI9yb0vD/COqTY74MenIr9hYQBdUVQgL4/uXkDYT?= =?us-ascii?Q?NvYBTcFFO+rht/xdHzj/ZiWXiUVZIVQI2zrnFKuT0LBm3yhzTxHFkqwJQerB?= =?us-ascii?Q?lvOKRsXCoBJpQIfmkMtMf/s/Z2FhbmDAcc9r2+t2fomJLdowbUiXU/0CtoqU?= =?us-ascii?Q?QPy/clN+PAcC0hfRrO14vQRTOb84I4Ox60a+GsSQ8Lh+n/WtKmRl02s7KZtP?= =?us-ascii?Q?aZTzsXnc6uOqGOYMQS4Bf7SZlPrVyIYOqVafOJo88eF6IqjPCw7wCSsuFVrq?= =?us-ascii?Q?0N5vGx4xaK1bTlJf3zEeKnAoReAk/nqSmz2vc2lAjWJL11idBdxdWvSOZbep?= =?us-ascii?Q?lpdCdO9k+NccpXLJLMmTVjN/CBL5V1j0/XrFkpu7UB+9HCirfppOj6/oObRr?= =?us-ascii?Q?NO3SC4Db76XW7oKWlMeNHwBoNDeAoaA2KtmwSDaE/nZAIfBmJNvciXEzl2Em?= =?us-ascii?Q?WdbTz0KalekeCcq17r5uo4XAPWuAeHVG/cUfSyvsksSkxfiU8kp5HpPZXE6m?= =?us-ascii?Q?8iGJ85gyyMxGdFO6YxopMk3Kv3RMvaIGWp/nDPKBoVI5Tz3G/yDGfIXWwe4E?= =?us-ascii?Q?1OZ115QLJkqRNjU0kBLq71yJpFXqiXY4j4lhmKk7IAeaqvrhcBYZXKQBlEuq?= =?us-ascii?Q?nr7c1IKI0PFd2frcMXYJuRr5Vw7WjWGS796RahFlSl1oKqI7fy3S6ZPHJeE0?= =?us-ascii?Q?Gw2gq65bypFPpa4WLMG6QHXP96SgqKTuXQ+y59r3+qiSF7uhRIebNXEP4lGb?= =?us-ascii?Q?16joim3b7DU1s8v+ndP24Dx3?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR12MB3753.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 6dda2266-bce1-49fb-1994-08d93f9bedd6 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Jul 2021 10:01:59.1508 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: BqMZ1fwes4zXL0zd6Qd02CGXJRlHo8z4sUEUZMc577ek1uPs/0jn6A+3NcNAayEefl5I3Th0lktNJV5NyWAjtA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4986 Subject: Re: [dpdk-stable] [PATCH 1/2] net/mlx5: remove redundant operations X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, Ruifeng The invalid_mask is used to set error flags and calculate the statistics. So, all the CQEs the first one with error or invalid status should be maske= d out (and the CQEs after that). IMO, what we could improve (apply just the part of the patch below): >>>> index 2234fbe6b2..98a75b09c6 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -768,18 +768,11 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile= struct mlx5_cqe *cq, comp_mask), 0)) / (sizeof(uint16_t) * 8); /* D.6 mask out entries after the compressed CQE. */ - mask =3D vcreate_u16(comp_idx < MLX5_VPMD_DESCS_PER_LOOP ? - -1UL >> (comp_idx * sizeof(uint16_t) * 8) : - 0); - invalid_mask =3D vorr_u16(invalid_mask, mask); + invalid_mask =3D vorr_u16(invalid_mask, comp_mask); /* D.7 count non-compressed valid CQEs. */ n =3D __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16( invalid_mask), 0)) / (sizeof(uint16_t) * 8); nocmp_n +=3D n; <<<< And that's it. The rest of the patch: >>>> - /* D.2 get the final invalid mask. */ - mask =3D vcreate_u16(n < MLX5_VPMD_DESCS_PER_LOOP ? - -1UL >> (n * sizeof(uint16_t) * 8) : 0); - invalid_mask =3D vorr_u16(invalid_mask, mask); <<<< Should not be applied, otherwise the following might be affected: opcode =3D vbic_u16(opcode, invalid_mask); ... opcode =3D vbic_u16(opcode, invalid_mask); With best regards, Slava > -----Original Message----- > From: Ruifeng Wang > Sent: Friday, July 2, 2021 13:30 > To: Slava Ovsiienko ; Raslan Darawsheh > ; Matan Azrad ; Shahaf Shuler > > Cc: dev@dpdk.org; jerinj@marvell.com; nd ; Honnappa > Nagarahalli ; stable@dpdk.org; nd > > Subject: RE: [PATCH 1/2] net/mlx5: remove redundant operations >=20 > > -----Original Message----- > > From: Slava Ovsiienko > > Sent: Friday, July 2, 2021 4:13 PM > > To: Ruifeng Wang ; Raslan Darawsheh > > ; Matan Azrad ; Shahaf Shuler > > > > Cc: dev@dpdk.org; jerinj@marvell.com; nd ; Honnappa > > Nagarahalli ; stable@dpdk.org > > Subject: RE: [PATCH 1/2] net/mlx5: remove redundant operations > > > > Hi, Ruifeng > Hi, Slava >=20 > > > > > -----Original Message----- > > > From: Ruifeng Wang > > > Sent: Tuesday, June 1, 2021 11:31 > > > To: Raslan Darawsheh ; Matan Azrad > > > ; Shahaf Shuler ; Slava > > > Ovsiienko > > > Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com; > > > honnappa.nagarahalli@arm.com; Ruifeng Wang > ; > > > stable@dpdk.org > > > Subject: [PATCH 1/2] net/mlx5: remove redundant operations > > > > > > Some operations on mask are redundant and can be removed. > > > The change yielded 1.6% performance gain on N1SDP. > > > On ThunderX2, slight performance uplift was also observed. > > > > > > Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Ruifeng Wang > > > --- > > > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +-------- > > > 1 file changed, 1 insertion(+), 8 deletions(-) > > > > > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > index 2234fbe6b2..98a75b09c6 100644 > > > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h > > > @@ -768,18 +768,11 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, > > > volatile struct mlx5_cqe *cq, > > > comp_mask), 0)) / > > > (sizeof(uint16_t) * 8); > > > /* D.6 mask out entries after the compressed CQE. */ > > > - mask =3D vcreate_u16(comp_idx < > > > MLX5_VPMD_DESCS_PER_LOOP ? > > > - -1UL >> (comp_idx * sizeof(uint16_t) * 8) : > > > - 0); > > > - invalid_mask =3D vorr_u16(invalid_mask, mask); > > > + invalid_mask =3D vorr_u16(invalid_mask, comp_mask); > > > > Mmmm... I'm not sure we can drop the masking compressed (and > > following) CQE skip. > > Let's consider the completion scenario (the series of 4 CQEs, each > > element is 64B long) > > > > 0: normal uncompressed CQE, ownership OK, format uncompressed, opcode > > OK, no error > > 1: compressed CQE, ownership OK, format compressed, opcode OK, no > > error > > 2: miniCQE array, format can be any!!, may be discovered as ownership > > OK, format uncompressed, opcode OK, no error > > 3: miniCQE array, format can be any!!, may be discovered as ownership > > OK, format uncompressed, opcode OK, no error >=20 > Thanks for your review and explanation about CQE processing details. > I did the change based on the fact that some calculations doesn't change = the > data. > So some intermediate calculations were removed. >=20 > In the above diff section, result of 'mask' always equals to the nearest > 'comp_mask' that above it. > So I just remoed 'mask' and use 'comp_mask' instead. > > > > Obviously, we should unconditionally mask out 2 and 3, regardless of > > recognized their formats/opcode/error/etc. > > I think we can get the diff above and skip diff below: > > > > > /* D.7 count non-compressed valid CQEs. */ > > > n =3D __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16( > > > invalid_mask), 0)) / (sizeof(uint16_t) * 8); > > > nocmp_n +=3D n; > > > - /* D.2 get the final invalid mask. */ > > > - mask =3D vcreate_u16(n < MLX5_VPMD_DESCS_PER_LOOP ? > > > - -1UL >> (n * sizeof(uint16_t) * 8) : 0); > > > - invalid_mask =3D vorr_u16(invalid_mask, mask); > > > > and get the correct final invalid_mask - all compressed and invalid > > CQEs and following ones will be masked out. >=20 > This diff section is similar to the previous one. > 'mask' always equals to the nearest 'invalid_mask' that above it. > So entire line "invalid_mask =3D vorr_u16(invalid_mask, mask);" can be re= moved. >=20 > Code logic is not changed. But I'm not sure the code change impacts reada= bility > or maintainability that you may concern. >=20 > Thanks. > > > > With best regards, > > Slava