From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5D2CA426A1 for ; Mon, 2 Oct 2023 12:40:09 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4D3BC402EA; Mon, 2 Oct 2023 12:40:09 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by mails.dpdk.org (Postfix) with ESMTP id 300D04003C; Mon, 2 Oct 2023 12:40:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696243207; x=1727779207; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=BIjb4i6wAMdfIgu+tIBRD4QIiia3aGeGM7t4XzrAo9s=; b=hULB6mT2DzjbpB7mPs6k2fpq8rCHcfPtOxDvXJ669IFwhdJ0DM+ZuKsY 7jKmGt1CEk1DtfgLSS6MjqaPmgCZF619Qau/AUdDMLYP6a9mMYaj7Rdas C5+LbTLhXHGe3PpKnK4bbXQcYTvKyXZMNuxFCm6CSCYzstGSO9mmElnV3 p62U4TIVOA94tSTvp62X47b5rY16FkhRk9em6U+Wc+GTFR/r+5UwqQVxG yYNbqVvtAxtwz1aLIxMVNtTHLk0XEJhe5XCN5EDJpRtba4QlRP0kYLJCK /r9nKfirPfSOzFTFQGfvf0UBjnkmzMbJKO1y3Keqe5e7q7nrjSN3+5L/O g==; X-IronPort-AV: E=McAfee;i="6600,9927,10850"; a="4192886" X-IronPort-AV: E=Sophos;i="6.03,194,1694761200"; d="scan'208";a="4192886" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Oct 2023 03:40:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10850"; a="894102634" X-IronPort-AV: E=Sophos;i="6.03,194,1694761200"; d="scan'208";a="894102634" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga001.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 02 Oct 2023 03:38:43 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32; Mon, 2 Oct 2023 03:40:03 -0700 Received: from orsedg603.ED.cps.intel.com (10.7.248.4) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.32 via Frontend Transport; Mon, 2 Oct 2023 03:40:03 -0700 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (104.47.66.40) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.32; Mon, 2 Oct 2023 03:40:03 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Cg8vCRBI/pex0eWhUgmlbXMzuLAVm+isCpoK8cU1Ij9KmASrguUnV8A6XN8oxai6swkKj3pKN6wMuStQCzWkU1sHvAqIZPnW0QvCcrg7wYnkPQH0EE8oap0tJ99YYQkVfBmy6/V8HjwaIKdXuVG+7oZjbaDX7EbOSrwkli30jhdn1lw4jUalhPs7zW6X7vVP3zzLAeZ32d12TzHkD4dNZSYjG46YaWtdeSbOzFz6QqChFFXQtn9OA/fzM1lQ5FXePhJxzBAqrEUveK6sLIPT383w99HKZtrAqnZN1pFVdr1sO148oHWpuOIEjp4K1sFN1GNW7Xk4FjKBQ9FaNXegdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jJNxV+a9pmyH/y0F3e8nvscXLZBCZaExYYoyTjlnHms=; b=aanvKDoL6P8cxvlvwmBveZKhP9qdc0BV8grBSdc8dfnzNHillHR+Ynm8FHNulYO5Jby7NKhrgngo4LBAIA3jCcbHk6O/Utc8iEmMSY3dcb65061xvq6eelV8+m/VrtprwepAYLMMULkc2csdiIZikvd6NVc81PSKrwLfvMejE1AbeP0HpLheYT9DPAQB2EvRm9erh79FgvOZQimGNeMoZl/TUkV2XW1pB+1AVGqptlKsyUaRJqJGhrYo7U4AFpa8bQHbOv1SoBe7ks+rSQOIABKNNyZ82j/K3QMAEnp8UsaM8+9sCuK42OaCpmbUgeEb6KKKVciP5TUQKzgO4l3wKg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by SN7PR11MB6653.namprd11.prod.outlook.com (2603:10b6:806:26f::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.29; Mon, 2 Oct 2023 10:40:02 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::aa85:ead1:baa8:c652]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::aa85:ead1:baa8:c652%2]) with mapi id 15.20.6838.024; Mon, 2 Oct 2023 10:40:01 +0000 Date: Mon, 2 Oct 2023 11:39:54 +0100 From: Bruce Richardson To: Jieqiang Wang CC: Yipeng Wang , Sameh Gobriel , Vladimir Medvedkin , Honnappa Nagarahalli , Dharmik Thakkar , , , , Feifei Wang , Ruifeng Wang Subject: Re: [PATCH] hash: fix SSE comparison Message-ID: References: <20230906023100.3618303-1-jieqiang.wang@arm.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20230906023100.3618303-1-jieqiang.wang@arm.com> X-ClientProxiedBy: DU2PR04CA0279.eurprd04.prod.outlook.com (2603:10a6:10:28c::14) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|SN7PR11MB6653:EE_ X-MS-Office365-Filtering-Correlation-Id: 5704f7bf-fa4d-49cd-796c-08dbc333ee8f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 2/blviGRUL8O4IeQau1PbbcZ7f649MDoblEVFoUhJObmuLhy4iKaDlXUIEPQu+xdIQKJsBNDw/oVLYpWhftAvwZ9DC5DfkNR+FIsY69/Elfwoq1rBSzfyejTi9OzLzw6NgAawWdrR9HuXQPyVrTbdkx8FbzyKLX1K9ohJgEgDo7nk+qiG4uEPvs3H7ASbIBUeFDEGeayJBOdCX4MSmlrHmC3zxSIou5r9n/6dJ4YvGl+/qHXOgmAMKWpHxxZpv+Y5Q1j2/x06QWyhb5JyXIFh+wjcDvEVRsHF8Id5yqJ3x+vGAHupMNYYhSieLo6fYAhbuPkYVdx7HxeVa1XAlyDKRKCaLQnHqdVBesriUWtCB4vGRDCH7zAX7PX1ssKCQRMt2Pr9y9Gu+zNEgskJnNTCky9zDToMGYz8X0aiOYZztyu3c1UhZi/qry/BKveR6UgkBRgvQ30x9MQnAxaSqbm4Ghr5pQXGyd2J5OnLUuLp+tPkShNItxnN08DFA/bKykzSl5qGMQwf0/LhbXgFcCpYYqkXiSOmHtLM1V0yfnF7XI9ALdYn3Il6aWy1t4ve4gUyBXluzzmR3GQB0PvcWpYom/NXQJ84PCcbXifAO5eJ2k= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(396003)(39860400002)(136003)(366004)(346002)(376002)(230922051799003)(1800799009)(186009)(64100799003)(451199024)(6506007)(6666004)(2906002)(26005)(6512007)(66476007)(8936002)(54906003)(4326008)(66946007)(316002)(6916009)(44832011)(5660300002)(86362001)(8676002)(82960400001)(38100700002)(41300700001)(478600001)(66556008)(83380400001)(6486002)(67856001); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?oPfxeLCJSEk34J3dV4pCbS7iBCB2zlCLvS7OLR6WGpIvLMLdFKPMeih5Qczx?= =?us-ascii?Q?8Q+4KvJUT9mifpss/n6E+/fQEd0jVi2qEZ0U+fhUYtSGb0VV5XXN7GkBkgkf?= =?us-ascii?Q?aE4sFpUTHPbrhlMatyfYy3xiBGmsgkHNon14w9VdK4uMq68NT4HYe3lGGZG9?= =?us-ascii?Q?73fx0ZyS0oPDq4BJ+QWwlaYONrcsDSyluOB989P/2GNuv7HVO9ym5Fh5mnfu?= =?us-ascii?Q?pPQpQjQe1E9pCcT54ymNk1M2Q4hvr81ExnePOERfF/yAJjx2kUQajbzR1k21?= =?us-ascii?Q?C0Bj3/xbpurxtlACgU9r4T03Zk5EqwZImK1aRUV2Z6eVbBuHZrejSsPaHTgQ?= =?us-ascii?Q?85nuvlW2tvs1EWKdE0PhL3JDM050WZu169RWDEs5JIaxw+v1TZtREt1tCV4N?= =?us-ascii?Q?yiMv44EXzKWwCVAJduIwIkhqGMwxgSCU1YAagDGgRVMxlr6PP7sZU5NbVC3B?= =?us-ascii?Q?jH1Es1CdJY8IykL55zxUa5LE8glaYT9qo3Xjjez2Y3NyuO/yAah7uV2mDXdI?= =?us-ascii?Q?gUb+t19CuEvZ7MS+bVNCStbCsfRiLQ2uHjuCPYWEPI+LLeBc9WZciX/NdK2z?= =?us-ascii?Q?as/MvT2NUbVCs12XwgNSAzEUZ+pz5meDuI6cBhANxY770epCgobSpVi3T2tL?= =?us-ascii?Q?8ukcapGAUZ97++Ay1T7nB62ecZfWb8Qn7aDHC8ScHk/ul90kVsFqfEJQudWM?= =?us-ascii?Q?QAFVjEF8aKBUgX6AgMIOSteWBimXwZBvXsvZBBMDq2oRP15SXJbEe4eTAv1+?= =?us-ascii?Q?7HmHYQUIKY9Dh3mj0vTUJpaAA0v22pTRgx4k33eHftm5vYh+60iHb1uEBwDK?= =?us-ascii?Q?AAM0ZXqLL90xAcuG5vQONsiqgiRIiQKfqDpZ16Eus60OwATnK/hH1NGYTF7r?= =?us-ascii?Q?nExlzou0hDyr1OdlNk6b0z/2U8Pws+YtYlYIt5GeikQQQr2LROmhZ6t5++1e?= =?us-ascii?Q?4o6sCraWI/NYwNXNZmY1jcAL960yPyiWfSG5+ZqjOXrDY98kq1b+tDZH9s+i?= =?us-ascii?Q?y4nJ01W7a8Ddb131P5HVSVfTkn+NvW/iGoGae/Ujr3yvl0AdWA7/gGWu6aZL?= =?us-ascii?Q?7PbuvTYrMZj3nvZXaeFX9KsXIS4ScCsB21ApV4NGZMNmvWlxlFWrPTSXnDrM?= =?us-ascii?Q?xZaU6chfC9NNqVWin/wWw23kayBVC1C9sCFrZdINJAuJO/gezGLFAOISC1hM?= =?us-ascii?Q?Wy1rhnLRxwlcFIu6QHNef4JFo7k4e4bTR+wVdAlWzvCZl1AlvRUNusNZe+Fl?= =?us-ascii?Q?iguJfafnmxeO93uQCkD+GT1r/pZQZx1j2BRnGEBZvu3En5UJCxjQzeiJAMXF?= =?us-ascii?Q?Q1XzSkCjyv1xjSSR8cevJqlbf3ONvEniLQzojbVUUL9Q+IXWQJo0Jo8QUWlc?= =?us-ascii?Q?yeLhQMzVUm+FtEVsdUDPnGDcAKmcMtUwhgFbX5TazjwPrSa7CDjnG2w2NqJb?= =?us-ascii?Q?gUsgoUUigjKeBHOSZPE3cjdvSGieoCwlFSYC7BCPDgRAIKkR8jYcLkTyLFm2?= =?us-ascii?Q?0Ian2c+/jpDEWMiy3kIvlaRujyC8PoyDzVjQoDxQddp3AemPZXwm+Cb3dIi9?= =?us-ascii?Q?LInBM1i4SjSoVfdC8vc9roB2API2C+raOYDfpST9XpiXCR1nB3ECJ5z5CWFN?= =?us-ascii?Q?wA=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 5704f7bf-fa4d-49cd-796c-08dbc333ee8f X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Oct 2023 10:40:01.8931 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: qb4P2ZNRfLgTkxZo8VFLPHp6nx9xJNdbvqRNAgABWMFfYQARGvY+x5L5/0J7NaoxMAyhGjeWzgtfL3yYkjDyh0mLvDUaj1J9k72fGAhhVsQ= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR11MB6653 X-OriginatorOrg: intel.com X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote: > __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are > equal. In original SSE2 implementation for function compare_signatures, > it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit > element, while we should only care about the MSB of lower 8-bit in each > 16-bit element. > For example, if the comparison result is all equal, SSE2 path returns > 0xFFFF while NEON and default scalar path return 0x5555. > Although this bug is not causing any negative effects since the caller > function solely examines the trailing zeros of each match mask, we > recommend this fix to ensure consistency with NEON and default scalar > code behaviors. > > Fixes: c7d93df552c2 ("hash: use partial-key hashing") > Cc: yipeng1.wang@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Feifei Wang > Signed-off-by: Jieqiang Wang > Reviewed-by: Ruifeng Wang Fix looks correct, but see comment below. I think we can convert the vector mask to a simpler - and possibly faster - scalar one. /Bruce > --- > lib/hash/rte_cuckoo_hash.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c > index d92a903bb3..acaa8b74bd 100644 > --- a/lib/hash/rte_cuckoo_hash.c > +++ b/lib/hash/rte_cuckoo_hash.c > @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches, > /* For match mask the first bit of every two bits indicates the match */ > switch (sig_cmp_fn) { > #if defined(__SSE2__) > - case RTE_HASH_COMPARE_SSE: > + case RTE_HASH_COMPARE_SSE: { > /* Compare all signatures in the bucket */ > - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( > - _mm_load_si128( > + __m128i shift_mask = _mm_set1_epi16(0x0080); Not sure that this variable name is the most descriptive, as we don't actually shift anything using this. How about "results_mask". > + __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128( > (__m128i const *)prim_bkt->sig_current), > - _mm_set1_epi16(sig))); > + _mm_set1_epi16(sig)); > + *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask)); While this will work like you describe, I would think the simpler solution here is not to do a vector mask, but instead to simply do a scalar one. This would save extra vector loads too, since all values could just be masked with compile-time constant 0xAAAA. > /* Compare all signatures in the bucket */ > - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16( > - _mm_load_si128( > + __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128( > (__m128i const *)sec_bkt->sig_current), > - _mm_set1_epi16(sig))); > + _mm_set1_epi16(sig)); > + *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask)); > + } > break; > #elif defined(__ARM_NEON) > case RTE_HASH_COMPARE_NEON: { > -- > 2.25.1 >