From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0CE5DA0487 for ; Wed, 3 Jul 2019 18:54:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D4BC82B94; Wed, 3 Jul 2019 18:54:11 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 1872528EE for ; Wed, 3 Jul 2019 18:54:09 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x63Gnq5V004447; Wed, 3 Jul 2019 09:54:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=d1Dtr3KW3HDP91D3cfjvOM6xPVnNjRlfqskOP7N0KCI=; b=EA3aEicox7iqoCBQRwjg4QM69JTxCumC2S3ojDmVIM2QmUu5afDkEGLnPLxJxP01ypK4 WlUtQc1AiS7FtjPFD6nx3pC9z40sF+kNgnCJ5Q3u9O+Ga0YZEUJLtfz74Qb8bo6e24kK KLynZGgKsi+IVkbXtpbbfx50sPoVwriCB3YAAWh0AJUeAaUMM/TDdW5XehlpQw8a4vXT 2Kh40M2KSLZvfLHZ0HA1nJfQCF3oHS0VSM3t0eeeKYoNhREceQvQRbgReIAWFgBKOXQQ Xl2Jn9NpG22zjSurWCoLV8lYuASRGu5Y/CHpbTtg/wn63AhySHfTedWrskP8n71GCmab QQ== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0a-0016f401.pphosted.com with ESMTP id 2tgrv19yr3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 03 Jul 2019 09:54:05 -0700 Received: from SC-EXCH04.marvell.com (10.93.176.84) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Wed, 3 Jul 2019 09:54:04 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.51) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Wed, 3 Jul 2019 09:54:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=d1Dtr3KW3HDP91D3cfjvOM6xPVnNjRlfqskOP7N0KCI=; b=J3ov8HH4qf/AAkR1IUN4rJmTk7uX6pgl34/AKLUaU2F2QkbbM68vhqJMXp5Zw6Zu1d62NkqlKZwp7uNwbUXncTvm6RM7atNz1iAURdVfADsqvDcxgxmgF8Sb6PI3DLocfsQmJ6eRtXjksgmDhI/F5AY8+Ly1fKtyUUG4apMs9lc= Received: from CY4PR1801MB1863.namprd18.prod.outlook.com (10.171.255.14) by CY4PR1801MB1976.namprd18.prod.outlook.com (10.171.255.141) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.18; Wed, 3 Jul 2019 16:54:02 +0000 Received: from CY4PR1801MB1863.namprd18.prod.outlook.com ([fe80::1859:6ecd:9898:f96c]) by CY4PR1801MB1863.namprd18.prod.outlook.com ([fe80::1859:6ecd:9898:f96c%7]) with mapi id 15.20.2032.019; Wed, 3 Jul 2019 16:54:02 +0000 From: Pavan Nikhilesh Bhagavatula To: Pavan Nikhilesh Bhagavatula , "Jerin Jacob Kollanukkaran" CC: "dev@dpdk.org" , ", Gavin Hu" Thread-Topic: [dpdk-dev][PATCH] mempool/octeontx2: fix clang build failure Thread-Index: AQHVMb+xGQZl/35onEGMFKFJHkxgPaa5HADg Date: Wed, 3 Jul 2019 16:54:02 +0000 Message-ID: References: <20190703165220.1068-1-pbhagavatula@marvell.com> In-Reply-To: <20190703165220.1068-1-pbhagavatula@marvell.com> Accept-Language: en-IN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [223.226.66.166] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 0953238c-2889-484c-f4b8-08d6ffd70d14 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:CY4PR1801MB1976; x-ms-traffictypediagnostic: CY4PR1801MB1976: x-ms-exchange-purlcount: 1 x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:1265; x-forefront-prvs: 00872B689F x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(396003)(346002)(366004)(376002)(39860400002)(136003)(189003)(199004)(13464003)(7736002)(305945005)(3846002)(6116002)(74316002)(4326008)(81156014)(14444005)(256004)(81166006)(8936002)(11346002)(446003)(68736007)(486006)(476003)(55016002)(76176011)(66066001)(6436002)(7696005)(6306002)(2906002)(110136005)(316002)(8676002)(99286004)(229853002)(9686003)(54906003)(6246003)(6636002)(186003)(53936002)(33656002)(6506007)(53946003)(26005)(55236004)(102836004)(30864003)(71190400001)(71200400001)(5660300002)(73956011)(76116006)(14454004)(52536014)(478600001)(66446008)(64756008)(25786009)(66556008)(66476007)(66946007)(86362001); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR1801MB1976; H:CY4PR1801MB1863.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: LUEcpRFzuv3gKHyeRIOHTZra6dwiuB3QCY+hVBUiDcV3ZnwHUU5iiJoMHIgE+shKvy81u3ZkaLBHc5ZijKPx04NpiHRlSWUFbc3lYV/pJlaXiq6qf8NOmt17Tl07dTlfGs3D7RQi026RT0VbI2qOHzHiWQbpuGWcPTGIPlfpk8HtqlxGQPWjqQeUvU8bmgpBGfDiW/BwVyNXhVC2I9Urco1ojEigssbuNWWb6QOnHGbGh0lCVr0wWUI58XgM2/ilOHLJd7jE9YSZ9NcspbWEcDHATavoCozzqp2/LvUwcoiQwTvvsbXlyCYb/d8RWf8Eq/XQWDArixJMm4Yxc23JzBP4dvItUGnVlpbhqajJ5trJGPcJ8SR3uoSXYk6icy/VN8XzfXnrPC0PX3oca8eXrKS5HMxT9tmZzwoUfgAfzxc= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 0953238c-2889-484c-f4b8-08d6ffd70d14 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Jul 2019 16:54:02.0333 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: pbhagavatula@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1801MB1976 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-07-03_04:, , signatures=0 Subject: Re: [dpdk-dev] [PATCH] mempool/octeontx2: fix clang build failure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" +Cc : Gavin Hu For some reason send mail didn't pick up reported-by. >-----Original Message----- >From: pbhagavatula@marvell.com >Sent: Wednesday, July 3, 2019 10:22 PM >To: Jerin Jacob Kollanukkaran >Cc: dev@dpdk.org; Pavan Nikhilesh Bhagavatula > >Subject: [dpdk-dev][PATCH] mempool/octeontx2: fix clang build failure > >From: Pavan Nikhilesh > >The ARMv8.1 CASP instruction works with even register pairs and since >there no register constraint in older versions of GCC/Clang, use >explicit register allocation to satisfy CASP requirements. > >Fixs build issue with arm64-armv8a-linux-clang. > >Fixes: ee338015e7a9 ("mempool/octeontx2: add optimized dequeue >operation for arm64") > >Reported-by: Gavin Hu >Signed-off-by: Pavan Nikhilesh >Signed-off-by: Jerin Jacob >--- > >Upstreamed gcc fix: >https://github.com/gcc-mirror/gcc/commit/a1bdb8f296aac911 > > drivers/mempool/octeontx2/otx2_mempool_ops.c | 278 +++++++++-- >-------- > 1 file changed, 127 insertions(+), 151 deletions(-) > >diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c >b/drivers/mempool/octeontx2/otx2_mempool_ops.c >index 97146d1fe..e1764b030 100644 >--- a/drivers/mempool/octeontx2/otx2_mempool_ops.c >+++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c >@@ -54,233 +54,206 @@ npa_lf_aura_op_search_alloc(const int64_t >wdata, int64_t * const addr, > return 0; > } > >-/* >- * Some versions of the compiler don't have support for __int128_t for >- * CASP inline-asm. i.e. if the optimization level is reduced to -O0 the >- * CASP restrictions aren't followed and the compiler might end up >violation the >- * CASP rules. Fix it by explicitly providing ((optimize("-O3"))). >- * >- * Example: >- * ccSPMGzq.s:1648: Error: reg pair must start from even reg at >- * operand 1 - `casp x21,x22,x0,x1,[x19]' >- */ >-static __attribute__((optimize("-O3"))) __rte_noinline int __hot >+static __rte_always_inline int > npa_lf_aura_op_alloc_bulk(const int64_t wdata, int64_t * const addr, > unsigned int n, void **obj_table) > { >- const __uint128_t wdata128 =3D ((__uint128_t)wdata << 64) | >wdata; >+ register const uint64_t wdata64 __asm("x26") =3D wdata; >+ register const uint64_t wdata128 __asm("x27") =3D wdata; > uint64x2_t failed =3D vdupq_n_u64(~0); > > switch (n) { > case 32: > { >- __uint128_t t0, t1, t2, t3, t4, t5, t6, t7, t8, t9; >- __uint128_t t10, t11; >- > asm volatile ( > ".cpu generic+lse\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t1], %H[t1], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t2], %H[t2], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t3], %H[t3], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t4], %H[t4], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t5], %H[t5], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t6], %H[t6], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t7], %H[t7], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t8], %H[t8], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t9], %H[t9], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t10], %H[t10], %[wdata], %H[wdata], >[%[loc]]\n" >- "casp %[t11], %H[t11], %[wdata], %H[wdata], >[%[loc]]\n" >- "fmov d16, %[t0]\n" >- "fmov v16.D[1], %H[t0]\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d17, %[t1]\n" >- "fmov v17.D[1], %H[t1]\n" >- "casp %[t1], %H[t1], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d18, %[t2]\n" >- "fmov v18.D[1], %H[t2]\n" >- "casp %[t2], %H[t2], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d19, %[t3]\n" >- "fmov v19.D[1], %H[t3]\n" >- "casp %[t3], %H[t3], %[wdata], %H[wdata], [%[loc]]\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x2, x3, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x4, x5, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x6, x7, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x8, x9, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x10, x11, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x12, x13, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x14, x15, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x16, x17, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x18, x19, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x20, x21, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x22, x23, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d16, x0\n" >+ "fmov v16.D[1], x1\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d17, x2\n" >+ "fmov v17.D[1], x3\n" >+ "casp x2, x3, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d18, x4\n" >+ "fmov v18.D[1], x5\n" >+ "casp x4, x5, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d19, x6\n" >+ "fmov v19.D[1], x7\n" >+ "casp x6, x7, %[wdata64], %[wdata128], [%[loc]]\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "and %[failed].16B, %[failed].16B, v17.16B\n" > "and %[failed].16B, %[failed].16B, v18.16B\n" > "and %[failed].16B, %[failed].16B, v19.16B\n" >- "fmov d20, %[t4]\n" >- "fmov v20.D[1], %H[t4]\n" >- "fmov d21, %[t5]\n" >- "fmov v21.D[1], %H[t5]\n" >- "fmov d22, %[t6]\n" >- "fmov v22.D[1], %H[t6]\n" >- "fmov d23, %[t7]\n" >- "fmov v23.D[1], %H[t7]\n" >+ "fmov d20, x8\n" >+ "fmov v20.D[1], x9\n" >+ "fmov d21, x10\n" >+ "fmov v21.D[1], x11\n" >+ "fmov d22, x12\n" >+ "fmov v22.D[1], x13\n" >+ "fmov d23, x14\n" >+ "fmov v23.D[1], x15\n" > "and %[failed].16B, %[failed].16B, v20.16B\n" > "and %[failed].16B, %[failed].16B, v21.16B\n" > "and %[failed].16B, %[failed].16B, v22.16B\n" > "and %[failed].16B, %[failed].16B, v23.16B\n" > "st1 { v16.2d, v17.2d, v18.2d, v19.2d}, [%[dst]], 64\n" > "st1 { v20.2d, v21.2d, v22.2d, v23.2d}, [%[dst]], 64\n" >- "fmov d16, %[t8]\n" >- "fmov v16.D[1], %H[t8]\n" >- "fmov d17, %[t9]\n" >- "fmov v17.D[1], %H[t9]\n" >- "fmov d18, %[t10]\n" >- "fmov v18.D[1], %H[t10]\n" >- "fmov d19, %[t11]\n" >- "fmov v19.D[1], %H[t11]\n" >+ "fmov d16, x16\n" >+ "fmov v16.D[1], x17\n" >+ "fmov d17, x18\n" >+ "fmov v17.D[1], x19\n" >+ "fmov d18, x20\n" >+ "fmov v18.D[1], x21\n" >+ "fmov d19, x22\n" >+ "fmov v19.D[1], x23\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "and %[failed].16B, %[failed].16B, v17.16B\n" > "and %[failed].16B, %[failed].16B, v18.16B\n" > "and %[failed].16B, %[failed].16B, v19.16B\n" >- "fmov d20, %[t0]\n" >- "fmov v20.D[1], %H[t0]\n" >- "fmov d21, %[t1]\n" >- "fmov v21.D[1], %H[t1]\n" >- "fmov d22, %[t2]\n" >- "fmov v22.D[1], %H[t2]\n" >- "fmov d23, %[t3]\n" >- "fmov v23.D[1], %H[t3]\n" >+ "fmov d20, x0\n" >+ "fmov v20.D[1], x1\n" >+ "fmov d21, x2\n" >+ "fmov v21.D[1], x3\n" >+ "fmov d22, x4\n" >+ "fmov v22.D[1], x5\n" >+ "fmov d23, x6\n" >+ "fmov v23.D[1], x7\n" > "and %[failed].16B, %[failed].16B, v20.16B\n" > "and %[failed].16B, %[failed].16B, v21.16B\n" > "and %[failed].16B, %[failed].16B, v22.16B\n" > "and %[failed].16B, %[failed].16B, v23.16B\n" > "st1 { v16.2d, v17.2d, v18.2d, v19.2d}, [%[dst]], 64\n" > "st1 { v20.2d, v21.2d, v22.2d, v23.2d}, [%[dst]], 64\n" >- : "+Q" (*addr), [failed] "=3D&w" (failed), >- [t0] "=3D&r" (t0), [t1] "=3D&r" (t1), [t2] "=3D&r" (t2), >- [t3] "=3D&r" (t3), [t4] "=3D&r" (t4), [t5] "=3D&r" (t5), >- [t6] "=3D&r" (t6), [t7] "=3D&r" (t7), [t8] "=3D&r" (t8), >- [t9] "=3D&r" (t9), [t10] "=3D&r" (t10), [t11] "=3D&r" (t11) >- : [wdata] "r" (wdata128), [dst] "r" (obj_table), >- [loc] "r" (addr) >- : "memory", "v16", "v17", "v18", >- "v19", "v20", "v21", "v22", "v23" >+ : "+Q" (*addr), [failed] "=3D&w" (failed) >+ : [wdata64] "r" (wdata64), [wdata128] "r" (wdata128), >+ [dst] "r" (obj_table), [loc] "r" (addr) >+ : "memory", "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", >+ "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", >"x16", >+ "x17", "x18", "x19", "x20", "x21", "x22", "x23", "v16", >"v17", >+ "v18", "v19", "v20", "v21", "v22", "v23" > ); > break; > } > case 16: > { >- __uint128_t t0, t1, t2, t3, t4, t5, t6, t7; >- > asm volatile ( > ".cpu generic+lse\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t1], %H[t1], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t2], %H[t2], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t3], %H[t3], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t4], %H[t4], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t5], %H[t5], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t6], %H[t6], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t7], %H[t7], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d16, %[t0]\n" >- "fmov v16.D[1], %H[t0]\n" >- "fmov d17, %[t1]\n" >- "fmov v17.D[1], %H[t1]\n" >- "fmov d18, %[t2]\n" >- "fmov v18.D[1], %H[t2]\n" >- "fmov d19, %[t3]\n" >- "fmov v19.D[1], %H[t3]\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x2, x3, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x4, x5, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x6, x7, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x8, x9, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x10, x11, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x12, x13, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x14, x15, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d16, x0\n" >+ "fmov v16.D[1], x1\n" >+ "fmov d17, x2\n" >+ "fmov v17.D[1], x3\n" >+ "fmov d18, x4\n" >+ "fmov v18.D[1], x5\n" >+ "fmov d19, x6\n" >+ "fmov v19.D[1], x7\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "and %[failed].16B, %[failed].16B, v17.16B\n" > "and %[failed].16B, %[failed].16B, v18.16B\n" > "and %[failed].16B, %[failed].16B, v19.16B\n" >- "fmov d20, %[t4]\n" >- "fmov v20.D[1], %H[t4]\n" >- "fmov d21, %[t5]\n" >- "fmov v21.D[1], %H[t5]\n" >- "fmov d22, %[t6]\n" >- "fmov v22.D[1], %H[t6]\n" >- "fmov d23, %[t7]\n" >- "fmov v23.D[1], %H[t7]\n" >+ "fmov d20, x8\n" >+ "fmov v20.D[1], x9\n" >+ "fmov d21, x10\n" >+ "fmov v21.D[1], x11\n" >+ "fmov d22, x12\n" >+ "fmov v22.D[1], x13\n" >+ "fmov d23, x14\n" >+ "fmov v23.D[1], x15\n" > "and %[failed].16B, %[failed].16B, v20.16B\n" > "and %[failed].16B, %[failed].16B, v21.16B\n" > "and %[failed].16B, %[failed].16B, v22.16B\n" > "and %[failed].16B, %[failed].16B, v23.16B\n" > "st1 { v16.2d, v17.2d, v18.2d, v19.2d}, [%[dst]], 64\n" > "st1 { v20.2d, v21.2d, v22.2d, v23.2d}, [%[dst]], 64\n" >- : "+Q" (*addr), [failed] "=3D&w" (failed), >- [t0] "=3D&r" (t0), [t1] "=3D&r" (t1), [t2] "=3D&r" (t2), >- [t3] "=3D&r" (t3), [t4] "=3D&r" (t4), [t5] "=3D&r" (t5), >- [t6] "=3D&r" (t6), [t7] "=3D&r" (t7) >- : [wdata] "r" (wdata128), [dst] "r" (obj_table), >- [loc] "r" (addr) >- : "memory", "v16", "v17", "v18", "v19", >- "v20", "v21", "v22", "v23" >+ : "+Q" (*addr), [failed] "=3D&w" (failed) >+ : [wdata64] "r" (wdata64), [wdata128] "r" (wdata128), >+ [dst] "r" (obj_table), [loc] "r" (addr) >+ : "memory", "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", >+ "x8", "x9", "x10", "x11", "x12", "x13", "x14", "x15", >"v16", >+ "v17", "v18", "v19", "v20", "v21", "v22", "v23" > ); > break; > } > case 8: > { >- __uint128_t t0, t1, t2, t3; >- > asm volatile ( > ".cpu generic+lse\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t1], %H[t1], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t2], %H[t2], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t3], %H[t3], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d16, %[t0]\n" >- "fmov v16.D[1], %H[t0]\n" >- "fmov d17, %[t1]\n" >- "fmov v17.D[1], %H[t1]\n" >- "fmov d18, %[t2]\n" >- "fmov v18.D[1], %H[t2]\n" >- "fmov d19, %[t3]\n" >- "fmov v19.D[1], %H[t3]\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x2, x3, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x4, x5, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x6, x7, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d16, x0\n" >+ "fmov v16.D[1], x1\n" >+ "fmov d17, x2\n" >+ "fmov v17.D[1], x3\n" >+ "fmov d18, x4\n" >+ "fmov v18.D[1], x5\n" >+ "fmov d19, x6\n" >+ "fmov v19.D[1], x7\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "and %[failed].16B, %[failed].16B, v17.16B\n" > "and %[failed].16B, %[failed].16B, v18.16B\n" > "and %[failed].16B, %[failed].16B, v19.16B\n" > "st1 { v16.2d, v17.2d, v18.2d, v19.2d}, [%[dst]], 64\n" >- : "+Q" (*addr), [failed] "=3D&w" (failed), >- [t0] "=3D&r" (t0), [t1] "=3D&r" (t1), [t2] "=3D&r" (t2), >- [t3] "=3D&r" (t3) >- : [wdata] "r" (wdata128), [dst] "r" (obj_table), >- [loc] "r" (addr) >- : "memory", "v16", "v17", "v18", "v19" >+ : "+Q" (*addr), [failed] "=3D&w" (failed) >+ : [wdata64] "r" (wdata64), [wdata128] "r" (wdata128), >+ [dst] "r" (obj_table), [loc] "r" (addr) >+ : "memory", "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7", >+ "v16", "v17", "v18", "v19" > ); > break; > } > case 4: > { >- __uint128_t t0, t1; >- > asm volatile ( > ".cpu generic+lse\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "casp %[t1], %H[t1], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d16, %[t0]\n" >- "fmov v16.D[1], %H[t0]\n" >- "fmov d17, %[t1]\n" >- "fmov v17.D[1], %H[t1]\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "casp x2, x3, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d16, x0\n" >+ "fmov v16.D[1], x1\n" >+ "fmov d17, x2\n" >+ "fmov v17.D[1], x3\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "and %[failed].16B, %[failed].16B, v17.16B\n" > "st1 { v16.2d, v17.2d}, [%[dst]], 32\n" >- : "+Q" (*addr), [failed] "=3D&w" (failed), >- [t0] "=3D&r" (t0), [t1] "=3D&r" (t1) >- : [wdata] "r" (wdata128), [dst] "r" (obj_table), >- [loc] "r" (addr) >- : "memory", "v16", "v17" >+ : "+Q" (*addr), [failed] "=3D&w" (failed) >+ : [wdata64] "r" (wdata64), [wdata128] "r" (wdata128), >+ [dst] "r" (obj_table), [loc] "r" (addr) >+ : "memory", "x0", "x1", "x2", "x3", "v16", "v17" > ); > break; > } > case 2: > { >- __uint128_t t0; >- > asm volatile ( > ".cpu generic+lse\n" >- "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n" >- "fmov d16, %[t0]\n" >- "fmov v16.D[1], %H[t0]\n" >+ "casp x0, x1, %[wdata64], %[wdata128], [%[loc]]\n" >+ "fmov d16, x0\n" >+ "fmov v16.D[1], x1\n" > "and %[failed].16B, %[failed].16B, v16.16B\n" > "st1 { v16.2d}, [%[dst]], 16\n" >- : "+Q" (*addr), [failed] "=3D&w" (failed), >- [t0] "=3D&r" (t0) >- : [wdata] "r" (wdata128), [dst] "r" (obj_table), >- [loc] "r" (addr) >- : "memory", "v16" >+ : "+Q" (*addr), [failed] "=3D&w" (failed) >+ : [wdata64] "r" (wdata64), [wdata128] "r" (wdata128), >+ [dst] "r" (obj_table), [loc] "r" (addr) >+ : "memory", "x0", "x1", "v16" > ); > break; > } >@@ -308,7 +281,7 @@ otx2_npa_clear_alloc(struct rte_mempool *mp, >void **obj_table, unsigned int n) > } > } > >-static inline int __hot >+static __rte_noinline int __hot > otx2_npa_deq_arm64(struct rte_mempool *mp, void **obj_table, >unsigned int n) > { > const int64_t wdata =3D npa_lf_aura_handle_to_aura(mp- >>pool_id); >@@ -332,7 +305,8 @@ otx2_npa_deq_arm64(struct rte_mempool >*mp, void **obj_table, unsigned int n) > > return 0; > } >-#endif >+ >+#else > > static inline int __hot > otx2_npa_deq(struct rte_mempool *mp, void **obj_table, unsigned >int n) >@@ -359,6 +333,8 @@ otx2_npa_deq(struct rte_mempool *mp, void >**obj_table, unsigned int n) > return 0; > } > >+#endif >+ > static unsigned int > otx2_npa_get_count(const struct rte_mempool *mp) > { >-- >2.22.0