From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id E9CCEA0471 for ; Tue, 18 Jun 2019 09:39:34 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2AE321C0CC; Tue, 18 Jun 2019 09:39:34 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 1E2E51C0BD for ; Tue, 18 Jun 2019 09:39:31 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x5I7ZWJ0019274; Tue, 18 Jun 2019 00:39:30 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=G/0UL/7RpeAxUTXHyPb+3obkumYs/box49lWbzvYCtU=; b=LeAu3rVh2crQEZOLlLE16cptUGK8/khLknqzccfOLjFMfeCqG0d0DobiE/7zxwv2DiS8 Xr2u8a5VKx5LWt36KDPUANnlXxe1fYnuJWf3LdC/0bfu3A6uSY7NoM0mx0T0lVhhr7ov gQcoMGu4irpQqdhD+4uRKR2oKVeOfGb8MqXubdX4uHwPvu83KXoc9QNd059nRaYQ0Fxn ZCCi2l2UifT6Ziizo7I/DdubiYoAVwsg10gZ0cXqb4WZrYJh8ZScJVrfefQ3NzeCbN8G moZtOWDACu3DEcScnU9SuC/UKGpa/pwQ886GKCjfamKo7vbIOSUL3qoGWLDPg+ZENHjw EQ== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0a-0016f401.pphosted.com with ESMTP id 2t6qgp8xdq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 18 Jun 2019 00:39:30 -0700 Received: from SC-EXCH04.marvell.com (10.93.176.84) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Tue, 18 Jun 2019 00:39:29 -0700 Received: from NAM03-CO1-obe.outbound.protection.outlook.com (104.47.40.59) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Tue, 18 Jun 2019 00:39:29 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector2-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G/0UL/7RpeAxUTXHyPb+3obkumYs/box49lWbzvYCtU=; b=UlN2xd8SgFFtSi1M8p4PCw+2SCPtjpBb820iV7akqe8kN6UfAP4P+FRhbP1VOo17HDOAR9pNMkCiiDOluTD/ZfIAUCrqG2gYMknLd/qFO/ZHDUkBoEKridLhATNvNZIWFt+4DRZnKaFADiymrk7HxdyYjE616LTz+3OC4owbPrU= Received: from CY4PR1801MB1863.namprd18.prod.outlook.com (10.171.255.14) by CY4PR1801MB1958.namprd18.prod.outlook.com (10.171.255.37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1987.12; Tue, 18 Jun 2019 07:39:24 +0000 Received: from CY4PR1801MB1863.namprd18.prod.outlook.com ([fe80::f54e:57f6:1ca3:7f85]) by CY4PR1801MB1863.namprd18.prod.outlook.com ([fe80::f54e:57f6:1ca3:7f85%5]) with mapi id 15.20.1987.014; Tue, 18 Jun 2019 07:39:24 +0000 From: Pavan Nikhilesh Bhagavatula To: Aaron Conole , Jerin Jacob Kollanukkaran CC: "dev@dpdk.org" , Nithin Kumar Dabilpuram , Vamsi Krishna Attunuru , Olivier Matz Thread-Topic: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2: add optimized dequeue operation for arm64 Thread-Index: AQHVJSVeOfQo3B13ckSGYzWdgzlYgKagXBnXgACM8+A= Date: Tue, 18 Jun 2019 07:39:23 +0000 Message-ID: References: <20190601014905.45531-1-jerinj@marvell.com> <20190617155537.36144-1-jerinj@marvell.com> <20190617155537.36144-26-jerinj@marvell.com> In-Reply-To: Accept-Language: en-IN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [49.205.218.204] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 7537ba37-8e66-496d-ecdf-08d6f3c015aa x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:CY4PR1801MB1958; x-ms-traffictypediagnostic: CY4PR1801MB1958: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:580; x-forefront-prvs: 007271867D x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(136003)(366004)(39860400002)(346002)(376002)(189003)(199004)(13464003)(14454004)(52536014)(316002)(5660300002)(66066001)(74316002)(78486014)(86362001)(186003)(9686003)(26005)(2906002)(55016002)(256004)(6436002)(71190400001)(71200400001)(6636002)(73956011)(446003)(102836004)(476003)(54906003)(11346002)(99286004)(110136005)(229853002)(486006)(7696005)(76116006)(33656002)(66446008)(64756008)(66556008)(66476007)(66946007)(76176011)(6506007)(8936002)(6246003)(4326008)(6116002)(3846002)(68736007)(305945005)(25786009)(81156014)(8676002)(81166006)(478600001)(7736002)(53936002); DIR:OUT; SFP:1101; SCL:1; SRVR:CY4PR1801MB1958; H:CY4PR1801MB1863.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: uM6YWI84zGePpIi2Z8ZHWlWcFFPxFBUv6RxGd68DkswU3idP1IS87KhZVTTkHj3Ee2J5mmVKuDo3M+u//hpjjrqokLN+VwYLPCUoRoZ+8WtDMHnDNYLAON4xCkLxngnvaeNlF9PV85O0o0ehloNRhNOk0qlsptZubhbyZgHuZrZEQaorm3kd1qMLrJ9WwC1Vb8TnXhSR8jDiqTYO8pZTe01zBUK824livJfYuXePJqrxJryvOoKHqEMRQ7ZT/GoKzk4SVjvNaCgTC7B52EG6qGrtMZX7rADd6WwwUV/bB8/Dhn994VKK2QzdIr5XAC/e6DsnciURwsG6blbwdhAVhAWl4xycIscQFda9w/9ekX4wOdgBE1mGHs6NvaMKjlgJe8EJLshC1mOVr5mr+sbnyiIsH27HcTDvESNKeB3pW24= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 7537ba37-8e66-496d-ecdf-08d6f3c015aa X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jun 2019 07:39:24.1518 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: pbhagavatula@marvell.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR1801MB1958 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-06-18_04:, , signatures=0 Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v3 25/27] mempool/octeontx2: add optimized dequeue operation for arm64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Aaron, >-----Original Message----- >From: Aaron Conole >Sent: Tuesday, June 18, 2019 2:55 AM >To: Jerin Jacob Kollanukkaran >Cc: dev@dpdk.org; Nithin Kumar Dabilpuram >; Vamsi Krishna Attunuru >; Pavan Nikhilesh Bhagavatula >; Olivier Matz >Subject: [EXT] Re: [dpdk-dev] [PATCH v3 25/27] mempool/octeontx2: >add optimized dequeue operation for arm64 > >> From: Pavan Nikhilesh >> >> This patch adds an optimized arm64 instruction based routine to >leverage >> CPU pipeline characteristics of octeontx2. The theme is to fill the >> pipeline with CASP operations as much HW can do so that HW can do >alloc() >> HW ops in full throttle. >> >> Cc: Olivier Matz >> Cc: Aaron Conole >> >> Signed-off-by: Pavan Nikhilesh >> Signed-off-by: Jerin Jacob >> Signed-off-by: Vamsi Attunuru >> --- >> drivers/mempool/octeontx2/otx2_mempool_ops.c | 291 >+++++++++++++++++++ >> 1 file changed, 291 insertions(+) >> >> diff --git a/drivers/mempool/octeontx2/otx2_mempool_ops.c >b/drivers/mempool/octeontx2/otx2_mempool_ops.c >> index c59bd73c0..e6737abda 100644 >> --- a/drivers/mempool/octeontx2/otx2_mempool_ops.c >> +++ b/drivers/mempool/octeontx2/otx2_mempool_ops.c >> @@ -37,6 +37,293 @@ npa_lf_aura_op_alloc_one(const int64_t >wdata, int64_t * const addr, >> return -ENOENT; >> } >> >> +#if defined(RTE_ARCH_ARM64) >> +static __rte_noinline int >> +npa_lf_aura_op_search_alloc(const int64_t wdata, int64_t * const >addr, >> + void **obj_table, unsigned int n) >> +{ >> + uint8_t i; >> + >> + for (i =3D 0; i < n; i++) { >> + if (obj_table[i] !=3D NULL) >> + continue; >> + if (npa_lf_aura_op_alloc_one(wdata, addr, obj_table, >i)) >> + return -ENOENT; >> + } >> + >> + return 0; >> +} >> + >> +static __attribute__((optimize("-O3"))) __rte_noinline int __hot > >Sorry if I missed this before. > >Is there a good reason to hard-code this optimization, rather than let >the build system provide it? Some versions of compiler don't have support for __int128_t for CASP inline= -asm. i.e. if the optimization level is reduced to -O0 the CASP restrictions aren= 't followed and=20 compiler might end up violation the CASP rules example: /tmp/ccSPMGzq.s:1648: Error: reg pair must start from even reg at operand 1= - `casp x21,x22,x0,x1,[x19]' /tmp/ccSPMGzq.s:1706: Error: reg pair must start from even reg at operand 1= - `casp x13,x14,x0,x1,[x11]' /tmp/ccSPMGzq.s:1745: Error: reg pair must start from even reg at operand 1= - `casp x9,x10,x0,x1,[x7]' /tmp/ccSPMGzq.s:1775: Error: reg pair must start from even reg at operand 1= - `casp x7,x8,x0,x1,[x5]'* Forcing to -O3 with __rte_noinline in place fixes it as the alignment fits = in. Regards, Pavan. > >> +npa_lf_aura_op_alloc_bulk(const int64_t wdata, int64_t * const >addr, >> + unsigned int n, void **obj_table) >> +{ >> + const __uint128_t wdata128 =3D ((__uint128_t)wdata << 64) | >wdata; >> + uint64x2_t failed =3D vdupq_n_u64(~0); >> + >> + switch (n) { >> + case 32: >> + { >> + __uint128_t t0, t1, t2, t3, t4, t5, t6, t7, t8, t9; >> + __uint128_t t10, t11; >> + >> + asm volatile ( >> + ".cpu generic+lse\n" >> + "casp %[t0], %H[t0], %[wdata], %H[wdata], [%[loc]]\n"