From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0091.outbound.protection.outlook.com [207.46.100.91]) by dpdk.org (Postfix) with ESMTP id 3CF8E568A for ; Wed, 2 Dec 2015 11:48:38 +0100 (CET) Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.Jacob@caviumnetworks.com; Received: from localhost.localdomain (122.167.201.210) by CY1PR0701MB1727.namprd07.prod.outlook.com (10.163.21.141) with Microsoft SMTP Server (TLS) id 15.1.331.20; Wed, 2 Dec 2015 10:48:34 +0000 Date: Wed, 2 Dec 2015 16:18:13 +0530 From: Jerin Jacob To: "Ananyev, Konstantin" Message-ID: <20151202104811.GA6337@localhost.localdomain> References: <1448995276-9599-1-git-send-email-jianbo.liu@linaro.org> <1448995276-9599-4-git-send-email-jianbo.liu@linaro.org> <20151201164139.GA12144@localhost.localdomain> <20151202080259.GA32494@localhost.localdomain> <2601191342CEEE43887BDE71AB97725836ACFADA@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836ACFADA@irsmsx105.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Originating-IP: [122.167.201.210] X-ClientProxiedBy: MAXPR01CA0012.INDPRD01.PROD.OUTLOOK.COM (25.164.147.19) To CY1PR0701MB1727.namprd07.prod.outlook.com (25.163.21.141) X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 2:Wg9Rfwdi/E2tmNCJ/9BTN5CgwRYQTaRBMrSEqmaW/aczXNaUN9f/Lph+rEkv9PS1BQAZxWCxQvkA0XxTy+ANc/q4nTlJL7QdtMZMmXRh7tBy5dRuAvKA9wnqslPfwJBtWNfgm7fxbTbxCp6D9BK2RA==; 3:c7NMdjPzCqm/YN5gdC9o3HRyPmCuF+xGifc9J3UV9h4GHjPO7gy40lCGrmUCsYMOa9t5fRUkrG3pWaNFTyhpPrwBiEopnJkunZVJpIxcEaBkDLt0B6BgoeJxQG/6Md8q; 25:kMXcUb0Tl+uFO4vu0K9KnGU0VdR6jbND0VE0sNrh53VKDnBm1KL7CnzrN4qeFB/Kte62Y7kc5Gq5YXzcgFBJrRho/nPVwjEs2+tOJZfgsWp152pWdwblF0edCcvI/yswQ9VCIpVSyF4YX00lTH1qUD/4I9KYxu6WKOOA5HI0nyVFLa7UUzQqknymMFYoX2Bu+SFQCRc/hqknmrJx6g2YAYmOwKs0fNqLW+JE9KJP2SNtz1i+hMKV1Bf5sR8MZ28tw+21gx86aPgEkEz8HdRdug== X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CY1PR0701MB1727; X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 20:yFIh/7iVrTlFQqEo2wVhAFNjn2NnY1L8P+NMLx0kmrfneM3P+SU/OjDFCmD7lzeddOqJOZ8e2R/eqwARGXNd9FiqkW4tzpNwDIgnmJrwfg4MXMP2/vDQzem4Y2Ie5ao1WmrrtHB7o08/17W8sJY/D9wRx2N3yzXm2mx2YqOpH3ie4VU8me7LQ149qJzG4SpF8MVMW8xc5DTKj9WVwHQ4k8S65hhVzBlHoSZFVg5W5FPR45k/bkHGDJ/4iHc0prRoc3/JcolLzVSP6raQH/Q8fEt/Q3qitPkYO3ci/3PPGoTdSdgnkcZymKpcZmQe9aWMmcrNqS5L5FRHmzBUDYkvk1+VKvLB0O0kJHOUkCe26OFqlKqnI1CfRt4xbxfDWXHK2V4/FWwdsBDigox/SIBiF5ciOm/hy/1mAGI/zxX3+y8tzx2kqsAG5NrNlu43i20QcBEA0ggtrmQYHsf0JgflaQvqoNZzuqP24K946sXUqo4NNFElYRhFGZOz6nJs+EF+G6WcFkpWR5cYnI31IwYuomI0M07dSoQMGtOLRs9iOSVaW74j/H+YKOpfU3/7tuIcKneV0weFc8WUJPYcRXocGDXG+USDaHVOo/WuwrPDiZ4= X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(236414709691187); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(601004)(2401047)(8121501046)(520078)(5005006)(10201501046)(3002001); SRVR:CY1PR0701MB1727; BCL:0; PCL:0; RULEID:; SRVR:CY1PR0701MB1727; X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 4:lW7BwW4YCddmHpuyY6l7K6Aj9bsncyfYTgmcbnsvgcO1hG4EdjofpafaiyiFYUsX/NrP+Nwkv4jgakGRyBAmw2zL49FkTMecNseEYcU/++pL3lkilrZ9pjEPP3sIHMPMzm2PZwwjXmiPag+ThVDFWNKwVK43U9vg8eD+apL0oDBp5dyaeh7066TPIzYP0GjRg8mCaW3SFxNhmNcBZ7gQ5dyTMudkdHtoH+XhLhq8yM+dlUzFl7STWiIgvfv6cfluaobcwWLc0KHM9zs1QKm/AKdq3n9Qc+KXWPpz4/GlyEdZXsJu0OiDdaYziqnMkU7Zh2UF3aoGnXqPSl2HsWPEE2skGN4R82n18ywJs/WbPRSYKUyKwNrRzpO8XhD+O04GgVyIUKqq+XdsYZgAREuB8Iq8165gsG9e6ZM5eqJaQzCwllyFbOA+G8adegloskWi X-Forefront-PRVS: 077884B8B5 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6069001)(6009001)(53754006)(174864002)(189002)(199003)(377454003)(24454002)(13464003)(33656002)(77096005)(19580405001)(110136002)(23726003)(6116002)(5004730100002)(97756001)(19580395003)(81156007)(3846002)(87976001)(61506002)(42186005)(106356001)(2950100001)(86362001)(5008740100001)(586003)(46406003)(66066001)(40100003)(101416001)(105586002)(83506001)(54356999)(5001960100002)(189998001)(50466002)(4001350100001)(1076002)(97736004)(93886004)(1096002)(47776003)(122386002)(50986999)(92566002)(76176999)(7099028)(357404004); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR0701MB1727; H:localhost.localdomain; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; Received-SPF: None (protection.outlook.com: caviumnetworks.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR0701MB1727; 23:gQ1nc5iuorJGogGOheCLDtdUdDVhgSfVWFGed6K?= =?us-ascii?Q?98xW9GgYdlCgVI9OSsAZRzxSQBJKTuzzUmyTCqbN5vIA2uLUepn9U77Ur37J?= =?us-ascii?Q?oOMXalaiff5EvciMUjmislRsunLqE6CLJxz34RMOY/ckKkqGHvh4GTfakU8k?= =?us-ascii?Q?8IptEIGceyESWAwtx9tC358VGQqB/1dzwNu4/YqgKyHYXk5+3BUYqcwlOdJz?= =?us-ascii?Q?o9EB4+b2/qRs3LJGKKVksSSrNRhecKO8Moek5dIj1e0m3D3tqmjAvrum9i3P?= =?us-ascii?Q?7mlCYZaybebW6b81Svusit25KSjXlc4CaH7UnFVU4U7N9Udy8RrxSN8FZR73?= =?us-ascii?Q?dAyeDgV6uvJud3SHx7Gb1p9KFDH5nVit4oLoKSfpg7fsZMn3O4UunFxZeyM4?= =?us-ascii?Q?0Yevo+skAUmE77C+TUrlh2p7jYLfcXxToMR0wjpUkq/LdwZpr+JxJ3zRFLkl?= =?us-ascii?Q?KtJ8CG9GERF3bTRvRdT2c9X5yvQ4EeJdd2tcd8oS2mjkA47KYYiiaoCQqb1Z?= =?us-ascii?Q?9+mRBnmLaaiXFG62RETlZRVDMFUcu01xZu845ZK2Q/ovb3bYQ9YzruXyL6OV?= =?us-ascii?Q?kbW9NOUXfAAPJ/a7LwE/Xd110428huTmTefFC7OjRIM5AbIkYnrwfNHeQHFm?= =?us-ascii?Q?xgOnY9j82Ymfe7GlV2AwG+5Y3IduV+9Hr62ETp2r1tuFBb5dLUUcPHk+FueB?= =?us-ascii?Q?tBa8JlEcu5FdtuwaoTxTu0VfJbFYbo7NLH/KJxcZxwCmGTViEMTYtnGIAxWk?= =?us-ascii?Q?eBoeewS09FH62A+qpUlWDN0zjh/bwfcUYLMoFVFytQ8RIeOftK1iyiC7U9id?= =?us-ascii?Q?evvNfw/NQF6XwOO3ndNWBolVf3V4kJLI6XyboZOsCdDpnRqS9O/GWdls/Nsy?= =?us-ascii?Q?PoytVokCFQxXVO/a3OOF2SYtrnCmeuE7P5EFVPJRYLl/TxR+V6Ej1mJetlen?= =?us-ascii?Q?Yy6yEJTQmVbD7IK5AduDUmGeLKmKRKPItqjIVj42Ix+g4D62fkzIUYbaek2V?= =?us-ascii?Q?e3rTj//KF12Jwjyd+QChjzMpqXqbqP87eRgXeFDKZZJdU5NHQrrJKpGFKh6j?= =?us-ascii?Q?tQ/qlFc8+7tM6EL0sjAE+e4/lXlUGn4X1rqUOEIWNtWO1hV+NuXQjw6QGGt/?= =?us-ascii?Q?PepxZzWesulFPmmwlKIJ4vOqPqvb/Iuvl6MLjAc9Ur/l7mP8v4PRtlEoOqWa?= =?us-ascii?Q?vmwwZ5AQdL2VsQDqd22JzWHaq9FpXMkOOecwpjRUkfR62AKfvhk7n0cYjC5K?= =?us-ascii?Q?m5uwrYTNkRvqrl1CrsE6FQVfsen8ZUJNNQqKLJ5nP?= X-Microsoft-Exchange-Diagnostics: 1; CY1PR0701MB1727; 5:SRny2DFUQMn8QVquuiYDpKobOao2WS/Sv1WnkXxWWqdA/wjmSxEHJPxEtvGC17rSYvq0q6pnjYjvN1XOYHSuXk0dTcRMA0hciT9bmCmQDr3C+XlAA8LMBXc+AU/v3PrUlZzRdEC+wnz2M8BfelkdZQ==; 24:trUdXIJ1YesY+7CcnC1jR3XhX1W7PeQ4+iLEaItYqI5HvFn60FraSRPp0aPSHHlSmPlSGJTGEQiGmxOAzHFg8nDznGm4bBrk4PxzdaRIKMg= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Dec 2015 10:48:34.4494 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0701MB1727 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Dec 2015 10:48:39 -0000 On Wed, Dec 02, 2015 at 10:33:44AM +0000, Ananyev, Konstantin wrote: > Hi everyone, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu > > Sent: Wednesday, December 02, 2015 9:50 AM > > To: Jerin Jacob > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH 3/4] eal/arm: Enable lpm/table/pipeline libs > > > > On 2 December 2015 at 16:03, Jerin Jacob wrote: > > > On Wed, Dec 02, 2015 at 02:54:52PM +0800, Jianbo Liu wrote: > > >> On 2 December 2015 at 00:41, Jerin Jacob wrote: > > >> > On Tue, Dec 01, 2015 at 01:41:15PM -0500, Jianbo Liu wrote: > > >> >> Adds ARM NEON support for lpm. > > >> >> And enables table/pipeline libraries which depend on lpm. > > >> > > > >> > I already sent the patch on the same yesterday. > > >> > We can converge the patches after the discussion. > > >> > Please check "[dpdk-dev] [PATCH 0/3] add lpm support for NEON" on ml > > >> > > > >> Yes, I have read your patch. But there are many differences, so I sent > > >> mine for your reviewing :) > > >> > > >> > > > >> >> > > >> >> Signed-off-by: Jianbo Liu > > >> >> --- > > >> >> config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > > >> >> config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > >> >> lib/librte_eal/common/include/arch/arm/rte_vect.h | 28 ++++++++++ > > >> >> lib/librte_lpm/rte_lpm.h | 68 ++++++++++++++++------- > > >> >> 4 files changed, 77 insertions(+), 25 deletions(-) > > >> >> > > >> >> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> index cbebd64..efffa1f 100644 > > >> >> --- a/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > > >> >> @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n > > >> >> CONFIG_RTE_EAL_IGB_UIO=n > > >> >> > > >> >> # fails to compile on ARM > > >> >> -CONFIG_RTE_LIBRTE_LPM=n > > >> >> -CONFIG_RTE_LIBRTE_TABLE=n > > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n > > >> >> CONFIG_RTE_SCHED_VECTOR=n > > >> >> > > >> >> # cannot use those on ARM > > >> >> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> index 504f3ed..57f7941 100644 > > >> >> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc > > >> >> @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n > > >> >> CONFIG_RTE_LIBRTE_FM10K_PMD=n > > >> >> CONFIG_RTE_LIBRTE_I40E_PMD=n > > >> >> > > >> >> -CONFIG_RTE_LIBRTE_LPM=n > > >> >> -CONFIG_RTE_LIBRTE_TABLE=n > > >> >> -CONFIG_RTE_LIBRTE_PIPELINE=n > > >> >> CONFIG_RTE_SCHED_VECTOR=n > > >> >> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> index a33c054..7437711 100644 > > >> >> --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > >> >> @@ -41,6 +41,8 @@ extern "C" { > > >> >> > > >> >> typedef int32x4_t xmm_t; > > >> >> > > >> >> +typedef int32x4_t __m128i; > > >> >> + > > >> >> #define XMM_SIZE (sizeof(xmm_t)) > > >> >> #define XMM_MASK (XMM_SIZE - 1) > > >> >> > > >> >> @@ -53,6 +55,32 @@ typedef union rte_xmm { > > >> >> double pd[XMM_SIZE / sizeof(double)]; > > >> >> } __attribute__((aligned(16))) rte_xmm_t; > > >> >> > > >> >> +static __inline __m128i > > >> >> +_mm_set_epi32(int i3, int i2, int i1, int i0) > > >> >> +{ > > >> >> + int32_t r[4] = {i0, i1, i2, i3}; > > >> >> + > > >> >> + return vld1q_s32(r); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_loadu_si128(__m128i *p) > > >> >> +{ > > >> >> + return vld1q_s32((int32_t *)p); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_set1_epi32(int i) > > >> >> +{ > > >> >> + return vdupq_n_s32(i); > > >> >> +} > > >> >> + > > >> >> +static __inline __m128i > > >> >> +_mm_and_si128(__m128i a, __m128i b) > > >> >> +{ > > >> >> + return vandq_s32(a, b); > > >> >> +} > > >> >> + > > > > > > IMO, it's not always good to emulate GCC defined intrinsics of > > > other architecture. What if a legacy DPDK application has such mappings > > > then BOOM, multiple definition, which one is correct? which one > > > to comment it out? Integration pain starts for DPDK library consumer:-( > > > > > They can include rte_vect.h in build/include directly, which is linked correctly > > to the one for that ARCH, so there is no need to worry about. > > > > > > >> > > > >> > IMO, it makes sense to not emulate the SSE intrinsics with NEON > > >> > Let's create the rte_vect_* as required. look at the existing patch. > > >> > > > >> I thought of creating a layer of SIMD over all the platforms before. > > >> But can't you see it make things complicated, considering there are > > >> only few simple intrinsic to implement? > > > > > > Not true, There were, a lot of SSE intrinsics needs be to emulated for ACL NEON > > > implementation if I were to take this approach and emulation comes with > > > the cost. > > > > > No, I will not re-implement all the intrinsic like that . > > I only do with the simple intrinsic, such as load/store, as you said below. > > > > > So my take is, > > > lets the each architecture implementation for specific SIMD version of DPDK > > > API in the library should have the freedom to implement the API in > > > NATIVE. > > > > > > And let's create only rte_vect_* abstraction only for using > > > that API/library. Which boils down to have very minimal rte_vect_* > > > abstraction to load, store, set not beyond that. > > > > > > This makes clear "contract" between DPDK library and the applications. > > > and make easy for remaning new architecture porting effort in DPDK. > > > > > Agree. > > But I reuse existing intrinsic names, and you recreate new ones. > > And I try to do as few changes as possible, and try to avoid any > > mistaken which may cause code un-compiled. > > I think it's design level question, we need to hear what others talk about it. > > > > > Imagine how your proposed function will look like if new architecture > > > wants to implement "optimized" version of rte_lpm_lookupx4 > > > > > There is no optimization for this (simple) rte_lpm_lookupx4, otherwise > > you have done that in your patch. > > If there is for other new platform, defintely they should do like > > yours, as you did for NEON ACL. > > > > > > > >> If do so, we also need to explain to others how to use these interfaces. > > >> Besides, this patch did the smallest changes to the original code, and > > >> more likely to be accepted by others. > > > > > > other patch makes no changes to IA version of rte_lpm_lookupx4.I thought > > > that make reviewer easy to review the changes in architecture > > > perspective. > > > > > As I know, they don't enable LPM for PPC, and ARM is the first one to > > touch this issue. > > > > >> > > >> > > > >> >> #ifdef RTE_ARCH_ARM > > >> >> /* NEON intrinsic vqtbl1q_u8() is not supported in ARMv7-A(AArch32) */ > > >> >> static __inline uint8x16_t > > >> >> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > > >> >> index c299ce2..c76c07d 100644 > > >> >> --- a/lib/librte_lpm/rte_lpm.h > > >> >> +++ b/lib/librte_lpm/rte_lpm.h > > >> >> @@ -361,6 +361,47 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, > > >> >> /* Mask four results. */ > > >> >> #define RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff) > > >> >> > > >> >> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > > >> > > > >> > Separate out arm implementation to the different header file. > > >> > Too many ifdef looks odd in the header file and difficult to manage. > > >> > > > >> But there are many ifdefs already. > > >> And It seems unreasonable to add a new file only for one small function. > > >> > > > > > > small or big, its matter of each architecture to have > > > the freedom for the optimized version for the implementation. > > > > > > What if other architecture demands to write this function in assembly > > > or restructure it for performance improvement? > > > > > If there is such demands, should do like that. > > But I don't see any restructure in your patch, and you still follow > > the logic as x86, is it worth adding a new file? > > > > My preference would also be to put architecture dependent implementation > into different files. > Might be create lib/librte_lpm/arch/(arm|x86)/... here? > Konstantin +1 my existing patch creates lib/librte_lpm/rte_lpm_neon.h instead of lib/librte_lpm/arch/arm/rte_lpm_neon.h like lib/librte_hash/rte_cmp_x86.h I am OK for changing the directory structure as proposed in my next revision of patch. Let me know if anyone has any objections/concerns. Jerin