From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E4062A317C for ; Thu, 17 Oct 2019 22:17:04 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 691591C1BD; Thu, 17 Oct 2019 22:17:04 +0200 (CEST) Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30065.outbound.protection.outlook.com [40.107.3.65]) by dpdk.org (Postfix) with ESMTP id 6F7B81C0D4 for ; Thu, 17 Oct 2019 22:17:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MPxfyESC0jRQZWf6Q18YLRiU2x5DmFzU+C+t38X3KmQ=; b=hIgp8M8wSiPXTcgPpea/qIThIBv8xAyP6RaxA/vD+Jy8NgJu8x7nDufmgIqjvgzTUEia9CRfox0NeXAFCR6vUXgCZ3NQFrCJZAmVrHDoaWTBy+Q6uIqlKcMnTHL97Gc+SLwG1sJ1sq2RRHe6B7vQjRFp4mezLGHfGlz6fCs4h5o= Received: from VI1PR0802CA0033.eurprd08.prod.outlook.com (2603:10a6:800:a9::19) by AM0PR08MB3442.eurprd08.prod.outlook.com (2603:10a6:208:d7::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.18; Thu, 17 Oct 2019 20:17:00 +0000 Received: from VE1EUR03FT056.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e09::208) by VI1PR0802CA0033.outlook.office365.com (2603:10a6:800:a9::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2347.18 via Frontend Transport; Thu, 17 Oct 2019 20:17:00 +0000 Authentication-Results: spf=temperror (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dpdk.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dpdk.org; dmarc=none action=none header.from=arm.com; Received-SPF: TempError (protection.outlook.com: error in processing during lookup of arm.com: DNS Timeout) Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT056.mail.protection.outlook.com (10.152.19.28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2305.15 via Frontend Transport; Thu, 17 Oct 2019 20:16:58 +0000 Received: ("Tessian outbound 851a1162fca7:v33"); Thu, 17 Oct 2019 20:16:52 +0000 X-CR-MTA-TID: 64aa7808 Received: from b049fa78f013.2 (ip-172-16-0-2.eu-west-1.compute.internal [104.47.1.50]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id FF6600AA-8F42-4A34-BB90-02BC0F63EBF3.1; Thu, 17 Oct 2019 20:16:47 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01lp2050.outbound.protection.outlook.com [104.47.1.50]) by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id b049fa78f013.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 17 Oct 2019 20:16:47 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XTxW82RJLvtYnx/sCTa6qEtbE3VEUGaCgqM+ctwxcgcSW8VQ25MFL8Q+EfPI42UU0EjEc0Yo3X80gRj2Aj8/MlCB+VnrE1N89oJy3+7ptan4REQX4Om5iaadJOArsjtc2sa9CkdlZC429nNFRx/xwlGRk5xbPq3vtadp7kJDRCjQZ9Z4Dlun9UX8uwK9/p0QeNg9Q7rqruULgYabrzaLPg29qYqr7FIW//RvyRNAEmUeZEvxYy6sTwGRTaAg9gxqqcfFU/qm1RwXMpxGLZEvGe3u4k2/wwzLVAj7NRWNSeo+4AU1RLLnMLiZ/aLrgKvfyo1pzfXeH15c5klBSJ6hBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MPxfyESC0jRQZWf6Q18YLRiU2x5DmFzU+C+t38X3KmQ=; b=JENNklKUY1RkpHkUKx526jiCfo6mDOJDmb7QETVmkvAMHElqUcIC2EPr4NeAGaWK7eM/XP3MoFqbaOJieNtGwDNQBy7oUDBEfz4dMBQmFx5Qtc3xaDStIIHN6rbiWTYvG3lpIePjF9b5CMAjADODa8WrQWLUl23Uaj1El9ow0Re5RgQIEfwCxThKcgUbonl30Q0jyvzhWRT7FSM5bCpKuygp7tcdPz3Tl7sdNStn/WfSyAddwSzAGwF/CPoJKEmGquUuLfwLuoDqPYZ6hGwIp6LyBggDQk21FsQRybWTH38tgXloNQtkk/oPspO0NkFAWgB/DjhjE5KIBFFAH5n7kQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=MPxfyESC0jRQZWf6Q18YLRiU2x5DmFzU+C+t38X3KmQ=; b=hIgp8M8wSiPXTcgPpea/qIThIBv8xAyP6RaxA/vD+Jy8NgJu8x7nDufmgIqjvgzTUEia9CRfox0NeXAFCR6vUXgCZ3NQFrCJZAmVrHDoaWTBy+Q6uIqlKcMnTHL97Gc+SLwG1sJ1sq2RRHe6B7vQjRFp4mezLGHfGlz6fCs4h5o= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.27) by VE1PR08MB5262.eurprd08.prod.outlook.com (20.179.31.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.16; Thu, 17 Oct 2019 20:16:45 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::8c82:8d9c:c78d:22a6]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::8c82:8d9c:c78d:22a6%7]) with mapi id 15.20.2347.023; Thu, 17 Oct 2019 20:16:45 +0000 From: Honnappa Nagarahalli To: "Ananyev, Konstantin" , "olivier.matz@6wind.com" , "sthemmin@microsoft.com" , "jerinj@marvell.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "pbhagavatula@marvell.com" , David Christensen CC: "dev@dpdk.org" , Dharmik Thakkar , "Ruifeng Wang (Arm Technology China)" , "Gavin Hu (Arm Technology China)" , "stephen@networkplumber.org" , nd , nd Thread-Topic: [PATCH v4 1/2] lib/ring: apis to support configurable element size Thread-Index: AQHVgGkjycLlQ66roUqbf1hqE7rJk6dajggAgAACWdCAAOYoAIACzYSQgAB9m4CAAIwqUA== Date: Thu, 17 Oct 2019 20:16:45 +0000 Message-ID: References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB97725801A8C68545@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C6A2DA@IRSMSX104.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB97725801A8C6A2DA@IRSMSX104.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 1291ae4a-00f9-4846-9a4d-d4479dcf025a.0 x-checkrecipientchecked: true Authentication-Results-Original: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: 83c333a9-8f22-4966-9f36-08d7533ef697 X-MS-Office365-Filtering-HT: Tenant X-MS-TrafficTypeDiagnostic: VE1PR08MB5262:|VE1PR08MB5262:|AM0PR08MB3442: x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-ms-exchange-transport-forked: True X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; x-forefront-prvs: 01930B2BA8 X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009020)(4636009)(346002)(136003)(376002)(396003)(366004)(39860400002)(189003)(199004)(51234002)(9686003)(446003)(6246003)(2201001)(2906002)(7416002)(52536014)(74316002)(229853002)(86362001)(33656002)(71200400001)(71190400001)(305945005)(6436002)(55016002)(14454004)(11346002)(476003)(25786009)(4326008)(478600001)(6116002)(3846002)(8676002)(486006)(8936002)(256004)(76116006)(186003)(66066001)(99286004)(26005)(6506007)(102836004)(64756008)(316002)(7696005)(1511001)(5660300002)(81156014)(66446008)(66946007)(2501003)(110136005)(7736002)(54906003)(76176011)(81166006)(66476007)(66556008)(921003)(1121003); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB5262; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: EfFmT1WzSt+z5yGvg/IzqW/4dXOZiFQwI6Hx+/OP7ExsKubAz5oliljAlNPqI3JbciCXnkhmMJGrzOBhC/jf7APG325q+IWx0WJ1GnvLjSoNWcUAHlDX8elYKsbMBZQUxG33Xc2PTbRYJ4zm+bkSmQiuVsxOCyuIMPOf+XN3os0xXsiX/OppS0Y3AawJwBS8HkSQ6qiRJMWx5MkzOHV/DuiWVaO1F7q+Tc1TCgRWk3Kq5aIteBI1JLSTvciyWLUAUTuYDcdyQrgNn1rtBVRpas+DuPqoJ2fWQ9PmI5aVfNx93DMgIjZsqBNkEwf2RCOO1ju2jyoopIdGosDYoIMLBxTBT4HxE6NcthyODIWVCL5IKCsySuxKZ69qD14y80Cyqtbfl9AS+m/jk4+gaI7BQlZhbPPHQFJG+R7qkrHqaKs= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5262 Original-Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT056.eop-EUR03.prod.protection.outlook.com X-Forefront-Antispam-Report: CIP:63.35.35.123; IPV:CAL; SCL:-1; CTRY:IE; EFV:NLI; SFV:NSPM; SFS:(10009020)(4636009)(136003)(396003)(346002)(39860400002)(376002)(189003)(199004)(51234002)(11346002)(25786009)(50466002)(23726003)(7696005)(81156014)(186003)(76130400001)(86362001)(3846002)(6116002)(476003)(63350400001)(70206006)(76176011)(6506007)(70586007)(99286004)(126002)(2201001)(8676002)(2906002)(81166006)(486006)(4326008)(52536014)(102836004)(446003)(8746002)(46406003)(8936002)(66066001)(7736002)(9686003)(305945005)(356004)(6246003)(36906005)(336012)(33656002)(229853002)(47776003)(5660300002)(26826003)(74316002)(1511001)(478600001)(22756006)(54906003)(97756001)(26005)(316002)(110136005)(55016002)(14454004)(2501003)(921003)(1121003); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB3442; H:64aa7808-outbound-1.mta.getcheckrecipient.com; FPR:; SPF:TempError; LANG:en; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; MX:1; A:1; X-MS-Office365-Filtering-Correlation-Id-Prvs: 75ba98c4-1df7-4a76-141b-08d7533eeec5 NoDisclaimer: True X-Forefront-PRVS: 01930B2BA8 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mEK8kegaVXWxJSjClwcuFJeFIfMddrn2qnNtLIPNekzjz2iS6sfuJuF4VBx551pemjdsti6nN4oFz22DbfSh5hI9rvKx27Q5RXXzGZACKyVH13k73L+NvWSOU0P2R1zCEuga9sCw4mtTnUGmOGbG2rpl4VRyhgcox/0DSrxMKXKwgoCDI4avl8OKHgmP9nmGxP6nK21TJ93GtK9n3PwUZogONKxhgLZU0jJmxWYa2F9pHtcftPq7txsz6adTEqzQhmoNbjj13XukfexPKd0DX3WhI5wbc8Kss3WVWyvz3Yxk3UyGRkZTkCvhBXC0xd3Ya1IO/DyonYO9Mo4rwU491YbdWRE3XJe5BVfq4swBLNml4SF+34SwxjM38jWx+ct8YKWFBbbL0SLHyV60mQBM3POEgx2XCI6yVNQ57nBXzCw= X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Oct 2019 20:16:58.5343 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 83c333a9-8f22-4966-9f36-08d7533ef697 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB3442 Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" + David Christensen for Power architecture > > > > > > > It > > > > would mean extra work for the users. > > > > > > > > > 2. A lot of code duplication with these 3 copies of > > > > > ENQUEUE/DEQUEUE macros. > > > > > > > > > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop > > > > > always does 32B copy per iteration. > > > > Yes, I tried to keep it the same as the existing one (originally, > > > > I guess the intention was to allow for 256b vector instructions to > > > > be > > > > generated) > > > > > > > > > So wonder can we make a generic function that would do 32B copy > > > > > per iteration in a main loop, and copy tail by 4B chunks? > > > > > That would avoid copy duplication and will allow user to have > > > > > any elem size (multiple of 4B) he wants. > > > > > Something like that (note didn't test it, just a rough idea): > > > > > > > > > > static inline void > > > > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num, > > > > > uint32_t > > > > > esize) { > > > > > uint32_t i, sz; > > > > > > > > > > sz =3D (num * esize) / sizeof(uint32_t); > > > > If 'num' is a compile time constant, 'sz' will be a compile time co= nstant. > > > Otherwise, this will result in a multiplication operation. > > > > > > Not always. > > > If esize is compile time constant, then for esize as power of 2 > > > (4,8,16,...), it would be just one shift. > > > For other constant values it could be a 'mul' or in many cases just > > > 2 shifts plus 'add' (if compiler is smart enough). > > > I.E. let say for 24B elem is would be either num * 6 or (num << 2) + > > > (num << 1). > > With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num > > Not sure if the compiler will do this. >=20 > For 15, it can be just (num << 4) - num >=20 > > > > > I suppose for non-power of 2 elems it might be ok to get such small p= erf hit. > > Agree, should be ok not to focus on right now. > > > > > > > > >I have tried > > > > to avoid the multiplication operation and try to use shift and > > > >mask > > > operations (just like how the rest of the ring code does). > > > > > > > > > > > > > > for (i =3D 0; i < (sz & ~7); i +=3D 8) > > > > > memcpy(du32 + i, su32 + i, 8 * > > > > > sizeof(uint32_t)); > > > > I had used memcpy to start with (for the entire copy operation), > > > > performance is not the same for 64b elements when compared with > > > > the > > > existing ring APIs (some cases more and some cases less). > > > > > > I remember that from one of your previous mails, that's why here I > > > suggest to use in a loop memcpy() with fixed size. > > > That way for each iteration complier will replace memcpy() with > > > instructions to copy 32B in a way he thinks is optimal (same as for o= riginal > macro, I think). > > I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are= as > follows. The numbers in brackets are with the code on master. > > gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 > > > > RTE>>ring_perf_elem_autotest > > ### Testing single element and burst enq/deq ### SP/SC single > > enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst > > enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC > > burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size: > > 32): 2 > > > > ### Testing empty dequeue ### > > SC empty dequeue: 2.11 > > MC empty dequeue: 1.41 (2.11) > > > > ### Testing using a single lcore ### > > SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86) MP/MC bulk enq/dequeue > > (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06) > > MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95) > > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: > > 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27) > > SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk enq/dequeue > > (size: 32): 25.74 (20.91) > > > > ### Testing using two NUMA nodes ### > > SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66) MP/MC bulk > > enq/dequeue (size: 8): 176.02 (173.43) SP/SC bulk enq/dequeue (size: > > 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74) > > > > On one of the Arm platform > > MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are > > ok) >=20 > So it shows better numbers for one core, but worse on 2, right? >=20 >=20 > > On another Arm platform, all numbers are same or slightly better. > > > > I can post the patch with this change if you want to run some benchmark= s on > your platform. >=20 > Sure, please do. > I'll try to run on my boxes. Sent v5, please check. Other platform owners should run this as well. >=20 > > I have not used the same code you have suggested, instead I have used t= he > same logic in a single macro with memcpy. > >