From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 7B5E22D13 for ; Fri, 25 Jan 2019 18:42:58 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jan 2019 09:42:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,522,1539673200"; d="scan'208";a="128908513" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga002.jf.intel.com with ESMTP; 25 Jan 2019 09:42:54 -0800 Received: from fmsmsx161.amr.corp.intel.com (10.18.125.9) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 25 Jan 2019 09:42:54 -0800 Received: from fmsmsx108.amr.corp.intel.com ([169.254.9.99]) by FMSMSX161.amr.corp.intel.com ([169.254.12.181]) with mapi id 14.03.0415.000; Fri, 25 Jan 2019 09:42:54 -0800 From: "Eads, Gage" To: Honnappa Nagarahalli , "dev@dpdk.org" CC: "olivier.matz@6wind.com" , "arybchenko@solarflare.com" , "Richardson, Bruce" , "Ananyev, Konstantin" , "stephen@networkplumber.org" , nd , "thomas@monjalon.net" , Ola Liljedahl , "Gavin Hu (Arm Technology China)" , "Song Zhu (Arm Technology China)" , nd Thread-Topic: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring Thread-Index: AQHUr0HtVw1YlpRb8kGtX078AL74eaW/dt8QgADPzHA= Date: Fri, 25 Jan 2019 17:42:53 +0000 Message-ID: <9184057F7FC11744A2107296B6B8EB1E541CB743@FMSMSX108.amr.corp.intel.com> References: <20190115235227.14013-1-gage.eads@intel.com> <20190118152326.22686-1-gage.eads@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOWNjZmM3MmQtMGZjMC00NTkwLTkwZmUtYTMyNWVmNTYwYmU2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiaFIzMlNXV3hobWQrdkh0V3QzUVNTZVB3WDlHV1RUQjJzWWNLN2FMRUZxYk9BOFUrdEdsbzU3Y1dRbEd1SXR3OSJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.1.200.107] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jan 2019 17:42:59 -0000 Hi Honnappa, Works for me -- I'm in favor of the best performing implementation, whoever= provides it. To allow an apples-to-apples comparison, I suggest Ola's/ARM's implementati= on be made to fit into the rte_ring API with an associated mempool handler.= That'll allow us to use the existing ring and mempool performance tests as= well. Feel free to use code from this patchset for the rte_ring integratio= n, if that helps, of course. I expect to have v4 available within the next week. Thanks, Gage > -----Original Message----- > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > Sent: Thursday, January 24, 2019 11:21 PM > To: Eads, Gage ; dev@dpdk.org > Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Richardson, Bruce > ; Ananyev, Konstantin > ; stephen@networkplumber.org; nd > ; thomas@monjalon.net; Ola Liljedahl > ; Gavin Hu (Arm Technology China) > ; Song Zhu (Arm Technology China) > ; nd > Subject: RE: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring >=20 > Hi Gage, > Thank you for this patch. Arm (Ola Liljedahl) had worked on a non- > blocking ring algorithm. We were planning to add it to DPDK at some point= this > year. I am wondering if you would be open to take a look at the algorithm= and > collaborate? >=20 > I am yet to fully understand both the algorithms. But, Ola has reviewed y= our > patch and can provide a quick overview of the differences here. >=20 > If you agree, we can send a RFC patch. You can review that and do perform= ance > benchmarking on your platforms. I can also benchmark your patch (may be o= nce > you fix the issue identified in __rte_ring_do_nb_enqueue_mp function?) o= n Arm > platforms. May be we can end up with a better combined algorithm. >=20 > Hi Thomas/Bruce, > Please let me know if this is ok and if there is a better way to do this= . >=20 > Thank you, > Honnappa >=20 > > -----Original Message----- > > From: dev On Behalf Of Gage Eads > > Sent: Friday, January 18, 2019 9:23 AM > > To: dev@dpdk.org > > Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; > > bruce.richardson@intel.com; konstantin.ananyev@intel.com; > > stephen@networkplumber.org > > Subject: [dpdk-dev] [PATCH v3 0/5] Add non-blocking ring > > > > For some users, the rte ring's "non-preemptive" constraint is not > > acceptable; for example, if the application uses a mixture of pinned > > high-priority threads and multiplexed low-priority threads that share a > mempool. > > > > This patchset introduces a non-blocking ring, on top of which a > > mempool can run. > > Crucially, the non-blocking algorithm relies on a 128-bit > > compare-and-swap, so it is currently limited to x86_64 machines. This > > is also an experimental API, so RING_F_NB users must build with the > ALLOW_EXPERIMENTAL_API flag. > > > > The ring uses more compare-and-swap atomic operations than the regular > > rte > > ring: > > With no contention, an enqueue of n pointers uses (1 + 2n) CAS > > operations and a dequeue of n pointers uses 2. This algorithm has > > worse average-case performance than the regular rte ring (particularly > > a highly-contended ring with large bulk accesses), however: > > - For applications with preemptible pthreads, the regular rte ring's wo= rst-case > > performance (i.e. one thread being preempted in the update_tail() cri= tical > > section) is much worse than the non-blocking ring's. > > - Software caching can mitigate the average case performance for ring-b= ased > > algorithms. For example, a non-blocking ring based mempool (a likely > > use case > > for this ring) with per-thread caching. > > > > The non-blocking ring is enabled via a new flag, RING_F_NB. For > > ease-of-use, existing ring enqueue/dequeue functions work with both > > "regular" and non- blocking rings. > > > > This patchset also adds non-blocking versions of ring_autotest and > > ring_perf_autotest, and a non-blocking ring based mempool. > > > > This patchset makes one API change; a deprecation notice will be > > posted in a separate commit. > > > > This patchset depends on the non-blocking stack patchset[1]. > > > > [1] http://mails.dpdk.org/archives/dev/2019-January/123653.html > > > > v3: > > - Avoid the ABI break by putting 64-bit head and tail values in the sa= me > > cacheline as struct rte_ring's prod and cons members. > > - Don't attempt to compile rte_atomic128_cmpset without > > ALLOW_EXPERIMENTAL_API, as this would break a large number of librar= ies. > > - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case > > someone tries > > to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. > > - Update the ring mempool to use experimental APIs > > - Clarify that RINB_F_NB is only limited to x86_64 currently; > > ARMv8.1-A builds > > can eventually support it with the CASP instruction. > > > > v2: > > - Merge separate docs commit into patch #5 > > - Convert uintptr_t to size_t > > - Add a compile-time check for the size of size_t > > - Fix a space-after-typecast issue > > - Fix an unnecessary-parentheses checkpatch warning > > - Bump librte_ring's library version > > > > Gage Eads (5): > > ring: add 64-bit headtail structure > > ring: add a non-blocking implementation > > test_ring: add non-blocking ring autotest > > test_ring_perf: add non-blocking ring perf test > > mempool/ring: add non-blocking ring handlers > > > > doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- > > drivers/mempool/ring/Makefile | 1 + > > drivers/mempool/ring/meson.build | 2 + > > drivers/mempool/ring/rte_mempool_ring.c | 58 ++- > > lib/librte_eventdev/rte_event_ring.h | 2 +- > > lib/librte_ring/Makefile | 3 +- > > lib/librte_ring/rte_ring.c | 72 ++- > > lib/librte_ring/rte_ring.h | 574 ++++++++++++++++= ++++++-- > > lib/librte_ring/rte_ring_generic_64.h | 152 +++++++ > > lib/librte_ring/rte_ring_version.map | 7 + > > test/test/test_ring.c | 57 ++- > > test/test/test_ring_perf.c | 19 +- > > 12 files changed, 874 insertions(+), 75 deletions(-) create mode > > 100644 lib/librte_ring/rte_ring_generic_64.h > > > > -- > > 2.13.6