From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id E0FA94CC3 for ; Thu, 28 Mar 2019 19:01:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2019 11:01:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,281,1549958400"; d="scan'208";a="146081576" Received: from txasoft-yocto.an.intel.com ([10.123.72.192]) by orsmga002.jf.intel.com with ESMTP; 28 Mar 2019 11:01:14 -0700 From: Gage Eads To: dev@dpdk.org Cc: olivier.matz@6wind.com, arybchenko@solarflare.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, nd@arm.com, thomas@monjalon.net Date: Thu, 28 Mar 2019 13:00:15 -0500 Message-Id: <20190328180015.13878-9-gage.eads@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190328180015.13878-1-gage.eads@intel.com> References: <20190306144559.391-1-gage.eads@intel.com> <20190328180015.13878-1-gage.eads@intel.com> Subject: [dpdk-dev] [PATCH v4 8/8] mempool/stack: add lock-free stack mempool handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Mar 2019 18:01:16 -0000 This commit adds support for lock-free (linked list based) stack mempool handler. In mempool_perf_autotest the lock-based stack outperforms the lock-free handler for certain lcore/alloc count/free count combinations*, however: - For applications with preemptible pthreads, a standard (lock-based) stack's worst-case performance (i.e. one thread being preempted while holding the spinlock) is much worse than the lock-free stack's. - Using per-thread mempool caches will largely mitigate the performance difference. *Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4, running on isolcpus cores with a tickless scheduler. The lock-based stack's rate_persec was 0.6x-3.5x the lock-free stack's. Signed-off-by: Gage Eads Reviewed-by: Olivier Matz --- doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++++ doc/guides/rel_notes/release_19_05.rst | 5 +++++ drivers/mempool/stack/rte_mempool_stack.c | 26 +++++++++++++++++++++++-- 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 2361c3b8f..d22f72f65 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -563,6 +563,16 @@ Known Issues 5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR. + Alternatively, applications can use the lock-free stack mempool handler. When + considering this handler, note that: + + - It is currently limited to the x86_64 platform, because it uses an + instruction (16-byte compare-and-swap) that is not yet available on other + platforms. + - It has worse average-case performance than the non-preemptive rte_ring, but + software caching (e.g. the mempool cache) can mitigate this by reducing the + number of stack accesses. + + rte_timer Running ``rte_timer_manage()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed. diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst index 96e851e13..9e56d1058 100644 --- a/doc/guides/rel_notes/release_19_05.rst +++ b/doc/guides/rel_notes/release_19_05.rst @@ -114,6 +114,11 @@ New Features The library supports two stack implementations: standard (lock-based) and lock-free. The lock-free implementation is currently limited to x86-64 platforms. +* **Added Lock-Free Stack Mempool Handler.** + + Added a new lock-free stack handler, which uses the newly added stack + library. + Removed Items ------------- diff --git a/drivers/mempool/stack/rte_mempool_stack.c b/drivers/mempool/stack/rte_mempool_stack.c index 25ccdb9af..7e85c8d6b 100644 --- a/drivers/mempool/stack/rte_mempool_stack.c +++ b/drivers/mempool/stack/rte_mempool_stack.c @@ -7,7 +7,7 @@ #include static int -stack_alloc(struct rte_mempool *mp) +__stack_alloc(struct rte_mempool *mp, uint32_t flags) { char name[RTE_STACK_NAMESIZE]; struct rte_stack *s; @@ -20,7 +20,7 @@ stack_alloc(struct rte_mempool *mp) return -rte_errno; } - s = rte_stack_create(name, mp->size, mp->socket_id, 0); + s = rte_stack_create(name, mp->size, mp->socket_id, flags); if (s == NULL) return -rte_errno; @@ -30,6 +30,18 @@ stack_alloc(struct rte_mempool *mp) } static int +stack_alloc(struct rte_mempool *mp) +{ + return __stack_alloc(mp, 0); +} + +static int +lf_stack_alloc(struct rte_mempool *mp) +{ + return __stack_alloc(mp, RTE_STACK_F_LF); +} + +static int stack_enqueue(struct rte_mempool *mp, void * const *obj_table, unsigned int n) { @@ -72,4 +84,14 @@ static struct rte_mempool_ops ops_stack = { .get_count = stack_get_count }; +static struct rte_mempool_ops ops_lf_stack = { + .name = "lf_stack", + .alloc = lf_stack_alloc, + .free = stack_free, + .enqueue = stack_enqueue, + .dequeue = stack_dequeue, + .get_count = stack_get_count +}; + MEMPOOL_REGISTER_OPS(ops_stack); +MEMPOOL_REGISTER_OPS(ops_lf_stack); -- 2.13.6 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id E278EA0679 for ; Thu, 28 Mar 2019 19:02:59 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2D0FF1B456; Thu, 28 Mar 2019 19:01:37 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id E0FA94CC3 for ; Thu, 28 Mar 2019 19:01:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2019 11:01:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,281,1549958400"; d="scan'208";a="146081576" Received: from txasoft-yocto.an.intel.com ([10.123.72.192]) by orsmga002.jf.intel.com with ESMTP; 28 Mar 2019 11:01:14 -0700 From: Gage Eads To: dev@dpdk.org Cc: olivier.matz@6wind.com, arybchenko@solarflare.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, nd@arm.com, thomas@monjalon.net Date: Thu, 28 Mar 2019 13:00:15 -0500 Message-Id: <20190328180015.13878-9-gage.eads@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190328180015.13878-1-gage.eads@intel.com> References: <20190306144559.391-1-gage.eads@intel.com> <20190328180015.13878-1-gage.eads@intel.com> Subject: [dpdk-dev] [PATCH v4 8/8] mempool/stack: add lock-free stack mempool handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Content-Type: text/plain; charset="UTF-8" Message-ID: <20190328180015.AU3A43MaX7TwcoMuiSQZ2Qe3DB5LpdGZ7vdS5fqYl08@z> This commit adds support for lock-free (linked list based) stack mempool handler. In mempool_perf_autotest the lock-based stack outperforms the lock-free handler for certain lcore/alloc count/free count combinations*, however: - For applications with preemptible pthreads, a standard (lock-based) stack's worst-case performance (i.e. one thread being preempted while holding the spinlock) is much worse than the lock-free stack's. - Using per-thread mempool caches will largely mitigate the performance difference. *Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4, running on isolcpus cores with a tickless scheduler. The lock-based stack's rate_persec was 0.6x-3.5x the lock-free stack's. Signed-off-by: Gage Eads Reviewed-by: Olivier Matz --- doc/guides/prog_guide/env_abstraction_layer.rst | 10 ++++++++++ doc/guides/rel_notes/release_19_05.rst | 5 +++++ drivers/mempool/stack/rte_mempool_stack.c | 26 +++++++++++++++++++++++-- 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 2361c3b8f..d22f72f65 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -563,6 +563,16 @@ Known Issues 5. It MUST not be used by multi-producer/consumer pthreads, whose scheduling policies are SCHED_FIFO or SCHED_RR. + Alternatively, applications can use the lock-free stack mempool handler. When + considering this handler, note that: + + - It is currently limited to the x86_64 platform, because it uses an + instruction (16-byte compare-and-swap) that is not yet available on other + platforms. + - It has worse average-case performance than the non-preemptive rte_ring, but + software caching (e.g. the mempool cache) can mitigate this by reducing the + number of stack accesses. + + rte_timer Running ``rte_timer_manage()`` on a non-EAL pthread is not allowed. However, resetting/stopping the timer from a non-EAL pthread is allowed. diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst index 96e851e13..9e56d1058 100644 --- a/doc/guides/rel_notes/release_19_05.rst +++ b/doc/guides/rel_notes/release_19_05.rst @@ -114,6 +114,11 @@ New Features The library supports two stack implementations: standard (lock-based) and lock-free. The lock-free implementation is currently limited to x86-64 platforms. +* **Added Lock-Free Stack Mempool Handler.** + + Added a new lock-free stack handler, which uses the newly added stack + library. + Removed Items ------------- diff --git a/drivers/mempool/stack/rte_mempool_stack.c b/drivers/mempool/stack/rte_mempool_stack.c index 25ccdb9af..7e85c8d6b 100644 --- a/drivers/mempool/stack/rte_mempool_stack.c +++ b/drivers/mempool/stack/rte_mempool_stack.c @@ -7,7 +7,7 @@ #include static int -stack_alloc(struct rte_mempool *mp) +__stack_alloc(struct rte_mempool *mp, uint32_t flags) { char name[RTE_STACK_NAMESIZE]; struct rte_stack *s; @@ -20,7 +20,7 @@ stack_alloc(struct rte_mempool *mp) return -rte_errno; } - s = rte_stack_create(name, mp->size, mp->socket_id, 0); + s = rte_stack_create(name, mp->size, mp->socket_id, flags); if (s == NULL) return -rte_errno; @@ -30,6 +30,18 @@ stack_alloc(struct rte_mempool *mp) } static int +stack_alloc(struct rte_mempool *mp) +{ + return __stack_alloc(mp, 0); +} + +static int +lf_stack_alloc(struct rte_mempool *mp) +{ + return __stack_alloc(mp, RTE_STACK_F_LF); +} + +static int stack_enqueue(struct rte_mempool *mp, void * const *obj_table, unsigned int n) { @@ -72,4 +84,14 @@ static struct rte_mempool_ops ops_stack = { .get_count = stack_get_count }; +static struct rte_mempool_ops ops_lf_stack = { + .name = "lf_stack", + .alloc = lf_stack_alloc, + .free = stack_free, + .enqueue = stack_enqueue, + .dequeue = stack_dequeue, + .get_count = stack_get_count +}; + MEMPOOL_REGISTER_OPS(ops_stack); +MEMPOOL_REGISTER_OPS(ops_lf_stack); -- 2.13.6