From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 40DC61F7 for ; Fri, 6 Feb 2015 16:20:12 +0100 (CET) Received: from was59-1-82-226-113-214.fbx.proxad.net ([82.226.113.214] helo=[192.168.0.10]) by mail.droids-corp.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1YJkl5-0003ZY-Nm; Fri, 06 Feb 2015 16:23:58 +0100 Message-ID: <54D4DB9F.6080601@6wind.com> Date: Fri, 06 Feb 2015 16:19:59 +0100 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 MIME-Version: 1.0 To: Cunming Liang , dev@dpdk.org References: <1422491072-5114-1-git-send-email-cunming.liang@intel.com> <1422842559-13617-1-git-send-email-cunming.liang@intel.com> <1422842559-13617-17-git-send-email-cunming.liang@intel.com> In-Reply-To: <1422842559-13617-17-git-send-email-cunming.liang@intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Feb 2015 15:20:12 -0000 Hi, On 02/02/2015 03:02 AM, Cunming Liang wrote: > Add a sched_yield() syscall if the thread spins for too long, waiting other thread to finish its operations on the ring. > That gives pre-empted thread a chance to proceed and finish with ring enqnue/dequeue operation. > The purpose is to reduce contention on the ring. > > Signed-off-by: Cunming Liang > --- > lib/librte_ring/rte_ring.h | 35 +++++++++++++++++++++++++++++------ > 1 file changed, 29 insertions(+), 6 deletions(-) > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > index 39bacdd..c402c73 100644 > --- a/lib/librte_ring/rte_ring.h > +++ b/lib/librte_ring/rte_ring.h > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats { > > #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. */ > #define RTE_RING_MZ_PREFIX "RG_" > +#define RTE_RING_PAUSE_REP 0x100 /**< yield after num of times pause. */ > > /** > * An RTE ring structure. > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table, > uint32_t cons_tail, free_entries; > const unsigned max = n; > int success; > - unsigned i; > + unsigned i, rep; > uint32_t mask = r->prod.mask; > int ret; > > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void * const *obj_table, > * If there are other enqueues in progress that preceded us, > * we need to wait for them to complete > */ > - while (unlikely(r->prod.tail != prod_head)) > - rte_pause(); > + do { > + /* avoid spin too long waiting for other thread finish */ > + for (rep = RTE_RING_PAUSE_REP; > + rep != 0 && r->prod.tail != prod_head; rep--) > + rte_pause(); > + > + /* > + * It gives pre-empted thread a chance to proceed and > + * finish with ring enqnue operation. > + */ > + if (rep == 0) > + sched_yield(); > + } while (rep == 0); > > r->prod.tail = prod_next; > return ret; > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table, > uint32_t cons_next, entries; > const unsigned max = n; > int success; > - unsigned i; > + unsigned i, rep; > uint32_t mask = r->prod.mask; > > /* move cons.head atomically */ > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void **obj_table, > * If there are other dequeues in progress that preceded us, > * we need to wait for them to complete > */ > - while (unlikely(r->cons.tail != cons_head)) > - rte_pause(); > + do { > + /* avoid spin too long waiting for other thread finish */ > + for (rep = RTE_RING_PAUSE_REP; > + rep != 0 && r->cons.tail != cons_head; rep--) > + rte_pause(); > + > + /* > + * It gives pre-empted thread a chance to proceed and > + * finish with ring denqnue operation. > + */ > + if (rep == 0) > + sched_yield(); > + } while (rep == 0); > > __RING_STAT_ADD(r, deq_success, n); > r->cons.tail = cons_next; > The ring library was designed with the assumption that the code is not preemptable. The code is lock-less but not wait-less. Actually, if the code is preempted at a bad moment, it can spin forever until it's unscheduled. I wonder if adding a sched_yield() may not penalize the current implementations that only use one pthread per core? Even if there is only one pthread in the scheduler queue for this CPU, calling the scheduler code may cost thousands of cycles. Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from? Why 0x100 is better than 42 or than 10000? I think it could be good to check if there is a performance impact with this change, especially where there is a lot of contention on the ring. If it has an impact, what about adding a compile or runtime option? Regards, Olivier