From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id D17C95930 for ; Mon, 9 Feb 2015 16:43:14 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga102.jf.intel.com with ESMTP; 09 Feb 2015 07:39:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,544,1418112000"; d="scan'208";a="524811859" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga003.jf.intel.com with ESMTP; 09 Feb 2015 07:35:12 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.81]) by IRSMSX153.ger.corp.intel.com ([169.254.9.2]) with mapi id 14.03.0195.001; Mon, 9 Feb 2015 15:43:11 +0000 From: "Ananyev, Konstantin" To: Olivier MATZ , "Liang, Cunming" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever Thread-Index: AQHQPoxwXJoGXxIJ706oygtFodqga5zjw6SAgASxSrA= Date: Mon, 9 Feb 2015 15:43:10 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213E461C@irsmsx105.ger.corp.intel.com> References: <1422491072-5114-1-git-send-email-cunming.liang@intel.com> <1422842559-13617-1-git-send-email-cunming.liang@intel.com> <1422842559-13617-17-git-send-email-cunming.liang@intel.com> <54D4DB9F.6080601@6wind.com> In-Reply-To: <54D4DB9F.6080601@6wind.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin forever X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Feb 2015 15:43:15 -0000 Hi Olivier, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ > Sent: Friday, February 06, 2015 3:20 PM > To: Liang, Cunming; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid s= pin forever >=20 > Hi, >=20 > On 02/02/2015 03:02 AM, Cunming Liang wrote: > > Add a sched_yield() syscall if the thread spins for too long, waiting o= ther thread to finish its operations on the ring. > > That gives pre-empted thread a chance to proceed and finish with ring e= nqnue/dequeue operation. > > The purpose is to reduce contention on the ring. > > > > Signed-off-by: Cunming Liang > > --- > > lib/librte_ring/rte_ring.h | 35 +++++++++++++++++++++++++++++------ > > 1 file changed, 29 insertions(+), 6 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h > > index 39bacdd..c402c73 100644 > > --- a/lib/librte_ring/rte_ring.h > > +++ b/lib/librte_ring/rte_ring.h > > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats { > > > > #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. *= / > > #define RTE_RING_MZ_PREFIX "RG_" > > +#define RTE_RING_PAUSE_REP 0x100 /**< yield after num of times pause.= */ > > > > /** > > * An RTE ring structure. > > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void *= const *obj_table, > > uint32_t cons_tail, free_entries; > > const unsigned max =3D n; > > int success; > > - unsigned i; > > + unsigned i, rep; > > uint32_t mask =3D r->prod.mask; > > int ret; > > > > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void = * const *obj_table, > > * If there are other enqueues in progress that preceded us, > > * we need to wait for them to complete > > */ > > - while (unlikely(r->prod.tail !=3D prod_head)) > > - rte_pause(); > > + do { > > + /* avoid spin too long waiting for other thread finish */ > > + for (rep =3D RTE_RING_PAUSE_REP; > > + rep !=3D 0 && r->prod.tail !=3D prod_head; rep--) > > + rte_pause(); > > + > > + /* > > + * It gives pre-empted thread a chance to proceed and > > + * finish with ring enqnue operation. > > + */ > > + if (rep =3D=3D 0) > > + sched_yield(); > > + } while (rep =3D=3D 0); > > > > r->prod.tail =3D prod_next; > > return ret; > > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void *= *obj_table, > > uint32_t cons_next, entries; > > const unsigned max =3D n; > > int success; > > - unsigned i; > > + unsigned i, rep; > > uint32_t mask =3D r->prod.mask; > > > > /* move cons.head atomically */ > > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void = **obj_table, > > * If there are other dequeues in progress that preceded us, > > * we need to wait for them to complete > > */ > > - while (unlikely(r->cons.tail !=3D cons_head)) > > - rte_pause(); > > + do { > > + /* avoid spin too long waiting for other thread finish */ > > + for (rep =3D RTE_RING_PAUSE_REP; > > + rep !=3D 0 && r->cons.tail !=3D cons_head; rep--) > > + rte_pause(); > > + > > + /* > > + * It gives pre-empted thread a chance to proceed and > > + * finish with ring denqnue operation. > > + */ > > + if (rep =3D=3D 0) > > + sched_yield(); > > + } while (rep =3D=3D 0); > > > > __RING_STAT_ADD(r, deq_success, n); > > r->cons.tail =3D cons_next; > > >=20 > The ring library was designed with the assumption that the code is not > preemptable. The code is lock-less but not wait-less. Actually, if the > code is preempted at a bad moment, it can spin forever until it's > unscheduled. >=20 > I wonder if adding a sched_yield() may not penalize the current > implementations that only use one pthread per core? Even if there > is only one pthread in the scheduler queue for this CPU, calling > the scheduler code may cost thousands of cycles. >=20 > Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from? > Why 0x100 is better than 42 or than 10000? The idea was to have something few times bigger than actual number active cores in the system, to minimise chance of a sched_yield() being ca= lled for the case when we have one thread per physical core. =20 My thought was that having that many repeats would make such chance neglect= able. Though, right now, I don't have any data to back it up. =20 > I think it could be good to check if there is a performance impact > with this change, especially where there is a lot of contention on > the ring. If it has an impact, what about adding a compile or runtime > option? Good idea, probably we should make RTE_RING_PAUSE_REP configuration option and let say avoid emitting ' sched_yield();' at all, if RTE_RING_PAUSE_REP= =3D=3D 0. Konstantin >=20 >=20 > Regards, > Olivier