From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id D17C95930
 for <dev@dpdk.org>; Mon,  9 Feb 2015 16:43:14 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga102.jf.intel.com with ESMTP; 09 Feb 2015 07:39:30 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,544,1418112000"; d="scan'208";a="524811859"
Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75])
 by orsmga003.jf.intel.com with ESMTP; 09 Feb 2015 07:35:12 -0800
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.81]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.2]) with mapi id 14.03.0195.001;
 Mon, 9 Feb 2015 15:43:11 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>, "Liang, Cunming"
 <cunming.liang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid
 spin forever
Thread-Index: AQHQPoxwXJoGXxIJ706oygtFodqga5zjw6SAgASxSrA=
Date: Mon, 9 Feb 2015 15:43:10 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258213E461C@irsmsx105.ger.corp.intel.com>
References: <1422491072-5114-1-git-send-email-cunming.liang@intel.com>
 <1422842559-13617-1-git-send-email-cunming.liang@intel.com>
 <1422842559-13617-17-git-send-email-cunming.liang@intel.com>
 <54D4DB9F.6080601@6wind.com>
In-Reply-To: <54D4DB9F.6080601@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid spin
 forever
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Feb 2015 15:43:15 -0000

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> Sent: Friday, February 06, 2015 3:20 PM
> To: Liang, Cunming; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4 16/17] ring: add sched_yield to avoid s=
pin forever
>=20
> Hi,
>=20
> On 02/02/2015 03:02 AM, Cunming Liang wrote:
> > Add a sched_yield() syscall if the thread spins for too long, waiting o=
ther thread to finish its operations on the ring.
> > That gives pre-empted thread a chance to proceed and finish with ring e=
nqnue/dequeue operation.
> > The purpose is to reduce contention on the ring.
> >
> > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 35 +++++++++++++++++++++++++++++------
> >  1 file changed, 29 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > index 39bacdd..c402c73 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -126,6 +126,7 @@ struct rte_ring_debug_stats {
> >
> >  #define RTE_RING_NAMESIZE 32 /**< The maximum length of a ring name. *=
/
> >  #define RTE_RING_MZ_PREFIX "RG_"
> > +#define RTE_RING_PAUSE_REP 0x100  /**< yield after num of times pause.=
 */
> >
> >  /**
> >   * An RTE ring structure.
> > @@ -410,7 +411,7 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void *=
 const *obj_table,
> >  	uint32_t cons_tail, free_entries;
> >  	const unsigned max =3D n;
> >  	int success;
> > -	unsigned i;
> > +	unsigned i, rep;
> >  	uint32_t mask =3D r->prod.mask;
> >  	int ret;
> >
> > @@ -468,8 +469,19 @@ __rte_ring_mp_do_enqueue(struct rte_ring *r, void =
* const *obj_table,
> >  	 * If there are other enqueues in progress that preceded us,
> >  	 * we need to wait for them to complete
> >  	 */
> > -	while (unlikely(r->prod.tail !=3D prod_head))
> > -		rte_pause();
> > +	do {
> > +		/* avoid spin too long waiting for other thread finish */
> > +		for (rep =3D RTE_RING_PAUSE_REP;
> > +		     rep !=3D 0 && r->prod.tail !=3D prod_head; rep--)
> > +			rte_pause();
> > +
> > +		/*
> > +		 * It gives pre-empted thread a chance to proceed and
> > +		 * finish with ring enqnue operation.
> > +		 */
> > +		if (rep =3D=3D 0)
> > +			sched_yield();
> > +	} while (rep =3D=3D 0);
> >
> >  	r->prod.tail =3D prod_next;
> >  	return ret;
> > @@ -589,7 +601,7 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void *=
*obj_table,
> >  	uint32_t cons_next, entries;
> >  	const unsigned max =3D n;
> >  	int success;
> > -	unsigned i;
> > +	unsigned i, rep;
> >  	uint32_t mask =3D r->prod.mask;
> >
> >  	/* move cons.head atomically */
> > @@ -634,8 +646,19 @@ __rte_ring_mc_do_dequeue(struct rte_ring *r, void =
**obj_table,
> >  	 * If there are other dequeues in progress that preceded us,
> >  	 * we need to wait for them to complete
> >  	 */
> > -	while (unlikely(r->cons.tail !=3D cons_head))
> > -		rte_pause();
> > +	do {
> > +		/* avoid spin too long waiting for other thread finish */
> > +		for (rep =3D RTE_RING_PAUSE_REP;
> > +		     rep !=3D 0 && r->cons.tail !=3D cons_head; rep--)
> > +			rte_pause();
> > +
> > +		/*
> > +		 * It gives pre-empted thread a chance to proceed and
> > +		 * finish with ring denqnue operation.
> > +		 */
> > +		if (rep =3D=3D 0)
> > +			sched_yield();
> > +	} while (rep =3D=3D 0);
> >
> >  	__RING_STAT_ADD(r, deq_success, n);
> >  	r->cons.tail =3D cons_next;
> >
>=20
> The ring library was designed with the assumption that the code is not
> preemptable. The code is lock-less but not wait-less. Actually, if the
> code is preempted at a bad moment, it can spin forever until it's
> unscheduled.
>=20
> I wonder if adding a sched_yield() may not penalize the current
> implementations that only use one pthread per core? Even if there
> is only one pthread in the scheduler queue for this CPU, calling
> the scheduler code may cost thousands of cycles.
>=20
> Also, where does this value "RTE_RING_PAUSE_REP 0x100" comes from?
> Why 0x100 is better than 42 or than 10000?

The idea was to have something few times bigger than actual number
active cores in the system, to minimise chance of  a sched_yield() being ca=
lled
for the case when we have one thread per physical core. =20
My thought was that having that many repeats would make such chance neglect=
able.
Though, right now, I don't have any data to back it up.    =20

> I think it could be good to check if there is a performance impact
> with this change, especially where there is a lot of contention on
> the ring. If it has an impact, what about adding a compile or runtime
> option?

Good idea, probably we should make RTE_RING_PAUSE_REP  configuration option
and let say avoid emitting ' sched_yield();' at all, if  RTE_RING_PAUSE_REP=
 =3D=3D 0.

Konstantin

>=20
>=20
> Regards,
> Olivier