From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 3CE7CA2F18
	for <public@inbox.dpdk.org>; Thu,  3 Oct 2019 14:26:59 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 8B00E1C038;
	Thu,  3 Oct 2019 14:26:58 +0200 (CEST)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by dpdk.org (Postfix) with ESMTP id 4E6341C02F
 for <dev@dpdk.org>; Thu,  3 Oct 2019 14:26:57 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga008.jf.intel.com ([10.7.209.65])
 by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 03 Oct 2019 05:26:56 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.67,252,1566889200"; d="scan'208";a="185898266"
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159])
 by orsmga008.jf.intel.com with ESMTP; 03 Oct 2019 05:26:47 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.164]) by
 IRSMSX104.ger.corp.intel.com ([169.254.5.103]) with mapi id 14.03.0439.000;
 Thu, 3 Oct 2019 13:26:46 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "paulmck@linux.ibm.com" <paulmck@linux.ibm.com>
CC: "Wang, Yipeng1" <yipeng1.wang@intel.com>, "Medvedkin, Vladimir"
 <vladimir.medvedkin@intel.com>, "Ruifeng Wang (Arm Technology China)"
 <Ruifeng.Wang@arm.com>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
 "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, nd <nd@arm.com>
Thread-Topic: [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
Thread-Index: AQHVeCGdR5HV3Y+YjUaapXAJb26OSqdHkBoQgADWrwCAAGr9MA==
Date: Thu, 3 Oct 2019 12:26:45 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258019197083A@irsmsx105.ger.corp.intel.com>
References: <20190906094534.36060-1-ruifeng.wang@arm.com>
 <20191001062917.35578-1-honnappa.nagarahalli@arm.com>
 <20191001062917.35578-3-honnappa.nagarahalli@arm.com>
 <2601191342CEEE43887BDE71AB977258019196FF8E@irsmsx105.ger.corp.intel.com>
 <VE1PR08MB514912CDFC19476AF82B197F989F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
In-Reply-To: <VE1PR08MB514912CDFC19476AF82B197F989F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDAwZjhhODItMjY4NS00YTI4LWIxMDYtNDk1MjdhZjg4MmFjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiSTVKaDFkQnZyWk9jK2puVUplcVZYeVFcL1NcL0FcL0UyZ0lzekU1U215YllMOWNpUFJBdEJuYmpIeUhFd0pTK3BFQSJ9
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.2.0.6
dlp-reaction: no-action
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v3 2/3] lib/rcu: add resource reclamation APIs
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Honnappa,

> > > Add resource reclamation APIs to make it simple for applications and
> > > libraries to integrate rte_rcu library.
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ola Liljedhal <ola.liljedhal@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  app/test/test_rcu_qsbr.c           | 291 +++++++++++++++++++++++++++=
+-
> > >  lib/librte_rcu/meson.build         |   2 +
> > >  lib/librte_rcu/rte_rcu_qsbr.c      | 185 ++++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr.h      | 169 +++++++++++++++++
> > >  lib/librte_rcu/rte_rcu_qsbr_pvt.h  |  46 +++++
> > >  lib/librte_rcu/rte_rcu_version.map |   4 +
> > >  lib/meson.build                    |   6 +-
> > >  7 files changed, 700 insertions(+), 3 deletions(-)  create mode
> > > 100644 lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > >
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.c
> > > b/lib/librte_rcu/rte_rcu_qsbr.c index ce7f93dd3..76814f50b 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.c
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.c
> > > @@ -21,6 +21,7 @@
> > >  #include <rte_errno.h>
> > >
> > >  #include "rte_rcu_qsbr.h"
> > > +#include "rte_rcu_qsbr_pvt.h"
> > >
> > >  /* Get the memory size of QSBR variable */  size_t @@ -267,6 +268,19=
0
> > > @@ rte_rcu_qsbr_dump(FILE *f, struct rte_rcu_qsbr *v)
> > >  	return 0;
> > >  }
> > >
> > > +/* Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + */
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params) {
> > > +	struct rte_rcu_qsbr_dq *dq;
> > > +	uint32_t qs_fifo_size;
> > > +
> > > +	if (params =3D=3D NULL || params->f =3D=3D NULL ||
> > > +		params->v =3D=3D NULL || params->name =3D=3D NULL ||
> > > +		params->size =3D=3D 0 || params->esize =3D=3D 0 ||
> > > +		(params->esize % 8 !=3D 0)) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno =3D EINVAL;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq =3D rte_zmalloc(NULL,
> > > +		(sizeof(struct rte_rcu_qsbr_dq) + params->esize),
> > > +		RTE_CACHE_LINE_SIZE);
> > > +	if (dq =3D=3D NULL) {
> > > +		rte_errno =3D ENOMEM;
> > > +
> > > +		return NULL;
> > > +	}
> > > +
> > > +	/* round up qs_fifo_size to next power of two that is not less than
> > > +	 * max_size.
> > > +	 */
> > > +	qs_fifo_size =3D rte_align32pow2((((params->esize/8) + 1)
> > > +					* params->size) + 1);
> > > +	dq->r =3D rte_ring_create(params->name, qs_fifo_size,
> > > +					SOCKET_ID_ANY, 0);
> >
> > If it is going to be not MT safe, then why not to create the ring with
> > (RING_F_SP_ENQ | RING_F_SC_DEQ) flags set?
> Agree.
>=20
> > Though I think it could be changed to allow MT safe multiple enqeue/sin=
gle
> > dequeue, see below.
> The MT safe issue is due to reclaim code. The reclaim code has the follow=
ing sequence:
>=20
> rte_ring_peek
> rte_rcu_qsbr_check
> rte_ring_dequeue
>=20
> This entire sequence needs to be atomic as the entry cannot be dequeued w=
ithout knowing that the grace period for that entry is over.

I understand that, though I believe at least it should be possible to suppo=
rt multiple-enqueue/single dequeuer and reclaim mode.
With serialized dequeue() even multiple dequeue should be possible.

> Note that due to optimizations in rte_rcu_qsbr_check API, this sequence s=
hould not be large in most cases. I do not have ideas on how to
> make this sequence lock-free.
>=20
> If the writer is on the control plane, most use cases will use mutex lock=
s for synchronization if they are multi-threaded. That lock should be
> enough to provide the thread safety for these APIs.

In that is case, why do we need ring at all?
For sure people can create their own queue quite easily with mutex and TAIL=
Q.
If performance is not an issue, they can even add pthread_cond to it, and h=
ave an ability
for the consumer to sleep/wakeup on empty/full queue.=20

>=20
> If the writer is multi-threaded and lock-free, then one should use per th=
read defer queue.

If that's the only working model, then the question is why do we need that =
API at all?
Just simple array with counter or linked-list should do for majority of cas=
es.

>=20
> >
> > > +	if (dq->r =3D=3D NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): defer queue create failed\n", __func__);
> > > +		rte_free(dq);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	dq->v =3D params->v;
> > > +	dq->size =3D params->size;
> > > +	dq->esize =3D params->esize;
> > > +	dq->f =3D params->f;
> > > +	dq->p =3D params->p;
> > > +
> > > +	return dq;
> > > +}
> > > +
> > > +/* Enqueue one resource to the defer queue to free after the grace
> > > + * period is over.
> > > + */
> > > +int rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e) {
> > > +	uint64_t token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +	uint32_t cur_size, free_size;
> > > +
> > > +	if (dq =3D=3D NULL || e =3D=3D NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno =3D EINVAL;
> > > +
> > > +		return 1;
> >
> > Why just not to return -EINVAL straightway?
> > I think there is no much point to set rte_errno in that function at all=
, just
> > return value should do.
> I am trying to keep these consistent with the existing APIs. They return =
0 or 1 and set the rte_errno.

A lot of public DPDK API functions do use return value to return status cod=
e
(0, or some positive numbers of success, negative errno values on failure),
I am not inventing anything new here.

>=20
> >
> > > +	}
> > > +
> > > +	/* Start the grace period */
> > > +	token =3D rte_rcu_qsbr_start(dq->v);
> > > +
> > > +	/* Reclaim resources if the queue is 1/8th full. This helps
> > > +	 * the queue from growing too large and allows time for reader
> > > +	 * threads to report their quiescent state.
> > > +	 */
> > > +	cur_size =3D rte_ring_count(dq->r) / (dq->esize/8 + 1);
> >
> > Probably would be a bit easier if you just store in dq->esize (elt size=
 + token
> > size) / 8.
> Agree
>=20
> >
> > > +	if (cur_size > (dq->size >> RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT)) {
> >
> > Why to make this threshold value hard-coded?
> > Why either not to put it into create parameter, or just return a specia=
l return
> > value, to indicate that threshold is reached?
> My thinking was to keep the programming interface easy to use. The more t=
he parameters, the more painful it is for the user. IMO, the
> constants chosen should be good enough for most cases. More advanced user=
s could modify the constants. However, we could make these
> as part of the parameters, but make them optional for the user. For ex: i=
f they set them to 0, default values can be used.
>=20
> > Or even return number of filled/free entroes on success, so caller can =
decide
> > to reclaim or not based on that information on his own?
> This means more code on the user side.=20

I personally think it it really wouldn't be that big problem to the user to=
 pass extra parameter to the function.
Again what if user doesn't want to reclaim() in enqueue() thread at all?

> I think adding these to parameters seems like a better option.
>=20
> >
> > > +		rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +			"%s(): Triggering reclamation\n", __func__);
> > > +		rte_rcu_qsbr_dq_reclaim(dq);
> > > +	}
> > > +
> > > +	/* Check if there is space for atleast for 1 resource */
> > > +	free_size =3D rte_ring_free_count(dq->r) / (dq->esize/8 + 1);
> > > +	if (!free_size) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Defer queue is full\n", __func__);
> > > +		rte_errno =3D ENOSPC;
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Enqueue the resource */
> > > +	rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)token);
> > > +
> > > +	/* The resource to enqueue needs to be a multiple of 64b
> > > +	 * due to the limitation of the rte_ring implementation.
> > > +	 */
> > > +	for (i =3D 0, tmp =3D (uint64_t *)e; i < dq->esize/8; i++, tmp++)
> > > +		rte_ring_sp_enqueue(dq->r, (void *)(uintptr_t)*tmp);
> >
> >
> > That whole construction above looks a bit clumsy and error prone...
> > I suppose just:
> >
> > const uint32_t nb_elt =3D  dq->elt_size/8 + 1; uint32_t free, n; ...
> > n =3D rte_ring_enqueue_bulk(dq->r, e, nb_elt, &free); if (n =3D=3D 0)
> Yes, bulk enqueue can be used. But note that once the flexible element si=
ze ring patch is done, this code will use that.

Well, when it will be in the mainline, and it would provide a better way,
for sure this code can be updated to use new API (if it is provide some imp=
rovements).
But as I udenrstand, right now it is not there, while bulk enqueue/dequeue =
are.

>=20
> >   return -ENOSPC;
> > return free;
> >
> > That way I think you can have MT-safe version of that function.
> Please see the description of MT safe issue above.
>=20
> >
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/* Reclaim resources from the defer queue. */ int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq) {
> > > +	uint32_t max_cnt;
> > > +	uint32_t cnt;
> > > +	void *token;
> > > +	uint64_t *tmp;
> > > +	uint32_t i;
> > > +
> > > +	if (dq =3D=3D NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno =3D EINVAL;
> > > +
> > > +		return 1;
> >
> > Same story as above - I think rte_errno is excessive in this function.
> > Just return value should be enough.
> >
> >
> > > +	}
> > > +
> > > +	/* Anything to reclaim? */
> > > +	if (rte_ring_count(dq->r) =3D=3D 0)
> > > +		return 0;
> >
> > Not sure you need that, see below.
> >
> > > +
> > > +	/* Reclaim at the max 1/16th the total number of entries. */
> > > +	max_cnt =3D dq->size >> RTE_RCU_QSBR_MAX_RECLAIM_LIMIT;
> > > +	max_cnt =3D (max_cnt =3D=3D 0) ? dq->size : max_cnt;
> >
> > Again why not to make max_cnt a configurable at create() parameter?
> I think making this as an optional parameter for creating defer queue is =
a better option.
>=20
> > Or even a parameter for that function?
> >
> > > +	cnt =3D 0;
> > > +
> > > +	/* Check reader threads quiescent state and reclaim resources */
> > > +	while ((cnt < max_cnt) && (rte_ring_peek(dq->r, &token) =3D=3D 0) &=
&
> > > +		(rte_rcu_qsbr_check(dq->v, (uint64_t)((uintptr_t)token), false)
> > > +			=3D=3D 1)) {
> >
> >
> > > +		(void)rte_ring_sc_dequeue(dq->r, &token);
> > > +		/* The resource to dequeue needs to be a multiple of 64b
> > > +		 * due to the limitation of the rte_ring implementation.
> > > +		 */
> > > +		for (i =3D 0, tmp =3D (uint64_t *)dq->e; i < dq->esize/8;
> > > +			i++, tmp++)
> > > +			(void)rte_ring_sc_dequeue(dq->r,
> > > +					(void *)(uintptr_t)tmp);
> >
> > Again, no need for such constructs with multiple dequeuer I believe.
> > Just:
> >
> > const uint32_t nb_elt =3D  dq->elt_size/8 + 1; uint32_t n; uintptr_t
> > elt[nb_elt]; ...
> > n =3D rte_ring_dequeue_bulk(dq->r, elt, nb_elt, NULL); if (n !=3D 0) {d=
q->f(dq->p,
> > elt);}
> Agree on bulk API use.
>=20
> >
> > Seems enough.
> > Again in that case you can have enqueue/reclaim running in different th=
reads
> > simultaneously, plus you don't need dq->e at all.
> Will check on dq->e
>=20
> >
> > > +		dq->f(dq->p, dq->e);
> > > +
> > > +		cnt++;
> > > +	}
> > > +
> > > +	rte_log(RTE_LOG_INFO, rte_rcu_log_type,
> > > +		"%s(): Reclaimed %u resources\n", __func__, cnt);
> > > +
> > > +	if (cnt =3D=3D 0) {
> > > +		/* No resources were reclaimed */
> > > +		rte_errno =3D EAGAIN;
> > > +		return 1;
> > > +	}
> > > +
> > > +	return 0;
> >
> > I'd suggest to return cnt on success.
> I am trying to keep the APIs simple. I do not see much use for 'cnt' as r=
eturn value to the user. It exposes more details which I think are
> internal to the library.

Not sure what is the hassle to return number of completed reclamaitions?
If user doesn't need that information, he simply wouldn't use it.
But might be it would be usefull - he can decide should he try another atte=
mpt
of reclaim() immediately or is it ok to do something else.

>=20
> >
> > > +}
> > > +
> > > +/* Delete a defer queue. */
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq) {
> > > +	if (dq =3D=3D NULL) {
> > > +		rte_log(RTE_LOG_ERR, rte_rcu_log_type,
> > > +			"%s(): Invalid input parameter\n", __func__);
> > > +		rte_errno =3D EINVAL;
> > > +
> > > +		return 1;
> > > +	}
> > > +
> > > +	/* Reclaim all the resources */
> > > +	if (rte_rcu_qsbr_dq_reclaim(dq) !=3D 0)
> > > +		/* Error number is already set by the reclaim API */
> > > +		return 1;
> >
> > How do you know that you have reclaimed everything?
> Good point, will come back with a different solution.
>=20
> >
> > > +
> > > +	rte_ring_free(dq->r);
> > > +	rte_free(dq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  int rte_rcu_log_type;
> > >
> > >  RTE_INIT(rte_rcu_register)
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr.h
> > > b/lib/librte_rcu/rte_rcu_qsbr.h index c80f15c00..185d4b50a 100644
> > > --- a/lib/librte_rcu/rte_rcu_qsbr.h
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr.h
> > > @@ -34,6 +34,7 @@ extern "C" {
> > >  #include <rte_lcore.h>
> > >  #include <rte_debug.h>
> > >  #include <rte_atomic.h>
> > > +#include <rte_ring.h>
> > >
> > >  extern int rte_rcu_log_type;
> > >
> > > @@ -109,6 +110,67 @@ struct rte_rcu_qsbr {
> > >  	 */
> > >  } __rte_cache_aligned;
> > >
> > > +/**
> > > + * Call back function called to free the resources.
> > > + *
> > > + * @param p
> > > + *   Pointer provided while creating the defer queue
> > > + * @param e
> > > + *   Pointer to the resource data stored on the defer queue
> > > + *
> > > + * @return
> > > + *   None
> > > + */
> > > +typedef void (*rte_rcu_qsbr_free_resource)(void *p, void *e);
> >
> > Stylish thing - usually in DPDK we have typedf newtype_t ...
> > Though I am not sure you need a new typedef at all - just a function po=
inter
> > inside the struct seems enough.
> Other libraries (for ex: rte_hash) use this approach. I think it is bette=
r to keep it out of the structure to allow for better commenting.

I am saying majority of DPDK code use _t suffix for typedef:
typedef void (*rte_rcu_qsbr_free_resource_t)(void *p, void *e);

>=20
> >
> > > +
> > > +#define RTE_RCU_QSBR_DQ_NAMESIZE RTE_RING_NAMESIZE
> > > +
> > > +/**
> > > + *  Trigger automatic reclamation after 1/8th the defer queue is ful=
l.
> > > + */
> > > +#define RTE_RCU_QSBR_AUTO_RECLAIM_LIMIT 3
> > > +
> > > +/**
> > > + *  Reclaim at the max 1/16th the total number of resources.
> > > + */
> > > +#define RTE_RCU_QSBR_MAX_RECLAIM_LIMIT 4
> >
> >
> > As I said above, I don't think these thresholds need to be hardcoded.
> > In any case, there seems not much point to put them in the public heade=
r file.
> >
> > > +
> > > +/**
> > > + * Parameters used when creating the defer queue.
> > > + */
> > > +struct rte_rcu_qsbr_dq_parameters {
> > > +	const char *name;
> > > +	/**< Name of the queue. */
> > > +	uint32_t size;
> > > +	/**< Number of entries in queue. Typically, this will be
> > > +	 *   the same as the maximum number of entries supported in the
> > > +	 *   lock free data structure.
> > > +	 *   Data structures with unbounded number of entries is not
> > > +	 *   supported currently.
> > > +	 */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of each element in the defer queue.
> > > +	 *   This has to be multiple of 8B as the rte_ring APIs
> > > +	 *   support 8B element sizes only.
> > > +	 */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> >
> > Style nit again - I like short names myself, but that seems a bit extre=
me... :)
> > Might be at least:
> > void (*reclaim)(void *, void *);
> May be 'free_fn'?
>=20
> > void * reclaim_data;
> > ?
> This is the pointer to the data structure to free the resource into. For =
ex: In LPM data structure, it will be pointer to LPM. 'reclaim_data'
> does not convey the meaning correctly.

Ok, please free to comeup with your own names.
I just wanted to say that 'f' and 'p' are a bit an extreme for public API.

>=20
> >
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs. This can be NULL.
> > > +	 */
> > > +	struct rte_rcu_qsbr *v;
> >
> > Does it need to be inside that struct?
> > Might be better:
> > rte_rcu_qsbr_dq_create(struct rte_rcu_qsbr *v, const struct
> > rte_rcu_qsbr_dq_parameters *params);
> The API takes a parameter structure as input anyway, why to add another a=
rgument to the function? QSBR variable is also another
> parameter.
>=20
> >
> > Another alternative: make both reclaim() and enqueue() to take v as a
> > parameter.
> But both of them need access to some of the parameters provided in rte_rc=
u_qsbr_dq_create API. We would end up passing 2 arguments to
> the functions.

Pure stylish thing.
>From my perspective it just provides better visibility what is going in the=
 code:
For QSBR var 'v' create a new deferred queue.
But no strong opinion here.

>=20
> >
> > > +	/**< RCU QSBR variable to use for this defer queue */ };
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq;
> > > +
> > >  /**
> > >   * @warning
> > >   * @b EXPERIMENTAL: this API may change without prior notice @@
> > > -648,6 +710,113 @@ __rte_experimental  int  rte_rcu_qsbr_dump(FILE *f=
,
> > > struct rte_rcu_qsbr *v);
> > >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Create a queue used to store the data structure elements that can
> > > + * be freed later. This queue is referred to as 'defer queue'.
> > > + *
> > > + * @param params
> > > + *   Parameters to create a defer queue.
> > > + * @return
> > > + *   On success - Valid pointer to defer queue
> > > + *   On error - NULL
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOMEM - Not enough memory
> > > + */
> > > +__rte_experimental
> > > +struct rte_rcu_qsbr_dq *
> > > +rte_rcu_qsbr_dq_create(const struct rte_rcu_qsbr_dq_parameters
> > > +*params);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Enqueue one resource to the defer queue and start the grace perio=
d.
> > > + * The resource will be freed later after at least one grace period
> > > + * is over.
> > > + *
> > > + * If the defer queue is full, it will attempt to reclaim resources.
> > > + * It will also reclaim resources at regular intervals to avoid
> > > + * the defer queue from growing too big.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other mea=
ns.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to allocate an entry from.
> > > + * @param e
> > > + *   Pointer to resource data to copy to the defer queue. The size o=
f
> > > + *   the data to copy is equal to the element size provided when the
> > > + *   defer queue was created.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - ENOSPC - Defer queue is full. This condition can not happen
> > > + *		if the defer queue size is equal (or larger) than the
> > > + *		number of elements in the data structure.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_enqueue(struct rte_rcu_qsbr_dq *dq, void *e);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Reclaim resources from the defer queue.
> > > + *
> > > + * This API is not multi-thread safe. It is expected that the caller
> > > + * provides multi-thread safety by locking a mutex or some other mea=
ns.
> > > + *
> > > + * A lock free multi-thread writer algorithm could achieve
> > > +multi-thread
> > > + * safety by creating and using one defer queue per thread.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to reclaim an entry from.
> > > + * @return
> > > + *   On successful reclamation of at least 1 resource - 0
> > > + *   On error - 1 with rte_errno set to
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - None of the resources have completed at least 1 grac=
e
> > period,
> > > + *		try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_reclaim(struct rte_rcu_qsbr_dq *dq);
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > + *
> > > + * Delete a defer queue.
> > > + *
> > > + * It tries to reclaim all the resources on the defer queue.
> > > + * If any of the resources have not completed the grace period
> > > + * the reclamation stops and returns immediately. The rest of
> > > + * the resources are not reclaimed and the defer queue is not
> > > + * freed.
> > > + *
> > > + * @param dq
> > > + *   Defer queue to delete.
> > > + * @return
> > > + *   On success - 0
> > > + *   On error - 1
> > > + *   Possible rte_errno codes are:
> > > + *   - EINVAL - NULL parameters are passed
> > > + *   - EAGAIN - Some of the resources have not completed at least 1 =
grace
> > > + *		period, try again.
> > > + */
> > > +__rte_experimental
> > > +int
> > > +rte_rcu_qsbr_dq_delete(struct rte_rcu_qsbr_dq *dq);
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> > > new file mode 100644
> > > index 000000000..2122bc36a
> > > --- /dev/null
> > > +++ b/lib/librte_rcu/rte_rcu_qsbr_pvt.h
> >
> > Again style suggestion: as it is not public header - don't use rte_ pre=
fix for
> > naming.
> > From my perspective - easier to relalize for reader what is public head=
er,
> > what is not.
> Looks like the guidelines are not defined very well. I see one private fi=
le with rte_ prefix. I see Stephen not using rte_ prefix. I do not have
> any preference. But, a consistent approach is required.

That's just a suggestion.
For me (and I hope for others) it would be a bit easier.
When looking at the code for first time I had to look a t meson.build to ch=
eck
is it a public header or not.
If the file doesn't have 'rte_' prefix, I assume that it is an internal one=
 straightway.
But , as you said, there is no exact guidelines here, so up to you to decid=
e.

>=20
> >
> > > @@ -0,0 +1,46 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright (c) 2019 Arm Limited
> > > + */
> > > +
> > > +#ifndef _RTE_RCU_QSBR_PVT_H_
> > > +#define _RTE_RCU_QSBR_PVT_H_
> > > +
> > > +/**
> > > + * This file is private to the RCU library. It should not be include=
d
> > > + * by the user of this library.
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include "rte_rcu_qsbr.h"
> > > +
> > > +/* RTE defer queue structure.
> > > + * This structure holds the defer queue. The defer queue is used to
> > > + * hold the deleted entries from the data structure that are not
> > > + * yet freed.
> > > + */
> > > +struct rte_rcu_qsbr_dq {
> > > +	struct rte_rcu_qsbr *v; /**< RCU QSBR variable used by this queue.*=
/
> > > +	struct rte_ring *r;     /**< RCU QSBR defer queue. */
> > > +	uint32_t size;
> > > +	/**< Number of elements in the defer queue */
> > > +	uint32_t esize;
> > > +	/**< Size (in bytes) of data stored on the defer queue */
> > > +	rte_rcu_qsbr_free_resource f;
> > > +	/**< Function to call to free the resource. */
> > > +	void *p;
> > > +	/**< Pointer passed to the free function. Typically, this is the
> > > +	 *   pointer to the data structure to which the resource to free
> > > +	 *   belongs.
> > > +	 */
> > > +	char e[0];
> > > +	/**< Temporary storage to copy the defer queue element. */
> >
> > Do you really need 'e' at all?
> > Can't it be just temporary stack variable?
> Ok, will check.
>=20
> >
> > > +};
> > > +
> > > +#ifdef __cplusplus
> > > +}
> > > +#endif
> > > +
> > > +#endif /* _RTE_RCU_QSBR_PVT_H_ */
> > > diff --git a/lib/librte_rcu/rte_rcu_version.map
> > > b/lib/librte_rcu/rte_rcu_version.map
> > > index f8b9ef2ab..dfac88a37 100644
> > > --- a/lib/librte_rcu/rte_rcu_version.map
> > > +++ b/lib/librte_rcu/rte_rcu_version.map
> > > @@ -8,6 +8,10 @@ EXPERIMENTAL {
> > >  	rte_rcu_qsbr_synchronize;
> > >  	rte_rcu_qsbr_thread_register;
> > >  	rte_rcu_qsbr_thread_unregister;
> > > +	rte_rcu_qsbr_dq_create;
> > > +	rte_rcu_qsbr_dq_enqueue;
> > > +	rte_rcu_qsbr_dq_reclaim;
> > > +	rte_rcu_qsbr_dq_delete;
> > >
> > >  	local: *;
> > >  };
> > > diff --git a/lib/meson.build b/lib/meson.build index
> > > e5ff83893..0e1be8407 100644
> > > --- a/lib/meson.build
> > > +++ b/lib/meson.build
> > > @@ -11,7 +11,9 @@
> > >  libraries =3D [
> > >  	'kvargs', # eal depends on kvargs
> > >  	'eal', # everything depends on eal
> > > -	'ring', 'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > > +	'ring',
> > > +	'rcu', # rcu depends on ring
> > > +	'mempool', 'mbuf', 'net', 'meter', 'ethdev', 'pci', # core
> > >  	'cmdline',
> > >  	'metrics', # bitrate/latency stats depends on this
> > >  	'hash',    # efd depends on this
> > > @@ -22,7 +24,7 @@ libraries =3D [
> > >  	'gro', 'gso', 'ip_frag', 'jobstats',
> > >  	'kni', 'latencystats', 'lpm', 'member',
> > >  	'power', 'pdump', 'rawdev',
> > > -	'rcu', 'reorder', 'sched', 'security', 'stack', 'vhost',
> > > +	'reorder', 'sched', 'security', 'stack', 'vhost',
> > >  	# ipsec lib depends on net, crypto and security
> > >  	'ipsec',
> > >  	# add pkt framework libs which use other libs from above
> > > --
> > > 2.17.1