From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A6FD6A3160 for ; Fri, 11 Oct 2019 16:41:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 771401EB1D; Fri, 11 Oct 2019 16:41:10 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id BB95F1EAED for ; Fri, 11 Oct 2019 16:41:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Oct 2019 07:41:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.67,284,1566889200"; d="scan'208";a="188336547" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga008.jf.intel.com with ESMTP; 11 Oct 2019 07:41:02 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.164]) by IRSMSX107.ger.corp.intel.com ([169.254.10.7]) with mapi id 14.03.0439.000; Fri, 11 Oct 2019 15:41:01 +0100 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , "stephen@networkplumber.org" , "paulmck@linux.ibm.com" CC: "Wang, Yipeng1" , "Medvedkin, Vladimir" , "Ruifeng Wang (Arm Technology China)" , Dharmik Thakkar , "dev@dpdk.org" , nd , nd , nd Thread-Topic: [PATCH v3 1/3] lib/ring: add peek API Thread-Index: AQHVeCGUIgBwOBQCI0OY47a7OfyurKdHoInwgAGlroCABaC3MIACyz6AgAJU8SCAANpxgIAAsB1A Date: Fri, 11 Oct 2019 14:41:01 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580191975AE3@irsmsx105.ger.corp.intel.com> References: <20190906094534.36060-1-ruifeng.wang@arm.com> <20191001062917.35578-1-honnappa.nagarahalli@arm.com> <20191001062917.35578-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB9772580191970014@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191971EBE@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772580191975145@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTk4OTNiODQtMWU0OC00ZmMxLTg4NDgtNTBiNGQ2OTliYjM0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiOGVSTkZMTzhyRTJcLzhWVE9XMEJYaFZjYVFEY3VcL05sMk9Wc09OSUdIcTdMXC9iNXBtZk9BWnh0N0hHamQ5K29VQyJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/ring: add peek API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com] > Sent: Friday, October 11, 2019 6:04 AM > To: Ananyev, Konstantin ; stephen@networkpl= umber.org; paulmck@linux.ibm.com > Cc: Wang, Yipeng1 ; Medvedkin, Vladimir ; Ruifeng Wang (Arm Technology > China) ; Dharmik Thakkar ;= dev@dpdk.org; nd ; nd > ; nd > Subject: RE: [PATCH v3 1/3] lib/ring: add peek API >=20 > > > > > > > > > > > > > > > > > > > > > > > > Subject: [PATCH v3 1/3] lib/ring: add peek API > > > > > > > > > > > > > > From: Ruifeng Wang > > > > > > > > > > > > > > The peek API allows fetching the next available object in the > > > > > > > ring without dequeuing it. This helps in scenarios where > > > > > > > dequeuing of objects depend on their value. > > > > > > > > > > > > > > Signed-off-by: Dharmik Thakkar > > > > > > > Signed-off-by: Ruifeng Wang > > > > > > > Reviewed-by: Honnappa Nagarahalli > > > > > > > > > > > > > > Reviewed-by: Gavin Hu > > > > > > > --- > > > > > > > lib/librte_ring/rte_ring.h | 30 > > > > > > > ++++++++++++++++++++++++++++++ > > > > > > > 1 file changed, 30 insertions(+) > > > > > > > > > > > > > > diff --git a/lib/librte_ring/rte_ring.h > > > > > > > b/lib/librte_ring/rte_ring.h index 2a9f768a1..d3d0d5e18 10064= 4 > > > > > > > --- a/lib/librte_ring/rte_ring.h > > > > > > > +++ b/lib/librte_ring/rte_ring.h > > > > > > > @@ -953,6 +953,36 @@ rte_ring_dequeue_burst(struct rte_ring > > > > > > > *r, void > > > > > > **obj_table, > > > > > > > r->cons.single, available); } > > > > > > > > > > > > > > +/** > > > > > > > + * Peek one object from a ring. > > > > > > > + * > > > > > > > + * The peek API allows fetching the next available object in > > > > > > > +the ring > > > > > > > + * without dequeuing it. This API is not multi-thread safe > > > > > > > +with respect > > > > > > > + * to other consumer threads. > > > > > > > + * > > > > > > > + * @param r > > > > > > > + * A pointer to the ring structure. > > > > > > > + * @param obj_p > > > > > > > + * A pointer to a void * pointer (object) that will be fil= led. > > > > > > > + * @return > > > > > > > + * - 0: Success, object available > > > > > > > + * - -ENOENT: Not enough entries in the ring. > > > > > > > + */ > > > > > > > +__rte_experimental > > > > > > > +static __rte_always_inline int rte_ring_peek(struct rte_ring > > > > > > > +*r, void **obj_p) > > > > > > > > > > > > As it is not MT safe, then I think we need _sc_ in the name, to > > > > > > follow other rte_ring functions naming conventions > > > > > > (rte_ring_sc_peek() or so). > > > > > Agree > > > > > > > > > > > > > > > > > As a better alternative what do you think about introducing a > > > > > > serialized versions of DPDK rte_ring dequeue functions? > > > > > > Something like that: > > > > > > > > > > > > /* same as original ring dequeue, but: > > > > > > * 1) move cons.head only if cons.head =3D=3D const.tail > > > > > > * 2) don't update cons.tail > > > > > > */ > > > > > > unsigned int > > > > > > rte_ring_serial_dequeue_bulk(struct rte_ring *r, void > > > > > > **obj_table, unsigned int n, > > > > > > unsigned int *available); > > > > > > > > > > > > /* sets both cons.head and cons.tail to cons.head + num */ void > > > > > > rte_ring_serial_dequeue_finish(struct rte_ring *r, uint32_t > > > > > > num); > > > > > > > > > > > > /* resets cons.head to const.tail value */ void > > > > > > rte_ring_serial_dequeue_abort(struct rte_ring *r); > > > > > > > > > > > > Then your dq_reclaim cycle function will look like that: > > > > > > > > > > > > const uint32_t nb_elt =3D dq->elt_size/8 + 1; uint32_t avl, n; > > > > > > uintptr_t elt[nb_elt]; ... > > > > > > > > > > > > do { > > > > > > > > > > > > /* read next elem from the queue */ > > > > > > n =3D rte_ring_serial_dequeue_bulk(dq->r, elt, nb_elt, &avl); > > > > > > if (n =3D=3D 0) > > > > > > break; > > > > > > > > > > > > /* wrong period, keep elem in the queue */ if > > > > > > (rte_rcu_qsbr_check(dr->v, > > > > > > elt[0]) !=3D 1) { > > > > > > rte_ring_serial_dequeue_abort(dq->r); > > > > > > break; > > > > > > } > > > > > > > > > > > > /* can reclaim, remove elem from the queue */ > > > > > > rte_ring_serial_dequeue_finish(dr->q, nb_elt); > > > > > > > > > > > > /*call reclaim function */ > > > > > > dr->f(dr->p, elt); > > > > > > > > > > > > } while (avl >=3D nb_elt); > > > > > > > > > > > > That way, I think even rte_rcu_qsbr_dq_reclaim() can be MT safe= . > > > > > > As long as actual reclamation callback itself is MT safe of cou= rse. > > > > > > > > > > I think it is a great idea. The other writers would still be > > > > > polling for the current writer to update the tail or update the > > > > > head. This makes it a > > > > blocking solution. > > > > > > > > Yep, it is a blocking one. > > > > > > > > > We can make the other threads not poll i.e. they will quit > > > > > reclaiming if they > > > > see that other writers are dequeuing from the queue. > > > > > > > > Actually didn't think about that possibility, but yes should be > > > > possible to have _try_ semantics too. > > > > > > > > >The other way is to use per thread queues. > > > > > > > > > > The other requirement I see is to support unbounded-size data > > > > > structures where in the data structures do not have a > > > > > pre-determined number of entries. Also, currently the defer queue > > > > > size is equal to the total > > > > number of entries in a given data structure. There are plans to > > > > support dynamically resizable defer queue. This means, memory > > > > allocation which will affect the lock-free-ness of the solution. > > > > > > > > > > So, IMO: > > > > > 1) The API should provide the capability to support different > > > > > algorithms - > > > > may be through some flags? > > > > > 2) The requirements for the ring are pretty unique to the problem > > > > > we have here (for ex: move the cons-head only if cons-tail is als= o > > > > > the same, skip > > > > polling). So, we should probably implement a ring with-in the RCU l= ibrary? > > > > > > > > Personally, I think such serialization ring API would be useful for > > > > other cases too. > > > > There are few cases when user need to read contents of the queue > > > > without removing elements from it. > > > > Let say we do use similar approach inside TLDK to implement TCP > > > > transmit queue. > > > > If such API would exist in DPDK we can just use it straightway, > > > > without maintaining a separate one. > > > ok > > > > > > > > > > > > > > > > > From the timeline perspective, adding all these capabilities woul= d > > > > > be difficult to get done with in 19.11 timeline. What I have here > > > > > satisfies my current needs. I suggest that we make provisions in > > > > > APIs now to > > > > support all these features, but do the implementation in the coming > > releases. > > > > Does this sound ok for you? > > > > > > > > Not sure I understand your suggestion here... > > > > Could you explain it a bit more - how new API will look like and > > > > what would be left for the future. > > > For this patch, I suggest we do not add any more complexity. If > > > someone wants a lock-free/block-free mechanism, it is available by cr= eating > > per thread defer queues. > > > > > > We push the following to the future: > > > 1) Dynamically size adjustable defer queue. IMO, with this, the > > > lock-free/block-free reclamation will not be available (memory alloca= tion > > requires locking). The memory for the defer queue will be allocated/fre= ed in > > chunks of 'size' elements as the queue grows/shrinks. > > > > That one is fine by me. > > In fact I don't know would be there a real use-case for dynamic defer q= ueue > > for rcu var... > > But I suppose that's subject for another discussion. > Currently, the defer queue size is equal to the number of resources in th= e data structure. This is unnecessary as the reclamation is done > regularly. > If a smaller queue size is used, the queue might get full (even after rec= lamation), in which case, the queue size should be increased. I understand the intention. Though I am not very happy with approach where to free one resource we firs= t have to allocate another one. Sounds like a source of deadlocks and for that case probably unnecessary co= mplication. But again, as it is not for 19.11 we don't have to discuss it now. =20 > > > > > > > > 2) Constant size defer queue with lock-free and block-free reclamatio= n > > > (single option). The defer queue will be of fixed length 'size'. If > > > the queue gets full an error is returned. The user could provide a 's= ize' equal > > to the number of elements in a data structure to ensure queue never get= s full. > > > > Ok so for 19.11 what enqueue/dequeue model do you plan to support? > > - MP/MC > > - MP/SC > > - SP/SC > Just SP/SC Ok, just to confirm we are on the same page: there would be a possibility for one thread do dq_enqueue(), second one do = dq_reclaim() simultaneously (of course if actual reclamation function is thread safe)? =20 > > - non MT at all (only same single thread can do enqueue and dequeue) > If MT safe is required, one should use 1 defer queue per thread for now. >=20 > > > > And related question: > > What additional rte_ring API you plan to introduce in that case? > > - None > > - rte_ring_sc_peek() > rte_ring_peek will be changed to rte_ring_sc_peek >=20 > > - rte_ring_serial_dequeue() > > > > > > > > I would add a 'flags' field in rte_rcu_qsbr_dq_parameters and provide > > > 2 #defines, one for dynamically variable size defer queue and the oth= er for > > constant size defer queue. > > > > > > However, IMO, using per thread defer queue is a much simpler way to > > achieve 2. It does not add any significant burden to the user either. > > > > > > > > > > > > > > > > > > > > > > > > > +{ > > > > > > > + uint32_t prod_tail =3D r->prod.tail; > > > > > > > + uint32_t cons_head =3D r->cons.head; > > > > > > > + uint32_t count =3D (prod_tail - cons_head) & r->mask; > > > > > > > + unsigned int n =3D 1; > > > > > > > + if (count) { > > > > > > > + DEQUEUE_PTRS(r, &r[1], cons_head, obj_p, n, void *); > > > > > > > + return 0; > > > > > > > + } > > > > > > > + return -ENOENT; > > > > > > > +} > > > > > > > + > > > > > > > #ifdef __cplusplus > > > > > > > } > > > > > > > #endif > > > > > > > -- > > > > > > > 2.17.1