DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	Olivier Matz <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"david.marchand@redhat.com" <david.marchand@redhat.com>,
	nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [RFC v2 1/1] lib/ring: add scatter gather APIs
Date: Mon, 12 Oct 2020 17:06:30 +0000	[thread overview]
Message-ID: <BYAPR11MB3301191B27890F387D48D34B9A070@BYAPR11MB3301.namprd11.prod.outlook.com> (raw)
In-Reply-To: <DBAPR08MB581406C02A163D1F2601841D98080@DBAPR08MB5814.eurprd08.prod.outlook.com>



 
> <snip>
> 
> > > > > Hi Honnappa,
> > > > >
> > > > > From a quick walkthrough, I have some questions/comments, please
> > > > > see below.
> > > > Hi Olivier, appreciate your input.
> > > >
> > > > >
> > > > > On Tue, Oct 06, 2020 at 08:29:05AM -0500, Honnappa Nagarahalli wrote:
> > > > > > Add scatter gather APIs to avoid intermediate memcpy. Use cases
> > > > > > that involve copying large amount of data to/from the ring can
> > > > > > benefit from these APIs.
> > > > > >
> > > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > > <honnappa.nagarahalli@arm.com>
> > > > > > ---
> > > > > >  lib/librte_ring/meson.build        |   3 +-
> > > > > >  lib/librte_ring/rte_ring_elem.h    |   1 +
> > > > > >  lib/librte_ring/rte_ring_peek_sg.h | 552
> > > > > > +++++++++++++++++++++++++++++
> > > > > >  3 files changed, 555 insertions(+), 1 deletion(-)  create mode
> > > > > > 100644 lib/librte_ring/rte_ring_peek_sg.h
> > > > > >
> > > > > > diff --git a/lib/librte_ring/meson.build
> > > > > > b/lib/librte_ring/meson.build index 31c0b4649..377694713 100644
> > > > > > --- a/lib/librte_ring/meson.build
> > > > > > +++ b/lib/librte_ring/meson.build
> > > > > > @@ -12,4 +12,5 @@ headers = files('rte_ring.h',
> > > > > >  		'rte_ring_peek.h',
> > > > > >  		'rte_ring_peek_c11_mem.h',
> > > > > >  		'rte_ring_rts.h',
> > > > > > -		'rte_ring_rts_c11_mem.h')
> > > > > > +		'rte_ring_rts_c11_mem.h',
> > > > > > +		'rte_ring_peek_sg.h')
> > > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > > b/lib/librte_ring/rte_ring_elem.h index 938b398fc..7d3933f15
> > > > > > 100644
> > > > > > --- a/lib/librte_ring/rte_ring_elem.h
> > > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > > @@ -1079,6 +1079,7 @@ rte_ring_dequeue_burst_elem(struct
> > > > > > rte_ring *r, void *obj_table,
> > > > > >
> > > > > >  #ifdef ALLOW_EXPERIMENTAL_API
> > > > > >  #include <rte_ring_peek.h>
> > > > > > +#include <rte_ring_peek_sg.h>
> > > > > >  #endif
> > > > > >
> > > > > >  #include <rte_ring.h>
> > > > > > diff --git a/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > b/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > new file mode 100644
> > > > > > index 000000000..97d5764a6
> > > > > > --- /dev/null
> > > > > > +++ b/lib/librte_ring/rte_ring_peek_sg.h
> > > > > > @@ -0,0 +1,552 @@
> > > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > > + *
> > > > > > + * Copyright (c) 2020 Arm
> > > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > > + * All rights reserved.
> > > > > > + * Derived from FreeBSD's bufring.h
> > > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _RTE_RING_PEEK_SG_H_
> > > > > > +#define _RTE_RING_PEEK_SG_H_
> > > > > > +
> > > > > > +/**
> > > > > > + * @file
> > > > > > + * @b EXPERIMENTAL: this API may change without prior notice
> > > > > > + * It is not recommended to include this file directly.
> > > > > > + * Please include <rte_ring_elem.h> instead.
> > > > > > + *
> > > > > > + * Ring Peek Scatter Gather APIs
> > > > >
> > > > > I am not fully convinced by the API name. To me, "scatter/gather"
> > > > > is associated to iovecs, like for instance in [1]. The wikipedia
> > > > > definition [2] may be closer though.
> > > > >
> > > > > [1]
> > > > >
> > https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGat
> > > > > he
> > > > > r.html
> > > > > [2]
> > > > > https://en.wikipedia.org/wiki/Gather-scatter_(vector_addressing)
> > > > The way I understand scatter-gather is, the data to be sent to
> > > > something (like a device) is coming from multiple sources. It would
> > > > require
> > > copying to put the data together to be contiguous. If the device
> > > supports scatter-gather, such copying is not required. The device can
> > collect data from multiple locations and make it contiguous.
> > > >
> > > > In the case I was looking at, one part of the data was coming from
> > > > the user of the API and another was generated by the API itself. If
> > > these two pieces of information have to be enqueued as a single object
> > > on the ring, I had to create an intermediate copy. But by exposing the ring
> > memory to the user, the intermediate copy is avoided. Hence I called it
> > scatter-gather.
> > > >
> > > > >
> > > > > What about "zero-copy"?
> > > > I think no-copy (nc for short) or user-copy (uc for short) would
> > > > convey the meaning better. These would indicate that the rte_ring
> > > > APIs are
> > > not copying the objects and it is left to the user to do the actual copy.


+1 for _ZC_ in naming.
_NC_ is probably ok too, but sounds really strange to me.

> > > >
> > > > >
> > > > > Also, the "peek" term looks also a bit confusing to me, but it
> > > > > existed before your patch. I understand it for dequeue, but not for
> > enqueue.
> > > > I kept 'peek' there because the API still offers the 'peek' API
> > > > capabilities. I am also not sure on what 'peek' means for enqueue
> > > > API. The
> > > enqueue 'peek' API was provided to be symmetric with dequeue peek API.
> > > >
> > > > >
> > > > > Or, what about replacing the existing experimental peek API by this one?
> > > > > They look quite similar to me.
> > > > I agree, scatter gather APIs provide the peek capability and the no-copy
> > benefits.
> > > > Konstantin, any opinions here?

I am still not very comfortable with API that allows users to access
elems locations directly. I do understand that it could be beneficial in some
special cases (you provided some good examples below), so I don't object to
have it as addon.
But I still think it shouldn't be the _only_ API. 

> >
> > Sorry, didn't have time yet, to look at this RFC properly.
> > Will try to do it next week, as I understand that's for 21.02 anyway?
> This is committed for 20.11. We should be able to get into RC2.

Sounds really tight..., but ok, let's see how it goes.
 
> >
> > > > >
> > > > > > + * Introduction of rte_ring with scatter gather serialized
> > > > > > + producer/consumer
> > > > > > + * (HTS sync mode) makes it possible to split public
> > > > > > + enqueue/dequeue API
> > > > > > + * into 3 phases:
> > > > > > + * - enqueue/dequeue start
> > > > > > + * - copy data to/from the ring
> > > > > > + * - enqueue/dequeue finish
> > > > > > + * Along with the advantages of the peek APIs, these APIs
> > > > > > + provide the ability
> > > > > > + * to avoid copying of the data to temporary area.
> > > > > > + *
> > > > > > + * Note that right now this new API is available only for two sync
> > modes:
> > > > > > + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST)
> > > > > > + * 2) Serialized Producer/Serialized Consumer
> > (RTE_RING_SYNC_MT_HTS).
> > > > > > + * It is a user responsibility to create/init ring with
> > > > > > + appropriate sync
> > > > > > + * modes selected.
> > > > > > + *
> > > > > > + * Example usage:
> > > > > > + * // read 1 elem from the ring:
> > > > >
> > > > > Comment should be "prepare enqueuing 32 objects"
> > > > >
> > > > > > + * n = rte_ring_enqueue_sg_bulk_start(ring, 32, &sgd, NULL);
> > > > > > + * if (n != 0) {
> > > > > > + *	//Copy objects in the ring
> > > > > > + *	memcpy (sgd->ptr1, obj, sgd->n1 * sizeof(uintptr_t));
> > > > > > + *	if (n != sgd->n1)
> > > > > > + *		//Second memcpy because of wrapround
> > > > > > + *		n2 = n - sgd->n1;
> > > > > > + *		memcpy (sgd->ptr2, obj[n2], n2 * sizeof(uintptr_t));
> > > > >
> > > > > Missing { }
> > > > >
> > > > > > + *	rte_ring_dequeue_sg_finish(ring, n);
> > > > >
> > > > > Should be enqueue
> > > > >
> > > > Thanks, will correct them.
> > > >
> > > > > > + * }
> > > > > > + *
> > > > > > + * Note that between _start_ and _finish_ none other thread can
> > > > > > + proceed
> > > > > > + * with enqueue(/dequeue) operation till _finish_ completes.
> > > > > > + */
> > > > > > +
> > > > > > +#ifdef __cplusplus
> > > > > > +extern "C" {
> > > > > > +#endif
> > > > > > +
> > > > > > +#include <rte_ring_peek_c11_mem.h>
> > > > > > +
> > > > > > +/* Rock that needs to be passed between reserve and commit APIs
> > > > > > +*/ struct rte_ring_sg_data {
> > > > > > +	/* Pointer to the first space in the ring */
> > > > > > +	void **ptr1;
> > > > > > +	/* Pointer to the second space in the ring if there is wrap-
> > around */
> > > > > > +	void **ptr2;
> > > > > > +	/* Number of elements in the first pointer. If this is equal to
> > > > > > +	 * the number of elements requested, then ptr2 is NULL.
> > > > > > +	 * Otherwise, subtracting n1 from number of elements
> > requested
> > > > > > +	 * will give the number of elements available at ptr2.
> > > > > > +	 */
> > > > > > +	unsigned int n1;
> > > > > > +};
> > > > >
> > > > > Would it be possible to simply return the offset instead of this structure?
> > > > > The wrap could be managed by a __rte_ring_enqueue_elems()
> > function.
> > > > Trying to use __rte_ring_enqueue_elems() will force temporary copy.
> > See below.
> > > >
> > > > >
> > > > > I mean something like this:
> > > > >
> > > > > 	uint32_t start;
> > > > > 	n = rte_ring_enqueue_sg_bulk_start(ring, 32, &start, NULL);
> > > > > 	if (n != 0) {
> > > > > 		/* Copy objects in the ring. */
> > > > > 		__rte_ring_enqueue_elems(ring, start, obj, sizeof(uintptr_t),
> > > > > n);
> > > > For ex: 'obj' here is temporary copy.
> > > >
> > > > > 		rte_ring_enqueue_sg_finish(ring, n);
> > > > > 	}
> > > > >
> > > > > It would require to slightly change __rte_ring_enqueue_elems() to
> > > > > support to be called with prod_head >= size, and wrap in that case.
> > > > >
> > > > The alternate solution I can think of requires 3 things 1) the base
> > > > address of the ring 2) Index to where to copy 3) the mask. With
> > > > these 3
> > > things one could write the code like below:
> > > > for (i = 0; i < n; i++) {
> > > > 	ring_addr[(index + i) & mask] = obj[i]; // ANDing with mask will take
> > care of wrap-around.
> > > > }
> > > >
> > > > However, I think this does not allow for passing the address in the
> > > > ring to another function/API to copy the data (It is possible, but
> > > > the user
> > > has to calculate the actual address, worry about the wrap-around, second
> > pointer etc).
> > > >
> > > > The current approach hides some details and provides flexibility to the
> > application to use the pointer the way it wants.
> > >
> > > I agree that doing the access + masking manually looks too complex.
> > >
> > > However I'm not sure to get why using __rte_ring_enqueue_elems()
> > would
> > > result in an additional copy. I suppose the object that you want to
> > > enqueue is already stored somewhere?
> I think this is the key. The object is not stored any where (yet), it is getting generated. When it is generated, it should get stored directly into
> the ring. I have provided some examples below.
> 
> > >
> > > For instance, let's say you have 10 objects to enqueue, located at
> > > different places:
> > >
> > > 	void *obj_0_to_3 = <place where objects 0 to 3 are stored>;
> > > 	void *obj_4_to_7 = ...;
> > > 	void *obj_8_to_9 = ...;
> > > 	uint32_t start;
> > > 	n = rte_ring_enqueue_sg_bulk_start(ring, 10, &start, NULL);
> > > 	if (n != 0) {
> > > 		__rte_ring_enqueue_elems(ring, start, obj_0_to_3,
> > > 			sizeof(uintptr_t), 4);
> > > 		__rte_ring_enqueue_elems(ring, start + 4, obj_4_to_7,
> > > 			sizeof(uintptr_t), 4);
> > > 		__rte_ring_enqueue_elems(ring, start + 8, obj_8_to_9,
> > > 			sizeof(uintptr_t), 2);
> > > 		rte_ring_enqueue_sg_finish(ring, 10);
> > > 	}
> > >
> >
> >
> > As I understand, It is not about different objects stored in different places, it
> > is about:
> > a) object is relatively big (16B+ ?)
> > b) You compose objects from values stored in few different places.
> >
> > Let say you have:
> > struct elem_obj {uint64_t a; uint32_t b, c;};
> >
> > And then you'd like to copy 'a' value from one location, 'b' from second, and
> > 'c' from third one.
> >
> > Konstantin
> >
> I think there are multiple use cases. Some I have in mind are:
> 
> 1)
> Code without this patch:
> 
> struct rte_mbuf *pkts_burst[32];
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> /* Pkt I/O core polls packets from the NIC, pkts_burst is the temporary store */
> nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, 32);
> /* Provide packets to the packet processing cores */
> rte_ring_enqueue_burst(ring, pkts_burst, 32, &free_space);
> 
> Code with the patch:
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> /* Reserve space on the ring */
> n = rte_ring_enqueue_sg_burst_start(ring, 32, &sgd, NULL);
> /* Pkt I/O core polls packets from the NIC */
> if (n == 32)
> 	nb_rx = rte_eth_rx_burst(portid, queueid, sgd->ptr1, 32);
> else
> 	nb_rx = rte_eth_rx_burst(portid, queueid, sgd->ptr1, sgd->n1);
> /* Provide packets to the packet processing cores */
> /* Temporary storage 'pkts_burst' is not required */
> rte_ring_enqueue_sg_finish(ring, nb_rx);
> 
> 
> 2) This is same/similar to what Konstantin mentioned
> 
> Code without this patch:
> 
> struct elem_obj {uint64_t a; uint32_t b, c;};
> struct elem_obj obj;
> 
> /* Create ring with sync type RTE_RING_SYNC_ST or RTE_RING_SYNC_MT_HTS */
> 
> obj.a = rte_get_a();
> obj.b = rte_get_b();
> obj.c = rte_get_c();
> /* obj is the temporary storage and results in memcpy in the following call */
> rte_ring_enqueue_elem(ring, sizeof(struct elem_obj), 1, &obj, NULL);
> 
> Code with the patch:
> 
> struct elem_obj *obj;
> /* Reserve space on the ring */
> n = rte_ring_enqueue_sg_bulk_elem_start(ring, sizeof(elem_obj), 1, &sgd, NULL);
> 
> obj = (struct elem_obj *)sgd->ptr1;
> obj.a = rte_get_a();
> obj.b = rte_get_b();
> obj.c = rte_get_c();
> /* obj is not a temporary storage */
> rte_ring_enqueue_sg_elem_finish(ring, n);

  reply	other threads:[~2020-10-12 17:06 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:39 [dpdk-dev] [RFC 0/1] lib/ring: add scatter gather and serial dequeue APIs Honnappa Nagarahalli
2020-02-24 20:39 ` [dpdk-dev] [RFC 1/1] " Honnappa Nagarahalli
2020-02-26 20:38   ` Ananyev, Konstantin
2020-02-26 23:21     ` Ananyev, Konstantin
2020-02-28  0:18     ` Honnappa Nagarahalli
2020-03-02 18:20       ` Ananyev, Konstantin
2020-03-04 23:21         ` Honnappa Nagarahalli
2020-03-05 18:26           ` Ananyev, Konstantin
2020-03-25 20:43             ` Honnappa Nagarahalli
2020-10-06 13:29 ` [dpdk-dev] [RFC v2 0/1] lib/ring: add scatter gather APIs Honnappa Nagarahalli
2020-10-06 13:29   ` [dpdk-dev] [RFC v2 1/1] " Honnappa Nagarahalli
2020-10-07  8:27     ` Olivier Matz
2020-10-08 20:44       ` Honnappa Nagarahalli
2020-10-08 20:47         ` Honnappa Nagarahalli
2020-10-09  7:33         ` Olivier Matz
2020-10-09  8:05           ` Ananyev, Konstantin
2020-10-09 22:54             ` Honnappa Nagarahalli
2020-10-12 17:06               ` Ananyev, Konstantin [this message]
2020-10-12 16:20     ` Ananyev, Konstantin
2020-10-12 22:31       ` Honnappa Nagarahalli
2020-10-13 11:38         ` Ananyev, Konstantin
2020-10-23  4:43 ` [dpdk-dev] [PATCH v3 0/5] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 1/5] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-23 13:24     ` Ananyev, Konstantin
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 2/5] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-23 13:59     ` Ananyev, Konstantin
2020-10-24 15:45       ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 3/5] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-23 14:22     ` Ananyev, Konstantin
2020-10-23 23:54       ` Honnappa Nagarahalli
2020-10-24  0:29         ` Stephen Hemminger
2020-10-24  0:31           ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 4/5] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-23 14:20     ` Ananyev, Konstantin
2020-10-23 22:47       ` Honnappa Nagarahalli
2020-10-23  4:43   ` [dpdk-dev] [PATCH v3 5/5] test/ring: add stress " Honnappa Nagarahalli
2020-10-23 14:11     ` Ananyev, Konstantin
2020-10-24 16:11 ` [dpdk-dev] [PATCH v4 0/8] lib/ring: add " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 1/8] " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 2/8] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 3/8] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 4/8] test/ring: add stress " Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 5/8] doc/ring: add zero copy peek APIs Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 6/8] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 7/8] test/ring: remove unnecessary braces Honnappa Nagarahalli
2020-10-24 16:11   ` [dpdk-dev] [PATCH v4 8/8] test/ring: user uintptr_t instead of unsigned long Honnappa Nagarahalli
2020-10-24 16:18   ` [dpdk-dev] [PATCH v4 0/8] lib/ring: add zero copy APIs Honnappa Nagarahalli
2020-10-25  7:16     ` David Marchand
2020-10-25  8:14       ` Thomas Monjalon
2020-10-25  5:45 ` [dpdk-dev] [PATCH v5 " Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 1/8] " Honnappa Nagarahalli
2020-10-27 14:11     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 2/8] test/ring: move common function to header file Honnappa Nagarahalli
2020-10-27 13:51     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 3/8] test/ring: add functional tests for zero copy APIs Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 4/8] test/ring: add stress " Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 5/8] doc/ring: add zero copy peek APIs Honnappa Nagarahalli
2020-10-27 14:11     ` Ananyev, Konstantin
2020-10-29 10:52     ` David Marchand
2020-10-29 11:28       ` Ananyev, Konstantin
2020-10-29 12:35         ` David Marchand
2020-10-29 17:29           ` Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 6/8] test/ring: fix the memory dump size Honnappa Nagarahalli
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 7/8] test/ring: remove unnecessary braces Honnappa Nagarahalli
2020-10-27 14:13     ` Ananyev, Konstantin
2020-10-25  5:45   ` [dpdk-dev] [PATCH v5 8/8] test/ring: user uintptr_t instead of unsigned long Honnappa Nagarahalli
2020-10-27 14:14     ` Ananyev, Konstantin
2020-10-29 13:57   ` [dpdk-dev] [PATCH v5 0/8] lib/ring: add zero copy APIs David Marchand
2020-10-29 22:11     ` Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR11MB3301191B27890F387D48D34B9A070@BYAPR11MB3301.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).