From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1F75AA04BC; Fri, 9 Oct 2020 09:33:49 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8A6B21C12C; Fri, 9 Oct 2020 09:33:46 +0200 (CEST) Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by dpdk.org (Postfix) with ESMTP id 3F8261C129 for ; Fri, 9 Oct 2020 09:33:44 +0200 (CEST) Received: by mail-wr1-f51.google.com with SMTP id w5so9159559wrp.8 for ; Fri, 09 Oct 2020 00:33:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=PMudCVP16qVhF8EUQvfS94Z3OBnB0gkrl5Vjq5LbU28=; b=SHroD/YOeRmedc73Rl3eyP+XCEkfU4m3CwJg2ln9WT65US2StDnLSFn+F9MzuGlumA bS9XgK1YW61ggLB698uKl4FAs2SybVzY86VCQQ0vL+qhD/+03hpaOh++VH+hCf8993ek /zot6tLBWdyVrC+s4fvKg4PvzVVTQwKPtA8RBll19MsIQID1lpYD+vJQE23xA7aqsV1v QRRolyMVXHTA62hHyVq2C4/Cbud2RLRnAcAkT98yln+xFVQe85Un6KfPTfjA1ljYYQ36 d7IkYMC13i1ViqfhI7lz6cuET5JbyUMchd7eczEe/huRJcGddhz/b1/3MNbR0SVxIUdZ ATfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=PMudCVP16qVhF8EUQvfS94Z3OBnB0gkrl5Vjq5LbU28=; b=IXBvd65uh+40Vf9VZ6sLVZEthAywu2rt2pIIFwz54XLCnwJgYr8JZJhrdQSocBGI/G yZ1OLi9JZfBWzQTDKmFUdIoPDYTasUw8pjSqWP6FV8/7G9yuEVoHN8r9Oh1WlK9iksV6 zFbXBTlxxmYTd4ltyRX8MXu5ZHeIpjjg0HXpwALsOfuPVWtWscPtqBwKQaal7hdLq7VN F5h+xgYhF7NrqFmA8PKTyAunIVt+hD4w+zWkKvOpjAkjpCHV6mt0EK4y0QmQ59G/Ejnq wy/O4RXJewk7EpEDpLlzPMBwlO9USQzpzTjG7HnKpoZ4Rbgk9vkEvpAcZ1NVBxr6+pQJ eqBw== X-Gm-Message-State: AOAM5338i6Wfyh4263s0gIE/A+SjwhCCWgXncSper+viO3mdwgLQccA1 g8TFZw+a1XTjVk1y/bP6573AyA== X-Google-Smtp-Source: ABdhPJw110l9KhFswS0yvqxn3tvKhi9ZWGhVvbogXXTAR562WQ3Qflsisi4QfNa8aHMW/+JWbkBGIA== X-Received: by 2002:adf:cd0c:: with SMTP id w12mr13111278wrm.305.1602228823822; Fri, 09 Oct 2020 00:33:43 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id l3sm4940756wmg.32.2020.10.09.00.33.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Oct 2020 00:33:43 -0700 (PDT) Date: Fri, 9 Oct 2020 09:33:42 +0200 From: Olivier Matz To: Honnappa Nagarahalli Cc: "dev@dpdk.org" , "konstantin.ananyev@intel.com" , "david.marchand@redhat.com" , nd Message-ID: <20201009073342.GZ21395@platinum> References: <20200224203931.21256-1-honnappa.nagarahalli@arm.com> <20201006132905.46205-1-honnappa.nagarahalli@arm.com> <20201006132905.46205-2-honnappa.nagarahalli@arm.com> <20201007082739.GL21395@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [RFC v2 1/1] lib/ring: add scatter gather APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Oct 08, 2020 at 08:44:13PM +0000, Honnappa Nagarahalli wrote: > > > > > > Hi Honnappa, > > > > From a quick walkthrough, I have some questions/comments, please see > > below. > Hi Olivier, appreciate your input. > > > > > On Tue, Oct 06, 2020 at 08:29:05AM -0500, Honnappa Nagarahalli wrote: > > > Add scatter gather APIs to avoid intermediate memcpy. Use cases that > > > involve copying large amount of data to/from the ring can benefit from > > > these APIs. > > > > > > Signed-off-by: Honnappa Nagarahalli > > > --- > > > lib/librte_ring/meson.build | 3 +- > > > lib/librte_ring/rte_ring_elem.h | 1 + > > > lib/librte_ring/rte_ring_peek_sg.h | 552 > > > +++++++++++++++++++++++++++++ > > > 3 files changed, 555 insertions(+), 1 deletion(-) create mode 100644 > > > lib/librte_ring/rte_ring_peek_sg.h > > > > > > diff --git a/lib/librte_ring/meson.build b/lib/librte_ring/meson.build > > > index 31c0b4649..377694713 100644 > > > --- a/lib/librte_ring/meson.build > > > +++ b/lib/librte_ring/meson.build > > > @@ -12,4 +12,5 @@ headers = files('rte_ring.h', > > > 'rte_ring_peek.h', > > > 'rte_ring_peek_c11_mem.h', > > > 'rte_ring_rts.h', > > > - 'rte_ring_rts_c11_mem.h') > > > + 'rte_ring_rts_c11_mem.h', > > > + 'rte_ring_peek_sg.h') > > > diff --git a/lib/librte_ring/rte_ring_elem.h > > > b/lib/librte_ring/rte_ring_elem.h index 938b398fc..7d3933f15 100644 > > > --- a/lib/librte_ring/rte_ring_elem.h > > > +++ b/lib/librte_ring/rte_ring_elem.h > > > @@ -1079,6 +1079,7 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, > > > void *obj_table, > > > > > > #ifdef ALLOW_EXPERIMENTAL_API > > > #include > > > +#include > > > #endif > > > > > > #include > > > diff --git a/lib/librte_ring/rte_ring_peek_sg.h > > > b/lib/librte_ring/rte_ring_peek_sg.h > > > new file mode 100644 > > > index 000000000..97d5764a6 > > > --- /dev/null > > > +++ b/lib/librte_ring/rte_ring_peek_sg.h > > > @@ -0,0 +1,552 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * > > > + * Copyright (c) 2020 Arm > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org > > > + * All rights reserved. > > > + * Derived from FreeBSD's bufring.h > > > + * Used as BSD-3 Licensed with permission from Kip Macy. > > > + */ > > > + > > > +#ifndef _RTE_RING_PEEK_SG_H_ > > > +#define _RTE_RING_PEEK_SG_H_ > > > + > > > +/** > > > + * @file > > > + * @b EXPERIMENTAL: this API may change without prior notice > > > + * It is not recommended to include this file directly. > > > + * Please include instead. > > > + * > > > + * Ring Peek Scatter Gather APIs > > > > I am not fully convinced by the API name. To me, "scatter/gather" is > > associated to iovecs, like for instance in [1]. The wikipedia definition [2] may > > be closer though. > > > > [1] > > https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGathe > > r.html > > [2] https://en.wikipedia.org/wiki/Gather-scatter_(vector_addressing) > The way I understand scatter-gather is, the data to be sent to something (like a device) is coming from multiple sources. It would require copying to put the data together to be contiguous. If the device supports scatter-gather, such copying is not required. The device can collect data from multiple locations and make it contiguous. > > In the case I was looking at, one part of the data was coming from the user of the API and another was generated by the API itself. If these two pieces of information have to be enqueued as a single object on the ring, I had to create an intermediate copy. But by exposing the ring memory to the user, the intermediate copy is avoided. Hence I called it scatter-gather. > > > > > What about "zero-copy"? > I think no-copy (nc for short) or user-copy (uc for short) would convey the meaning better. These would indicate that the rte_ring APIs are not copying the objects and it is left to the user to do the actual copy. > > > > > Also, the "peek" term looks also a bit confusing to me, but it existed before > > your patch. I understand it for dequeue, but not for enqueue. > I kept 'peek' there because the API still offers the 'peek' API capabilities. I am also not sure on what 'peek' means for enqueue API. The enqueue 'peek' API was provided to be symmetric with dequeue peek API. > > > > > Or, what about replacing the existing experimental peek API by this one? > > They look quite similar to me. > I agree, scatter gather APIs provide the peek capability and the no-copy benefits. > Konstantin, any opinions here? > > > > > > + * Introduction of rte_ring with scatter gather serialized > > > + producer/consumer > > > + * (HTS sync mode) makes it possible to split public enqueue/dequeue > > > + API > > > + * into 3 phases: > > > + * - enqueue/dequeue start > > > + * - copy data to/from the ring > > > + * - enqueue/dequeue finish > > > + * Along with the advantages of the peek APIs, these APIs provide the > > > + ability > > > + * to avoid copying of the data to temporary area. > > > + * > > > + * Note that right now this new API is available only for two sync modes: > > > + * 1) Single Producer/Single Consumer (RTE_RING_SYNC_ST) > > > + * 2) Serialized Producer/Serialized Consumer (RTE_RING_SYNC_MT_HTS). > > > + * It is a user responsibility to create/init ring with appropriate > > > + sync > > > + * modes selected. > > > + * > > > + * Example usage: > > > + * // read 1 elem from the ring: > > > > Comment should be "prepare enqueuing 32 objects" > > > > > + * n = rte_ring_enqueue_sg_bulk_start(ring, 32, &sgd, NULL); > > > + * if (n != 0) { > > > + * //Copy objects in the ring > > > + * memcpy (sgd->ptr1, obj, sgd->n1 * sizeof(uintptr_t)); > > > + * if (n != sgd->n1) > > > + * //Second memcpy because of wrapround > > > + * n2 = n - sgd->n1; > > > + * memcpy (sgd->ptr2, obj[n2], n2 * sizeof(uintptr_t)); > > > > Missing { } > > > > > + * rte_ring_dequeue_sg_finish(ring, n); > > > > Should be enqueue > > > Thanks, will correct them. > > > > + * } > > > + * > > > + * Note that between _start_ and _finish_ none other thread can > > > + proceed > > > + * with enqueue(/dequeue) operation till _finish_ completes. > > > + */ > > > + > > > +#ifdef __cplusplus > > > +extern "C" { > > > +#endif > > > + > > > +#include > > > + > > > +/* Rock that needs to be passed between reserve and commit APIs */ > > > +struct rte_ring_sg_data { > > > + /* Pointer to the first space in the ring */ > > > + void **ptr1; > > > + /* Pointer to the second space in the ring if there is wrap-around */ > > > + void **ptr2; > > > + /* Number of elements in the first pointer. If this is equal to > > > + * the number of elements requested, then ptr2 is NULL. > > > + * Otherwise, subtracting n1 from number of elements requested > > > + * will give the number of elements available at ptr2. > > > + */ > > > + unsigned int n1; > > > +}; > > > > Would it be possible to simply return the offset instead of this structure? > > The wrap could be managed by a __rte_ring_enqueue_elems() function. > Trying to use __rte_ring_enqueue_elems() will force temporary copy. See below. > > > > > I mean something like this: > > > > uint32_t start; > > n = rte_ring_enqueue_sg_bulk_start(ring, 32, &start, NULL); > > if (n != 0) { > > /* Copy objects in the ring. */ > > __rte_ring_enqueue_elems(ring, start, obj, sizeof(uintptr_t), > > n); > For ex: 'obj' here is temporary copy. > > > rte_ring_enqueue_sg_finish(ring, n); > > } > > > > It would require to slightly change __rte_ring_enqueue_elems() to support > > to be called with prod_head >= size, and wrap in that case. > > > The alternate solution I can think of requires 3 things 1) the base address of the ring 2) Index to where to copy 3) the mask. With these 3 things one could write the code like below: > for (i = 0; i < n; i++) { > ring_addr[(index + i) & mask] = obj[i]; // ANDing with mask will take care of wrap-around. > } > > However, I think this does not allow for passing the address in the ring to another function/API to copy the data (It is possible, but the user has to calculate the actual address, worry about the wrap-around, second pointer etc). > > The current approach hides some details and provides flexibility to the application to use the pointer the way it wants. I agree that doing the access + masking manually looks too complex. However I'm not sure to get why using __rte_ring_enqueue_elems() would result in an additional copy. I suppose the object that you want to enqueue is already stored somewhere? For instance, let's say you have 10 objects to enqueue, located at different places: void *obj_0_to_3 = ; void *obj_4_to_7 = ...; void *obj_8_to_9 = ...; uint32_t start; n = rte_ring_enqueue_sg_bulk_start(ring, 10, &start, NULL); if (n != 0) { __rte_ring_enqueue_elems(ring, start, obj_0_to_3, sizeof(uintptr_t), 4); __rte_ring_enqueue_elems(ring, start + 4, obj_4_to_7, sizeof(uintptr_t), 4); __rte_ring_enqueue_elems(ring, start + 8, obj_8_to_9, sizeof(uintptr_t), 2); rte_ring_enqueue_sg_finish(ring, 10); } Thanks, Olivier