Re: [dpdk-dev] dmadev discussion summary

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: Jerin Jacob <jerinjacobk@gmail.com>,
	fengchengwen <fengchengwen@huawei.com>,
	Jerin Jacob <jerinj@marvell.com>,
	Nipun Gupta <nipun.gupta@nxp.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	Ferruh Yigit <ferruh.yigit@intel.com>, dpdk-dev <dev@dpdk.org>,
	Hemant Agrawal <hemant.agrawal@nxp.com>,
	Maxime Coquelin <maxime.coquelin@redhat.com>,
	Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	David Marchand <david.marchand@redhat.com>,
	Satananda Burla <sburla@marvell.com>,
	Prasun Kapoor <pkapoor@marvell.com>
Subject: Re: [dpdk-dev] dmadev discussion summary
Date: Fri, 2 Jul 2021 11:05:39 +0100	[thread overview]
Message-ID: <YN7k8wgre0aONbf8@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C618D1@smartserver.smartshare.dk>

On Fri, Jul 02, 2021 at 09:39:10AM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com] Sent:
> > Thursday, 1 July 2021 18.34
> > 
> > On Thu, Jul 01, 2021 at 08:31:00PM +0530, Jerin Jacob wrote:
> > > On Sat, Jun 26, 2021 at 9:29 AM fengchengwen
> > <fengchengwen@huawei.com> wrote:
> > > >
> > > > Hi, all I analyzed the current DPAM DMA driver and drew this
> > > > summary in
> > conjunction
> > > > with the previous discussion, and this will as a basis for the V2
> > implementation.
> > > >   Feedback is welcome, thanks
> > >
> > > Thanks for the write-up.
> > >
> > > >
> > <snip>
> > > >      [queue_setup]: create one virt-queue, has following main
> > parameters:
> > > >          HW-queue-index: the HW-queue index used nb_desc: the
> > > >          number of HW descriptors opaque: driver's specific info
> > > >          Note1: this API return virt-queue index which will used in
> > later API.
> > > >                 If user want create multiple virt-queue one the
> > same HW-queue,
> > > >                 they could achieved by call queue_setup with the
> > same
> > > >                 HW-queue-index.  Note2: I think it's hard to define
> > > >                 queue_setup config
> > paramter, and
> > > >                 also this is control API, so I think it's OK to use
> > opaque
> > > >                 pointer to implement it.  [dma_copy/memset/sg]: all
> > > >                 has vq_id input parameter.  Note: I notice dpaa
> > > >                 can't support single and sg in one
> > virt-queue, and
> > > >                I think it's maybe software implement policy other
> > than HW
> > > >                restriction because virt-queue could share the same
> > HW-queue.
> > > >       Here we use vq_id to tackle different scenario, like local-
> > to-local/
> > > >       local-to-host and etc.
> > >
> > > IMO, The index representation has an additional overhead as one needs
> > > to translate it to memory pointer. I prefer to avoid by having object
> > > handle and use _lookup() API get to make it work in multi-process
> > > cases to avoid the additional indirection. Like
> > mempool object.
> > 
> > While it doesn't matter to me that much which is chosen, an index seems
> > cleaner to me and more consistent with other device types in DPDK. The
> > objects pointed to by the memory pointers you refer too can't just be
> > stored in an internal array in your driver and accessed directly by
> > index, saving that layer of redirection?
> 
> The rte_eth_rx/tx_burst() functions use the parameter "uint16_t port_id"
> to identify the device.
> 
> rte_eth_rx/tx_burst() needs to look up the rte_eth_dev pointer in the
> rte_eth_devices array, which could be avoided by using "struct
> rte_eth_dev *dev" instead of "uint16_t port_id". It would be faster.
> 
Actually, it looks up the structure directly since they are all in an
array, there is no ethdev pointer array, so when passing the structure
pointer to the individual functions there is just a little bit of
arithmetic to covert index to pointer.

In the case of RX/TX fastpath functions, yes, there is an additional lookup
because the individual queue on each ethdev needs to have its pointer
looked up.

> However, in the case of rte_eth_rx/tx_burst(), the port_id is rapidly changing at runtime, and often comes from the mbuf or similar, where it is preferable storing a 16 bit value rather than a 64 bit pointer.
> 
> I think that the choice of DMA device (and virt-queue) for DMA fast path operations will be much more constant than the choice of port_id (and queue_id) in Ethdev fast path operations. If you agree with this, we should use "struct rte_dma_dev *dev" rather than "uint16_t dev_id" as parameter to the DMA fast path functions.
> 
> Going even further, why do we need to pass both dev_id and virt_queue_id to the DMA fast path functions, instead of just passing a pointer to the virt-queue? That pointer could be returned by the rte_dma_queue_setup() function. And it could be opaque or a well defined structure. Or even more exotic: It could be a structure where the first part is common and well defined, and the rest of the structure is driver specific.
>
That is an interesting possibility, and I actually quite like the idea. I'm
not sure about having the well-defined common part, so I'd suggest
initially using a queue pointer typedefed to "void *". The "queue_setup"
(and virt-queue/channel setup) functions can return queue pointers which
would then be used as first parameter to the dataplane functions. In the
case of the copy engines on our Intel platforms which don't have this
virt-queue concept that pointer can just point to the dmadev private
structure directly, and make the APIs a little more efficient.
 
> > 
> > > >   5) And the dmadev public data-plane API (just prototype):
> > > >      dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len,
> > flags)
> > > >        -- flags: used as an extended parameter, it could be
> > uint32_t
> > > >      dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len,
> > flags)
> > > >      dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len,
> > flags)
> > > >        -- sg: struct dma_scatterlist array
> > > >      uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t
> > *cookie,
> > > >                                    uint16_t nb_cpls, bool
> > *has_error)
> > > >        -- nb_cpls: indicate max process operations number
> > > >        -- has_error: indicate if there is an error
> > > >        -- return value: the number of successful completed
> > operations.
> > > >        -- example:
> > > >           1) If there are already 32 completed ops, and 4th is
> > error, and
> > > >              nb_cpls is 32, then the ret will be 3(because 1/2/3th
> > is OK), and
> > > >              has_error will be true.
> > > >           2) If there are already 32 completed ops, and all
> > successful
> > > >              completed, then the ret will be min(32, nb_cpls), and
> > has_error
> > > >              will be false.
> > > >           3) If there are already 32 completed ops, and all failed
> > completed,
> > > >              then the ret will be 0, and has_error will be true.
> > >
> > > +1. IMO, it is better to call ring_idx instead of a cookie. To
> > enforce
> > > that it the ring index.
> > >
> > +1, I like that name better too.
> 
> If it is "ring index" then it probably should be an uintXX_t type, and thus the functions cannot return <0 to indicate error.
> 

The ring index is defined to be a uint16_t type, allowing the enqueue
functions to return int, i.e. negative on error, or uint16_t index on
success.

> By "ring index", do you actually mean "descriptor index" (for the DMA engine's descriptors)? Does the application need to process this value as anything but an opaque handle? Wouldn't an opaque pointer type provide performance? It would also allow using NULL to indicate error.
> 
> Do you expect applications to store the cookie in structures that are instantiated many times (e.g. like the port_id is stored in the mbuf structure), so there is an advantage to using an uint16_t instead of a pointer?
> 

This idea of using indexes rather than pointer handle comes from my
experience with the ioat driver. The initial prototype versions of that
driver I did (longer ago than I like to remember) I had it work as you
describe, manage a single pointer for each job. However, when I tried
integrating it into a simple application for copying packets, I discovered
that for many cases we need two opaque handles - one for the source buffer,
one for the destination buffer. That is what was implemented in the
upstreamed ioat driver.

However, no sooner was that added, than a request came from those looking
to integrate copy offload into vhost to remove the use of those handles as
their design did not require them. Other integrations in that area have
used other types of handles between 8 and 16 bytes.

The other consideration in this is also the format of the metadata. If for
each transaction we have a handle which points to a structure of metadata
for that transaction it may be sub-optimal. Taking again the simple case of
a source mbuf and destination mbuf, if you store a pointer to both of these
in a struct and the pointer to that struct as the opaque handle you pass to
the driver, when a job is completed and you get back the relevant handle(s)
the completed elements are not contiguous in the format you want them. The
source buffers may all need to be freed en-mass and the destination buffers
may need to be enqueued to another core or sent to NIC TX. In both these
cases you essentially need to regather the source and destination buffers
into flag arrays to call mempool_put_bulk() or tx_burst() on. Therefore,
a better arrangement is to have source and dest pointers stored in parallel
arrays, so on return of a set of jobs no gathering of handles is needed.

So, in summary, using indexes rather than opaque handles allows the end
user/application to choose the data format most relevant for data
management for the job - and ensures a minimal memory footprint. It also
has the advantages of:
* reducing the number of parameters to the functions, meaning fewer
  registers need to be set up for each function call (or that we don't
  overflow the register count per-function and start needing to write
  parameters to stack)
* We save on stores done by the completion call, as well as memory
  allocation needed for the return status parameter. If 32 jobs
  enqueued are completed, we can just return that 32, along with the
  info that the last job-id done was N. We save on 32x8-byte stores
  for returning opaque pointers for those jobs.

All of these should improve performance by reducing our offload cost.

> > 
> > > >      uint16_t rte_dmadev_completed_status(dev_id, vq_id,
> > dma_cookie_t *cookie,
> > > >                                           uint16_t nb_status,
> > uint32_t *status)
> > > >        -- return value: the number of failed completed operations.
> > >
> > > See above. Here we are assuming it is an index otherwise we need to
> > > pass an array
> > > cookies.
> > >
> > > >      And here I agree with Morten: we should design API which
> > adapts to DPDK
> > > >      service scenarios. So we don't support some sound-cards DMA,
> > and 2D memory
> > > >      copy which mainly used in video scenarios.
> > > >   6) The dma_cookie_t is signed int type, when <0 it mean error,
> > it's
> > > >      monotonically increasing base on HW-queue (other than virt-
> > queue). The
> > > >      driver needs to make sure this because the damdev framework
> > don't manage
> > > >      the dma_cookie's creation.
> > >
> > > +1 and see above.
> > >
> > > >   7) Because data-plane APIs are not thread-safe, and user could
> > determine
> > > >      virt-queue to HW-queue's map (at the queue-setup stage), so it
> > is user's
> > > >      duty to ensure thread-safe.
> > >
> > > +1. But I am not sure how easy for the fast-path application to have
> > this logic,
> > > Instead, I think, it is better to tell the capa for queue by driver
> > > and in channel configuration,
> > > the application can request for requirement (Is multiple producers
> > enq
> > > to the same HW queue or not).
> > > Based on the request, the implementation can pick the correct
> > function
> > > pointer for enq.(lock vs lockless version if HW does not support
> > > lockless)
> > >
> > 
> > Non-thread safety is the norm in DPDK for all other queue resources,
> > however, haivng multi-thread safety as a capability sounds reasonable.
> > >
> > > >   8) One example: vq_id = rte_dmadev_queue_setup(dev,
> > > >   config.{HW-queue-index=x, opaque}); if (vq_id < 0) { // create
> > > >   virt-queue failed return; } // submit memcpy task cookit =
> > > >   rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags); if (cookie <
> > 0)
> > > >   { // submit failed return; } // get complete task ret =
> > > >   rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error); if
> > > >   (!has_error && ret == 1) { // the memcpy successful complete }
> > >
> > > +1
> > >
> > > >   9) As octeontx2_dma support sg-list which has many valid buffers
> > in
> > > >   dpi_dma_buf_ptr_s, it could call the rte_dmadev_memcpy_sg API.
> > >
> > > +1
> > >
> > > >   10) As ioat, it could delcare support one HW-queue at
> > dev_configure
> > > >   stage, and only support create one virt-queue.  11) As
> > dpaa2_qdma, I
> > > >   think it could migrate to new framework, but still wait for
> > > >   dpaa2_qdma guys feedback.  12) About the prototype src/dst
> > parameters
> > > >   of rte_dmadev_memcpy API, we have two candidates which are iova
> > and
> > > >   void *, how about introduce dma_addr_t type which could be va or
> > iova
> > > >   ?
> > >
> > > I think, conversion looks ugly, better to have void * and share the
> > > constraints of void * as limitation/capability using flag. So that
> > driver
> > > can update it.
> > >
> > I'm ok with either rte_iova_t or void * as parameter type. Let's not
> > define a new type though,
> 
> +1
> 
> > and +1 to just using capabilities to define what kinds
> > of addressing are supported by the device instances.
> 
> +1
> 
> > 
> > /Bruce
> 
> And regarding naming: Consider rte_dma_xx() instead of rte_dmadev_xx().

Definite +1.

next prev parent reply	other threads:[~2021-07-02 10:05 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 13:22 [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Chengwen Feng
2021-06-15 16:38 ` Bruce Richardson
2021-06-16  7:09   ` Morten Brørup
2021-06-16 10:17     ` fengchengwen
2021-06-16 12:09       ` Morten Brørup
2021-06-16 13:06       ` Bruce Richardson
2021-06-16 14:37       ` Jerin Jacob
2021-06-17  9:15         ` Bruce Richardson
2021-06-18  5:52           ` Jerin Jacob
2021-06-18  9:41             ` fengchengwen
2021-06-22 17:25               ` Jerin Jacob
2021-06-23  3:30                 ` fengchengwen
2021-06-23  7:21                   ` Jerin Jacob
2021-06-23  9:37                     ` Bruce Richardson
2021-06-23 11:40                       ` Jerin Jacob
2021-06-23 14:19                         ` Bruce Richardson
2021-06-24  6:49                           ` Jerin Jacob
2021-06-23  9:41                 ` Bruce Richardson
2021-06-23 10:10                   ` Morten Brørup
2021-06-23 11:46                   ` Jerin Jacob
2021-06-23 14:22                     ` Bruce Richardson
2021-06-18  9:55             ` Bruce Richardson
2021-06-22 17:31               ` Jerin Jacob
2021-06-22 19:17                 ` Bruce Richardson
2021-06-23  7:00                   ` Jerin Jacob
2021-06-16  9:41   ` fengchengwen
2021-06-16 17:31     ` Bruce Richardson
2021-06-16 18:08       ` Jerin Jacob
2021-06-16 19:13         ` Bruce Richardson
2021-06-17  7:42           ` Jerin Jacob
2021-06-17  8:00             ` Bruce Richardson
2021-06-18  5:16               ` Jerin Jacob
2021-06-18 10:03                 ` Bruce Richardson
2021-06-22 17:36                   ` Jerin Jacob
2021-06-17  9:48       ` fengchengwen
2021-06-17 11:02         ` Bruce Richardson
2021-06-17 14:18           ` Bruce Richardson
2021-06-18  8:52             ` fengchengwen
2021-06-18  9:30               ` Bruce Richardson
2021-06-22 17:51               ` Jerin Jacob
2021-06-23  3:50                 ` fengchengwen
2021-06-23 11:00                   ` Jerin Jacob
2021-06-23 14:56                   ` Bruce Richardson
2021-06-24 12:19                     ` fengchengwen
2021-06-26  3:59                       ` [dpdk-dev] dmadev discussion summary fengchengwen
2021-06-28 10:00                         ` Bruce Richardson
2021-06-28 11:14                           ` Ananyev, Konstantin
2021-06-28 12:53                             ` Bruce Richardson
2021-07-02 13:31                           ` fengchengwen
2021-07-01 15:01                         ` Jerin Jacob
2021-07-01 16:33                           ` Bruce Richardson
2021-07-02  7:39                             ` Morten Brørup
2021-07-02 10:05                               ` Bruce Richardson [this message]
2021-07-02 13:45                           ` fengchengwen
2021-07-02 14:57                             ` Morten Brørup
2021-07-03  0:32                               ` fengchengwen
2021-07-03  8:53                                 ` Morten Brørup
2021-07-03  9:08                                   ` Jerin Jacob
2021-07-03 12:24                                     ` Morten Brørup
2021-07-04  7:43                                       ` Jerin Jacob
2021-07-05 10:28                                         ` Morten Brørup
2021-07-06  7:11                                           ` fengchengwen
2021-07-03  9:45                                   ` fengchengwen
2021-07-03 12:00                                     ` Morten Brørup
2021-07-04  7:34                                       ` Jerin Jacob
2021-07-02  7:07                         ` Liang Ma
2021-07-02 13:59                           ` fengchengwen
2021-06-24  7:03                   ` [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Jerin Jacob
2021-06-24  7:59                     ` Morten Brørup
2021-06-24  8:05                       ` Jerin Jacob
2021-06-23  5:34       ` Hu, Jiayu
2021-06-23 11:07         ` Jerin Jacob
2021-06-16  2:17 ` Wang, Haiyue
2021-06-16  8:04   ` Bruce Richardson
2021-06-16  8:16     ` Wang, Haiyue
2021-06-16 12:14 ` David Marchand
2021-06-16 13:11   ` Bruce Richardson
2021-06-16 16:48     ` Honnappa Nagarahalli
2021-06-16 19:10       ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN7k8wgre0aONbf8@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mb@smartsharesystems.com \
    --cc=nipun.gupta@nxp.com \
    --cc=pkapoor@marvell.com \
    --cc=sburla@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).