DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: fengchengwen <fengchengwen@huawei.com>,
	"Jerin Jacob" <jerinjacobk@gmail.com>,
	"Jerin Jacob" <jerinj@marvell.com>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Nipun Gupta" <nipun.gupta@nxp.com>,
	"Thomas Monjalon" <thomas@monjalon.net>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>, dpdk-dev <dev@dpdk.org>,
	"Hemant Agrawal" <hemant.agrawal@nxp.com>,
	"Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Honnappa Nagarahalli" <honnappa.nagarahalli@arm.com>,
	"David Marchand" <david.marchand@redhat.com>,
	"Satananda Burla" <sburla@marvell.com>,
	"Prasun Kapoor" <pkapoor@marvell.com>
Subject: Re: [dpdk-dev] dmadev discussion summary
Date: Mon, 28 Jun 2021 13:53:29 +0100	[thread overview]
Message-ID: <YNnGSR8mCGioKg5t@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <DM6PR11MB449103B60676A0EF3B1C05489A039@DM6PR11MB4491.namprd11.prod.outlook.com>

On Mon, Jun 28, 2021 at 12:14:31PM +0100, Ananyev, Konstantin wrote:
> 
> Hi everyone,
> 
> > On Sat, Jun 26, 2021 at 11:59:49AM +0800, fengchengwen wrote:
> > > Hi, all
> > >   I analyzed the current DPAM DMA driver and drew this summary in conjunction
> > > with the previous discussion, and this will as a basis for the V2 implementation.
> > >   Feedback is welcome, thanks
> > >
> > Fantastic review and summary, many thanks for the work. Some comments
> > inline in API part below, but nothing too major, I hope.
> >
> > /Bruce
> >
> > <snip>
> > >
> > > Summary:
> > >   1) The dpaa2/octeontx2/Kunpeng are all ARM soc, there may acts as endpoint of
> > >      x86 host (e.g. smart NIC), multiple memory transfer requirements may exist,
> > >      e.g. local-to-host/local-to-host..., from the point of view of API design,
> > >      I think we should adopt a similar 'channel' or 'virt-queue' concept.
> > >   2) Whether to create a separate dmadev for each HW-queue? We previously
> > >      discussed this, and due HW-queue could indepent management (like
> > >      Kunpeng_dma and Intel DSA), we prefer create a separate dmadev for each
> > >      HW-queue before. But I'm not sure if that's the case with dpaa. I think
> > >      that can be left to the specific driver, no restriction is imposed on the
> > >      framework API layer.
> > >   3) I think we could setup following abstraction at dmadev device:
> > >       ------------    ------------
> > >       |virt-queue|    |virt-queue|
> > >       ------------    ------------
> > >              \           /
> > >               \         /
> > >                \       /
> > >              ------------     ------------
> > >              | HW-queue |     | HW-queue |
> > >              ------------     ------------
> > >                     \            /
> > >                      \          /
> > >                       \        /
> > >                         dmadev
> > >   4) The driver's ops design (here we only list key points):
> > >      [dev_info_get]: mainly return the number of HW-queues
> > >      [dev_configure]: nothing important
> > >      [queue_setup]: create one virt-queue, has following main parameters:
> > >          HW-queue-index: the HW-queue index used
> > >          nb_desc: the number of HW descriptors
> > >          opaque: driver's specific info
> > >          Note1: this API return virt-queue index which will used in later API.
> > >                 If user want create multiple virt-queue one the same HW-queue,
> > >                 they could achieved by call queue_setup with the same
> > >                 HW-queue-index.
> > >          Note2: I think it's hard to define queue_setup config paramter, and
> > >                 also this is control API, so I think it's OK to use opaque
> > >                 pointer to implement it.
> > I'm not sure opaque pointer will work in practice, so I think we should try
> > and standardize the parameters as much as possible. Since it's a control
> > plane API, using a struct with a superset of parameters may be workable.
> > Let's start with a minimum set and build up from there.
> >
> > >       [dma_copy/memset/sg]: all has vq_id input parameter.
> > >          Note: I notice dpaa can't support single and sg in one virt-queue, and
> > >                I think it's maybe software implement policy other than HW
> > >                restriction because virt-queue could share the same HW-queue.
> > Presumably for queues which support sq, the single-enqueue APIs can use a
> > single sg list internally?
> >
> > >       Here we use vq_id to tackle different scenario, like local-to-local/
> > >       local-to-host and etc.
> > >   5) And the dmadev public data-plane API (just prototype):
> > >      dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len, flags)
> > >        -- flags: used as an extended parameter, it could be uint32_t
> >
> > Suggest uint64_t rather than uint32_t to ensure we have expansion room?
> > Otherwise +1
> >
> > >      dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags)
> > +1
> >
> > >      dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len, flags)
> > >        -- sg: struct dma_scatterlist array
> > I don't think our drivers will be directly implementing this API, but so
> > long as SG support is listed as a capability flag I'm fine with this as an
> > API. [We can't fudge it as a bunch of single copies, because that would
> > cause us to have multiple cookies rather than one]
> >
> > >      uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t *cookie,
> > >                                    uint16_t nb_cpls, bool *has_error)
> > >        -- nb_cpls: indicate max process operations number
> > >        -- has_error: indicate if there is an error
> > >        -- return value: the number of successful completed operations.
> > >        -- example:
> > >           1) If there are already 32 completed ops, and 4th is error, and
> > >              nb_cpls is 32, then the ret will be 3(because 1/2/3th is OK), and
> > >              has_error will be true.
> > >           2) If there are already 32 completed ops, and all successful
> > >              completed, then the ret will be min(32, nb_cpls), and has_error
> > >              will be false.
> > >           3) If there are already 32 completed ops, and all failed completed,
> > >              then the ret will be 0, and has_error will be true.
> > +1 for this
> >
> > >      uint16_t rte_dmadev_completed_status(dev_id, vq_id, dma_cookie_t *cookie,
> > >                                           uint16_t nb_status, uint32_t *status)
> > >        -- return value: the number of failed completed operations.
> > >      And here I agree with Morten: we should design API which adapts to DPDK
> > >      service scenarios. So we don't support some sound-cards DMA, and 2D memory
> > >      copy which mainly used in video scenarios.
> >
> > Can I suggest a few adjustments here to the semantics of this API. In
> > future we may have operations which return a status value, e.g. our
> > hardware can support ops like compare equal/not-equal, which means that
> > this API would be meaningful even in case of success. Therefore, I suggest
> > that the return value be changed to allow success also to be returned in
> > the array, and the return value is not the number of failed ops, but the
> > number of ops for which status is being returned.
> >
> > Also for consideration: when trying to implement this in a prototype in our
> > driver, it would be easier if we relax the restriction on the "completed"
> > API so that we can flag has_error when an error is detected rather than
> > guaranteeing to return all elements right up to the error. For example, if
> > we have a burst of packets and one is problematic, it may be easier to flag
> > the error at the start of the burst and then have a few successful entries
> > at the start of the completed_status array. [Separate from this] We should
> > also have a "has_error" or "more_errors" flag on this API too, to indicate
> > when the user can switch back to using the regular "completed" API. This
> > means that apps switch from one API to the other when "has_error" is true,
> > and only switch back when it becomes false again.
> >
> > >   6) The dma_cookie_t is signed int type, when <0 it mean error, it's
> > >      monotonically increasing base on HW-queue (other than virt-queue). The
> > >      driver needs to make sure this because the damdev framework don't manage
> > >      the dma_cookie's creation.
> > +1 to this.
> > I think we also should specify that the cookie is guaranteed to wrap at a
> > power of 2 value (UINT16_MAX??). This allows it to be used as an
> > index into a circular buffer just by masking.
> >
> > >   7) Because data-plane APIs are not thread-safe, and user could determine
> > >      virt-queue to HW-queue's map (at the queue-setup stage), so it is user's
> > >      duty to ensure thread-safe.
> > >   8) One example:
> > >      vq_id = rte_dmadev_queue_setup(dev, config.{HW-queue-index=x, opaque});
> > >      if (vq_id < 0) {
> > >         // create virt-queue failed
> > >         return;
> > >      }
> > >      // submit memcpy task
> > >      cookit = rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags);
> > >      if (cookie < 0) {
> > >         // submit failed
> > >         return;
> > >      }
> > >      // get complete task
> > >      ret = rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error);
> > >      if (!has_error && ret == 1) {
> > >         // the memcpy successful complete
> > >      }
> > +1
> 
> I have two questions on the proposed API:
> 1. Would it make sense to split submission API into two stages:
>     a) reserve and prepare
>     b) actual submit.
> Similar to what DPDK ioat/idxd PMDs have right now:
> /* reserve and prepare */
>  for (i=0;i<num;i++) {cookie = rte_dmadev_memcpy(...);}
> /* submit to HW */
> rte_dmadev_issue_pending(...);
> 
> For those PMDs that prefer to do actual submission to HW at rte_dmadev_memcpy(),
> issue_pending()  will be just a NOP.
> 
> As I can it will make API more flexible and will help PMD developers to choose
> most suitable approach for their HW.
> As a side notice - linux DMA framework uses such approach too.
> 

Thanks for pointing out the omission Konstantin. I understood that to be
part of the original API proposals since we weren't doing burst enqueues,
but it would be good to see it explicitly called out here.

> 2) I wonder what would be MT-safe requirements for submit/completion API?
> I.E. should all PMD support the case when one thread does rte_dmadev_memcpy(..)
> while another one does  rte_dmadev_completed(...) on the same queue simultaneously?
> Or should such combination be ST only?
> Or might be new capability flag per device?
> 

I suggest we just add a capability flag for it into our library. It was
something we looked to support with ioat in the past and may do so again in
the future.

/Bruce

  reply	other threads:[~2021-06-28 12:53 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 13:22 [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Chengwen Feng
2021-06-15 16:38 ` Bruce Richardson
2021-06-16  7:09   ` Morten Brørup
2021-06-16 10:17     ` fengchengwen
2021-06-16 12:09       ` Morten Brørup
2021-06-16 13:06       ` Bruce Richardson
2021-06-16 14:37       ` Jerin Jacob
2021-06-17  9:15         ` Bruce Richardson
2021-06-18  5:52           ` Jerin Jacob
2021-06-18  9:41             ` fengchengwen
2021-06-22 17:25               ` Jerin Jacob
2021-06-23  3:30                 ` fengchengwen
2021-06-23  7:21                   ` Jerin Jacob
2021-06-23  9:37                     ` Bruce Richardson
2021-06-23 11:40                       ` Jerin Jacob
2021-06-23 14:19                         ` Bruce Richardson
2021-06-24  6:49                           ` Jerin Jacob
2021-06-23  9:41                 ` Bruce Richardson
2021-06-23 10:10                   ` Morten Brørup
2021-06-23 11:46                   ` Jerin Jacob
2021-06-23 14:22                     ` Bruce Richardson
2021-06-18  9:55             ` Bruce Richardson
2021-06-22 17:31               ` Jerin Jacob
2021-06-22 19:17                 ` Bruce Richardson
2021-06-23  7:00                   ` Jerin Jacob
2021-06-16  9:41   ` fengchengwen
2021-06-16 17:31     ` Bruce Richardson
2021-06-16 18:08       ` Jerin Jacob
2021-06-16 19:13         ` Bruce Richardson
2021-06-17  7:42           ` Jerin Jacob
2021-06-17  8:00             ` Bruce Richardson
2021-06-18  5:16               ` Jerin Jacob
2021-06-18 10:03                 ` Bruce Richardson
2021-06-22 17:36                   ` Jerin Jacob
2021-06-17  9:48       ` fengchengwen
2021-06-17 11:02         ` Bruce Richardson
2021-06-17 14:18           ` Bruce Richardson
2021-06-18  8:52             ` fengchengwen
2021-06-18  9:30               ` Bruce Richardson
2021-06-22 17:51               ` Jerin Jacob
2021-06-23  3:50                 ` fengchengwen
2021-06-23 11:00                   ` Jerin Jacob
2021-06-23 14:56                   ` Bruce Richardson
2021-06-24 12:19                     ` fengchengwen
2021-06-26  3:59                       ` [dpdk-dev] dmadev discussion summary fengchengwen
2021-06-28 10:00                         ` Bruce Richardson
2021-06-28 11:14                           ` Ananyev, Konstantin
2021-06-28 12:53                             ` Bruce Richardson [this message]
2021-07-02 13:31                           ` fengchengwen
2021-07-01 15:01                         ` Jerin Jacob
2021-07-01 16:33                           ` Bruce Richardson
2021-07-02  7:39                             ` Morten Brørup
2021-07-02 10:05                               ` Bruce Richardson
2021-07-02 13:45                           ` fengchengwen
2021-07-02 14:57                             ` Morten Brørup
2021-07-03  0:32                               ` fengchengwen
2021-07-03  8:53                                 ` Morten Brørup
2021-07-03  9:08                                   ` Jerin Jacob
2021-07-03 12:24                                     ` Morten Brørup
2021-07-04  7:43                                       ` Jerin Jacob
2021-07-05 10:28                                         ` Morten Brørup
2021-07-06  7:11                                           ` fengchengwen
2021-07-03  9:45                                   ` fengchengwen
2021-07-03 12:00                                     ` Morten Brørup
2021-07-04  7:34                                       ` Jerin Jacob
2021-07-02  7:07                         ` Liang Ma
2021-07-02 13:59                           ` fengchengwen
2021-06-24  7:03                   ` [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Jerin Jacob
2021-06-24  7:59                     ` Morten Brørup
2021-06-24  8:05                       ` Jerin Jacob
2021-06-23  5:34       ` Hu, Jiayu
2021-06-23 11:07         ` Jerin Jacob
2021-06-16  2:17 ` Wang, Haiyue
2021-06-16  8:04   ` Bruce Richardson
2021-06-16  8:16     ` Wang, Haiyue
2021-06-16 12:14 ` David Marchand
2021-06-16 13:11   ` Bruce Richardson
2021-06-16 16:48     ` Honnappa Nagarahalli
2021-06-16 19:10       ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNnGSR8mCGioKg5t@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mb@smartsharesystems.com \
    --cc=nipun.gupta@nxp.com \
    --cc=pkapoor@marvell.com \
    --cc=sburla@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).