DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Jerin Jacob <jerinjacobk@gmail.com>
Cc: fengchengwen <fengchengwen@huawei.com>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Thomas Monjalon" <thomas@monjalon.net>,
	"Ferruh Yigit" <ferruh.yigit@intel.com>, dpdk-dev <dev@dpdk.org>,
	"Nipun Gupta" <nipun.gupta@nxp.com>,
	"Hemant Agrawal" <hemant.agrawal@nxp.com>,
	"Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Honnappa Nagarahalli" <honnappa.nagarahalli@arm.com>,
	"Jerin Jacob" <jerinj@marvell.com>,
	"David Marchand" <david.marchand@redhat.com>,
	"Satananda Burla" <sburla@marvell.com>,
	"Prasun Kapoor" <pkapoor@marvell.com>
Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
Date: Wed, 23 Jun 2021 10:41:19 +0100	[thread overview]
Message-ID: <YNMBv31unTE4B1E/@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <CALBAE1NgLz2bV53X3DzjG+_tjN4B0M4J83ykeXgh2KTyP0szTw@mail.gmail.com>

On Tue, Jun 22, 2021 at 10:55:24PM +0530, Jerin Jacob wrote:
> On Fri, Jun 18, 2021 at 3:11 PM fengchengwen <fengchengwen@huawei.com> wrote:
> >
> > On 2021/6/18 13:52, Jerin Jacob wrote:
> > > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > >>
> > >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> > >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen <fengchengwen@huawei.com> wrote:
> > >>>>
> > >>>> On 2021/6/16 15:09, Morten Brørup wrote:
> > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > >>>>>> Sent: Tuesday, 15 June 2021 18.39
> > >>>>>>
> > >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > >>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA
> > >>>>>>> device.
> > >>>>>>>
> > >>>>>>> The APIs of dmadev library exposes some generic operations which can
> > >>>>>>> enable configuration and I/O with the DMA devices.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >>>>>>> ---
> > >>>>>> Thanks for sending this.
> > >>>>>>
> > >>>>>> Of most interest to me right now are the key data-plane APIs. While we
> > >>>>>> are
> > >>>>>> still in the prototyping phase, below is a draft of what we are
> > >>>>>> thinking
> > >>>>>> for the key enqueue/perform_ops/completed_ops APIs.
> > >>>>>>
> > >>>>>> Some key differences I note in below vs your original RFC:
> > >>>>>> * Use of void pointers rather than iova addresses. While using iova's
> > >>>>>> makes
> > >>>>>>   sense in the general case when using hardware, in that it can work
> > >>>>>> with
> > >>>>>>   both physical addresses and virtual addresses, if we change the APIs
> > >>>>>> to use
> > >>>>>>   void pointers instead it will still work for DPDK in VA mode, while
> > >>>>>> at the
> > >>>>>>   same time allow use of software fallbacks in error cases, and also a
> > >>>>>> stub
> > >>>>>>   driver than uses memcpy in the background. Finally, using iova's
> > >>>>>> makes the
> > >>>>>>   APIs a lot more awkward to use with anything but mbufs or similar
> > >>>>>> buffers
> > >>>>>>   where we already have a pre-computed physical address.
> > >>>>>> * Use of id values rather than user-provided handles. Allowing the
> > >>>>>> user/app
> > >>>>>>   to manage the amount of data stored per operation is a better
> > >>>>>> solution, I
> > >>>>>>   feel than proscribing a certain about of in-driver tracking. Some
> > >>>>>> apps may
> > >>>>>>   not care about anything other than a job being completed, while other
> > >>>>>> apps
> > >>>>>>   may have significant metadata to be tracked. Taking the user-context
> > >>>>>>   handles out of the API also makes the driver code simpler.
> > >>>>>> * I've kept a single combined API for completions, which differs from
> > >>>>>> the
> > >>>>>>   separate error handling completion API you propose. I need to give
> > >>>>>> the
> > >>>>>>   two function approach a bit of thought, but likely both could work.
> > >>>>>> If we
> > >>>>>>   (likely) never expect failed ops, then the specifics of error
> > >>>>>> handling
> > >>>>>>   should not matter that much.
> > >>>>>>
> > >>>>>> For the rest, the control / setup APIs are likely to be rather
> > >>>>>> uncontroversial, I suspect. However, I think that rather than xstats
> > >>>>>> APIs,
> > >>>>>> the library should first provide a set of standardized stats like
> > >>>>>> ethdev
> > >>>>>> does. If driver-specific stats are needed, we can add xstats later to
> > >>>>>> the
> > >>>>>> API.
> > >>>>>>
> > >>>>>> Appreciate your further thoughts on this, thanks.
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> /Bruce
> > >>>>>
> > >>>>> I generally agree with Bruce's points above.
> > >>>>>
> > >>>>> I would like to share a couple of ideas for further discussion:
> > >>>
> > >>>
> > >>> I believe some of the other requirements and comments for generic DMA will be
> > >>>
> > >>> 1) Support for the _channel_, Each channel may have different
> > >>> capabilities and functionalities.
> > >>> Typical cases are, each channel have separate source and destination
> > >>> devices like
> > >>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> > >>> EP to PCIe EP.
> > >>> So we need some notion of the channel in the specification.
> > >>>
> > >>
> > >> Can you share a bit more detail on what constitutes a channel in this case?
> > >> Is it equivalent to a device queue (which we are flattening to individual
> > >> devices in this API), or to a specific configuration on a queue?
> > >
> > > It not a queue. It is one of the attributes for transfer.
> > > I.e in the same queue, for a given transfer it can specify the
> > > different "source" and "destination" device.
> > > Like CPU to Sound card, CPU to network card etc.
> > >
> > >
> > >>
> > >>> 2) I assume current data plane APIs are not thread-safe. Is it right?
> > >>>
> > >> Yes.
> > >>
> > >>>
> > >>> 3) Cookie scheme outlined earlier looks good to me. Instead of having
> > >>> generic dequeue() API
> > >>>
> > >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
> > >>> void * dst, unsigned int length);
> > >>> to two stage API like, Where one will be used in fastpath and other
> > >>> one will use used in slowpath.
> > >>>
> > >>> - slowpath API will for take channel and take other attributes for transfer
> > >>>
> > >>> Example syantx will be:
> > >>>
> > >>> struct rte_dmadev_desc {
> > >>>            channel id;
> > >>>            ops ; // copy, xor, fill etc
> > >>>           other arguments specific to dma transfer // it can be set
> > >>> based on capability.
> > >>>
> > >>> };
> > >>>
> > >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id,  struct
> > >>> rte_dmadev_desc *dec);
> > >>>
> > >>> - Fastpath takes arguments that need to change per transfer along with
> > >>> slow-path handle.
> > >>>
> > >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned
> > >>> int length,  rte_dmadev_desc_t desc)
> > >>>
> > >>> This will help to driver to
> > >>> -Former API form the device-specific descriptors in slow path  for a
> > >>> given channel and fixed attributes per transfer
> > >>> -Later API blend "variable" arguments such as src, dest address with
> > >>> slow-path created descriptors
> > >>>
> > >>
> > >> This seems like an API for a context-aware device, where the channel is the
> > >> config data/context that is preserved across operations - is that correct?
> > >> At least from the Intel DMA accelerators side, we have no concept of this
> > >> context, and each operation is completely self-described. The location or
> > >> type of memory for copies is irrelevant, you just pass the src/dst
> > >> addresses to reference.
> > >
> > > it is not context-aware device. Each HW JOB is self-described.
> > > You can view it different attributes of transfer.
> > >
> > >
> > >>
> > >>> The above will give better performance and is the best trade-off c
> > >>> between performance and per transfer variables.
> > >>
> > >> We may need to have different APIs for context-aware and context-unaware
> > >> processing, with which to use determined by the capabilities discovery.
> > >> Given that for these DMA devices the offload cost is critical, more so than
> > >> any other dev class I've looked at before, I'd like to avoid having APIs
> > >> with extra parameters than need to be passed about since that just adds
> > >> extra CPU cycles to the offload.
> > >
> > > If driver does not support additional attributes and/or the
> > > application does not need it, rte_dmadev_desc_t can be NULL.
> > > So that it won't have any cost in the datapath. I think, we can go to
> > > different API
> > > cases if we can not abstract problems without performance impact.
> > > Otherwise, it will be too much
> > > pain for applications.
> >
> > Yes, currently we plan to use different API for different case, e.g.
> >   rte_dmadev_memcpy()  -- deal with local to local memcopy
> >   rte_dmadev_memset()  -- deal with fill with local memory with pattern
> > maybe:
> >   rte_dmadev_imm_data()  --deal with copy very little data
> >   rte_dmadev_p2pcopy()   --deal with peer-to-peer copy of diffenet PCIE addr
> >
> > These API capabilities will be reflected in the device capability set so that
> > application could know by standard API.
> 
> 
> There will be a lot of combination of that it will be like M x N cross
> base case, It won't scale.
> 

What are the various cases that are so significantly different? Using the
examples above, the "imm_data" and "p2p_copy" operations are still copy
ops, and the fact of it being a small copy or a p2p one can be expressed
just using flags? [Also, you are not likely to want to offload a small
copy, are you?]

> >
> > >
> > > Just to understand, I think, we need to HW capabilities and how to
> > > have a common API.
> > > I assume HW will have some HW JOB descriptors which will be filled in
> > > SW and submitted to HW.
> > > In our HW,  Job descriptor has the following main elements
> > >
> > > - Channel   // We don't expect the application to change per transfer
> > > - Source address - It can be scatter-gather too - Will be changed per transfer
> > > - Destination address - It can be scatter-gather too - Will be changed
> > > per transfer
> > > - Transfer Length - - It can be scatter-gather too - Will be changed
> > > per transfer
> > > - IOVA address where HW post Job completion status PER Job descriptor
> > > - Will be changed per transfer
> > > - Another sideband information related to channel  // We don't expect
> > > the application to change per transfer
> > > - As an option, Job completion can be posted as an event to
> > > rte_event_queue  too // We don't expect the application to change per
> > > transfer
> >
> > The 'option' field looks like a software interface field, but not HW descriptor.
> 
> It is in HW descriptor.
> 
> >
> > >
> > > @Richardson, Bruce @fengchengwen @Hemant Agrawal
> > >
> > > Could you share the options for your HW descriptors  which you are
> > > planning to expose through API like above so that we can easily
> > > converge on fastpath API
> > >
> >
> > Kunpeng HW descriptor is self-describing, and don't need refer context info.
> >
> > Maybe the fields which was fix with some transfer type could setup by driver, and
> > don't expose to application.
> 
> Yes. I agree.I think, that reason why I though to have
> rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW
> specific descriptors

What are all these attributes? Do you have a reference link for them?


  parent reply	other threads:[~2021-06-23  9:41 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 13:22 Chengwen Feng
2021-06-15 16:38 ` Bruce Richardson
2021-06-16  7:09   ` Morten Brørup
2021-06-16 10:17     ` fengchengwen
2021-06-16 12:09       ` Morten Brørup
2021-06-16 13:06       ` Bruce Richardson
2021-06-16 14:37       ` Jerin Jacob
2021-06-17  9:15         ` Bruce Richardson
2021-06-18  5:52           ` Jerin Jacob
2021-06-18  9:41             ` fengchengwen
2021-06-22 17:25               ` Jerin Jacob
2021-06-23  3:30                 ` fengchengwen
2021-06-23  7:21                   ` Jerin Jacob
2021-06-23  9:37                     ` Bruce Richardson
2021-06-23 11:40                       ` Jerin Jacob
2021-06-23 14:19                         ` Bruce Richardson
2021-06-24  6:49                           ` Jerin Jacob
2021-06-23  9:41                 ` Bruce Richardson [this message]
2021-06-23 10:10                   ` Morten Brørup
2021-06-23 11:46                   ` Jerin Jacob
2021-06-23 14:22                     ` Bruce Richardson
2021-06-18  9:55             ` Bruce Richardson
2021-06-22 17:31               ` Jerin Jacob
2021-06-22 19:17                 ` Bruce Richardson
2021-06-23  7:00                   ` Jerin Jacob
2021-06-16  9:41   ` fengchengwen
2021-06-16 17:31     ` Bruce Richardson
2021-06-16 18:08       ` Jerin Jacob
2021-06-16 19:13         ` Bruce Richardson
2021-06-17  7:42           ` Jerin Jacob
2021-06-17  8:00             ` Bruce Richardson
2021-06-18  5:16               ` Jerin Jacob
2021-06-18 10:03                 ` Bruce Richardson
2021-06-22 17:36                   ` Jerin Jacob
2021-06-17  9:48       ` fengchengwen
2021-06-17 11:02         ` Bruce Richardson
2021-06-17 14:18           ` Bruce Richardson
2021-06-18  8:52             ` fengchengwen
2021-06-18  9:30               ` Bruce Richardson
2021-06-22 17:51               ` Jerin Jacob
2021-06-23  3:50                 ` fengchengwen
2021-06-23 11:00                   ` Jerin Jacob
2021-06-23 14:56                   ` Bruce Richardson
2021-06-24 12:19                     ` fengchengwen
2021-06-26  3:59                       ` [dpdk-dev] dmadev discussion summary fengchengwen
2021-06-28 10:00                         ` Bruce Richardson
2021-06-28 11:14                           ` Ananyev, Konstantin
2021-06-28 12:53                             ` Bruce Richardson
2021-07-02 13:31                           ` fengchengwen
2021-07-01 15:01                         ` Jerin Jacob
2021-07-01 16:33                           ` Bruce Richardson
2021-07-02  7:39                             ` Morten Brørup
2021-07-02 10:05                               ` Bruce Richardson
2021-07-02 13:45                           ` fengchengwen
2021-07-02 14:57                             ` Morten Brørup
2021-07-03  0:32                               ` fengchengwen
2021-07-03  8:53                                 ` Morten Brørup
2021-07-03  9:08                                   ` Jerin Jacob
2021-07-03 12:24                                     ` Morten Brørup
2021-07-04  7:43                                       ` Jerin Jacob
2021-07-05 10:28                                         ` Morten Brørup
2021-07-06  7:11                                           ` fengchengwen
2021-07-03  9:45                                   ` fengchengwen
2021-07-03 12:00                                     ` Morten Brørup
2021-07-04  7:34                                       ` Jerin Jacob
2021-07-02  7:07                         ` Liang Ma
2021-07-02 13:59                           ` fengchengwen
2021-06-24  7:03                   ` [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Jerin Jacob
2021-06-24  7:59                     ` Morten Brørup
2021-06-24  8:05                       ` Jerin Jacob
2021-06-23  5:34       ` Hu, Jiayu
2021-06-23 11:07         ` Jerin Jacob
2021-06-16  2:17 ` Wang, Haiyue
2021-06-16  8:04   ` Bruce Richardson
2021-06-16  8:16     ` Wang, Haiyue
2021-06-16 12:14 ` David Marchand
2021-06-16 13:11   ` Bruce Richardson
2021-06-16 16:48     ` Honnappa Nagarahalli
2021-06-16 19:10       ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNMBv31unTE4B1E/@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=ferruh.yigit@intel.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mb@smartsharesystems.com \
    --cc=nipun.gupta@nxp.com \
    --cc=pkapoor@marvell.com \
    --cc=sburla@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).