From: Bruce Richardson <bruce.richardson@intel.com>
To: Jerin Jacob <jerinjacobk@gmail.com>
Cc: fengchengwen <fengchengwen@huawei.com>,
"Morten Brørup" <mb@smartsharesystems.com>,
"Thomas Monjalon" <thomas@monjalon.net>,
"Ferruh Yigit" <ferruh.yigit@intel.com>, dpdk-dev <dev@dpdk.org>,
"Nipun Gupta" <nipun.gupta@nxp.com>,
"Hemant Agrawal" <hemant.agrawal@nxp.com>,
"Maxime Coquelin" <maxime.coquelin@redhat.com>,
"Honnappa Nagarahalli" <honnappa.nagarahalli@arm.com>,
"Jerin Jacob" <jerinj@marvell.com>,
"David Marchand" <david.marchand@redhat.com>,
"Satananda Burla" <sburla@marvell.com>,
"Prasun Kapoor" <pkapoor@marvell.com>
Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
Date: Fri, 18 Jun 2021 10:55:23 +0100 [thread overview]
Message-ID: <YMxtiwvy9WeCVGjA@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <CALBAE1M=JXAKt6VxZ3uOwyM3c04D9Kw556WtMeBHYhQx3ywJWA@mail.gmail.com>
On Fri, Jun 18, 2021 at 11:22:28AM +0530, Jerin Jacob wrote:
> On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> > > On Wed, Jun 16, 2021 at 3:47 PM fengchengwen <fengchengwen@huawei.com> wrote:
> > > >
> > > > On 2021/6/16 15:09, Morten Brørup wrote:
> > > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > > > >> Sent: Tuesday, 15 June 2021 18.39
> > > > >>
> > > > >> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> > > > >>> This patch introduces 'dmadevice' which is a generic type of DMA
> > > > >>> device.
> > > > >>>
> > > > >>> The APIs of dmadev library exposes some generic operations which can
> > > > >>> enable configuration and I/O with the DMA devices.
> > > > >>>
> > > > >>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > > > >>> ---
> > > > >> Thanks for sending this.
> > > > >>
> > > > >> Of most interest to me right now are the key data-plane APIs. While we
> > > > >> are
> > > > >> still in the prototyping phase, below is a draft of what we are
> > > > >> thinking
> > > > >> for the key enqueue/perform_ops/completed_ops APIs.
> > > > >>
> > > > >> Some key differences I note in below vs your original RFC:
> > > > >> * Use of void pointers rather than iova addresses. While using iova's
> > > > >> makes
> > > > >> sense in the general case when using hardware, in that it can work
> > > > >> with
> > > > >> both physical addresses and virtual addresses, if we change the APIs
> > > > >> to use
> > > > >> void pointers instead it will still work for DPDK in VA mode, while
> > > > >> at the
> > > > >> same time allow use of software fallbacks in error cases, and also a
> > > > >> stub
> > > > >> driver than uses memcpy in the background. Finally, using iova's
> > > > >> makes the
> > > > >> APIs a lot more awkward to use with anything but mbufs or similar
> > > > >> buffers
> > > > >> where we already have a pre-computed physical address.
> > > > >> * Use of id values rather than user-provided handles. Allowing the
> > > > >> user/app
> > > > >> to manage the amount of data stored per operation is a better
> > > > >> solution, I
> > > > >> feel than proscribing a certain about of in-driver tracking. Some
> > > > >> apps may
> > > > >> not care about anything other than a job being completed, while other
> > > > >> apps
> > > > >> may have significant metadata to be tracked. Taking the user-context
> > > > >> handles out of the API also makes the driver code simpler.
> > > > >> * I've kept a single combined API for completions, which differs from
> > > > >> the
> > > > >> separate error handling completion API you propose. I need to give
> > > > >> the
> > > > >> two function approach a bit of thought, but likely both could work.
> > > > >> If we
> > > > >> (likely) never expect failed ops, then the specifics of error
> > > > >> handling
> > > > >> should not matter that much.
> > > > >>
> > > > >> For the rest, the control / setup APIs are likely to be rather
> > > > >> uncontroversial, I suspect. However, I think that rather than xstats
> > > > >> APIs,
> > > > >> the library should first provide a set of standardized stats like
> > > > >> ethdev
> > > > >> does. If driver-specific stats are needed, we can add xstats later to
> > > > >> the
> > > > >> API.
> > > > >>
> > > > >> Appreciate your further thoughts on this, thanks.
> > > > >>
> > > > >> Regards,
> > > > >> /Bruce
> > > > >
> > > > > I generally agree with Bruce's points above.
> > > > >
> > > > > I would like to share a couple of ideas for further discussion:
> > >
> > >
> > > I believe some of the other requirements and comments for generic DMA will be
> > >
> > > 1) Support for the _channel_, Each channel may have different
> > > capabilities and functionalities.
> > > Typical cases are, each channel have separate source and destination
> > > devices like
> > > DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> > > EP to PCIe EP.
> > > So we need some notion of the channel in the specification.
> > >
> >
> > Can you share a bit more detail on what constitutes a channel in this case?
> > Is it equivalent to a device queue (which we are flattening to individual
> > devices in this API), or to a specific configuration on a queue?
>
> It not a queue. It is one of the attributes for transfer.
> I.e in the same queue, for a given transfer it can specify the
> different "source" and "destination" device.
> Like CPU to Sound card, CPU to network card etc.
>
Ok. Thanks for clarifying. Do you think it's best given as a
device-specific parameter to the various functions, and NULL for hardware
that doesn't need it?
>
> >
> > > 2) I assume current data plane APIs are not thread-safe. Is it right?
> > >
> > Yes.
> >
> > >
> > > 3) Cookie scheme outlined earlier looks good to me. Instead of having
> > > generic dequeue() API
> > >
> > > 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
> > > void * dst, unsigned int length);
> > > to two stage API like, Where one will be used in fastpath and other
> > > one will use used in slowpath.
> > >
> > > - slowpath API will for take channel and take other attributes for transfer
> > >
> > > Example syantx will be:
> > >
> > > struct rte_dmadev_desc {
> > > channel id;
> > > ops ; // copy, xor, fill etc
> > > other arguments specific to dma transfer // it can be set
> > > based on capability.
> > >
> > > };
> > >
> > > rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct
> > > rte_dmadev_desc *dec);
> > >
> > > - Fastpath takes arguments that need to change per transfer along with
> > > slow-path handle.
> > >
> > > rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned
> > > int length, rte_dmadev_desc_t desc)
> > >
> > > This will help to driver to
> > > -Former API form the device-specific descriptors in slow path for a
> > > given channel and fixed attributes per transfer
> > > -Later API blend "variable" arguments such as src, dest address with
> > > slow-path created descriptors
> > >
> >
> > This seems like an API for a context-aware device, where the channel is the
> > config data/context that is preserved across operations - is that correct?
> > At least from the Intel DMA accelerators side, we have no concept of this
> > context, and each operation is completely self-described. The location or
> > type of memory for copies is irrelevant, you just pass the src/dst
> > addresses to reference.
>
> it is not context-aware device. Each HW JOB is self-described.
> You can view it different attributes of transfer.
>
>
> >
> > > The above will give better performance and is the best trade-off c
> > > between performance and per transfer variables.
> >
> > We may need to have different APIs for context-aware and context-unaware
> > processing, with which to use determined by the capabilities discovery.
> > Given that for these DMA devices the offload cost is critical, more so than
> > any other dev class I've looked at before, I'd like to avoid having APIs
> > with extra parameters than need to be passed about since that just adds
> > extra CPU cycles to the offload.
>
> If driver does not support additional attributes and/or the
> application does not need it, rte_dmadev_desc_t can be NULL.
> So that it won't have any cost in the datapath. I think, we can go to
> different API
> cases if we can not abstract problems without performance impact.
> Otherwise, it will be too much
> pain for applications.
Ok. Having one extra parameter ignored by some drivers should not be that
big of a deal. [With all these, we'll only really know for sure when
implemented and offload cost measured]
>
> Just to understand, I think, we need to HW capabilities and how to
> have a common API.
> I assume HW will have some HW JOB descriptors which will be filled in
> SW and submitted to HW.
> In our HW, Job descriptor has the following main elements
>
> - Channel // We don't expect the application to change per transfer
> - Source address - It can be scatter-gather too - Will be changed per transfer
> - Destination address - It can be scatter-gather too - Will be changed
> per transfer
> - Transfer Length - - It can be scatter-gather too - Will be changed
> per transfer
> - IOVA address where HW post Job completion status PER Job descriptor
> - Will be changed per transfer
> - Another sideband information related to channel // We don't expect
> the application to change per transfer
> - As an option, Job completion can be posted as an event to
> rte_event_queue too // We don't expect the application to change per
> transfer
>
> @Richardson, Bruce @fengchengwen @Hemant Agrawal
>
> Could you share the options for your HW descriptors which you are
> planning to expose through API like above so that we can easily
> converge on fastpath API
>
Taking the case of a simple copy op, the parameters we need are:
* src
* dst
* length
Depending on the specific hardware there will also be passed in the
descriptor a completion address, but we plan for these cases to always have
the completions written back to a set location so that we have essentially
ring-writeback, as with the hardware which doesn't explicitly have a
separate completion address. Beyond that, I believe the only descriptor
fields we will use are just the flags field indicating the op type etc.
/Bruce
next prev parent reply other threads:[~2021-06-18 9:55 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 13:22 Chengwen Feng
2021-06-15 16:38 ` Bruce Richardson
2021-06-16 7:09 ` Morten Brørup
2021-06-16 10:17 ` fengchengwen
2021-06-16 12:09 ` Morten Brørup
2021-06-16 13:06 ` Bruce Richardson
2021-06-16 14:37 ` Jerin Jacob
2021-06-17 9:15 ` Bruce Richardson
2021-06-18 5:52 ` Jerin Jacob
2021-06-18 9:41 ` fengchengwen
2021-06-22 17:25 ` Jerin Jacob
2021-06-23 3:30 ` fengchengwen
2021-06-23 7:21 ` Jerin Jacob
2021-06-23 9:37 ` Bruce Richardson
2021-06-23 11:40 ` Jerin Jacob
2021-06-23 14:19 ` Bruce Richardson
2021-06-24 6:49 ` Jerin Jacob
2021-06-23 9:41 ` Bruce Richardson
2021-06-23 10:10 ` Morten Brørup
2021-06-23 11:46 ` Jerin Jacob
2021-06-23 14:22 ` Bruce Richardson
2021-06-18 9:55 ` Bruce Richardson [this message]
2021-06-22 17:31 ` Jerin Jacob
2021-06-22 19:17 ` Bruce Richardson
2021-06-23 7:00 ` Jerin Jacob
2021-06-16 9:41 ` fengchengwen
2021-06-16 17:31 ` Bruce Richardson
2021-06-16 18:08 ` Jerin Jacob
2021-06-16 19:13 ` Bruce Richardson
2021-06-17 7:42 ` Jerin Jacob
2021-06-17 8:00 ` Bruce Richardson
2021-06-18 5:16 ` Jerin Jacob
2021-06-18 10:03 ` Bruce Richardson
2021-06-22 17:36 ` Jerin Jacob
2021-06-17 9:48 ` fengchengwen
2021-06-17 11:02 ` Bruce Richardson
2021-06-17 14:18 ` Bruce Richardson
2021-06-18 8:52 ` fengchengwen
2021-06-18 9:30 ` Bruce Richardson
2021-06-22 17:51 ` Jerin Jacob
2021-06-23 3:50 ` fengchengwen
2021-06-23 11:00 ` Jerin Jacob
2021-06-23 14:56 ` Bruce Richardson
2021-06-24 12:19 ` fengchengwen
2021-06-26 3:59 ` [dpdk-dev] dmadev discussion summary fengchengwen
2021-06-28 10:00 ` Bruce Richardson
2021-06-28 11:14 ` Ananyev, Konstantin
2021-06-28 12:53 ` Bruce Richardson
2021-07-02 13:31 ` fengchengwen
2021-07-01 15:01 ` Jerin Jacob
2021-07-01 16:33 ` Bruce Richardson
2021-07-02 7:39 ` Morten Brørup
2021-07-02 10:05 ` Bruce Richardson
2021-07-02 13:45 ` fengchengwen
2021-07-02 14:57 ` Morten Brørup
2021-07-03 0:32 ` fengchengwen
2021-07-03 8:53 ` Morten Brørup
2021-07-03 9:08 ` Jerin Jacob
2021-07-03 12:24 ` Morten Brørup
2021-07-04 7:43 ` Jerin Jacob
2021-07-05 10:28 ` Morten Brørup
2021-07-06 7:11 ` fengchengwen
2021-07-03 9:45 ` fengchengwen
2021-07-03 12:00 ` Morten Brørup
2021-07-04 7:34 ` Jerin Jacob
2021-07-02 7:07 ` Liang Ma
2021-07-02 13:59 ` fengchengwen
2021-06-24 7:03 ` [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library Jerin Jacob
2021-06-24 7:59 ` Morten Brørup
2021-06-24 8:05 ` Jerin Jacob
2021-06-23 5:34 ` Hu, Jiayu
2021-06-23 11:07 ` Jerin Jacob
2021-06-16 2:17 ` Wang, Haiyue
2021-06-16 8:04 ` Bruce Richardson
2021-06-16 8:16 ` Wang, Haiyue
2021-06-16 12:14 ` David Marchand
2021-06-16 13:11 ` Bruce Richardson
2021-06-16 16:48 ` Honnappa Nagarahalli
2021-06-16 19:10 ` Bruce Richardson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YMxtiwvy9WeCVGjA@bricha3-MOBL.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=fengchengwen@huawei.com \
--cc=ferruh.yigit@intel.com \
--cc=hemant.agrawal@nxp.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=jerinj@marvell.com \
--cc=jerinjacobk@gmail.com \
--cc=maxime.coquelin@redhat.com \
--cc=mb@smartsharesystems.com \
--cc=nipun.gupta@nxp.com \
--cc=pkapoor@marvell.com \
--cc=sburla@marvell.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).