* Re: [dpdk-dev] RFC: Kunpeng DMA driver API design decision
2021-06-12 8:41 ` Jerin Jacob
@ 2021-06-12 11:53 ` Fengchengwen
2021-06-14 18:18 ` Bruce Richardson
1 sibling, 0 replies; 5+ messages in thread
From: Fengchengwen @ 2021-06-12 11:53 UTC (permalink / raw)
To: Jerin Jacob, Thomas Monjalon
Cc: Ferruh Yigit, dev, Nipun Gupta, Hemant Agrawal, Richardson,
Bruce, Maxime Coquelin, Honnappa Nagarahalli, Jerin Jacob,
David Marchand
OK, I will send one, thanks
From:Jerin Jacob <jerinjacobk@gmail.com>
To:Thomas Monjalon <thomas@monjalon.net>
Cc:Fengchengwen <fengchengwen@huawei.com>;Ferruh Yigit <ferruh.yigit@intel.com>;dev <dev@dpdk.org>;Nipun Gupta <nipun.gupta@nxp.com>;Hemant Agrawal <hemant.agrawal@nxp.com>;Richardson, Bruce <bruce.richardson@intel.com>;Maxime Coquelin <maxime.coquelin@redhat.com>;Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>;Jerin Jacob <jerinj@marvell.com>;David Marchand <david.marchand@redhat.com>
Date:2021-06-12 16:41:32
Subject:Re: [dpdk-dev] RFC: Kunpeng DMA driver API design decision
On Sat, Jun 12, 2021 at 2:01 PM Thomas Monjalon < thomas@monjalon.net<mailto:thomas@monjalon.net>> wrote:
>
> 12/06/2021 09:01, fengchengwen:
> > Hi all,
> >
> > We prepare support Kunpeng DMA engine under rawdev framework, and observed that
> > there are two different implementations of the data plane API:
> > 1. rte_rawdev_enqueue/dequeue_buffers which was implemented by dpaa2_qdma and
> > octeontx2_dma driver.
> > 2. rte_ioat_enqueue_xxx/rte_ioat_completed_ops which was implemented by ioat
> > driver.
> >
> > Due to following consideration (mainly performance), we plan to implement API
> > like ioat (not the same, have some differences) in data plane:
> > 1. The rte_rawdev_enqueue_buffers use opaque buffer reference which is vendor's
> > specific, so it needs first to translate application parameters to opaque
> > pointer, and then driver writes the opaque data onto hardware, this may lead
> > to performance problem.
> > 2. rte_rawdev_xxx doesn't provide memory barrier API which may need to extend
> > by opaque data (e.g. add flag to every request), this may introduce some
> > complexity.
> >
> > Also the example/ioat was used to compare DMA and CPU-memcopy performance,
> > Could we generalized it so that it supports multiple-vendor ?
> >
> > I don't know if the community accepts this kind of implementation, so if you
> > have any comments, please provide feedback.
>
> I would love having a common generic API.
> I would prefer having drivers under drivers/dma/ directory,
> rather than rawdev.
+1 for rte_dmadev.
Now that we have multiple DMA drivers, it better to have a common
generic API for API.
@fengchengwen If you would like to pursue generic DMA API the please
propose an RFC for dmadev PUBLIC API before implementing it,
We can help you review the proposal of API.
>
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] RFC: Kunpeng DMA driver API design decision
2021-06-12 8:41 ` Jerin Jacob
2021-06-12 11:53 ` Fengchengwen
@ 2021-06-14 18:18 ` Bruce Richardson
1 sibling, 0 replies; 5+ messages in thread
From: Bruce Richardson @ 2021-06-14 18:18 UTC (permalink / raw)
To: Jerin Jacob
Cc: Thomas Monjalon, fengchengwen, Ferruh Yigit, dev, Nipun Gupta,
Hemant Agrawal, Maxime Coquelin, Honnappa Nagarahalli,
Jerin Jacob, David Marchand
On Sat, Jun 12, 2021 at 02:11:10PM +0530, Jerin Jacob wrote:
> On Sat, Jun 12, 2021 at 2:01 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > 12/06/2021 09:01, fengchengwen:
> > > Hi all,
> > >
> > > We prepare support Kunpeng DMA engine under rawdev framework, and observed that
> > > there are two different implementations of the data plane API:
> > > 1. rte_rawdev_enqueue/dequeue_buffers which was implemented by dpaa2_qdma and
> > > octeontx2_dma driver.
> > > 2. rte_ioat_enqueue_xxx/rte_ioat_completed_ops which was implemented by ioat
> > > driver.
> > >
> > > Due to following consideration (mainly performance), we plan to implement API
> > > like ioat (not the same, have some differences) in data plane:
> > > 1. The rte_rawdev_enqueue_buffers use opaque buffer reference which is vendor's
> > > specific, so it needs first to translate application parameters to opaque
> > > pointer, and then driver writes the opaque data onto hardware, this may lead
> > > to performance problem.
> > > 2. rte_rawdev_xxx doesn't provide memory barrier API which may need to extend
> > > by opaque data (e.g. add flag to every request), this may introduce some
> > > complexity.
> > >
> > > Also the example/ioat was used to compare DMA and CPU-memcopy performance,
> > > Could we generalized it so that it supports multiple-vendor ?
> > >
> > > I don't know if the community accepts this kind of implementation, so if you
> > > have any comments, please provide feedback.
> >
> > I would love having a common generic API.
> > I would prefer having drivers under drivers/dma/ directory,
> > rather than rawdev.
>
> +1 for rte_dmadev.
>
> Now that we have multiple DMA drivers, it better to have a common
> generic API for API.
>
> @fengchengwen If you would like to pursue generic DMA API the please
> propose an RFC for dmadev PUBLIC API before implementing it,
> We can help you review the proposal of API.
>
I'd like to volunteer to help with this effort also, having a large
interest in it from my work on ioat driver (thanks for the positive words
on the API :-)).
Based on our experience with ioat driver, we are also looking into possible
prototypes for a dmadev device type too, and hopefully will have some RFC
to share soon. As might be expected this will be very similar to the
existing ioat APIs, though with one change to the dataplane API I'll call
out here initially. The use of explicit source and destination handles for
each operation is a little inflexible, so we are looking at replacing that
mechanism with one where the APIs return a (sequentially increasing) job id
after each enqueue, and having the completion function return the id of the
last completed job (or error info in case of an error). This would have the
advantage of allowing each app or library using the dmadev to store as much
or as little context information as desired in its own circular buffer or
buffers, and not be limited to just two uint64_t's. It would also simplify
the drivers, since they have less data to manage.
I'd hope to have a more complete API description to send out very shortly
to kick off reviews and discussion.
Regards,
/Bruce
^ permalink raw reply [flat|nested] 5+ messages in thread