From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 449CCA0C3F; Mon, 28 Jun 2021 14:53:40 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C8D4E40692; Mon, 28 Jun 2021 14:53:39 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id D691D4068A for ; Mon, 28 Jun 2021 14:53:37 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10028"; a="207763144" X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="207763144" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 05:53:36 -0700 X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="625242322" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.1.17]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 28 Jun 2021 05:53:32 -0700 Date: Mon, 28 Jun 2021 13:53:29 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" Cc: fengchengwen , Jerin Jacob , Jerin Jacob , Morten =?iso-8859-1?Q?Br=F8rup?= , Nipun Gupta , Thomas Monjalon , "Yigit, Ferruh" , dpdk-dev , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor Message-ID: References: <25d29598-c26d-8497-2867-9b650c79df49@huawei.com> <3db2eda0-4490-2b8f-c65d-636bcf794494@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [dpdk-dev] dmadev discussion summary X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Jun 28, 2021 at 12:14:31PM +0100, Ananyev, Konstantin wrote: > > Hi everyone, > > > On Sat, Jun 26, 2021 at 11:59:49AM +0800, fengchengwen wrote: > > > Hi, all > > > I analyzed the current DPAM DMA driver and drew this summary in conjunction > > > with the previous discussion, and this will as a basis for the V2 implementation. > > > Feedback is welcome, thanks > > > > > Fantastic review and summary, many thanks for the work. Some comments > > inline in API part below, but nothing too major, I hope. > > > > /Bruce > > > > > > > > > > Summary: > > > 1) The dpaa2/octeontx2/Kunpeng are all ARM soc, there may acts as endpoint of > > > x86 host (e.g. smart NIC), multiple memory transfer requirements may exist, > > > e.g. local-to-host/local-to-host..., from the point of view of API design, > > > I think we should adopt a similar 'channel' or 'virt-queue' concept. > > > 2) Whether to create a separate dmadev for each HW-queue? We previously > > > discussed this, and due HW-queue could indepent management (like > > > Kunpeng_dma and Intel DSA), we prefer create a separate dmadev for each > > > HW-queue before. But I'm not sure if that's the case with dpaa. I think > > > that can be left to the specific driver, no restriction is imposed on the > > > framework API layer. > > > 3) I think we could setup following abstraction at dmadev device: > > > ------------ ------------ > > > |virt-queue| |virt-queue| > > > ------------ ------------ > > > \ / > > > \ / > > > \ / > > > ------------ ------------ > > > | HW-queue | | HW-queue | > > > ------------ ------------ > > > \ / > > > \ / > > > \ / > > > dmadev > > > 4) The driver's ops design (here we only list key points): > > > [dev_info_get]: mainly return the number of HW-queues > > > [dev_configure]: nothing important > > > [queue_setup]: create one virt-queue, has following main parameters: > > > HW-queue-index: the HW-queue index used > > > nb_desc: the number of HW descriptors > > > opaque: driver's specific info > > > Note1: this API return virt-queue index which will used in later API. > > > If user want create multiple virt-queue one the same HW-queue, > > > they could achieved by call queue_setup with the same > > > HW-queue-index. > > > Note2: I think it's hard to define queue_setup config paramter, and > > > also this is control API, so I think it's OK to use opaque > > > pointer to implement it. > > I'm not sure opaque pointer will work in practice, so I think we should try > > and standardize the parameters as much as possible. Since it's a control > > plane API, using a struct with a superset of parameters may be workable. > > Let's start with a minimum set and build up from there. > > > > > [dma_copy/memset/sg]: all has vq_id input parameter. > > > Note: I notice dpaa can't support single and sg in one virt-queue, and > > > I think it's maybe software implement policy other than HW > > > restriction because virt-queue could share the same HW-queue. > > Presumably for queues which support sq, the single-enqueue APIs can use a > > single sg list internally? > > > > > Here we use vq_id to tackle different scenario, like local-to-local/ > > > local-to-host and etc. > > > 5) And the dmadev public data-plane API (just prototype): > > > dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len, flags) > > > -- flags: used as an extended parameter, it could be uint32_t > > > > Suggest uint64_t rather than uint32_t to ensure we have expansion room? > > Otherwise +1 > > > > > dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags) > > +1 > > > > > dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len, flags) > > > -- sg: struct dma_scatterlist array > > I don't think our drivers will be directly implementing this API, but so > > long as SG support is listed as a capability flag I'm fine with this as an > > API. [We can't fudge it as a bunch of single copies, because that would > > cause us to have multiple cookies rather than one] > > > > > uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t *cookie, > > > uint16_t nb_cpls, bool *has_error) > > > -- nb_cpls: indicate max process operations number > > > -- has_error: indicate if there is an error > > > -- return value: the number of successful completed operations. > > > -- example: > > > 1) If there are already 32 completed ops, and 4th is error, and > > > nb_cpls is 32, then the ret will be 3(because 1/2/3th is OK), and > > > has_error will be true. > > > 2) If there are already 32 completed ops, and all successful > > > completed, then the ret will be min(32, nb_cpls), and has_error > > > will be false. > > > 3) If there are already 32 completed ops, and all failed completed, > > > then the ret will be 0, and has_error will be true. > > +1 for this > > > > > uint16_t rte_dmadev_completed_status(dev_id, vq_id, dma_cookie_t *cookie, > > > uint16_t nb_status, uint32_t *status) > > > -- return value: the number of failed completed operations. > > > And here I agree with Morten: we should design API which adapts to DPDK > > > service scenarios. So we don't support some sound-cards DMA, and 2D memory > > > copy which mainly used in video scenarios. > > > > Can I suggest a few adjustments here to the semantics of this API. In > > future we may have operations which return a status value, e.g. our > > hardware can support ops like compare equal/not-equal, which means that > > this API would be meaningful even in case of success. Therefore, I suggest > > that the return value be changed to allow success also to be returned in > > the array, and the return value is not the number of failed ops, but the > > number of ops for which status is being returned. > > > > Also for consideration: when trying to implement this in a prototype in our > > driver, it would be easier if we relax the restriction on the "completed" > > API so that we can flag has_error when an error is detected rather than > > guaranteeing to return all elements right up to the error. For example, if > > we have a burst of packets and one is problematic, it may be easier to flag > > the error at the start of the burst and then have a few successful entries > > at the start of the completed_status array. [Separate from this] We should > > also have a "has_error" or "more_errors" flag on this API too, to indicate > > when the user can switch back to using the regular "completed" API. This > > means that apps switch from one API to the other when "has_error" is true, > > and only switch back when it becomes false again. > > > > > 6) The dma_cookie_t is signed int type, when <0 it mean error, it's > > > monotonically increasing base on HW-queue (other than virt-queue). The > > > driver needs to make sure this because the damdev framework don't manage > > > the dma_cookie's creation. > > +1 to this. > > I think we also should specify that the cookie is guaranteed to wrap at a > > power of 2 value (UINT16_MAX??). This allows it to be used as an > > index into a circular buffer just by masking. > > > > > 7) Because data-plane APIs are not thread-safe, and user could determine > > > virt-queue to HW-queue's map (at the queue-setup stage), so it is user's > > > duty to ensure thread-safe. > > > 8) One example: > > > vq_id = rte_dmadev_queue_setup(dev, config.{HW-queue-index=x, opaque}); > > > if (vq_id < 0) { > > > // create virt-queue failed > > > return; > > > } > > > // submit memcpy task > > > cookit = rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags); > > > if (cookie < 0) { > > > // submit failed > > > return; > > > } > > > // get complete task > > > ret = rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error); > > > if (!has_error && ret == 1) { > > > // the memcpy successful complete > > > } > > +1 > > I have two questions on the proposed API: > 1. Would it make sense to split submission API into two stages: > a) reserve and prepare > b) actual submit. > Similar to what DPDK ioat/idxd PMDs have right now: > /* reserve and prepare */ > for (i=0;i /* submit to HW */ > rte_dmadev_issue_pending(...); > > For those PMDs that prefer to do actual submission to HW at rte_dmadev_memcpy(), > issue_pending() will be just a NOP. > > As I can it will make API more flexible and will help PMD developers to choose > most suitable approach for their HW. > As a side notice - linux DMA framework uses such approach too. > Thanks for pointing out the omission Konstantin. I understood that to be part of the original API proposals since we weren't doing burst enqueues, but it would be good to see it explicitly called out here. > 2) I wonder what would be MT-safe requirements for submit/completion API? > I.E. should all PMD support the case when one thread does rte_dmadev_memcpy(..) > while another one does rte_dmadev_completed(...) on the same queue simultaneously? > Or should such combination be ST only? > Or might be new capability flag per device? > I suggest we just add a capability flag for it into our library. It was something we looked to support with ioat in the past and may do so again in the future. /Bruce