From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 008DFA0C41; Wed, 23 Jun 2021 11:41:30 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 70F294003F; Wed, 23 Jun 2021 11:41:30 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id DCB9C4003E for ; Wed, 23 Jun 2021 11:41:28 +0200 (CEST) IronPort-SDR: sTB9c8/pGVW7t34vYaZH4/5KY72VET1/s7wmUXsQw+fPMZeMpL+hkXFTj4gEYOrA3zl11u/lpP 0s+0KzbIlaoA== X-IronPort-AV: E=McAfee;i="6200,9189,10023"; a="207167028" X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="207167028" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 02:41:27 -0700 IronPort-SDR: lejhGs7ygJf9V4149+S/n5ADpAexaJKVP8k/mxY7xuIBli1QuA2Z1FVRhpXaN+MCeiQbUJhacU bcUgmmS+vNXQ== X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="452973050" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.13.79]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 23 Jun 2021 02:41:24 -0700 Date: Wed, 23 Jun 2021 10:41:19 +0100 From: Bruce Richardson To: Jerin Jacob Cc: fengchengwen , Morten =?iso-8859-1?Q?Br=F8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , Satananda Burla , Prasun Kapoor Message-ID: References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com> <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Jun 22, 2021 at 10:55:24PM +0530, Jerin Jacob wrote: > On Fri, Jun 18, 2021 at 3:11 PM fengchengwen wrote: > > > > On 2021/6/18 13:52, Jerin Jacob wrote: > > > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson > > > wrote: > > >> > > >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: > > >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen wrote: > > >>>> > > >>>> On 2021/6/16 15:09, Morten Brørup wrote: > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > >>>>>> Sent: Tuesday, 15 June 2021 18.39 > > >>>>>> > > >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > > >>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA > > >>>>>>> device. > > >>>>>>> > > >>>>>>> The APIs of dmadev library exposes some generic operations which can > > >>>>>>> enable configuration and I/O with the DMA devices. > > >>>>>>> > > >>>>>>> Signed-off-by: Chengwen Feng > > >>>>>>> --- > > >>>>>> Thanks for sending this. > > >>>>>> > > >>>>>> Of most interest to me right now are the key data-plane APIs. While we > > >>>>>> are > > >>>>>> still in the prototyping phase, below is a draft of what we are > > >>>>>> thinking > > >>>>>> for the key enqueue/perform_ops/completed_ops APIs. > > >>>>>> > > >>>>>> Some key differences I note in below vs your original RFC: > > >>>>>> * Use of void pointers rather than iova addresses. While using iova's > > >>>>>> makes > > >>>>>> sense in the general case when using hardware, in that it can work > > >>>>>> with > > >>>>>> both physical addresses and virtual addresses, if we change the APIs > > >>>>>> to use > > >>>>>> void pointers instead it will still work for DPDK in VA mode, while > > >>>>>> at the > > >>>>>> same time allow use of software fallbacks in error cases, and also a > > >>>>>> stub > > >>>>>> driver than uses memcpy in the background. Finally, using iova's > > >>>>>> makes the > > >>>>>> APIs a lot more awkward to use with anything but mbufs or similar > > >>>>>> buffers > > >>>>>> where we already have a pre-computed physical address. > > >>>>>> * Use of id values rather than user-provided handles. Allowing the > > >>>>>> user/app > > >>>>>> to manage the amount of data stored per operation is a better > > >>>>>> solution, I > > >>>>>> feel than proscribing a certain about of in-driver tracking. Some > > >>>>>> apps may > > >>>>>> not care about anything other than a job being completed, while other > > >>>>>> apps > > >>>>>> may have significant metadata to be tracked. Taking the user-context > > >>>>>> handles out of the API also makes the driver code simpler. > > >>>>>> * I've kept a single combined API for completions, which differs from > > >>>>>> the > > >>>>>> separate error handling completion API you propose. I need to give > > >>>>>> the > > >>>>>> two function approach a bit of thought, but likely both could work. > > >>>>>> If we > > >>>>>> (likely) never expect failed ops, then the specifics of error > > >>>>>> handling > > >>>>>> should not matter that much. > > >>>>>> > > >>>>>> For the rest, the control / setup APIs are likely to be rather > > >>>>>> uncontroversial, I suspect. However, I think that rather than xstats > > >>>>>> APIs, > > >>>>>> the library should first provide a set of standardized stats like > > >>>>>> ethdev > > >>>>>> does. If driver-specific stats are needed, we can add xstats later to > > >>>>>> the > > >>>>>> API. > > >>>>>> > > >>>>>> Appreciate your further thoughts on this, thanks. > > >>>>>> > > >>>>>> Regards, > > >>>>>> /Bruce > > >>>>> > > >>>>> I generally agree with Bruce's points above. > > >>>>> > > >>>>> I would like to share a couple of ideas for further discussion: > > >>> > > >>> > > >>> I believe some of the other requirements and comments for generic DMA will be > > >>> > > >>> 1) Support for the _channel_, Each channel may have different > > >>> capabilities and functionalities. > > >>> Typical cases are, each channel have separate source and destination > > >>> devices like > > >>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe > > >>> EP to PCIe EP. > > >>> So we need some notion of the channel in the specification. > > >>> > > >> > > >> Can you share a bit more detail on what constitutes a channel in this case? > > >> Is it equivalent to a device queue (which we are flattening to individual > > >> devices in this API), or to a specific configuration on a queue? > > > > > > It not a queue. It is one of the attributes for transfer. > > > I.e in the same queue, for a given transfer it can specify the > > > different "source" and "destination" device. > > > Like CPU to Sound card, CPU to network card etc. > > > > > > > > >> > > >>> 2) I assume current data plane APIs are not thread-safe. Is it right? > > >>> > > >> Yes. > > >> > > >>> > > >>> 3) Cookie scheme outlined earlier looks good to me. Instead of having > > >>> generic dequeue() API > > >>> > > >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src, > > >>> void * dst, unsigned int length); > > >>> to two stage API like, Where one will be used in fastpath and other > > >>> one will use used in slowpath. > > >>> > > >>> - slowpath API will for take channel and take other attributes for transfer > > >>> > > >>> Example syantx will be: > > >>> > > >>> struct rte_dmadev_desc { > > >>> channel id; > > >>> ops ; // copy, xor, fill etc > > >>> other arguments specific to dma transfer // it can be set > > >>> based on capability. > > >>> > > >>> }; > > >>> > > >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct > > >>> rte_dmadev_desc *dec); > > >>> > > >>> - Fastpath takes arguments that need to change per transfer along with > > >>> slow-path handle. > > >>> > > >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned > > >>> int length, rte_dmadev_desc_t desc) > > >>> > > >>> This will help to driver to > > >>> -Former API form the device-specific descriptors in slow path for a > > >>> given channel and fixed attributes per transfer > > >>> -Later API blend "variable" arguments such as src, dest address with > > >>> slow-path created descriptors > > >>> > > >> > > >> This seems like an API for a context-aware device, where the channel is the > > >> config data/context that is preserved across operations - is that correct? > > >> At least from the Intel DMA accelerators side, we have no concept of this > > >> context, and each operation is completely self-described. The location or > > >> type of memory for copies is irrelevant, you just pass the src/dst > > >> addresses to reference. > > > > > > it is not context-aware device. Each HW JOB is self-described. > > > You can view it different attributes of transfer. > > > > > > > > >> > > >>> The above will give better performance and is the best trade-off c > > >>> between performance and per transfer variables. > > >> > > >> We may need to have different APIs for context-aware and context-unaware > > >> processing, with which to use determined by the capabilities discovery. > > >> Given that for these DMA devices the offload cost is critical, more so than > > >> any other dev class I've looked at before, I'd like to avoid having APIs > > >> with extra parameters than need to be passed about since that just adds > > >> extra CPU cycles to the offload. > > > > > > If driver does not support additional attributes and/or the > > > application does not need it, rte_dmadev_desc_t can be NULL. > > > So that it won't have any cost in the datapath. I think, we can go to > > > different API > > > cases if we can not abstract problems without performance impact. > > > Otherwise, it will be too much > > > pain for applications. > > > > Yes, currently we plan to use different API for different case, e.g. > > rte_dmadev_memcpy() -- deal with local to local memcopy > > rte_dmadev_memset() -- deal with fill with local memory with pattern > > maybe: > > rte_dmadev_imm_data() --deal with copy very little data > > rte_dmadev_p2pcopy() --deal with peer-to-peer copy of diffenet PCIE addr > > > > These API capabilities will be reflected in the device capability set so that > > application could know by standard API. > > > There will be a lot of combination of that it will be like M x N cross > base case, It won't scale. > What are the various cases that are so significantly different? Using the examples above, the "imm_data" and "p2p_copy" operations are still copy ops, and the fact of it being a small copy or a p2p one can be expressed just using flags? [Also, you are not likely to want to offload a small copy, are you?] > > > > > > > > Just to understand, I think, we need to HW capabilities and how to > > > have a common API. > > > I assume HW will have some HW JOB descriptors which will be filled in > > > SW and submitted to HW. > > > In our HW, Job descriptor has the following main elements > > > > > > - Channel // We don't expect the application to change per transfer > > > - Source address - It can be scatter-gather too - Will be changed per transfer > > > - Destination address - It can be scatter-gather too - Will be changed > > > per transfer > > > - Transfer Length - - It can be scatter-gather too - Will be changed > > > per transfer > > > - IOVA address where HW post Job completion status PER Job descriptor > > > - Will be changed per transfer > > > - Another sideband information related to channel // We don't expect > > > the application to change per transfer > > > - As an option, Job completion can be posted as an event to > > > rte_event_queue too // We don't expect the application to change per > > > transfer > > > > The 'option' field looks like a software interface field, but not HW descriptor. > > It is in HW descriptor. > > > > > > > > > @Richardson, Bruce @fengchengwen @Hemant Agrawal > > > > > > Could you share the options for your HW descriptors which you are > > > planning to expose through API like above so that we can easily > > > converge on fastpath API > > > > > > > Kunpeng HW descriptor is self-describing, and don't need refer context info. > > > > Maybe the fields which was fix with some transfer type could setup by driver, and > > don't expose to application. > > Yes. I agree.I think, that reason why I though to have > rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW > specific descriptors What are all these attributes? Do you have a reference link for them?