From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1E47FA0C41; Wed, 23 Jun 2021 05:30:31 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8DEBC4003F; Wed, 23 Jun 2021 05:30:30 +0200 (CEST) Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by mails.dpdk.org (Postfix) with ESMTP id 5DA334003E for ; Wed, 23 Jun 2021 05:30:28 +0200 (CEST) Received: from dggemv711-chm.china.huawei.com (unknown [172.30.72.55]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4G8pZg2YV1z71dm; Wed, 23 Jun 2021 11:26:19 +0800 (CST) Received: from dggpeml500024.china.huawei.com (7.185.36.10) by dggemv711-chm.china.huawei.com (10.1.198.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 23 Jun 2021 11:30:25 +0800 Received: from [127.0.0.1] (10.40.190.165) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 23 Jun 2021 11:30:24 +0800 To: Jerin Jacob CC: Bruce Richardson , =?UTF-8?Q?Morten_Br=c3=b8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , "Satananda Burla" , Prasun Kapoor References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com> <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com> From: fengchengwen Message-ID: Date: Wed, 23 Jun 2021 11:30:24 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.40.190.165] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2021/6/23 1:25, Jerin Jacob wrote: > On Fri, Jun 18, 2021 at 3:11 PM fengchengwen wrote: >> >> On 2021/6/18 13:52, Jerin Jacob wrote: >>> On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson >>> wrote: >>>> >>>> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: >>>>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen wrote: >>>>>> >>>>>> On 2021/6/16 15:09, Morten Brørup wrote: >>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson >>>>>>>> Sent: Tuesday, 15 June 2021 18.39 >>>>>>>> >>>>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: >>>>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA >>>>>>>>> device. >>>>>>>>> >>>>>>>>> The APIs of dmadev library exposes some generic operations which can >>>>>>>>> enable configuration and I/O with the DMA devices. >>>>>>>>> >>>>>>>>> Signed-off-by: Chengwen Feng >>>>>>>>> --- >>>>>>>> Thanks for sending this. >>>>>>>> >>>>>>>> Of most interest to me right now are the key data-plane APIs. While we >>>>>>>> are >>>>>>>> still in the prototyping phase, below is a draft of what we are >>>>>>>> thinking >>>>>>>> for the key enqueue/perform_ops/completed_ops APIs. >>>>>>>> >>>>>>>> Some key differences I note in below vs your original RFC: >>>>>>>> * Use of void pointers rather than iova addresses. While using iova's >>>>>>>> makes >>>>>>>> sense in the general case when using hardware, in that it can work >>>>>>>> with >>>>>>>> both physical addresses and virtual addresses, if we change the APIs >>>>>>>> to use >>>>>>>> void pointers instead it will still work for DPDK in VA mode, while >>>>>>>> at the >>>>>>>> same time allow use of software fallbacks in error cases, and also a >>>>>>>> stub >>>>>>>> driver than uses memcpy in the background. Finally, using iova's >>>>>>>> makes the >>>>>>>> APIs a lot more awkward to use with anything but mbufs or similar >>>>>>>> buffers >>>>>>>> where we already have a pre-computed physical address. >>>>>>>> * Use of id values rather than user-provided handles. Allowing the >>>>>>>> user/app >>>>>>>> to manage the amount of data stored per operation is a better >>>>>>>> solution, I >>>>>>>> feel than proscribing a certain about of in-driver tracking. Some >>>>>>>> apps may >>>>>>>> not care about anything other than a job being completed, while other >>>>>>>> apps >>>>>>>> may have significant metadata to be tracked. Taking the user-context >>>>>>>> handles out of the API also makes the driver code simpler. >>>>>>>> * I've kept a single combined API for completions, which differs from >>>>>>>> the >>>>>>>> separate error handling completion API you propose. I need to give >>>>>>>> the >>>>>>>> two function approach a bit of thought, but likely both could work. >>>>>>>> If we >>>>>>>> (likely) never expect failed ops, then the specifics of error >>>>>>>> handling >>>>>>>> should not matter that much. >>>>>>>> >>>>>>>> For the rest, the control / setup APIs are likely to be rather >>>>>>>> uncontroversial, I suspect. However, I think that rather than xstats >>>>>>>> APIs, >>>>>>>> the library should first provide a set of standardized stats like >>>>>>>> ethdev >>>>>>>> does. If driver-specific stats are needed, we can add xstats later to >>>>>>>> the >>>>>>>> API. >>>>>>>> >>>>>>>> Appreciate your further thoughts on this, thanks. >>>>>>>> >>>>>>>> Regards, >>>>>>>> /Bruce >>>>>>> >>>>>>> I generally agree with Bruce's points above. >>>>>>> >>>>>>> I would like to share a couple of ideas for further discussion: >>>>> >>>>> >>>>> I believe some of the other requirements and comments for generic DMA will be >>>>> >>>>> 1) Support for the _channel_, Each channel may have different >>>>> capabilities and functionalities. >>>>> Typical cases are, each channel have separate source and destination >>>>> devices like >>>>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe >>>>> EP to PCIe EP. >>>>> So we need some notion of the channel in the specification. >>>>> >>>> >>>> Can you share a bit more detail on what constitutes a channel in this case? >>>> Is it equivalent to a device queue (which we are flattening to individual >>>> devices in this API), or to a specific configuration on a queue? >>> >>> It not a queue. It is one of the attributes for transfer. >>> I.e in the same queue, for a given transfer it can specify the >>> different "source" and "destination" device. >>> Like CPU to Sound card, CPU to network card etc. >>> >>> >>>> >>>>> 2) I assume current data plane APIs are not thread-safe. Is it right? >>>>> >>>> Yes. >>>> >>>>> >>>>> 3) Cookie scheme outlined earlier looks good to me. Instead of having >>>>> generic dequeue() API >>>>> >>>>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src, >>>>> void * dst, unsigned int length); >>>>> to two stage API like, Where one will be used in fastpath and other >>>>> one will use used in slowpath. >>>>> >>>>> - slowpath API will for take channel and take other attributes for transfer >>>>> >>>>> Example syantx will be: >>>>> >>>>> struct rte_dmadev_desc { >>>>> channel id; >>>>> ops ; // copy, xor, fill etc >>>>> other arguments specific to dma transfer // it can be set >>>>> based on capability. >>>>> >>>>> }; >>>>> >>>>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct >>>>> rte_dmadev_desc *dec); >>>>> >>>>> - Fastpath takes arguments that need to change per transfer along with >>>>> slow-path handle. >>>>> >>>>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned >>>>> int length, rte_dmadev_desc_t desc) >>>>> >>>>> This will help to driver to >>>>> -Former API form the device-specific descriptors in slow path for a >>>>> given channel and fixed attributes per transfer >>>>> -Later API blend "variable" arguments such as src, dest address with >>>>> slow-path created descriptors >>>>> >>>> >>>> This seems like an API for a context-aware device, where the channel is the >>>> config data/context that is preserved across operations - is that correct? >>>> At least from the Intel DMA accelerators side, we have no concept of this >>>> context, and each operation is completely self-described. The location or >>>> type of memory for copies is irrelevant, you just pass the src/dst >>>> addresses to reference. >>> >>> it is not context-aware device. Each HW JOB is self-described. >>> You can view it different attributes of transfer. >>> >>> >>>> >>>>> The above will give better performance and is the best trade-off c >>>>> between performance and per transfer variables. >>>> >>>> We may need to have different APIs for context-aware and context-unaware >>>> processing, with which to use determined by the capabilities discovery. >>>> Given that for these DMA devices the offload cost is critical, more so than >>>> any other dev class I've looked at before, I'd like to avoid having APIs >>>> with extra parameters than need to be passed about since that just adds >>>> extra CPU cycles to the offload. >>> >>> If driver does not support additional attributes and/or the >>> application does not need it, rte_dmadev_desc_t can be NULL. >>> So that it won't have any cost in the datapath. I think, we can go to >>> different API >>> cases if we can not abstract problems without performance impact. >>> Otherwise, it will be too much >>> pain for applications. >> >> Yes, currently we plan to use different API for different case, e.g. >> rte_dmadev_memcpy() -- deal with local to local memcopy >> rte_dmadev_memset() -- deal with fill with local memory with pattern >> maybe: >> rte_dmadev_imm_data() --deal with copy very little data >> rte_dmadev_p2pcopy() --deal with peer-to-peer copy of diffenet PCIE addr >> >> These API capabilities will be reflected in the device capability set so that >> application could know by standard API. > > > There will be a lot of combination of that it will be like M x N cross > base case, It won't scale. Currently, it is hard to define generic dma descriptor, I think the well-defined APIs is feasible. > >> >>> >>> Just to understand, I think, we need to HW capabilities and how to >>> have a common API. >>> I assume HW will have some HW JOB descriptors which will be filled in >>> SW and submitted to HW. >>> In our HW, Job descriptor has the following main elements >>> >>> - Channel // We don't expect the application to change per transfer >>> - Source address - It can be scatter-gather too - Will be changed per transfer >>> - Destination address - It can be scatter-gather too - Will be changed >>> per transfer >>> - Transfer Length - - It can be scatter-gather too - Will be changed >>> per transfer >>> - IOVA address where HW post Job completion status PER Job descriptor >>> - Will be changed per transfer >>> - Another sideband information related to channel // We don't expect >>> the application to change per transfer >>> - As an option, Job completion can be posted as an event to >>> rte_event_queue too // We don't expect the application to change per >>> transfer >> >> The 'option' field looks like a software interface field, but not HW descriptor. > > It is in HW descriptor. The HW is interesting, something like: DMA could send completion direct to EventHWQueue, the DMA and EventHWQueue are link in the hardware range, rather than by software. Could you provide public driver of this HW ? So we could know more about it's working mechanism and software-hardware collaboration. > >> >>> >>> @Richardson, Bruce @fengchengwen @Hemant Agrawal >>> >>> Could you share the options for your HW descriptors which you are >>> planning to expose through API like above so that we can easily >>> converge on fastpath API >>> >> >> Kunpeng HW descriptor is self-describing, and don't need refer context info. >> >> Maybe the fields which was fix with some transfer type could setup by driver, and >> don't expose to application. > > Yes. I agree.I think, that reason why I though to have > rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW > specific descriptors > and have single enq() operation with variable argument(through enq > parameter) and fix argumenents through rte_dmadev_prep() call object. > >> >> So that we could use more generic way to define the API. >> >>> >>> >>>> >>>> /Bruce >>> >>> . >>> >> > > . >