From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D996FA0C41; Wed, 23 Jun 2021 16:22:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 749294003F; Wed, 23 Jun 2021 16:22:23 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id E05C34003E for ; Wed, 23 Jun 2021 16:22:21 +0200 (CEST) IronPort-SDR: NQNRl9r38qjE21fG4ZeHVRDjb9uqYJpl/LQ6XNq7q/E4P4FCCwuLQ7HncLQm6agsq9MqOOEnbK oFV02kPZ0A2Q== X-IronPort-AV: E=McAfee;i="6200,9189,10024"; a="204263499" X-IronPort-AV: E=Sophos;i="5.83,294,1616482800"; d="scan'208";a="204263499" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 07:22:20 -0700 IronPort-SDR: XDeRjxiC58ElgIrY3yudOKbEhY3Ma+f22PhXG/JOIPQ0chQbfumtpa6W7c0iZIaM7dKJ+TOEDH p+gVlWeeThhQ== X-IronPort-AV: E=Sophos;i="5.83,294,1616482800"; d="scan'208";a="406700290" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.13.79]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 23 Jun 2021 07:22:17 -0700 Date: Wed, 23 Jun 2021 15:22:13 +0100 From: Bruce Richardson To: Jerin Jacob Cc: fengchengwen , Morten =?iso-8859-1?Q?Br=F8rup?= , Thomas Monjalon , Ferruh Yigit , dpdk-dev , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , Jerin Jacob , David Marchand , Satananda Burla , Prasun Kapoor Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk> <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 23, 2021 at 05:16:28PM +0530, Jerin Jacob wrote: > On Wed, Jun 23, 2021 at 3:11 PM Bruce Richardson > wrote: > > > > On Tue, Jun 22, 2021 at 10:55:24PM +0530, Jerin Jacob wrote: > > > On Fri, Jun 18, 2021 at 3:11 PM fengchengwen > > > wrote: > > > > > > > > On 2021/6/18 13:52, Jerin Jacob wrote: > > > > > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson > > > > > wrote: > > > > >> > > > > >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote: > > > > >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen > > > > >>> wrote: > > > > >>>> > > > > >>>> On 2021/6/16 15:09, Morten Brørup wrote: > > > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce > > > > >>>>>> Richardson Sent: Tuesday, 15 June 2021 18.39 > > > > >>>>>> > > > > >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng > > > > >>>>>> wrote: > > > > >>>>>>> This patch introduces 'dmadevice' which is a generic type > > > > >>>>>>> of DMA device. > > > > >>>>>>> > > > > >>>>>>> The APIs of dmadev library exposes some generic operations > > > > >>>>>>> which can enable configuration and I/O with the DMA > > > > >>>>>>> devices. > > > > >>>>>>> > > > > >>>>>>> Signed-off-by: Chengwen Feng --- > > > > >>>>>> Thanks for sending this. > > > > >>>>>> > > > > >>>>>> Of most interest to me right now are the key data-plane > > > > >>>>>> APIs. While we are still in the prototyping phase, below is > > > > >>>>>> a draft of what we are thinking for the key > > > > >>>>>> enqueue/perform_ops/completed_ops APIs. > > > > >>>>>> > > > > >>>>>> Some key differences I note in below vs your original RFC: * > > > > >>>>>> Use of void pointers rather than iova addresses. While using > > > > >>>>>> iova's makes sense in the general case when using hardware, > > > > >>>>>> in that it can work with both physical addresses and virtual > > > > >>>>>> addresses, if we change the APIs to use void pointers > > > > >>>>>> instead it will still work for DPDK in VA mode, while at the > > > > >>>>>> same time allow use of software fallbacks in error cases, > > > > >>>>>> and also a stub driver than uses memcpy in the background. > > > > >>>>>> Finally, using iova's makes the APIs a lot more awkward to > > > > >>>>>> use with anything but mbufs or similar buffers where we > > > > >>>>>> already have a pre-computed physical address. * Use of id > > > > >>>>>> values rather than user-provided handles. Allowing the > > > > >>>>>> user/app to manage the amount of data stored per operation > > > > >>>>>> is a better solution, I feel than proscribing a certain > > > > >>>>>> about of in-driver tracking. Some apps may not care about > > > > >>>>>> anything other than a job being completed, while other apps > > > > >>>>>> may have significant metadata to be tracked. Taking the > > > > >>>>>> user-context handles out of the API also makes the driver > > > > >>>>>> code simpler. * I've kept a single combined API for > > > > >>>>>> completions, which differs from the separate error handling > > > > >>>>>> completion API you propose. I need to give the two function > > > > >>>>>> approach a bit of thought, but likely both could work. If > > > > >>>>>> we (likely) never expect failed ops, then the specifics of > > > > >>>>>> error handling should not matter that much. > > > > >>>>>> > > > > >>>>>> For the rest, the control / setup APIs are likely to be > > > > >>>>>> rather uncontroversial, I suspect. However, I think that > > > > >>>>>> rather than xstats APIs, the library should first provide a > > > > >>>>>> set of standardized stats like ethdev does. If > > > > >>>>>> driver-specific stats are needed, we can add xstats later to > > > > >>>>>> the API. > > > > >>>>>> > > > > >>>>>> Appreciate your further thoughts on this, thanks. > > > > >>>>>> > > > > >>>>>> Regards, /Bruce > > > > >>>>> > > > > >>>>> I generally agree with Bruce's points above. > > > > >>>>> > > > > >>>>> I would like to share a couple of ideas for further > > > > >>>>> discussion: > > > > >>> > > > > >>> > > > > >>> I believe some of the other requirements and comments for > > > > >>> generic DMA will be > > > > >>> > > > > >>> 1) Support for the _channel_, Each channel may have different > > > > >>> capabilities and functionalities. Typical cases are, each > > > > >>> channel have separate source and destination devices like DMA > > > > >>> between PCIe EP to Host memory, Host memory to Host memory, > > > > >>> PCIe EP to PCIe EP. So we need some notion of the channel in > > > > >>> the specification. > > > > >>> > > > > >> > > > > >> Can you share a bit more detail on what constitutes a channel in > > > > >> this case? Is it equivalent to a device queue (which we are > > > > >> flattening to individual devices in this API), or to a specific > > > > >> configuration on a queue? > > > > > > > > > > It not a queue. It is one of the attributes for transfer. I.e in > > > > > the same queue, for a given transfer it can specify the different > > > > > "source" and "destination" device. Like CPU to Sound card, CPU > > > > > to network card etc. > > > > > > > > > > > > > > >> > > > > >>> 2) I assume current data plane APIs are not thread-safe. Is it > > > > >>> right? > > > > >>> > > > > >> Yes. > > > > >> > > > > >>> > > > > >>> 3) Cookie scheme outlined earlier looks good to me. Instead of > > > > >>> having generic dequeue() API > > > > >>> > > > > >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void > > > > >>> * src, void * dst, unsigned int length); to two stage API like, > > > > >>> Where one will be used in fastpath and other one will use used > > > > >>> in slowpath. > > > > >>> > > > > >>> - slowpath API will for take channel and take other attributes > > > > >>> for transfer > > > > >>> > > > > >>> Example syantx will be: > > > > >>> > > > > >>> struct rte_dmadev_desc { channel id; ops ; // copy, xor, fill > > > > >>> etc other arguments specific to dma transfer // it can be set > > > > >>> based on capability. > > > > >>> > > > > >>> }; > > > > >>> > > > > >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id, struct > > > > >>> rte_dmadev_desc *dec); > > > > >>> > > > > >>> - Fastpath takes arguments that need to change per transfer > > > > >>> along with slow-path handle. > > > > >>> > > > > >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, > > > > >>> unsigned int length, rte_dmadev_desc_t desc) > > > > >>> > > > > >>> This will help to driver to -Former API form the > > > > >>> device-specific descriptors in slow path for a given channel > > > > >>> and fixed attributes per transfer -Later API blend "variable" > > > > >>> arguments such as src, dest address with slow-path created > > > > >>> descriptors > > > > >>> > > > > >> > > > > >> This seems like an API for a context-aware device, where the > > > > >> channel is the config data/context that is preserved across > > > > >> operations - is that correct? At least from the Intel DMA > > > > >> accelerators side, we have no concept of this context, and each > > > > >> operation is completely self-described. The location or type of > > > > >> memory for copies is irrelevant, you just pass the src/dst > > > > >> addresses to reference. > > > > > > > > > > it is not context-aware device. Each HW JOB is self-described. > > > > > You can view it different attributes of transfer. > > > > > > > > > > > > > > >> > > > > >>> The above will give better performance and is the best > > > > >>> trade-off c between performance and per transfer variables. > > > > >> > > > > >> We may need to have different APIs for context-aware and > > > > >> context-unaware processing, with which to use determined by the > > > > >> capabilities discovery. Given that for these DMA devices the > > > > >> offload cost is critical, more so than any other dev class I've > > > > >> looked at before, I'd like to avoid having APIs with extra > > > > >> parameters than need to be passed about since that just adds > > > > >> extra CPU cycles to the offload. > > > > > > > > > > If driver does not support additional attributes and/or the > > > > > application does not need it, rte_dmadev_desc_t can be NULL. So > > > > > that it won't have any cost in the datapath. I think, we can go > > > > > to different API cases if we can not abstract problems without > > > > > performance impact. Otherwise, it will be too much pain for > > > > > applications. > > > > > > > > Yes, currently we plan to use different API for different case, > > > > e.g. rte_dmadev_memcpy() -- deal with local to local memcopy > > > > rte_dmadev_memset() -- deal with fill with local memory with > > > > pattern maybe: rte_dmadev_imm_data() --deal with copy very little > > > > data rte_dmadev_p2pcopy() --deal with peer-to-peer copy of > > > > diffenet PCIE addr > > > > > > > > These API capabilities will be reflected in the device capability > > > > set so that application could know by standard API. > > > > > > > > > There will be a lot of combination of that it will be like M x N > > > cross base case, It won't scale. > > > > > > > What are the various cases that are so significantly different? Using > > the examples above, the "imm_data" and "p2p_copy" operations are still > > copy ops, and the fact of it being a small copy or a p2p one can be > > expressed just using flags? [Also, you are not likely to want to > > offload a small copy, are you?] > > I meant, p2p version can have memcpy, memset, _imm_data. So it is gone to > 4 to 6 now, If we add one more op, it becomes 8 function. > > IMO, a separate function is good if driver need to do radically different > thing. In our hardware, it is about updating the descriptor field > differently, Is it so with other HW? If so, _prep() makes life easy. > I disagree. Sure, there are a matrix of possibilities, but using the set above, memcpy == copy, both memset and imm_data seem like a "fill op" to me, so to have those work with both p2p and DRAM you should only need two functions with a flag to indicate p2p or mem-mem (or two flags if you want to indicate src and dest in pci or memory independently). I'm just not seeing where the massive structs need to be passed around and slow things down. /Bruce