From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6B228A0C41; Wed, 23 Jun 2021 13:07:32 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DDE7B4003F; Wed, 23 Jun 2021 13:07:31 +0200 (CEST) Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) by mails.dpdk.org (Postfix) with ESMTP id 6657F4003E for ; Wed, 23 Jun 2021 13:07:30 +0200 (CEST) Received: by mail-io1-f50.google.com with SMTP id g22so2866785iom.1 for ; Wed, 23 Jun 2021 04:07:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yl2LPv4akCghnsGVGROIYqFuQUTnRVF1zDUd48DgojI=; b=IsoeXPh81jsyP30XpyyPYAolMXzWV2qAe9Zdk/p0SDET37nHZLxZ39uWI7FibkQF/j CKseiJhQrlA+80io5jflECDSYTaiOd4KtOm5PK421GLXKZ1A3b2R9rj1sv3iTB4Ia4dT 1I9nfSSKNuvruaYqkQYRs2pmrCsK1HdFoLBl7+FIQlefbNRj5/6oq0CMTDibzwiBLwiP tghIdajFWBucAFYs8X6K7P0evlMQ98hszi2RKiBGChTVpkxLkgNE3GKgSM7pjK9LOcBj 1nIvBuF+e2VOceEmGx72w7pC4z/UK1qsk3EIhHjql1lbqN5B/kYOA5sd8OKyqQZdGnno LJHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yl2LPv4akCghnsGVGROIYqFuQUTnRVF1zDUd48DgojI=; b=bj4s9sLAhFLlNVuoNgB+SKZ38Cqje1ahS5LT/zKJZZ5XkNu6YL5wqVYq9X3di+nNNv V3PenOx0WrP6RLXpimJV8I+1qH0av7Ds6bs5efpRCgz/HLwCnE1V+J4Q/jnwXefxXp3h Qe6B1nmQalwtPwPWr48OW455eRwTlWDUJYpaFLx3c1TxxJe2xJu78+cD9ZxGxEf0qQcH 5sD0906dyhX+iBiJR+KDJICc5dFizWYyXZ0AFkELwgzuP3kLfttfiqN4lfB0H2UipTQB 1eT8XrB2EAePkKcyLtVhjPsuMsnAL+AohQiuefaDrCThbrZpqoPMhoC5n5L6vr52vjU6 8+DQ== X-Gm-Message-State: AOAM533xtQ+SCNonTMs/6XRcHwPodvbxLItIIHgpXvz7W8JDjzs2O1/k 9nGLAoHR3XTEnCOKKOFQsy0lnJe1nNgD/VmTDBA= X-Google-Smtp-Source: ABdhPJy/Y4a3hBSGmEQaPB2ODjXb88y4CDu4lmvnE7wF2HtZkkhxChDYDJSgUttT4HFJ9Xui+apBILh1B5sTppNFmXc= X-Received: by 2002:a6b:c984:: with SMTP id z126mr6800043iof.94.1624446449693; Wed, 23 Jun 2021 04:07:29 -0700 (PDT) MIME-Version: 1.0 References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com> <18dfcd9c6942429dbd1f674de8e6e337@intel.com> In-Reply-To: <18dfcd9c6942429dbd1f674de8e6e337@intel.com> From: Jerin Jacob Date: Wed, 23 Jun 2021 16:37:03 +0530 Message-ID: To: "Hu, Jiayu" Cc: "Richardson, Bruce" , fengchengwen , "thomas@monjalon.net" , "Yigit, Ferruh" , "dev@dpdk.org" , "nipun.gupta@nxp.com" , "hemant.agrawal@nxp.com" , "maxime.coquelin@redhat.com" , "honnappa.nagarahalli@arm.com" , "jerinj@marvell.com" , "david.marchand@redhat.com" Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, Jun 23, 2021 at 11:04 AM Hu, Jiayu wrote: > > > > > -----Original Message----- > > From: dev On Behalf Of Bruce Richardson > > Sent: Thursday, June 17, 2021 1:31 AM > > To: fengchengwen > > Cc: thomas@monjalon.net; Yigit, Ferruh ; > > dev@dpdk.org; nipun.gupta@nxp.com; hemant.agrawal@nxp.com; > > maxime.coquelin@redhat.com; honnappa.nagarahalli@arm.com; > > jerinj@marvell.com; david.marchand@redhat.com; jerinjacobk@gmail.com > > Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library > > > > On Wed, Jun 16, 2021 at 05:41:45PM +0800, fengchengwen wrote: > > > On 2021/6/16 0:38, Bruce Richardson wrote: > > > > On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote: > > > >> This patch introduces 'dmadevice' which is a generic type of DMA > > > >> device. > > > >> > > > >> The APIs of dmadev library exposes some generic operations which > > > >> can enable configuration and I/O with the DMA devices. > > > >> > > > >> Signed-off-by: Chengwen Feng > > > >> --- > > > > Thanks for sending this. > > > > > > > > Of most interest to me right now are the key data-plane APIs. While > > > > we are still in the prototyping phase, below is a draft of what we > > > > are thinking for the key enqueue/perform_ops/completed_ops APIs. > > > > > > > > Some key differences I note in below vs your original RFC: > > > > * Use of void pointers rather than iova addresses. While using iova's > > makes > > > > sense in the general case when using hardware, in that it can work with > > > > both physical addresses and virtual addresses, if we change the APIs to > > use > > > > void pointers instead it will still work for DPDK in VA mode, while at the > > > > same time allow use of software fallbacks in error cases, and also a stub > > > > driver than uses memcpy in the background. Finally, using iova's makes > > the > > > > APIs a lot more awkward to use with anything but mbufs or similar > > buffers > > > > where we already have a pre-computed physical address. > > > > > > The iova is an hint to application, and widely used in DPDK. > > > If switch to void, how to pass the address (iova or just va ?) this > > > may introduce implementation dependencies here. > > > > > > Or always pass the va, and the driver performs address translation, > > > and this translation may cost too much cpu I think. > > > > > > > On the latter point, about driver doing address translation I would agree. > > However, we probably need more discussion about the use of iova vs just > > virtual addresses. My thinking on this is that if we specify the API using iovas > > it will severely hurt usability of the API, since it forces the user to take more > > inefficient codepaths in a large number of cases. Given a pointer to the > > middle of an mbuf, one cannot just pass that straight as an iova but must > > instead do a translation into offset from mbuf pointer and then readd the > > offset to the mbuf base address. > > Agree. Vhost is one consumer of DMA devices. To support SW fallback > in case of DMA copy errors, vhost needs to pass VA for both DPDK mbuf > and guest buffer to the callback layer (a middle layer between vhost and > dma device). If DMA devices use iova, it will require the callback layer to > call rte_mem_virt2iova() to translate va to iova in data path, even if iova > is va in some cases. But if DMA devices claim to use va, device differences > can be hided inside driver, which makes the DMA callback layer simpler > and more efficient. +1 to Bruce suggestion. I think, we can make void * by: - Add RTE_PCI_DRV_NEED_IOVA_AS_VA in our driver and - I think, we need capability to say DMA address should be from hugepage or (mapped by IOMMU) Not from random heap and stack area. aka capablity to say https://www.kernel.org/doc/html/latest/x86/sva.html feature is not supported for those devices > > Thanks, > Jiayu > > > > > My preference therefore is to require the use of an IOMMU when using a > > dmadev, so that it can be a much closer analog of memcpy. Once an iommu > > is present, DPDK will run in VA mode, allowing virtual addresses to our > > hugepage memory to be sent directly to hardware. Also, when using > > dmadevs on top of an in-kernel driver, that kernel driver may do all iommu > > management for the app, removing further the restrictions on what memory > > can be addressed by hardware. > > > > > > * Use of id values rather than user-provided handles. Allowing the > > user/app > > > > to manage the amount of data stored per operation is a better solution, > > I > > > > feel than proscribing a certain about of in-driver tracking. Some apps > > may > > > > not care about anything other than a job being completed, while other > > apps > > > > may have significant metadata to be tracked. Taking the user-context > > > > handles out of the API also makes the driver code simpler. > > > > > > The user-provided handle was mainly used to simply application > > > implementation, It provides the ability to quickly locate contexts. > > > > > > The "use of id values" seem like the dma_cookie of Linux DMA engine > > > framework, user will get a unique dma_cookie after calling > > > dmaengine_submit(), and then could use it to call > > dma_async_is_tx_complete() to get completion status. > > > > > > > Yes, the idea of the id is the same - to locate contexts. The main difference is > > that if we have the driver manage contexts or pointer to contexts, as well as > > giving more work to the driver, it complicates the APIs for measuring > > completions. If we use an ID-based approach, where the app maintains its > > own ring of contexts (if any), it avoids the need to have an "out" parameter > > array for returning those contexts, which needs to be appropriately sized. > > Instead we can just report that all ids up to N are completed. [This would be > > similar to your suggestion that N jobs be reported as done, in that no > > contexts are provided, it's just that knowing the ID of what is completed is > > generally more useful than the number (which can be obviously got by > > subtracting the old value)] > > > > We are still working on prototyping all this, but would hope to have a > > functional example of all this soon. > > > > > How about define the copy prototype as following: > > > dma_cookie_t rte_dmadev_copy(uint16_t dev_id, xxx) while the > > > dma_cookie_t is int32 and is monotonically increasing, when >=0 mean > > > enqueue successful else fail. > > > when complete the dmadev will return latest completed dma_cookie, and > > > the application could use the dma_cookie to quick locate contexts. > > > > > > > If I understand this correctly, I believe this is largely what I was suggesting - > > just with the typedef for the type? In which case it obviously looks good to > > me. > > > > > > * I've kept a single combined API for completions, which differs from the > > > > separate error handling completion API you propose. I need to give the > > > > two function approach a bit of thought, but likely both could work. If we > > > > (likely) never expect failed ops, then the specifics of error handling > > > > should not matter that much. > > > > > > The rte_ioat_completed_ops API is too complex, and consider some > > > applications may never copy fail, so split them as two API. > > > It's indeed not friendly to other scenarios that always require error > > handling. > > > > > > I prefer use completed operations number as return value other than > > > the ID so that application could simple judge whether have new > > > completed operations, and the new prototype: > > > uint16_t rte_dmadev_completed(uint16_t dev_id, dma_cookie_t *cookie, > > > uint32_t *status, uint16_t max_status, uint16_t *num_fails); > > > > > > 1) for normal case which never expect failed ops: > > > just call: ret = rte_dmadev_completed(dev_id, &cookie, NULL, 0, > > > NULL); > > > 2) for other case: > > > ret = rte_dmadev_completed(dev_id, &cookie, &status, max_status, > > &fails); > > > at this point the fails <= ret <= max_status > > > > > Completely agree that we need to plan for the happy-day case where all is > > passing. Looking at the prototypes you have above, I am ok with returning > > number of completed ops as the return value with the final completed cookie > > as an "out" parameter. > > For handling errors, I'm ok with what you propose above, just with one small > > adjustment - I would remove the restriction that ret <= max_status. > > > > In case of zero-failures, we can report as many ops succeeding as we like, > > and even in case of failure, we can still report as many successful ops as we > > like before we start filling in the status field. For example, if 32 ops are > > completed, and the last one fails, we can just fill in one entry into status, and > > return 32. Alternatively if the 4th last one fails we fill in 4 entries and return > > 32. The only requirements would be: > > * fails <= max_status > > * fails <= ret > > * cookie holds the id of the last entry in status. > > > > A further possible variation is to have separate "completed" and > > "completed_status" APIs, where "completed_status" is as above, but > > "completed" skips the final 3 parameters and returns -1 on error. In that case > > the user can fall back to the completed_status call. > > > > > > > > > > For the rest, the control / setup APIs are likely to be rather > > > > uncontroversial, I suspect. However, I think that rather than xstats > > > > APIs, the library should first provide a set of standardized stats > > > > like ethdev does. If driver-specific stats are needed, we can add > > > > xstats later to the API. > > > > > > Agree, will fix in v2 > > > > > Thanks. In parallel, we will be working on our prototype implementation too, > > taking in the feedback here, and hopefully send it as an RFC soon. > > Then we can look to compare and contrast and arrive at an agreed API. It > > might also be worthwhile to set up a community call for all interested parties > > in this API to discuss things with a more rapid turnaround. That was done in > > the past for other new device class APIs that were developed, e.g. eventdev. > > > > Regards, > > /Bruce