From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 46618A0C47; Tue, 6 Jul 2021 12:01:10 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2B5A841262; Tue, 6 Jul 2021 12:01:10 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 7F7BB4120E for ; Tue, 6 Jul 2021 12:01:08 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10036"; a="272935059" X-IronPort-AV: E=Sophos;i="5.83,328,1616482800"; d="scan'208";a="272935059" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Jul 2021 03:01:07 -0700 X-IronPort-AV: E=Sophos;i="5.83,328,1616482800"; d="scan'208";a="647261882" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.23.242]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 06 Jul 2021 03:01:03 -0700 Date: Tue, 6 Jul 2021 11:01:00 +0100 From: Bruce Richardson To: fengchengwen Cc: Jerin Jacob , Thomas Monjalon , Ferruh Yigit , Jerin Jacob , dpdk-dev , Morten =?iso-8859-1?Q?Br=F8rup?= , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , liangma@liangbit.com, Radha Mohan Chintakuntla Message-ID: References: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> <2e3b5fb4-5752-47f2-5adc-79aae5ed9482@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2e3b5fb4-5752-47f2-5adc-79aae5ed9482@huawei.com> Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Jul 06, 2021 at 11:01:17AM +0800, fengchengwen wrote: > Many thanks, mostly OK, and a few comment inline > > On 2021/7/4 17:30, Jerin Jacob wrote: > > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng wrote: > >> > >> This patch introduces 'dmadevice' which is a generic type of DMA > >> device. > ... > >> +#include > > > > Sort in alphabetical order. > > > >> + > >> +/** > >> + * dma_cookie_t - an opaque DMA cookie > > > > Since we are defining the behaviour is not opaque any more. > > I think, it is better to call ring_idx or so. > > > > > This type is designed to have two meanings, return <0 on failure and return >=0 on success. > > How about follwing definition: > typedef int dma_ring_index_t; > > if >= 0, it's value range is [0, 65535] = uint16_t, so driver implementation will simply. > if <0, then men enqueue failure > > For driver, it could hold uint16_t ring_index, if enquer fail just return fail, else return > the current ring_index, and update it by: ring_index++; > Well, yes and no on the "two meanings". For the enqueue function, yes the return value can have two meanings, but I don't consider them one type. On the completion call, however, this can only be positive values < UINT16_MAX, so having two meanings is actually confusing. Better to have * enqueue return regular int, with doxygen comment "@return Negative on error, otherwise job index between 0 and UINT16_MAX" * for completions, take a uint16_t* parameter for the last completed index since no negative values are needed. Beyond this, we generally don't use typedefs in DPDK for basic types (with a few exceptions e.g. rte_iova_t), and save their use only for function pointers. > >> + > >> +/** > >> + * A structure used to retrieve the contextual information of > >> + * an DMA device > >> + */ > >> +struct rte_dmadev_info { > >> + /** > >> + * Fields filled by framewok > > > > typo. > > > >> + */ > >> + struct rte_device *device; /**< Generic Device information */ > >> + const char *driver_name; /**< Device driver name */ > >> + int socket_id; /**< Socket ID where memory is allocated */ > >> + > >> + /** > >> + * Specification fields filled by driver > >> + */ > >> + uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */ > >> + uint16_t max_hw_queues; /**< Maximum number of HW queues. */ > >> + uint16_t max_vqs_per_hw_queue; > >> + /**< Maximum number of virt queues to allocate per HW queue */ > >> + uint16_t max_desc; > >> + /**< Maximum allowed number of virt queue descriptors */ > >> + uint16_t min_desc; > >> + /**< Minimum allowed number of virt queue descriptors */ > > > > Please add max_nb_segs. i.e maximum number of segments supported. > > Do you means something like "burst_size" ? > > > > >> + > >> + /** > >> + * Status fields filled by driver > >> + */ > >> + uint16_t nb_hw_queues; /**< Number of HW queues configured */ > >> + uint16_t nb_vqs; /**< Number of virt queues configured */ > >> +}; > >> + i > >> + > >> +/** > >> + * dma_address_type > >> + */ > >> +enum dma_address_type { > >> + DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */ > >> + DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */ > >> +}; > >> + > >> +/** > >> + * A structure used to configure a DMA device. > >> + */ > >> +struct rte_dmadev_conf { > >> + enum dma_address_type addr_type; /**< Address type to used */ > > > > I think, there are 3 kinds of limitations/capabilities. > > > > When the system is configured as IOVA as VA > > 1) Device supports any VA address like memory from rte_malloc(), > > rte_memzone(), malloc, stack memory > > 2) Device support only VA address from rte_malloc(), rte_memzone() i.e > > memory backed by hugepage and added to DMA map. > > > > When the system is configured as IOVA as PA > > 1) Devices support only PA addresses . > > > > IMO, Above needs to be advertised as capability and application needs > > to align with that > > and I dont think application requests the driver to work in any of the modes. > > > > OK, Let's put together our ideas on address type: > > There are three mode, we may define as: > IOVA_as_VA-ALL ---for device which may need support SVA feature > ---may also be a CPU memcpy 'device' > IOVA_as_VA ---for device which need support IOMMU > IOVA_as_PA > > There are many combination of the modes which device supports: eg. some device > may only support IOVA_as_PA, some may only support IOVA_as_VA, and some support > IOVA_as_PA and IOVA_as_VA. The specific runtime type is determined by the vfio > and drive capability(e.g RTE_PCI_DRV_NEED_IOVA_AS_VA). > > So we already define two capabilities for this: > #define RTE_DMA_DEV_CAPA_IOVA (1ull << 8) /**< Support IOVA as DMA address */ > ---this cover IOVA_as_VA and IOVA_as_PA > #define RTE_DMA_DEV_CAPA_VA (1ull << 9) /**< Support VA as DMA address */ > ---this cover IOVA_as_VA-ALL > for a device which don't support SVA: > only declare RTE_DMA_DEV_CAPA_IOVA > for a device which support SVA: > delcare RTE_DAMA_DEV_CAPA_IOVA > delcare RTE_DMA_DEV_CAPA_VA (only when IOMMU enabled and 'SVA flag' was set) > for a CPU memcpy device: > only declare RTE_DMA_DEV_CAPA_VA > > As application: > - if RTE_DMA_DEV_CAPA_VA support, then it could pass any va address to the DMA, > - else if RTE_DMA_DEV_CAPA_IOVA support, then it should pass iova address to the DMA > - else the DMA device should not exist. > I still don't think we need all of this. DPDK already has support through the existing bus infrastructure for determining if DPDK needs to use physical or virtual addresses, so we should not be duplicating that as devices *cannot* use a different addressing mode to DPDK itself. Given that, the only flag we need is one to indicate SVA support. > > > > > > I also think, we may need to set status as bitmask and enumerate all > > the combination of error codes > > of all the driver and return string from driver existing rte_flow_error > > > > bitmask has limit for most 32 (or we can extend 64), and also the rte_flow_error is > heavy. > > Considering that errors are a small number of scenarios, so it's OK to > pass status array, and status have 32bit it could denotes a very large number > of errcode. > +1 to this. > >> + > >> +struct rte_dmadev_stats { > >> + uint64_t enqueue_fail_count; > >> + /**< Conut of all operations which failed enqueued */ > >> + uint64_t enqueued_count; > >> + /**< Count of all operations which successful enqueued */ > >> + uint64_t completed_fail_count; > >> + /**< Count of all operations which failed to complete */ > >> + uint64_t completed_count; > >> + /**< Count of all operations which successful complete */ > >> +}; > > > > We need to have capability API to tell which items are > > updated/supported by the driver. > > > > There are fewer fields, and I don't think it's necessary to add capability API, > for those who don't support, it could don't implement the callback. > For those support, these fields are minimum et. > > > > >> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h > >> new file mode 100644 > >> index 0000000..a3afea2 > >> --- /dev/null > >> +++ b/lib/dmadev/rte_dmadev_core.h > >> @@ -0,0 +1,98 @@ > >> +/* SPDX-License-Identifier: BSD-3-Clause > >> + * Copyright 2021 HiSilicon Limited. > >> + */ > >> + > >> +#ifndef _RTE_DMADEV_CORE_H_ > >> +#define _RTE_DMADEV_CORE_H_ > >> + > >> +/** > >> + * @file > >> + * > >> + * RTE DMA Device internal header. > >> + * > >> + * This header contains internal data types. But they are still part of the > >> + * public API because they are used by inline public functions. > >> + */ > >> + > >> +struct rte_dmadev; > >> + > >> +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id, > >> + void *src, void *dst, > >> + uint32_t length, uint64_t flags); > >> +/**< @internal Function used to enqueue a copy operation. */ > > > > To avoid namespace conflict(as it is public API) use rte_ > > These are internal function used by driver, not application. > and the eth/regexdev_core also defined without rte_ > > So I think it should remain as it is. > Even if only used by a driver, APIs are exported from the .so built for the library, which means that they become public for apps using the lib. Even for header-only symbols for drivers, it's good practice to put the prefix since they are for use outside the compilation unit. > > > > > >> + > >> +/** > >> + * The data structure associated with each DMA device. > >> + */ > >> +struct rte_dmadev { > >> + /**< Enqueue a copy operation onto the DMA device. */ > >> + dmadev_copy_t copy; > >> + /**< Enqueue a scatter list copy operation onto the DMA device. */ > >> + dmadev_copy_sg_t copy_sg; > >> + /**< Enqueue a fill operation onto the DMA device. */ > >> + dmadev_fill_t fill; > >> + /**< Enqueue a scatter list fill operation onto the DMA device. */ > >> + dmadev_fill_sg_t fill_sg; > >> + /**< Add a fence to force ordering between operations. */ > >> + dmadev_fence_t fence; > >> + /**< Trigger hardware to begin performing enqueued operations. */ > >> + dmadev_perform_t perform; > >> + /**< Returns the number of operations that successful completed. */ > >> + dmadev_completed_t completed; > >> + /**< Returns the number of operations that failed to complete. */ > >> + dmadev_completed_fails_t completed_fails; > > > > We need to limit fastpath items in 1 CL > > yes, currently there are 8 callback, which just fill one cache line. > Before we get overly concerned about this, I think we should benchmark it to see how much our "one cacheline" is giving us compared to having them in ops. For example, the "perform" doorbell function, or the completed function is only called once every burst, so it would be interesting to see how much difference it really makes for that. /Bruce