From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 41707A0C47; Tue, 6 Jul 2021 05:01:24 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B676F40688; Tue, 6 Jul 2021 05:01:23 +0200 (CEST) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by mails.dpdk.org (Postfix) with ESMTP id 44F7C4067E for ; Tue, 6 Jul 2021 05:01:21 +0200 (CEST) Received: from dggemv704-chm.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4GJnHW5DVmz1CFM3; Tue, 6 Jul 2021 10:55:51 +0800 (CST) Received: from dggpeml500024.china.huawei.com (7.185.36.10) by dggemv704-chm.china.huawei.com (10.3.19.47) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 6 Jul 2021 11:01:18 +0800 Received: from [127.0.0.1] (10.40.190.165) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 6 Jul 2021 11:01:18 +0800 To: Jerin Jacob CC: Thomas Monjalon , Ferruh Yigit , "Richardson, Bruce" , Jerin Jacob , dpdk-dev , =?UTF-8?Q?Morten_Br=c3=b8rup?= , Nipun Gupta , Hemant Agrawal , "Maxime Coquelin" , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , , Radha Mohan Chintakuntla References: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> From: fengchengwen Message-ID: <2e3b5fb4-5752-47f2-5adc-79aae5ed9482@huawei.com> Date: Tue, 6 Jul 2021 11:01:17 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.40.190.165] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To dggpeml500024.china.huawei.com (7.185.36.10) X-CFilter-Loop: Reflected Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Many thanks, mostly OK, and a few comment inline On 2021/7/4 17:30, Jerin Jacob wrote: > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng wrote: >> >> This patch introduces 'dmadevice' which is a generic type of DMA >> device. ... >> +#include > > Sort in alphabetical order. > >> + >> +/** >> + * dma_cookie_t - an opaque DMA cookie > > Since we are defining the behaviour is not opaque any more. > I think, it is better to call ring_idx or so. > This type is designed to have two meanings, return <0 on failure and return >=0 on success. How about follwing definition: typedef int dma_ring_index_t; if >= 0, it's value range is [0, 65535] = uint16_t, so driver implementation will simply. if <0, then men enqueue failure For driver, it could hold uint16_t ring_index, if enquer fail just return fail, else return the current ring_index, and update it by: ring_index++; >> + >> +/** >> + * A structure used to retrieve the contextual information of >> + * an DMA device >> + */ >> +struct rte_dmadev_info { >> + /** >> + * Fields filled by framewok > > typo. > >> + */ >> + struct rte_device *device; /**< Generic Device information */ >> + const char *driver_name; /**< Device driver name */ >> + int socket_id; /**< Socket ID where memory is allocated */ >> + >> + /** >> + * Specification fields filled by driver >> + */ >> + uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */ >> + uint16_t max_hw_queues; /**< Maximum number of HW queues. */ >> + uint16_t max_vqs_per_hw_queue; >> + /**< Maximum number of virt queues to allocate per HW queue */ >> + uint16_t max_desc; >> + /**< Maximum allowed number of virt queue descriptors */ >> + uint16_t min_desc; >> + /**< Minimum allowed number of virt queue descriptors */ > > Please add max_nb_segs. i.e maximum number of segments supported. Do you means something like "burst_size" ? > >> + >> + /** >> + * Status fields filled by driver >> + */ >> + uint16_t nb_hw_queues; /**< Number of HW queues configured */ >> + uint16_t nb_vqs; /**< Number of virt queues configured */ >> +}; >> + i >> + >> +/** >> + * dma_address_type >> + */ >> +enum dma_address_type { >> + DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */ >> + DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */ >> +}; >> + >> +/** >> + * A structure used to configure a DMA device. >> + */ >> +struct rte_dmadev_conf { >> + enum dma_address_type addr_type; /**< Address type to used */ > > I think, there are 3 kinds of limitations/capabilities. > > When the system is configured as IOVA as VA > 1) Device supports any VA address like memory from rte_malloc(), > rte_memzone(), malloc, stack memory > 2) Device support only VA address from rte_malloc(), rte_memzone() i.e > memory backed by hugepage and added to DMA map. > > When the system is configured as IOVA as PA > 1) Devices support only PA addresses . > > IMO, Above needs to be advertised as capability and application needs > to align with that > and I dont think application requests the driver to work in any of the modes. > OK, Let's put together our ideas on address type: There are three mode, we may define as: IOVA_as_VA-ALL ---for device which may need support SVA feature ---may also be a CPU memcpy 'device' IOVA_as_VA ---for device which need support IOMMU IOVA_as_PA There are many combination of the modes which device supports: eg. some device may only support IOVA_as_PA, some may only support IOVA_as_VA, and some support IOVA_as_PA and IOVA_as_VA. The specific runtime type is determined by the vfio and drive capability(e.g RTE_PCI_DRV_NEED_IOVA_AS_VA). So we already define two capabilities for this: #define RTE_DMA_DEV_CAPA_IOVA (1ull << 8) /**< Support IOVA as DMA address */ ---this cover IOVA_as_VA and IOVA_as_PA #define RTE_DMA_DEV_CAPA_VA (1ull << 9) /**< Support VA as DMA address */ ---this cover IOVA_as_VA-ALL for a device which don't support SVA: only declare RTE_DMA_DEV_CAPA_IOVA for a device which support SVA: delcare RTE_DAMA_DEV_CAPA_IOVA delcare RTE_DMA_DEV_CAPA_VA (only when IOMMU enabled and 'SVA flag' was set) for a CPU memcpy device: only declare RTE_DMA_DEV_CAPA_VA As application: - if RTE_DMA_DEV_CAPA_VA support, then it could pass any va address to the DMA, - else if RTE_DMA_DEV_CAPA_IOVA support, then it should pass iova address to the DMA - else the DMA device should not exist. > >> +__rte_experimental >> +static inline dma_cookie_t >> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, >> + const struct dma_scatterlist *sg, >> + uint32_t sg_len, uint64_t flags) > > I would like to change this as: > rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct > rte_dma_sg *src, uint32_t nb_src, > const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like > src 30 MB copy can be splitted as written as 1 MB x 30 dst. > There are already too many arguments, and the above use case could split 30 sg-item. >> +__rte_experimental >> +static inline int >> +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id) >> +{ >> + struct rte_dmadev *dev = &rte_dmadevices[dev_id]; >> + return (*dev->fence)(dev, vq_id); >> +} > > Since HW submission is in a queue(FIFO) the ordering is always > maintained. Right? > Could you share more details and use case of fence() from > driver/application PoV? > For Kunpeng DMA, hardware supports parallel execution of requests in the same queue, It applies to the following scenarios: communication with the remote end is involved. driver should ensure issure 'doorbell' after data was full written. > >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice. >> + * >> + * Trigger hardware to begin performing enqueued operations >> + * >> + * This API is used to write the "doorbell" to the hardware to trigger it >> + * to begin the operations previously enqueued by rte_dmadev_copy/fill() >> + * >> + * @param dev_id >> + * The identifier of the device. >> + * @param vq_id >> + * The identifier of virt queue. >> + * >> + * @return >> + * - =0: Successful trigger hardware. >> + * - <0: Failure to trigger hardware. >> + * >> + * NOTE: The caller must ensure that the input parameter is valid and the >> + * corresponding device supports the operation. >> + */ >> +__rte_experimental >> +static inline int >> +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id) >> +{ >> + struct rte_dmadev *dev = &rte_dmadevices[dev_id]; >> + return (*dev->perform)(dev, vq_id); >> +} > > Since we have additional function call overhead in all the > applications for this scheme, I would like to understand > the use of doing this way vs enq does the doorbell implicitly from > driver/application PoV? > Because we split the burst operation into multiple substeps: for each enq we don't issue 'doorbell', and at last call perform() to issue 'doorbell'. For ARM platform, should call mb ops when issue 'doorbell', if call mb ops every enq, it may lead significant performance degradation. >> +__rte_experimental >> +static inline uint16_t >> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id, >> + const uint16_t nb_status, uint32_t *status, >> + dma_cookie_t *cookie) > > IMO, it is better to move cookie/rind_idx at 3. > Why it would return any array of errors? since it called after > rte_dmadev_completed() has > has_error. Is it better to change > > rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t > *cookie, uint32_t *status) > > I also think, we may need to set status as bitmask and enumerate all > the combination of error codes > of all the driver and return string from driver existing rte_flow_error > bitmask has limit for most 32 (or we can extend 64), and also the rte_flow_error is heavy. Considering that errors are a small number of scenarios, so it's OK to pass status array, and status have 32bit it could denotes a very large number of errcode. >> + >> +struct rte_dmadev_stats { >> + uint64_t enqueue_fail_count; >> + /**< Conut of all operations which failed enqueued */ >> + uint64_t enqueued_count; >> + /**< Count of all operations which successful enqueued */ >> + uint64_t completed_fail_count; >> + /**< Count of all operations which failed to complete */ >> + uint64_t completed_count; >> + /**< Count of all operations which successful complete */ >> +}; > > We need to have capability API to tell which items are > updated/supported by the driver. > There are fewer fields, and I don't think it's necessary to add capability API, for those who don't support, it could don't implement the callback. For those support, these fields are minimum et. > >> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h >> new file mode 100644 >> index 0000000..a3afea2 >> --- /dev/null >> +++ b/lib/dmadev/rte_dmadev_core.h >> @@ -0,0 +1,98 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright 2021 HiSilicon Limited. >> + */ >> + >> +#ifndef _RTE_DMADEV_CORE_H_ >> +#define _RTE_DMADEV_CORE_H_ >> + >> +/** >> + * @file >> + * >> + * RTE DMA Device internal header. >> + * >> + * This header contains internal data types. But they are still part of the >> + * public API because they are used by inline public functions. >> + */ >> + >> +struct rte_dmadev; >> + >> +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id, >> + void *src, void *dst, >> + uint32_t length, uint64_t flags); >> +/**< @internal Function used to enqueue a copy operation. */ > > To avoid namespace conflict(as it is public API) use rte_ These are internal function used by driver, not application. and the eth/regexdev_core also defined without rte_ So I think it should remain as it is. > > >> + >> +/** >> + * The data structure associated with each DMA device. >> + */ >> +struct rte_dmadev { >> + /**< Enqueue a copy operation onto the DMA device. */ >> + dmadev_copy_t copy; >> + /**< Enqueue a scatter list copy operation onto the DMA device. */ >> + dmadev_copy_sg_t copy_sg; >> + /**< Enqueue a fill operation onto the DMA device. */ >> + dmadev_fill_t fill; >> + /**< Enqueue a scatter list fill operation onto the DMA device. */ >> + dmadev_fill_sg_t fill_sg; >> + /**< Add a fence to force ordering between operations. */ >> + dmadev_fence_t fence; >> + /**< Trigger hardware to begin performing enqueued operations. */ >> + dmadev_perform_t perform; >> + /**< Returns the number of operations that successful completed. */ >> + dmadev_completed_t completed; >> + /**< Returns the number of operations that failed to complete. */ >> + dmadev_completed_fails_t completed_fails; > > We need to limit fastpath items in 1 CL yes, currently there are 8 callback, which just fill one cache line.