From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B1286A0A0F; Mon, 5 Jul 2021 17:56:04 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 41B6B41184; Mon, 5 Jul 2021 17:56:04 +0200 (CEST) Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by mails.dpdk.org (Postfix) with ESMTP id 809454068C for ; Mon, 5 Jul 2021 17:56:02 +0200 (CEST) Received: by mail-il1-f179.google.com with SMTP id e13so1340511ilc.1 for ; Mon, 05 Jul 2021 08:56:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=UtkL5glptt7u/1qd85RMV6pXpl4gCQ4xKHqcN2XzZHU=; b=fdowqAnKyD8D70aNv84KRac3fgcIknd5tJC5Bj4dJhTcwn7EbH5PTLeNsI2IPFyarH qlqGRa4pkAPbALQ/hFZx+HyF2oNKHI2HCcjNjwvJiA8eFtNgvRVWXh7ArCAsf/jGyt5n GsNevhh2fSHV2utHU9lpYCLaahCOoHYVm8OwuOIyP/HwsCM49x2VEO/Vi/tBwEZUm5Co UcVKpLC4/BiyH7PePLI+53jaEN2/9enMkdXfH+Z8dpQ4FL6C5kZshxSsxFzWcpAm6eqC SpFOEliCXjFe1bedLQ8uIbeEjzCIio0iITv4Cz8IEZy7D2hu5Hx6bFtkGpIbCHGIwqMB 16CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=UtkL5glptt7u/1qd85RMV6pXpl4gCQ4xKHqcN2XzZHU=; b=YMakrlIYRHyooK+VM9ZwaZveroxeVYCaG5C5ZvI+qniUMsG+OHozEaxPI9NTTzZYJS h/eJew9YPs+ZTbrQXqveLhNEGFo2vs4qdlEct+DUsOy2+YryNDrTv94mJXFAacjIdvTQ hHAu79gEYABNp8O8azfyQRXbrpD27nETXNfmAVXTeE0cNUKG7wCJey9xzst5EJgZCx3f TYoUxUwT69QBkBBIzE2HJsDlom+bzBE/KRRPOR4+gQLIwDYRiFWVLV7J7rHsnjtM+cD8 I0fl6iPAsWquu0Q1MSDAJOmeeK3ewIzZieup95qN4taQiwNSNaS7K8xtS+c8Y41FY3Mq 0e/A== X-Gm-Message-State: AOAM532LNdagsOwGELIB7/zdT71jamVcPwvIkl81EU4w0RpEd8MZxYmU hBG+ky1/N7aZI6cyGqNxXbwWO5Bc4rXMzbjxWVw= X-Google-Smtp-Source: ABdhPJw1kUfiO6d8PzT8kvZH+TPXbC6ZLKCv0qgSERm0KI88ZwDkWnnPcyGR0sn0nLqtQAoElkiKXIHsy2jozPDhov4= X-Received: by 2002:a92:bf0b:: with SMTP id z11mr11639438ilh.60.1625500561435; Mon, 05 Jul 2021 08:56:01 -0700 (PDT) MIME-Version: 1.0 References: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> In-Reply-To: From: Jerin Jacob Date: Mon, 5 Jul 2021 21:25:34 +0530 Message-ID: To: Bruce Richardson Cc: Chengwen Feng , Thomas Monjalon , Ferruh Yigit , Jerin Jacob , dpdk-dev , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , liangma@liangbit.com, Radha Mohan Chintakuntla Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" need On Mon, Jul 5, 2021 at 4:22 PM Bruce Richardson wrote: > > On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote: > > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng = wrote: > > > > > > This patch introduces 'dmadevice' which is a generic type of DMA > > > device. > > > > > > The APIs of dmadev library exposes some generic operations which can > > > enable configuration and I/O with the DMA devices. > > > > > > Signed-off-by: Chengwen Feng > > > > Thanks for v1. > > > > I would suggest finalizing lib/dmadev/rte_dmadev.h before doing the > > implementation so that you don't need > > to waste time on rewoking the implementation. > > > > I actually like having the .c file available too. Before we lock down the > .h file and the API, I want to verify the performance of our drivers with > the implementation, and having a working .c file is obviously necessary f= or > that. So I appreciate having it as part of the RFC. Ack. > > > Comments inline. > > > > > --- > > > > + * > > > + * The DMA framework is built on the following abstraction model: > > > + * > > > + * ------------ ------------ > > > + * |virt-queue| |virt-queue| > > > + * ------------ ------------ > > > + * \ / > > > + * \ / > > > + * \ / > > > + * ------------ ------------ > > > + * | HW-queue | | HW-queue | > > > + * ------------ ------------ > > > + * \ / > > > + * \ / > > > + * \ / > > > + * ---------- > > > + * | dmadev | > > > + * ---------- > > > > Continuing the discussion with @Morten Br=C3=B8rup , I think, we need t= o > > finalize the model. > > > > +1 and the terminology with regards to queues and channels. With our ioat > hardware, each HW queue was called a channel for instance. Looks like <> can cover all the use cases, if the HW has more than 1 queues it can be exposed as separate dmadev dev. > > > > + * a) The DMA operation request must be submitted to the virt queu= e, virt > > > + * queues must be created based on HW queues, the DMA device co= uld have > > > + * multiple HW queues. > > > + * b) The virt queues on the same HW-queue could represent differe= nt contexts, > > > + * e.g. user could create virt-queue-0 on HW-queue-0 for mem-to= -mem > > > + * transfer scenario, and create virt-queue-1 on the same HW-qu= eue for > > > + * mem-to-dev transfer scenario. > > > + * NOTE: user could also create multiple virt queues for mem-to-me= m transfer > > > + * scenario as long as the corresponding driver supports. > > > + * > > > + * The control plane APIs include configure/queue_setup/queue_releas= e/start/ > > > + * stop/reset/close, in order to start device work, the call sequenc= e must be > > > + * as follows: > > > + * - rte_dmadev_configure() > > > + * - rte_dmadev_queue_setup() > > > + * - rte_dmadev_start() > > > > Please add reconfigure behaviour etc, Please check the > > lib/regexdev/rte_regexdev.h > > introduction. I have added similar ones so you could reuse as much as p= ossible. > > > > > > > + * The dataplane APIs include two parts: > > > + * a) The first part is the submission of operation requests: > > > + * - rte_dmadev_copy() > > > + * - rte_dmadev_copy_sg() - scatter-gather form of copy > > > + * - rte_dmadev_fill() > > > + * - rte_dmadev_fill_sg() - scatter-gather form of fill > > > + * - rte_dmadev_fence() - add a fence force ordering betwee= n operations > > > + * - rte_dmadev_perform() - issue doorbell to hardware > > > + * These APIs could work with different virt queues which have = different > > > + * contexts. > > > + * The first four APIs are used to submit the operation request= to the virt > > > + * queue, if the submission is successful, a cookie (as type > > > + * 'dma_cookie_t') is returned, otherwise a negative number is = returned. > > > + * b) The second part is to obtain the result of requests: > > > + * - rte_dmadev_completed() > > > + * - return the number of operation requests completed su= ccessfully. > > > + * - rte_dmadev_completed_fails() > > > + * - return the number of operation requests failed to co= mplete. > > > + * > > > + * The misc APIs include info_get/queue_info_get/stats/xstats/selfte= st, provide > > > + * information query and self-test capabilities. > > > + * > > > + * About the dataplane APIs MT-safe, there are two dimensions: > > > + * a) For one virt queue, the submit/completion API could be MT-sa= fe, > > > + * e.g. one thread do submit operation, another thread do compl= etion > > > + * operation. > > > + * If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VQ. > > > + * If driver don't support it, it's up to the application to gu= arantee > > > + * MT-safe. > > > + * b) For multiple virt queues on the same HW queue, e.g. one thre= ad do > > > + * operation on virt-queue-0, another thread do operation on vi= rt-queue-1. > > > + * If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MVQ. > > > + * If driver don't support it, it's up to the application to gu= arantee > > > + * MT-safe. > > > > From an application PoV it may not be good to write portable > > applications. Please check > > latest thread with @Morten Br=C3=B8rup > > > > > + */ > > > + > > > +#ifdef __cplusplus > > > +extern "C" { > > > +#endif > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > > Sort in alphabetical order. > > > > > + > > > +/** > > > + * dma_cookie_t - an opaque DMA cookie > > > > Since we are defining the behaviour is not opaque any more. > > I think, it is better to call ring_idx or so. > > > > +1 for ring index. We don't need a separate type for it though, just > document the index as an unsigned return value. > > > > > > +#define RTE_DMA_DEV_CAPA_MT_MVQ (1ull << 11) /**< Support MT-safe of= multiple virt queues */ > > > > Please lot of @see for all symbols where it is being used. So that one > > can understand the full scope of > > symbols. See below example. > > > > #define RTE_REGEXDEV_CAPA_RUNTIME_COMPILATION_F (1ULL << 0) > > /**< RegEx device does support compiling the rules at runtime unlike > > * loading only the pre-built rule database using > > * struct rte_regexdev_config::rule_db in rte_regexdev_configure() > > * > > * @see struct rte_regexdev_config::rule_db, rte_regexdev_configure() > > * @see struct rte_regexdev_info::regexdev_capa > > */ > > > > > + * > > > + * If dma_cookie_t is >=3D0 it's a DMA operation request cookie, <0 = it's a error > > > + * code. > > > + * When using cookies, comply with the following rules: > > > + * a) Cookies for each virtual queue are independent. > > > + * b) For a virt queue, the cookie are monotonically incremented, wh= en it reach > > > + * the INT_MAX, it wraps back to zero. > > I disagree with the INT_MAX (or INT32_MAX) value here. If we use that > value, it means that we cannot use implicit wrap-around inside the CPU an= d > have to check for the INT_MAX value. Better to: > 1. Specify that it wraps at UINT16_MAX which allows us to just use a > uint16_t internally and wrap-around automatically, or: > 2. Specify that it wraps at a power-of-2 value >=3D UINT16_MAX, giving > drivers the flexibility at what value to wrap around. I think, (2) better than 1. I think, even better to wrap around the number = of descriptors configured in dev_configure()(We cake make this as the power of= 2), > > > > + * c) The initial cookie of a virt queue is zero, after the device i= s stopped or > > > + * reset, the virt queue's cookie needs to be reset to zero. > > > + * Example: > > > + * step-1: start one dmadev > > > + * step-2: enqueue a copy operation, the cookie return is 0 > > > + * step-3: enqueue a copy operation again, the cookie return is 1 > > > + * ... > > > + * step-101: stop the dmadev > > > + * step-102: start the dmadev > > > + * step-103: enqueue a copy operation, the cookie return is 0 > > > + * ... > > > + */ > > > > Good explanation. > > > > > +typedef int32_t dma_cookie_t; > > > > As I mentioned before, I'd just remove this, and use regular int types, > with "ring_idx" as the name. +1 > > > > > > + > > > +/** > > > + * dma_scatterlist - can hold scatter DMA operation request > > > + */ > > > +struct dma_scatterlist { > > > > I prefer to change scatterlist -> sg > > i.e rte_dma_sg > > > > > + void *src; > > > + void *dst; > > > + uint32_t length; > > > +}; > > > + > > > > > + > > > +/** > > > + * A structure used to retrieve the contextual information of > > > + * an DMA device > > > + */ > > > +struct rte_dmadev_info { > > > + /** > > > + * Fields filled by framewok > > > > typo. > > > > > + */ > > > + struct rte_device *device; /**< Generic Device information */ > > > + const char *driver_name; /**< Device driver name */ > > > + int socket_id; /**< Socket ID where memory is allocated */ > > > + > > > + /** > > > + * Specification fields filled by driver > > > + */ > > > + uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA= _) */ > > > + uint16_t max_hw_queues; /**< Maximum number of HW queues. */ > > > + uint16_t max_vqs_per_hw_queue; > > > + /**< Maximum number of virt queues to allocate per HW queue *= / > > > + uint16_t max_desc; > > > + /**< Maximum allowed number of virt queue descriptors */ > > > + uint16_t min_desc; > > > + /**< Minimum allowed number of virt queue descriptors */ > > > > Please add max_nb_segs. i.e maximum number of segments supported. > > > > > + > > > + /** > > > + * Status fields filled by driver > > > + */ > > > + uint16_t nb_hw_queues; /**< Number of HW queues configured */ > > > + uint16_t nb_vqs; /**< Number of virt queues configured */ > > > +}; > > > + i > > > + > > > +/** > > > + * dma_address_type > > > + */ > > > +enum dma_address_type { > > > + DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */ > > > + DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */ > > > +}; > > > + > > > +/** > > > + * A structure used to configure a DMA device. > > > + */ > > > +struct rte_dmadev_conf { > > > + enum dma_address_type addr_type; /**< Address type to used */ > > > > I think, there are 3 kinds of limitations/capabilities. > > > > When the system is configured as IOVA as VA > > 1) Device supports any VA address like memory from rte_malloc(), > > rte_memzone(), malloc, stack memory > > 2) Device support only VA address from rte_malloc(), rte_memzone() i.e > > memory backed by hugepage and added to DMA map. > > > > When the system is configured as IOVA as PA > > 1) Devices support only PA addresses . > > > > IMO, Above needs to be advertised as capability and application needs > > to align with that > > and I dont think application requests the driver to work in any of the = modes. > > > > > > I don't think we need this level of detail for addressing capabilities. > Unless I'm missing something, the hardware should behave exactly as other > hardware does taking in iova's. If the user wants to check whether virtu= al > addresses to pinned memory can be used directly, the user can call > "rte_eal_iova_mode". We can't have a situation where some hardware uses o= ne > type of addresses and another hardware the other. > > Therefore, the only additional addressing capability we should need to > report is that the hardware can use SVM/SVA and use virtual addresses not > in hugepage memory. +1. > > > > > > + uint16_t nb_hw_queues; /**< Number of HW-queues enable to use= */ > > > + uint16_t max_vqs; /**< Maximum number of virt queues to use *= / > > > > You need to what is max value allowed etc i.e it is based on > > info_get() and mention the field > > in info structure > > > > > > > + > > > +/** > > > + * dma_transfer_direction > > > + */ > > > +enum dma_transfer_direction { > > > > rte_dma_transter_direction > > > > > + DMA_MEM_TO_MEM, > > > + DMA_MEM_TO_DEV, > > > + DMA_DEV_TO_MEM, > > > + DMA_DEV_TO_DEV, > > > +}; > > > + > > > +/** > > > + * A structure used to configure a DMA virt queue. > > > + */ > > > +struct rte_dmadev_queue_conf { > > > + enum dma_transfer_direction direction; > > > > > > > + /**< Associated transfer direction */ > > > + uint16_t hw_queue_id; /**< The HW queue on which to create vi= rt queue */ > > > + uint16_t nb_desc; /**< Number of descriptor for this virt que= ue */ > > > + uint64_t dev_flags; /**< Device specific flags */ > > > > Use of this? Need more comments on this. > > Since it is in slowpath, We can have non opaque names here based on > > each driver capability. > > > > > > > + void *dev_ctx; /**< Device specific context */ > > > > Use of this ? Need more comment ont this. > > > > I think this should be dropped. We should not have any opaque > device-specific info in these structs, rather if a particular device need= s > parameters we should call them out. Drivers for which it's not relevant c= an > ignore them (and report same in capability if necessary). Since this is n= ot > a dataplane API, we aren't concerned too much about perf and can size the > struct appropriately. > > > > > Please add some good amount of reserved bits and have API to init this > > structure for future ABI stability, say rte_dmadev_queue_config_init() > > or so. > > > > I don't think that is necessary. Since the config struct is used only as > parameter to the config function, any changes to it can be managed by > versioning that single function. Padding would only be necessary if we ha= d > an array of these config structs somewhere. OK. For some reason, the versioning API looks ugly to me in code instead of kee= ping some rsvd fields look cool to me with init function. But I agree. function versioning works in this case. No need to find other = API if tt is not general DPDK API practice. In other libraries, I have seen such _init or function that can use for this as well as filling default value in some cases implementation values is not zero). So that application can avoid memset for param structure. Added rte_event_queue_default_conf_get() in eventdev spec for this. No strong opinion on this. > > > > > > + > > > +/** > > > + * A structure used to retrieve information of a DMA virt queue. > > > + */ > > > +struct rte_dmadev_queue_info { > > > + enum dma_transfer_direction direction; > > > > A queue may support all directions so I think it should be a bitfield. > > > > > + /**< Associated transfer direction */ > > > + uint16_t hw_queue_id; /**< The HW queue on which to create vi= rt queue */ > > > + uint16_t nb_desc; /**< Number of descriptor for this virt que= ue */ > > > + uint64_t dev_flags; /**< Device specific flags */ > > > +}; > > > + > > > > > +__rte_experimental > > > +static inline dma_cookie_t > > > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, > > > + const struct dma_scatterlist *sg, > > > + uint32_t sg_len, uint64_t flags) > > > > I would like to change this as: > > rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct > > rte_dma_sg *src, uint32_t nb_src, > > const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case= like > > src 30 MB copy can be splitted as written as 1 MB x 30 dst. > > > > > > > > > +{ > > > + struct rte_dmadev *dev =3D &rte_dmadevices[dev_id]; > > > + return (*dev->copy_sg)(dev, vq_id, sg, sg_len, flags); > > > +} > > > + > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > + * > > > + * Enqueue a fill operation onto the DMA virt queue > > > + * > > > + * This queues up a fill operation to be performed by hardware, but = does not > > > + * trigger hardware to begin that operation. > > > + * > > > + * @param dev_id > > > + * The identifier of the device. > > > + * @param vq_id > > > + * The identifier of virt queue. > > > + * @param pattern > > > + * The pattern to populate the destination buffer with. > > > + * @param dst > > > + * The address of the destination buffer. > > > + * @param length > > > + * The length of the destination buffer. > > > + * @param flags > > > + * An opaque flags for this operation. > > > > PLEASE REMOVE opaque stuff from fastpath it will be a pain for > > application writers as > > they need to write multiple combinations of fastpath. flags are OK, if > > we have a valid > > generic flag now to control the transfer behavior. > > > > +1. Flags need to be explicitly listed. If we don't have any flags for no= w, > we can specify that the value must be given as zero and it's for future > use. OK. > > > > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > + * > > > + * Add a fence to force ordering between operations > > > + * > > > + * This adds a fence to a sequence of operations to enforce ordering= , such that > > > + * all operations enqueued before the fence must be completed before= operations > > > + * after the fence. > > > + * NOTE: Since this fence may be added as a flag to the last operati= on enqueued, > > > + * this API may not function correctly when called immediately after= an > > > + * "rte_dmadev_perform" call i.e. before any new operations are enqu= eued. > > > + * > > > + * @param dev_id > > > + * The identifier of the device. > > > + * @param vq_id > > > + * The identifier of virt queue. > > > + * > > > + * @return > > > + * - =3D0: Successful add fence. > > > + * - <0: Failure to add fence. > > > + * > > > + * NOTE: The caller must ensure that the input parameter is valid an= d the > > > + * corresponding device supports the operation. > > > + */ > > > +__rte_experimental > > > +static inline int > > > +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id) > > > +{ > > > + struct rte_dmadev *dev =3D &rte_dmadevices[dev_id]; > > > + return (*dev->fence)(dev, vq_id); > > > +} > > > > Since HW submission is in a queue(FIFO) the ordering is always > > maintained. Right? > > Could you share more details and use case of fence() from > > driver/application PoV? > > > > There are different kinds of ordering to consider, ordering of completion= s > and the ordering of operations. While jobs are reported as completed to t= he > user in order, for performance hardware, may overlap individual jobs with= in > a burst (or even across bursts). Therefore, we need a fence operation to > inform hardware that one job should not be started until the other has > fully completed. Got it. In order to save space if first CL size for fastpath(Saving 8B for the pointer) and to avoid function overhead, Can we use one bit of flags of op function to enable the fence? > > > > > > + > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > + * > > > + * Trigger hardware to begin performing enqueued operations > > > + * > > > + * This API is used to write the "doorbell" to the hardware to trigg= er it > > > + * to begin the operations previously enqueued by rte_dmadev_copy/fi= ll() > > > + * > > > + * @param dev_id > > > + * The identifier of the device. > > > + * @param vq_id > > > + * The identifier of virt queue. > > > + * > > > + * @return > > > + * - =3D0: Successful trigger hardware. > > > + * - <0: Failure to trigger hardware. > > > + * > > > + * NOTE: The caller must ensure that the input parameter is valid an= d the > > > + * corresponding device supports the operation. > > > + */ > > > +__rte_experimental > > > +static inline int > > > +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id) > > > +{ > > > + struct rte_dmadev *dev =3D &rte_dmadevices[dev_id]; > > > + return (*dev->perform)(dev, vq_id); > > > +} > > > > Since we have additional function call overhead in all the > > applications for this scheme, I would like to understand > > the use of doing this way vs enq does the doorbell implicitly from > > driver/application PoV? > > > > In our benchmarks it's just faster. When we tested it, the overhead of th= e > function calls was noticably less than the cost of building up the > parameter array(s) for passing the jobs in as a burst. [We don't see this > cost with things like NIC I/O since DPDK tends to already have the mbuf > fully populated before the TX call anyway.] OK. I agree with stack population. My question was more on doing implicit doorbell update enq. Is doorbell wri= te costly in other HW compare to a function call? In our HW, it is just write = of the number of instructions written in a register. Also, we need to again access the internal PMD memory structure to find where to write etc if it is a separate function. > > > > > > + > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > + * > > > + * Returns the number of operations that have been successful comple= ted. > > > + * > > > + * @param dev_id > > > + * The identifier of the device. > > > + * @param vq_id > > > + * The identifier of virt queue. > > > + * @param nb_cpls > > > + * The maximum number of completed operations that can be processe= d. > > > + * @param[out] cookie > > > + * The last completed operation's cookie. > > > + * @param[out] has_error > > > + * Indicates if there are transfer error. > > > + * > > > + * @return > > > + * The number of operations that successful completed. > > > > successfully > > > > > + * > > > + * NOTE: The caller must ensure that the input parameter is valid an= d the > > > + * corresponding device supports the operation. > > > + */ > > > +__rte_experimental > > > +static inline uint16_t > > > +rte_dmadev_completed(uint16_t dev_id, uint16_t vq_id, const uint16_t= nb_cpls, > > > + dma_cookie_t *cookie, bool *has_error) > > > +{ > > > + struct rte_dmadev *dev =3D &rte_dmadevices[dev_id]; > > > + has_error =3D false; > > > + return (*dev->completed)(dev, vq_id, nb_cpls, cookie, has_err= or); > > > > It may be better to have cookie/ring_idx as third argument. > > > > No strong opinions here, but having it as in the code above means all > input parameters come before all output, which makes sense to me. +1 > > > > +} > > > + > > > +/** > > > + * @warning > > > + * @b EXPERIMENTAL: this API may change without prior notice. > > > + * > > > + * Returns the number of operations that failed to complete. > > > + * NOTE: This API was used when rte_dmadev_completed has_error was s= et. > > > + * > > > + * @param dev_id > > > + * The identifier of the device. > > > + * @param vq_id > > > + * The identifier of virt queue. > > (> + * @param nb_status > > > + * Indicates the size of status array. > > > + * @param[out] status > > > + * The error code of operations that failed to complete. > > > + * @param[out] cookie > > > + * The last failed completed operation's cookie. > > > + * > > > + * @return > > > + * The number of operations that failed to complete. > > > + * > > > + * NOTE: The caller must ensure that the input parameter is valid an= d the > > > + * corresponding device supports the operation. > > > + */ > > > +__rte_experimental > > > +static inline uint16_t > > > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id, > > > + const uint16_t nb_status, uint32_t *status= , > > > + dma_cookie_t *cookie) > > > > IMO, it is better to move cookie/rind_idx at 3. > > Why it would return any array of errors? since it called after > > rte_dmadev_completed() has > > has_error. Is it better to change > > > > rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t > > *cookie, uint32_t *status) > > > > I also think, we may need to set status as bitmask and enumerate all > > the combination of error codes > > of all the driver and return string from driver existing rte_flow_error > > > > See > > struct rte_flow_error { > > enum rte_flow_error_type type; /**< Cause field and error types= . */ > > const void *cause; /**< Object responsible for the error. */ > > const char *message; /**< Human-readable error message. */ > > }; > > > > I think we need a multi-return value API here, as we may add operations i= n > future which have non-error status values to return. The obvious case is > DMA engines which support "compare" operations. In that case a successful > compare (as in there were no DMA or HW errors) can return "equal" or > "not-equal" as statuses. For general "copy" operations, the faster > completion op can be used to just return successful values (and only call > this status version on error), while apps using those compare ops or a > mixture of copy and compare ops, would always use the slower one that > returns status values for each and every op.. > > The ioat APIs used 32-bit integer values for this status array so as to > allow e.g. 16-bits for error code and 16-bits for future status values. F= or > most operations there should be a fairly small set of things that can go > wrong, i.e. bad source address, bad destination address or invalid length= . > Within that we may have a couple of specifics for why an address is bad, > but even so I don't think we need to start having multiple bit > combinations. OK. What is the purpose of errors status? Is it for application printing it= or Does the application need to take any action based on specific error reques= ts? If the former is scope, then we need to define the standard enum value for the error right? ie. uint32_t *status needs to change to enum rte_dma_error or so. > > > > +{ > > > + struct rte_dmadev *dev =3D &rte_dmadevices[dev_id]; > > > + return (*dev->completed_fails)(dev, vq_id, nb_status, status,= cookie); > > > +} > > > + > > > +struct rte_dmadev_stats { > > > + uint64_t enqueue_fail_count; > > > + /**< Conut of all operations which failed enqueued */ > > > + uint64_t enqueued_count; > > > + /**< Count of all operations which successful enqueued */ > > > + uint64_t completed_fail_count; > > > + /**< Count of all operations which failed to complete */ > > > + uint64_t completed_count; > > > + /**< Count of all operations which successful complete */ > > > +}; > > > > We need to have capability API to tell which items are > > updated/supported by the driver. > > > > I also would remove the enqueue fail counts, since they are better counte= d > by the app. If a driver reports 20,000 failures we have no way of knowing > if that is 20,000 unique operations which failed to enqueue or a single > operation which failed to enqueue 20,000 times but succeeded on attempt > 20,001. > > > > > > diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_cor= e.h > > > new file mode 100644 > > > index 0000000..a3afea2 > > > --- /dev/null > > > +++ b/lib/dmadev/rte_dmadev_core.h > > > @@ -0,0 +1,98 @@ > > > +/* SPDX-License-Identifier: BSD-3-Clause > > > + * Copyright 2021 HiSilicon Limited. > > > + */ > > > + > > > +#ifndef _RTE_DMADEV_CORE_H_ > > > +#define _RTE_DMADEV_CORE_H_ > > > + > > > +/** > > > + * @file > > > + * > > > + * RTE DMA Device internal header. > > > + * > > > + * This header contains internal data types. But they are still part= of the > > > + * public API because they are used by inline public functions. > > > + */ > > > + > > > +struct rte_dmadev; > > > + > > > +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16= _t vq_id, > > > + void *src, void *dst, > > > + uint32_t length, uint64_t flags= ); > > > +/**< @internal Function used to enqueue a copy operation. */ > > > > To avoid namespace conflict(as it is public API) use rte_ > > > > > > > + > > > +/** > > > + * The data structure associated with each DMA device. > > > + */ > > > +struct rte_dmadev { > > > + /**< Enqueue a copy operation onto the DMA device. */ > > > + dmadev_copy_t copy; > > > + /**< Enqueue a scatter list copy operation onto the DMA devic= e. */ > > > + dmadev_copy_sg_t copy_sg; > > > + /**< Enqueue a fill operation onto the DMA device. */ > > > + dmadev_fill_t fill; > > > + /**< Enqueue a scatter list fill operation onto the DMA devic= e. */ > > > + dmadev_fill_sg_t fill_sg; > > > + /**< Add a fence to force ordering between operations. */ > > > + dmadev_fence_t fence; > > > + /**< Trigger hardware to begin performing enqueued operations= . */ > > > + dmadev_perform_t perform; > > > + /**< Returns the number of operations that successful complet= ed. */ > > > + dmadev_completed_t completed; > > > + /**< Returns the number of operations that failed to complete= . */ > > > + dmadev_completed_fails_t completed_fails; > > > > We need to limit fastpath items in 1 CL > > > > I don't think that is going to be possible. I also would like to see > numbers to check if we benefit much from having these fastpath ops separa= te > from the regular ops. > > > > + > > > + void *dev_private; /**< PMD-specific private data */ > > > + const struct rte_dmadev_ops *dev_ops; /**< Functions exported= by PMD */ > > > + > > > + uint16_t dev_id; /**< Device ID for this instance */ > > > + int socket_id; /**< Socket ID where memory is allocated */ > > > + struct rte_device *device; > > > + /**< Device info. supplied during device initialization */ > > > + const char *driver_name; /**< Driver info. supplied by probin= g */ > > > + char name[RTE_DMADEV_NAME_MAX_LEN]; /**< Device name */ > > > + > > > + RTE_STD_C11 > > > + uint8_t attached : 1; /**< Flag indicating the device is atta= ched */ > > > + uint8_t started : 1; /**< Device state: STARTED(1)/STOPPED(0)= */ > > > > Add a couple of reserved fields for future ABI stability. > > > > > + > > > +} __rte_cache_aligned; > > > + > > > +extern struct rte_dmadev rte_dmadevices[]; > > > +