DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Add support for inter-domain DMA operations
@ 2023-08-11 16:14 Anatoly Burakov
  2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Anatoly Burakov @ 2023-08-11 16:14 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patchset adds inter-domain DMA operations, and implements driver support
for them in Intel(R) IDXD driver.

Inter-domain DMA operations are similar to regular DMA operations, except that
source and/or destination addresses will be in virtual address space of another
process. In this patchset, DMA device is extended to support two new data plane
operations: inter-domain copy, and inter-domain fill. No control plane API is
provided for dmadev to set up inter-domain communication (see below for more
info).

DMA device API is extended with inter-domain operations, along with their
respective capability flag. Two new op flags are also added to allow for
inter-domain operations to select whether the source and/or destination address
is in an address space of another process. Finally, the `rte_dma_info` struct is
extended with a "controller ID" value (set to -1 by default for all drivers that
don't implement it), representing a hardware DMA controller ID. This is because
under current IDXD implementation the IDPTE (Inter-Domain Permission Table
Entry) table is global to each device. That is, even though there may be
multiple dmadev devices used by IDXD driver, they will all share their IDPTE
entries if they belong to the same hardware controller, so some sort of value
indicating where each dmadev belongs was needed.

Similarly, IDXD driver is extended to support the new dmadev API, as well as use
the new "controller ID" value. IDXD driver is also extended to have a private
API for control-plane operations related to creating/attaching to memory regions
which are shared between processes.

In the current implementation, control-plane operations were made as a private
API, instead of extending the DMA device API. This is because technically, only
the submitter (a process which is using IDXD driver to perform inter-domain
operations) has to have a DMA device available, while the owner (a process which
shares its memory regions with the submitter) does not have to manage a DMA
device to give access to its memory to another process. Another consideration is
that currently, this API is Linux*-specific and relies on passing file
descriptors over IPC, and this process, if implemented on other vendors'
hardware, may not map to the same scheme.

NOTE: currently, no publicly released hardware is available to test this feature
or this patchset

We are seeking community review on the following aspects of the patchset:
- The fact that control-plane API is supposed to be private to specific drivers
- The design of inter-domain data-plane operations API with respect to how
  "inter-domain handles" are being used and whether it's possible to make the
  API more vendor-neutral
- New data-plane ops in dmadev will extend the data plane struct into the second
  cache line - this should not be an issue since non-inter-domain operations are
  still in the first cache line, and thus existing fast path is not affected
- Any other feedback is welcome as well!

Anatoly Burakov (3):
  dmadev: add inter-domain operations
  dma/idxd: implement inter-domain operations
  dma/idxd: add API to create and attach to window

 doc/guides/dmadevs/idxd.rst           |  52 ++++++++
 doc/guides/prog_guide/dmadev.rst      |  22 ++++
 drivers/dma/idxd/idxd_bus.c           |  35 ++++++
 drivers/dma/idxd/idxd_common.c        | 123 ++++++++++++++++---
 drivers/dma/idxd/idxd_hw_defs.h       |  14 ++-
 drivers/dma/idxd/idxd_inter_dom.c     | 166 ++++++++++++++++++++++++++
 drivers/dma/idxd/idxd_internal.h      |   7 ++
 drivers/dma/idxd/meson.build          |   7 +-
 drivers/dma/idxd/rte_idxd_inter_dom.h |  79 ++++++++++++
 drivers/dma/idxd/version.map          |  11 ++
 lib/dmadev/rte_dmadev.c               |   2 +
 lib/dmadev/rte_dmadev.h               | 133 +++++++++++++++++++++
 lib/dmadev/rte_dmadev_core.h          |  12 ++
 13 files changed, 644 insertions(+), 19 deletions(-)
 create mode 100644 drivers/dma/idxd/idxd_inter_dom.c
 create mode 100644 drivers/dma/idxd/rte_idxd_inter_dom.h
 create mode 100644 drivers/dma/idxd/version.map

-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
@ 2023-08-11 16:14 ` Anatoly Burakov
  2023-08-18  8:08   ` [EXT] " Anoob Joseph
  2023-10-08  2:33   ` fengchengwen
  2023-08-11 16:14 ` [PATCH v1 2/3] dma/idxd: implement " Anatoly Burakov
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Anatoly Burakov @ 2023-08-11 16:14 UTC (permalink / raw)
  To: dev, Chengwen Feng, Kevin Laatz, Bruce Richardson; +Cc: Vladimir Medvedkin

Add a flag to indicate that a specific device supports inter-domain
operations, and add an API for inter-domain copy and fill.

Inter-domain operation is an operation that is very similar to regular
DMA operation, except either source or destination addresses can be in a
different process's address space, indicated by source and destination
handle values. These values are currently meant to be provided by
private drivers' API's.

This commit also adds a controller ID field into the DMA device API.
This is an arbitrary value that may not be implemented by hardware, but
it is meant to represent some kind of device hierarchy.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/prog_guide/dmadev.rst |  18 +++++
 lib/dmadev/rte_dmadev.c          |   2 +
 lib/dmadev/rte_dmadev.h          | 133 +++++++++++++++++++++++++++++++
 lib/dmadev/rte_dmadev_core.h     |  12 +++
 4 files changed, 165 insertions(+)

diff --git a/doc/guides/prog_guide/dmadev.rst b/doc/guides/prog_guide/dmadev.rst
index 2aa26d33b8..e4e5196416 100644
--- a/doc/guides/prog_guide/dmadev.rst
+++ b/doc/guides/prog_guide/dmadev.rst
@@ -108,6 +108,24 @@ completed operations along with the status of each operation (filled into the
 completed operation's ``ring_idx`` which could help user track operations within
 their own application-defined rings.
 
+.. _dmadev_inter_dom:
+
+
+Inter-domain operations
+~~~~~~~~~~~~~~~~~~~~~~~
+
+For some devices, inter-domain DMA operations may be supported (indicated by
+`RTE_DMA_CAPA_OPS_INTER_DOM` flag being set in DMA device capabilities flag). An
+inter-domain operation (such as `rte_dma_copy_inter_dom`) is similar to regular
+DMA device operation, except the user also needs to specify source and
+destination handles, which the hardware will then use to get source and/or
+destination PASID to perform the operation. When `src_handle` value is set,
+`RTE_DMA_OP_FLAG_SRC_HANDLE` op flag must also be set. Similarly, when
+`dst_handle` value is set, `RTE_DMA_OP_FLAG_DST_HANDLE` op flag must be set.
+
+Currently, source and destination handles are opaque values the user has to get
+from private API's of those DMA device drivers that support the operation.
+
 
 Querying Device Statistics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 8c095e1f35..ff00612f84 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -425,6 +425,8 @@ rte_dma_info_get(int16_t dev_id, struct rte_dma_info *dev_info)
 	if (*dev->dev_ops->dev_info_get == NULL)
 		return -ENOTSUP;
 	memset(dev_info, 0, sizeof(struct rte_dma_info));
+	/* set to -1 by default, as other drivers may not implement this */
+	dev_info->controller_id = -1;
 	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info,
 					    sizeof(struct rte_dma_info));
 	if (ret != 0)
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index e61d71959e..1cad36f0b6 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -278,6 +278,8 @@ int16_t rte_dma_next_dev(int16_t start_dev_id);
 #define RTE_DMA_CAPA_OPS_COPY_SG	RTE_BIT64(33)
 /** Support fill operation. */
 #define RTE_DMA_CAPA_OPS_FILL		RTE_BIT64(34)
+/** Support inter-domain operation. */
+#define RTE_DMA_CAPA_OPS_INTER_DOM	RTE_BIT64(48)
 /**@}*/
 
 /**
@@ -307,6 +309,8 @@ struct rte_dma_info {
 	int16_t numa_node;
 	/** Number of virtual DMA channel configured. */
 	uint16_t nb_vchans;
+	/** Controller ID, -1 if unknown */
+	int16_t controller_id;
 };
 
 /**
@@ -819,6 +823,16 @@ struct rte_dma_sge {
  * capability bit for this, driver should not return error if this flag was set.
  */
 #define RTE_DMA_OP_FLAG_LLC     RTE_BIT64(2)
+/** Source handle is set.
+ * Used for inter-domain operations to indicate source handle value will be
+ * meaningful and can be used by hardware to learn source PASID.
+ */
+#define RTE_DMA_OP_FLAG_SRC_HANDLE RTE_BIT64(16)
+/** Destination handle is set.
+ * Used for inter-domain operations to indicate destination handle value will be
+ * meaningful and can be used by hardware to learn destination PASID.
+ */
+#define RTE_DMA_OP_FLAG_DST_HANDLE RTE_BIT64(17)
 /**@}*/
 
 /**
@@ -1141,6 +1155,125 @@ rte_dma_burst_capacity(int16_t dev_id, uint16_t vchan)
 	return (*obj->burst_capacity)(obj->dev_private, vchan);
 }
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue an inter-domain copy operation.
+ *
+ * This queues up an inter-domain copy operation to be performed by hardware, if
+ * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell
+ * to begin this operation, otherwise do not trigger doorbell.
+ *
+ * The source and destination handle parameters are arbitrary opaque values,
+ * currently meant to be provided by private device driver API's. If the source
+ * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set.
+ * Similarly, if the destination handle value is meaningful,
+ * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and destination handle
+ * values are meant to provide information to the hardware about source and/or
+ * destination PASID for the inter-domain copy operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The address of the source buffer (if `src_handle` is set, source address
+ *   will be in address space of process referred to by source handle).
+ * @param dst
+ *   The address of the destination buffer (if `dst_handle` is set, destination
+ *   address will be in address space of process referred to by destination
+ *   handle).
+ * @param length
+ *   The length of the data to be copied.
+ * @param src_handle
+ *   Source handle value (if used, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set).
+ * @param dst_handle
+ *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE flag must be
+ *   set).
+ * @param flags
+ *   Flags for this operation.
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
+		rte_iova_t dst, uint32_t length, uint16_t src_handle,
+		uint16_t dst_handle, uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	if (*obj->copy_inter_dom == NULL)
+		return -ENOTSUP;
+#endif
+	return (*obj->copy_inter_dom)(obj->dev_private, vchan, src, dst, length,
+			src_handle, dst_handle, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue an inter-domain fill operation.
+ *
+ * This queues up an inter-domain fill operation to be performed by hardware, if
+ * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger doorbell
+ * to begin this operation, otherwise do not trigger doorbell.
+ *
+ * The source and destination handle parameters are arbitrary opaque values,
+ * currently meant to be provided by private device driver API's. If the source
+ * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must be set.
+ * Similarly, if the destination handle value is meaningful,
+ * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and destination handle
+ * values are meant to provide information to the hardware about source and/or
+ * destination PASID for the inter-domain fill operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param dst_handle
+ *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE flag must be
+ *   set).
+ * @param flags
+ *   Flags for this operation.
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued job.
+ *   - -ENOSPC: if no space left to enqueue.
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+static inline int
+rte_dma_fill_inter_dom(int16_t dev_id, uint16_t vchan, uint64_t pattern,
+		rte_iova_t dst, uint32_t length, uint16_t dst_handle,
+		uint64_t flags)
+{
+	struct rte_dma_fp_object *obj = &rte_dma_fp_objs[dev_id];
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dma_is_valid(dev_id) || length == 0)
+		return -EINVAL;
+	if (*obj->fill_inter_dom == NULL)
+		return -ENOTSUP;
+#endif
+
+	return (*obj->fill_inter_dom)(obj->dev_private, vchan, pattern, dst,
+			length, dst_handle, flags);
+}
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
index 064785686f..b3a020f9de 100644
--- a/lib/dmadev/rte_dmadev_core.h
+++ b/lib/dmadev/rte_dmadev_core.h
@@ -50,6 +50,16 @@ typedef uint16_t (*rte_dma_completed_status_t)(void *dev_private,
 /** @internal Used to check the remaining space in descriptor ring. */
 typedef uint16_t (*rte_dma_burst_capacity_t)(const void *dev_private, uint16_t vchan);
 
+/** @internal Used to enqueue an inter-domain copy operation. */
+typedef int (*rte_dma_copy_inter_dom_t)(void *dev_private, uint16_t vchan,
+			rte_iova_t src, rte_iova_t dst,	unsigned int length,
+			uint16_t src_handle, uint16_t dst_handle, uint64_t flags);
+/** @internal Used to enqueue an inter-domain fill operation. */
+typedef int (*rte_dma_fill_inter_dom_t)(void *dev_private, uint16_t vchan,
+			uint64_t pattern, rte_iova_t dst, uint32_t length,
+			uint16_t dst_handle, uint64_t flags);
+
+
 /**
  * @internal
  * Fast-path dmadev functions and related data are hold in a flat array.
@@ -73,6 +83,8 @@ struct rte_dma_fp_object {
 	rte_dma_completed_t        completed;
 	rte_dma_completed_status_t completed_status;
 	rte_dma_burst_capacity_t   burst_capacity;
+	rte_dma_copy_inter_dom_t   copy_inter_dom;
+	rte_dma_fill_inter_dom_t   fill_inter_dom;
 } __rte_aligned(128);
 
 extern struct rte_dma_fp_object *rte_dma_fp_objs;
-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 2/3] dma/idxd: implement inter-domain operations
  2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
  2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
@ 2023-08-11 16:14 ` Anatoly Burakov
  2023-08-11 16:14 ` [PATCH v1 3/3] dma/idxd: add API to create and attach to window Anatoly Burakov
  2023-08-15 19:20 ` [EXT] [PATCH v1 0/3] Add support for inter-domain DMA operations Satananda Burla
  3 siblings, 0 replies; 12+ messages in thread
From: Anatoly Burakov @ 2023-08-11 16:14 UTC (permalink / raw)
  To: dev, Chengwen Feng, Kevin Laatz, Bruce Richardson; +Cc: Vladimir Medvedkin

Implement inter-domain copy and fill operations defined in the newly
added DMA device API.

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/prog_guide/dmadev.rst |   4 +
 drivers/dma/idxd/idxd_bus.c      |  35 +++++++++
 drivers/dma/idxd/idxd_common.c   | 123 +++++++++++++++++++++++++++----
 drivers/dma/idxd/idxd_hw_defs.h  |  14 +++-
 drivers/dma/idxd/idxd_internal.h |   7 ++
 5 files changed, 165 insertions(+), 18 deletions(-)

diff --git a/doc/guides/prog_guide/dmadev.rst b/doc/guides/prog_guide/dmadev.rst
index e4e5196416..c2a957e971 100644
--- a/doc/guides/prog_guide/dmadev.rst
+++ b/doc/guides/prog_guide/dmadev.rst
@@ -126,6 +126,10 @@ destination PASID to perform the operation. When `src_handle` value is set,
 Currently, source and destination handles are opaque values the user has to get
 from private API's of those DMA device drivers that support the operation.
 
+List of drivers supporting inter-domain operations:
+
+- Intel(R) IDXD driver
+
 
 Querying Device Statistics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/drivers/dma/idxd/idxd_bus.c b/drivers/dma/idxd/idxd_bus.c
index 3b2d4c2b65..787bc4e2d7 100644
--- a/drivers/dma/idxd/idxd_bus.c
+++ b/drivers/dma/idxd/idxd_bus.c
@@ -7,6 +7,7 @@
 #include <unistd.h>
 #include <sys/mman.h>
 #include <libgen.h>
+#include <inttypes.h>
 
 #include <bus_driver.h>
 #include <dev_driver.h>
@@ -187,6 +188,31 @@ read_wq_int(struct rte_dsa_device *dev, const char *filename,
 	return ret;
 }
 
+static int
+read_gen_cap(struct rte_dsa_device *dev, uint64_t *gen_cap)
+{
+	char sysfs_node[PATH_MAX];
+	FILE *f;
+
+	snprintf(sysfs_node, sizeof(sysfs_node), "%s/dsa%d/gen_cap",
+		dsa_get_sysfs_path(), dev->addr.device_id);
+	f = fopen(sysfs_node, "r");
+	if (f == NULL) {
+		IDXD_PMD_ERR("%s(): opening file '%s' failed: %s",
+				__func__, sysfs_node, strerror(errno));
+		return -1;
+	}
+
+	if (fscanf(f, "%" PRIx64, gen_cap) != 1) {
+		IDXD_PMD_ERR("%s(): error reading file '%s': %s",
+				__func__, sysfs_node, strerror(errno));
+		return -1;
+	}
+
+	fclose(f);
+	return 0;
+}
+
 static int
 read_device_int(struct rte_dsa_device *dev, const char *filename,
 		int *value)
@@ -219,6 +245,7 @@ idxd_probe_dsa(struct rte_dsa_device *dev)
 {
 	struct idxd_dmadev idxd = {0};
 	int ret = 0;
+	uint64_t gen_cap;
 
 	IDXD_PMD_INFO("Probing device %s on numa node %d",
 			dev->wq_name, dev->device.numa_node);
@@ -232,6 +259,14 @@ idxd_probe_dsa(struct rte_dsa_device *dev)
 	idxd.u.bus.dsa_id = dev->addr.device_id;
 	idxd.sva_support = 1;
 
+	ret = read_gen_cap(dev, &gen_cap);
+	if (ret) {
+		IDXD_PMD_ERR("Failed to read gen_cap for %s", dev->wq_name);
+		return ret;
+	}
+	if (gen_cap & IDXD_INTERDOM_SUPPORT)
+		idxd.inter_dom_support = 1;
+
 	idxd.portal = idxd_bus_mmap_wq(dev);
 	if (idxd.portal == NULL) {
 		IDXD_PMD_ERR("WQ mmap failed");
diff --git a/drivers/dma/idxd/idxd_common.c b/drivers/dma/idxd/idxd_common.c
index 83d53942eb..ffe8614d16 100644
--- a/drivers/dma/idxd/idxd_common.c
+++ b/drivers/dma/idxd/idxd_common.c
@@ -41,7 +41,57 @@ __idxd_movdir64b(volatile void *dst, const struct idxd_hw_desc *src)
 
 __use_avx2
 static __rte_always_inline void
-__submit(struct idxd_dmadev *idxd)
+__idxd_enqcmd(volatile void *dst, const struct idxd_hw_desc *src)
+{
+	asm volatile (".byte 0xf2, 0x0f, 0x38, 0xf8, 0x02"
+			:
+			: "a" (dst), "d" (src)
+			: "memory");
+}
+
+static inline uint32_t
+__idxd_get_inter_dom_flags(const enum rte_idxd_ops op)
+{
+	switch (op) {
+	case idxd_op_memmove:
+		return IDXD_FLAG_SRC_ALT_PASID | IDXD_FLAG_DST_ALT_PASID;
+	case idxd_op_fill:
+		return IDXD_FLAG_DST_ALT_PASID;
+	default:
+		/* no flags needed */
+		return 0;
+	}
+}
+
+static inline uint32_t
+__idxd_get_op_flags(enum rte_idxd_ops op, uint64_t flags, bool inter_dom)
+{
+	uint32_t op_flags = op;
+	uint32_t flag_mask = IDXD_FLAG_FENCE;
+	if (inter_dom) {
+		flag_mask |=  __idxd_get_inter_dom_flags(op);
+		op_flags |= idxd_op_inter_dom;
+	}
+	op_flags = op_flags << IDXD_CMD_OP_SHIFT;
+	return op_flags | (flags & flag_mask) | IDXD_FLAG_CACHE_CONTROL;
+}
+
+static inline uint64_t
+__idxd_get_alt_pasid(uint64_t flags, uint64_t src_idpte_id,
+		uint64_t dst_idpte_id)
+{
+	/* hardware is intolerant to inactive fields being non-zero */
+	if (!(flags & RTE_DMA_OP_FLAG_SRC_HANDLE))
+		src_idpte_id = 0;
+	if (!(flags & RTE_DMA_OP_FLAG_DST_HANDLE))
+		dst_idpte_id = 0;
+	return (src_idpte_id << IDXD_CMD_DST_IDPTE_IDX_SHIFT) |
+			(dst_idpte_id << IDXD_CMD_DST_IDPTE_IDX_SHIFT);
+}
+
+__use_avx2
+static __rte_always_inline void
+__submit(struct idxd_dmadev *idxd, const bool use_enqcmd)
 {
 	rte_prefetch1(&idxd->batch_comp_ring[idxd->batch_idx_read]);
 
@@ -59,7 +109,10 @@ __submit(struct idxd_dmadev *idxd)
 		desc.completion = comp_addr;
 		desc.op_flags |= IDXD_FLAG_REQUEST_COMPLETION;
 		_mm_sfence(); /* fence before writing desc to device */
-		__idxd_movdir64b(idxd->portal, &desc);
+		if (use_enqcmd)
+			__idxd_enqcmd(idxd->portal, &desc);
+		else
+			__idxd_movdir64b(idxd->portal, &desc);
 	} else {
 		const struct idxd_hw_desc batch_desc = {
 				.op_flags = (idxd_op_batch << IDXD_CMD_OP_SHIFT) |
@@ -71,7 +124,10 @@ __submit(struct idxd_dmadev *idxd)
 				.size = idxd->batch_size,
 		};
 		_mm_sfence(); /* fence before writing desc to device */
-		__idxd_movdir64b(idxd->portal, &batch_desc);
+		if (use_enqcmd)
+			__idxd_enqcmd(idxd->portal, &batch_desc);
+		else
+			__idxd_movdir64b(idxd->portal, &batch_desc);
 	}
 
 	if (++idxd->batch_idx_write > idxd->max_batches)
@@ -93,7 +149,9 @@ __idxd_write_desc(struct idxd_dmadev *idxd,
 		const rte_iova_t src,
 		const rte_iova_t dst,
 		const uint32_t size,
-		const uint32_t flags)
+		const uint32_t flags,
+		const uint64_t alt_pasid,
+		const bool use_enqcmd)
 {
 	uint16_t mask = idxd->desc_ring_mask;
 	uint16_t job_id = idxd->batch_start + idxd->batch_size;
@@ -113,14 +171,14 @@ __idxd_write_desc(struct idxd_dmadev *idxd,
 	_mm256_store_si256((void *)&idxd->desc_ring[write_idx],
 			_mm256_set_epi64x(dst, src, comp_addr, op_flags64));
 	_mm256_store_si256((void *)&idxd->desc_ring[write_idx].size,
-			_mm256_set_epi64x(0, 0, 0, size));
+			_mm256_set_epi64x(alt_pasid, 0, 0, size));
 
 	idxd->batch_size++;
 
 	rte_prefetch0_write(&idxd->desc_ring[write_idx + 1]);
 
 	if (flags & RTE_DMA_OP_FLAG_SUBMIT)
-		__submit(idxd);
+		__submit(idxd, use_enqcmd);
 
 	return job_id;
 }
@@ -134,10 +192,26 @@ idxd_enqueue_copy(void *dev_private, uint16_t qid __rte_unused, rte_iova_t src,
 	 * but check it at compile time to be sure.
 	 */
 	RTE_BUILD_BUG_ON(RTE_DMA_OP_FLAG_FENCE != IDXD_FLAG_FENCE);
-	uint32_t memmove = (idxd_op_memmove << IDXD_CMD_OP_SHIFT) |
-			IDXD_FLAG_CACHE_CONTROL | (flags & IDXD_FLAG_FENCE);
-	return __idxd_write_desc(dev_private, memmove, src, dst, length,
-			flags);
+	uint32_t op_flags = __idxd_get_op_flags(idxd_op_memmove, flags, false);
+	return __idxd_write_desc(dev_private, op_flags, src, dst, length,
+			flags, 0, false);
+}
+
+__use_avx2
+int
+idxd_enqueue_copy_inter_dom(void *dev_private, uint16_t qid __rte_unused, rte_iova_t src,
+		rte_iova_t dst, unsigned int length,
+		uint16_t src_idpte_id, uint16_t dst_idpte_id, uint64_t flags)
+{
+	/* we can take advantage of the fact that the fence flag in dmadev and
+	 * DSA are the same, but check it at compile time to be sure.
+	 */
+	RTE_BUILD_BUG_ON(RTE_DMA_OP_FLAG_FENCE != IDXD_FLAG_FENCE);
+	uint32_t op_flags = __idxd_get_op_flags(idxd_op_memmove, flags, true);
+	uint64_t alt_pasid = __idxd_get_alt_pasid(flags, src_idpte_id, dst_idpte_id);
+	/* currently, we require inter-domain copies to use enqcmd */
+	return __idxd_write_desc(dev_private, op_flags, src, dst, length,
+			flags, alt_pasid, true);
 }
 
 __use_avx2
@@ -145,17 +219,28 @@ int
 idxd_enqueue_fill(void *dev_private, uint16_t qid __rte_unused, uint64_t pattern,
 		rte_iova_t dst, unsigned int length, uint64_t flags)
 {
-	uint32_t fill = (idxd_op_fill << IDXD_CMD_OP_SHIFT) |
-			IDXD_FLAG_CACHE_CONTROL | (flags & IDXD_FLAG_FENCE);
-	return __idxd_write_desc(dev_private, fill, pattern, dst, length,
-			flags);
+	uint32_t op_flags = __idxd_get_op_flags(idxd_op_fill, flags, false);
+	return __idxd_write_desc(dev_private, op_flags, pattern, dst, length,
+			flags, 0, false);
+}
+
+__use_avx2
+int
+idxd_enqueue_fill_inter_dom(void *dev_private, uint16_t qid __rte_unused,
+		uint64_t pattern, rte_iova_t dst, unsigned int length,
+		uint16_t dst_idpte_id, uint64_t flags)
+{
+	uint32_t op_flags = __idxd_get_op_flags(idxd_op_fill, flags, true);
+	uint64_t alt_pasid = __idxd_get_alt_pasid(flags, 0, dst_idpte_id);
+	return __idxd_write_desc(dev_private, op_flags, pattern, dst, length,
+			flags, alt_pasid, true);
 }
 
 __use_avx2
 int
 idxd_submit(void *dev_private, uint16_t qid __rte_unused)
 {
-	__submit(dev_private);
+	__submit(dev_private, false);
 	return 0;
 }
 
@@ -490,6 +575,12 @@ idxd_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *info, uint32_t
 	};
 	if (idxd->sva_support)
 		info->dev_capa |= RTE_DMA_CAPA_SVA;
+
+	if (idxd->inter_dom_support) {
+		info->dev_capa |= RTE_DMA_CAPA_OPS_INTER_DOM;
+		info->controller_id = idxd->u.bus.dsa_id;
+	}
+
 	return 0;
 }
 
@@ -600,6 +691,8 @@ idxd_dmadev_create(const char *name, struct rte_device *dev,
 	dmadev->fp_obj->completed_status = idxd_completed_status;
 	dmadev->fp_obj->burst_capacity = idxd_burst_capacity;
 	dmadev->fp_obj->dev_private = dmadev->data->dev_private;
+	dmadev->fp_obj->copy_inter_dom = idxd_enqueue_copy_inter_dom;
+	dmadev->fp_obj->fill_inter_dom = idxd_enqueue_fill_inter_dom;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY)
 		return 0;
diff --git a/drivers/dma/idxd/idxd_hw_defs.h b/drivers/dma/idxd/idxd_hw_defs.h
index a38540f283..441e9d29a4 100644
--- a/drivers/dma/idxd/idxd_hw_defs.h
+++ b/drivers/dma/idxd/idxd_hw_defs.h
@@ -9,18 +9,24 @@
  * Defines used in the data path for interacting with IDXD hardware.
  */
 #define IDXD_CMD_OP_SHIFT 24
+#define IDXD_CMD_SRC_IDPTE_IDX_SHIFT 32
+#define IDXD_CMD_DST_IDPTE_IDX_SHIFT 48
 enum rte_idxd_ops {
 	idxd_op_nop = 0,
 	idxd_op_batch,
 	idxd_op_drain,
 	idxd_op_memmove,
-	idxd_op_fill
+	idxd_op_fill,
+	idxd_op_inter_dom = 0x20
 };
 
 #define IDXD_FLAG_FENCE                 (1 << 0)
 #define IDXD_FLAG_COMPLETION_ADDR_VALID (1 << 2)
 #define IDXD_FLAG_REQUEST_COMPLETION    (1 << 3)
+#define IDXD_INTERDOM_SUPPORT           (1 << 6)
 #define IDXD_FLAG_CACHE_CONTROL         (1 << 8)
+#define IDXD_FLAG_SRC_ALT_PASID         (1 << 16)
+#define IDXD_FLAG_DST_ALT_PASID         (1 << 17)
 
 /**
  * Hardware descriptor used by DSA hardware, for both bursts and
@@ -42,8 +48,10 @@ struct idxd_hw_desc {
 
 	uint16_t intr_handle; /* completion interrupt handle */
 
-	/* remaining 26 bytes are reserved */
-	uint16_t reserved[13];
+	/* next 22 bytes are reserved */
+	uint16_t reserved[11];
+	uint16_t src_pasid_hndl;  /* pasid handle for source */
+	uint16_t dest_pasid_hndl; /* pasid handle for destination */
 } __rte_aligned(64);
 
 #define IDXD_COMP_STATUS_INCOMPLETE        0
diff --git a/drivers/dma/idxd/idxd_internal.h b/drivers/dma/idxd/idxd_internal.h
index cd4177721d..fb999d29f7 100644
--- a/drivers/dma/idxd/idxd_internal.h
+++ b/drivers/dma/idxd/idxd_internal.h
@@ -70,6 +70,7 @@ struct idxd_dmadev {
 	struct rte_dma_dev *dmadev;
 	struct rte_dma_vchan_conf qcfg;
 	uint8_t sva_support;
+	uint8_t	inter_dom_support;
 	uint8_t qid;
 
 	union {
@@ -92,8 +93,14 @@ int idxd_info_get(const struct rte_dma_dev *dev, struct rte_dma_info *dev_info,
 		uint32_t size);
 int idxd_enqueue_copy(void *dev_private, uint16_t qid, rte_iova_t src,
 		rte_iova_t dst, unsigned int length, uint64_t flags);
+int idxd_enqueue_copy_inter_dom(void *dev_private, uint16_t qid, rte_iova_t src,
+		rte_iova_t dst, unsigned int length,
+		uint16_t src_idpte_id, uint16_t dst_idpte_id, uint64_t flags);
 int idxd_enqueue_fill(void *dev_private, uint16_t qid, uint64_t pattern,
 		rte_iova_t dst, unsigned int length, uint64_t flags);
+int idxd_enqueue_fill_inter_dom(void *dev_private, uint16_t qid, uint64_t pattern,
+		rte_iova_t dst, unsigned int length, uint16_t dst_idpte_id,
+		uint64_t flags);
 int idxd_submit(void *dev_private, uint16_t qid);
 uint16_t idxd_completed(void *dev_private, uint16_t qid, uint16_t max_ops,
 		uint16_t *last_idx, bool *has_error);
-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 3/3] dma/idxd: add API to create and attach to window
  2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
  2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
  2023-08-11 16:14 ` [PATCH v1 2/3] dma/idxd: implement " Anatoly Burakov
@ 2023-08-11 16:14 ` Anatoly Burakov
  2023-08-14  4:39   ` Jerin Jacob
  2023-08-15 19:20 ` [EXT] [PATCH v1 0/3] Add support for inter-domain DMA operations Satananda Burla
  3 siblings, 1 reply; 12+ messages in thread
From: Anatoly Burakov @ 2023-08-11 16:14 UTC (permalink / raw)
  To: dev, Bruce Richardson, Kevin Laatz; +Cc: Vladimir Medvedkin

This commit implements functions necessary to use inter-domain
operations with idxd driver.

The process is as follows:

1. Process A that wishes to share its memory with others, shall call
   `rte_idxd_window_create()`, which will return a file descriptor
2. Process A is to send above mentioned file descriptor to any
   recipient process (usually over kernel IPC) that wishes to attach to
   that window
3. Process B, after receiving above mentioned file descriptor from
   process A over IPC, shall call `rte_idxd_window_attach()` and
   receive an inter-pasid handle
4. Process B shall use this handle as an argument for inter-domain
   operations using DMA device API

Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/dmadevs/idxd.rst           |  52 ++++++++
 drivers/dma/idxd/idxd_inter_dom.c     | 166 ++++++++++++++++++++++++++
 drivers/dma/idxd/meson.build          |   7 +-
 drivers/dma/idxd/rte_idxd_inter_dom.h |  79 ++++++++++++
 drivers/dma/idxd/version.map          |  11 ++
 5 files changed, 314 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma/idxd/idxd_inter_dom.c
 create mode 100644 drivers/dma/idxd/rte_idxd_inter_dom.h
 create mode 100644 drivers/dma/idxd/version.map

diff --git a/doc/guides/dmadevs/idxd.rst b/doc/guides/dmadevs/idxd.rst
index cb8f1fe729..b0439377f8 100644
--- a/doc/guides/dmadevs/idxd.rst
+++ b/doc/guides/dmadevs/idxd.rst
@@ -225,3 +225,55 @@ which operation failed and kick off the device to continue processing operations
    if (error){
       status_count = rte_dma_completed_status(dev_id, vchan, COMP_BURST_SZ, &idx, status);
    }
+
+Performing Inter-Domain operations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Refer to the :ref:`Enqueue / Dequeue APIs <dmadev_enqueue_dequeue>` section of the dmadev library
+documentation for details on operation enqueue, submission and completion API usage.
+
+Refer to the :ref:`Inter-domain operations <dmadev_inter_dom>` section of the dmadev library
+documentation for details on inter-domain operations.
+
+Intel(R) IDXD currently supports the following inter-domain operations:
+
+* Copy operation
+* Fill operation
+
+To use these operations with the IDXD driver, the following program flow should
+be adhered to:
+
+* Process A that wishes to share its memory with others, shall call
+  ``rte_idxd_window_create()``, which will return a file descriptor
+* Process A is to send above mentioned file descriptor to any recipient process
+  (usually over IPC) that wishes to attach to that window
+* Process B, after receiving above mentioned file descriptor from process A over
+  IPC, shall call ``rte_idxd_window_attach()`` and receive an inter-pasid handle
+* Process B shall use this handle as an argument for inter-domain operations
+  using DMA device API
+
+The controller ID parameter for create/attach functions in this case would be
+the controller ID of configured DSA2 devices (located under ``rte_dma_info``
+structure), but which can also be read from ``accel-config`` tool, or from the
+DSA2 work queue name (e.g. work queue ``wq0.3`` would have ``0`` as its
+controller ID).
+
+The ``rte_idxd_window_create()`` call will accept a ``flags`` argument, which
+can contain the following bits:
+
+* ``RTE_IDXD_WIN_FLAGS_PROT_READ`` - allow other process to read from memory
+  region to be shared
+  - In this case, the remote process will be using the resulting inter-pasid
+    handle as source handle for inter-domain DMA operations (and set the
+    ``RTE_DMA_OP_FLAG_SRC_HANDLE`` DMA operation flag)
+* ``RTE_IDXD_WIN_FLAGS_PROT_WRITE`` - allow other process to write into memory
+  region to be shared
+  - In this case, the remote process will be using the resulting inter-pasid
+    handle as destination handle for inter-domain DMA operations (and set the
+    ``RTE_DMA_OP_FLAG_DST_HANDLE`` DMA operation flag)
+* ``RTE_IDXD_WIN_FLAGS_WIN_CHECK`` - if this flag is not set, the remote process
+  will be allowed unrestricted access to entire memory space of the owner
+  process
+* ``RTE_IDXD_WIN_FLAGS_OFFSET_MODE`` - addresses for DMA operations will have to
+  be specified as offsets from base address of the memory region to be shared
+* ``RTE_IDXD_WIN_FLAGS_TYPE_SAMS`` - enable multi-submitter mode.
diff --git a/drivers/dma/idxd/idxd_inter_dom.c b/drivers/dma/idxd/idxd_inter_dom.c
new file mode 100644
index 0000000000..21dcd6980d
--- /dev/null
+++ b/drivers/dma/idxd/idxd_inter_dom.c
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Intel Corporation
+ */
+
+#include <stdlib.h>
+#include <stdint.h>
+#include <sys/ioctl.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <dirent.h>
+
+#include <rte_common.h>
+#include <rte_errno.h>
+#include <rte_idxd_inter_dom.h>
+
+#include "idxd_internal.h"
+
+#define IDXD_TYPE       ('d')
+#define IDXD_IOC_BASE   100
+#define IDXD_WIN_BASE   200
+
+enum idxd_win_type {
+	IDXD_WIN_TYPE_SA_SS = 0,
+	IDXD_WIN_TYPE_SA_MS,
+};
+
+#define IDXD_WIN_FLAGS_MASK (RTE_IDXD_WIN_FLAGS_PROT_READ | RTE_IDXD_WIN_FLAGS_PROT_WRITE |\
+		RTE_IDXD_WIN_FLAGS_WIN_CHECK | RTE_IDXD_WIN_FLAGS_OFFSET_MODE|\
+		RTE_IDXD_WIN_FLAGS_TYPE_SAMS)
+
+struct idxd_win_param {
+	uint64_t base;          /* Window base */
+	uint64_t size;          /* Window size */
+	uint32_t type;          /* Window type, see enum idxd_win_type */
+	uint16_t flags;         /* See IDXD windows flags */
+	uint16_t handle;        /* Window handle returned by driver */
+} __attribute__((packed));
+
+struct idxd_win_attach {
+	uint32_t fd;            /* Window file descriptor returned by IDXD_WIN_CREATE */
+	uint16_t handle;        /* Window handle returned by driver */
+} __attribute__((packed));
+
+struct idxd_win_fault {
+	uint64_t offset;        /* Window offset of faulting address */
+	uint64_t len;           /* Faulting range */
+	uint32_t write_fault;   /* Fault generated on write */
+} __attribute__((packed));
+
+#define IDXD_WIN_CREATE         _IOWR(IDXD_TYPE, IDXD_IOC_BASE + 1, struct idxd_win_param)
+#define IDXD_WIN_ATTACH         _IOR(IDXD_TYPE, IDXD_IOC_BASE + 2, struct idxd_win_attach)
+#define IDXD_WIN_FAULT          _IOR(IDXD_TYPE, IDXD_WIN_BASE + 1, struct idxd_win_fault)
+#define DSA_DEV_PATH "/dev/dsa"
+
+static inline const char *
+dsa_get_dev_path(void)
+{
+	const char *path = getenv("DSA_DEV_PATH");
+	return path ? path : DSA_DEV_PATH;
+}
+
+static int
+dsa_find_work_queue(int controller_id)
+{
+	char dev_templ[PATH_MAX], path_templ[PATH_MAX];
+	const char *path = dsa_get_dev_path();
+	struct dirent *wq;
+	DIR *dev_dir;
+	int fd = -1;
+
+	/* construct work queue path template */
+	snprintf(dev_templ, sizeof(dev_templ), "wq%d.", controller_id);
+
+	/* open the DSA device directory */
+	dev_dir = opendir(path);
+	if (dev_dir == NULL)
+		return -1;
+
+	/* find any available work queue */
+	while ((wq = readdir(dev_dir)) != NULL) {
+		/* skip things that aren't work queues */
+		if (strncmp(wq->d_name, dev_templ, strlen(dev_templ)) != 0)
+			continue;
+
+		/* try this work queue */
+		snprintf(path_templ, sizeof(path_templ), "%s/%s", path, wq->d_name);
+
+		fd = open(path_templ, O_RDWR);
+		if (fd < 0)
+			continue;
+
+		break;
+	}
+
+	return fd;
+}
+
+int
+rte_idxd_window_create(int controller_id, void *win_addr,
+	unsigned int win_len, int flags)
+{
+	struct idxd_win_param param = {0};
+	int idpte_fd, fd;
+
+	fd = dsa_find_work_queue(controller_id);
+
+	/* did we find anything? */
+	if (fd < 0) {
+		IDXD_PMD_ERR("%s(): creatomg idpt window failed", __func__);
+		return -1;
+	}
+
+	/* create a wormhole into a parallel reality... */
+	param.base = (uint64_t)win_addr;
+	param.size = win_len;
+	param.flags = flags & IDXD_WIN_FLAGS_MASK;
+	param.type = (flags & RTE_IDXD_WIN_FLAGS_TYPE_SAMS) ?
+		IDXD_WIN_TYPE_SA_MS : IDXD_WIN_TYPE_SA_SS;
+
+	idpte_fd = ioctl(fd, IDXD_WIN_CREATE, &param);
+
+	close(fd);
+
+	if (idpte_fd < 0)
+		rte_errno = idpte_fd;
+
+	return idpte_fd;
+}
+
+int
+rte_idxd_window_attach(int controller_id, int idpte_fd,
+	uint16_t *handle)
+{
+
+	struct idxd_win_attach win_attach = {0};
+	int ret, fd;
+
+	if (handle == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	fd = dsa_find_work_queue(controller_id);
+
+	/* did we find anything? */
+	if (fd < 0) {
+		IDXD_PMD_ERR("%s(): creatomg idpt window failed", __func__);
+		rte_errno = ENOENT;
+		return -1;
+	}
+
+	/* get access to someone else's wormhole */
+	win_attach.fd = idpte_fd;
+
+	ret = ioctl(fd, IDXD_WIN_ATTACH, &win_attach);
+	if (ret != 0) {
+		IDXD_PMD_ERR("%s(): attaching idpt window failed: %s",
+				__func__, strerror(ret));
+		rte_errno = ret;
+		return -1;
+	}
+
+	*handle = win_attach.handle;
+
+	return 0;
+}
diff --git a/drivers/dma/idxd/meson.build b/drivers/dma/idxd/meson.build
index c5403b431c..da73ab340c 100644
--- a/drivers/dma/idxd/meson.build
+++ b/drivers/dma/idxd/meson.build
@@ -22,5 +22,10 @@ sources = files(
 )
 
 if is_linux
-    sources += files('idxd_bus.c')
+    sources += files(
+    'idxd_bus.c',
+    'idxd_inter_dom.c',
+)
 endif
+
+headers = files('rte_idxd_inter_dom.h')
diff --git a/drivers/dma/idxd/rte_idxd_inter_dom.h b/drivers/dma/idxd/rte_idxd_inter_dom.h
new file mode 100644
index 0000000000..c31f3777c9
--- /dev/null
+++ b/drivers/dma/idxd/rte_idxd_inter_dom.h
@@ -0,0 +1,79 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2023 Intel Corporation
+ */
+
+#ifndef _RTE_IDXD_INTER_DOM_H_
+#define _RTE_IDXD_INTER_DOM_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <stdint.h>
+
+#include <rte_compat.h>
+
+/** Allow reading from address space. */
+#define RTE_IDXD_WIN_FLAGS_PROT_READ    0x0001
+/** Allow writing to address space. */
+#define RTE_IDXD_WIN_FLAGS_PROT_WRITE   0x0002
+/** If this flag not set, the entire address space will be accessible. */
+#define RTE_IDXD_WIN_FLAGS_WIN_CHECK    0x0004
+/** Destination addresses are offsets from window base address. */
+#define RTE_IDXD_WIN_FLAGS_OFFSET_MODE  0x0008
+/* multiple submitter flag. If not set - single submitter type will be used. */
+#define RTE_IDXD_WIN_FLAGS_TYPE_SAMS    0x0010
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create an inter-pasid window to allow another process to access this process'
+ * memory. This function returns a file descriptor for the window, that can be
+ * used by another process to access this window.
+ *
+ * @param controller_id
+ *   IDXD controller device ID.
+ * @param win_addr
+ *   Base address of memory chunk being shared (ignored if
+ *   `RTE_IDXD_WIN_FLAGS_WIN_CHECK` is not set).
+ * @param win_len
+ *   Length of memory chunk being shared (ignored if
+ *   `RTE_IDXD_WIN_FLAGS_WIN_CHECK` is not set).
+ * @param flags
+ *   Flags to configure the window.
+ * @return
+ *   Non-negative on success.
+ *   Negative on error.
+ */
+__rte_experimental
+int rte_idxd_window_create(int controller_id, void *win_addr,
+	unsigned int win_len, int flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Attach to an inter-pasid window of another process. This function expects a
+ * file descriptor returned by `rte_idxd_window_create()`, and will set the
+ * value pointed to by `handle`. This handle can then be used to perform
+ * inter-domain DMA operations.
+ *
+ * @param controller_id
+ *   IDXD controller device ID.
+ * @param idpte_fd
+ *   File descriptor for another process's window
+ * @param handle
+ *   Pointer to a variable to receive the handle.
+ * @return
+ *   0 on success.
+ *   Negative on error.
+ */
+__rte_experimental
+int rte_idxd_window_attach(int controller_id, int idpte_fd, uint16_t *handle);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_IDXD_INTER_DOM_H_ */
diff --git a/drivers/dma/idxd/version.map b/drivers/dma/idxd/version.map
new file mode 100644
index 0000000000..e091bb7c09
--- /dev/null
+++ b/drivers/dma/idxd/version.map
@@ -0,0 +1,11 @@
+DPDK_23 {
+	local: *;
+};
+
+
+EXPERIMENTAL {
+	global:
+
+	rte_idxd_window_create;
+	rte_idxd_window_attach;
+};
-- 
2.37.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] dma/idxd: add API to create and attach to window
  2023-08-11 16:14 ` [PATCH v1 3/3] dma/idxd: add API to create and attach to window Anatoly Burakov
@ 2023-08-14  4:39   ` Jerin Jacob
  2023-08-14  9:55     ` Burakov, Anatoly
  0 siblings, 1 reply; 12+ messages in thread
From: Jerin Jacob @ 2023-08-14  4:39 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Bruce Richardson, Kevin Laatz, Vladimir Medvedkin

On Fri, Aug 11, 2023 at 9:45 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> This commit implements functions necessary to use inter-domain
> operations with idxd driver.
>
> The process is as follows:
>
> 1. Process A that wishes to share its memory with others, shall call
>    `rte_idxd_window_create()`, which will return a file descriptor
> 2. Process A is to send above mentioned file descriptor to any
>    recipient process (usually over kernel IPC) that wishes to attach to
>    that window
> 3. Process B, after receiving above mentioned file descriptor from
>    process A over IPC, shall call `rte_idxd_window_attach()` and
>    receive an inter-pasid handle
> 4. Process B shall use this handle as an argument for inter-domain
>    operations using DMA device API

> +};
> +
> +
> +EXPERIMENTAL {
> +       global:
> +
> +       rte_idxd_window_create;
> +       rte_idxd_window_attach;

PMD specific API starts with rte_pmd_

> +};
> --
> 2.37.2
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] dma/idxd: add API to create and attach to window
  2023-08-14  4:39   ` Jerin Jacob
@ 2023-08-14  9:55     ` Burakov, Anatoly
  0 siblings, 0 replies; 12+ messages in thread
From: Burakov, Anatoly @ 2023-08-14  9:55 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, Bruce Richardson, Kevin Laatz, Vladimir Medvedkin

On 8/14/2023 5:39 AM, Jerin Jacob wrote:
> On Fri, Aug 11, 2023 at 9:45 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> This commit implements functions necessary to use inter-domain
>> operations with idxd driver.
>>
>> The process is as follows:
>>
>> 1. Process A that wishes to share its memory with others, shall call
>>     `rte_idxd_window_create()`, which will return a file descriptor
>> 2. Process A is to send above mentioned file descriptor to any
>>     recipient process (usually over kernel IPC) that wishes to attach to
>>     that window
>> 3. Process B, after receiving above mentioned file descriptor from
>>     process A over IPC, shall call `rte_idxd_window_attach()` and
>>     receive an inter-pasid handle
>> 4. Process B shall use this handle as an argument for inter-domain
>>     operations using DMA device API
> 
>> +};
>> +
>> +
>> +EXPERIMENTAL {
>> +       global:
>> +
>> +       rte_idxd_window_create;
>> +       rte_idxd_window_attach;
> 
> PMD specific API starts with rte_pmd_

Thanks, will fix in next revisions.

> 
>> +};
>> --
>> 2.37.2
>>

-- 
Thanks,
Anatoly


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXT] [PATCH v1 0/3] Add support for inter-domain DMA operations
  2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
                   ` (2 preceding siblings ...)
  2023-08-11 16:14 ` [PATCH v1 3/3] dma/idxd: add API to create and attach to window Anatoly Burakov
@ 2023-08-15 19:20 ` Satananda Burla
  3 siblings, 0 replies; 12+ messages in thread
From: Satananda Burla @ 2023-08-15 19:20 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: bruce.richardson

Hi Anatoly

> -----Original Message-----
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> Sent: Friday, August 11, 2023 9:15 AM
> To: dev@dpdk.org
> Cc: bruce.richardson@intel.com
> Subject: [EXT] [PATCH v1 0/3] Add support for inter-domain DMA
> operations
> 
> External Email
> 
> ----------------------------------------------------------------------
> This patchset adds inter-domain DMA operations, and implements driver
> support
> for them in Intel(R) IDXD driver.
> 
> Inter-domain DMA operations are similar to regular DMA operations,
> except that
> source and/or destination addresses will be in virtual address space of
> another
> process. In this patchset, DMA device is extended to support two new
> data plane
> operations: inter-domain copy, and inter-domain fill. No control plane
> API is
> provided for dmadev to set up inter-domain communication (see below for
> more
> info).
Thanks for posting this.
Do you have usecases where a process from 3rd domain sets up transfer 
between memories from 2 domains? i.e process 1 is src, process 2 is
dest and process 3 executes transfer. The SDXI spec also defines this kind
of a transfer.
Have you considered extending  rte_dma_port_param and rte_dma_vchan_conf
to represent interdomain memory transfer setup as a separate port type like
RTE_DMA_PORT_INTER_DOMAIN ?
And then we could have a separate vchan dedicated for this transfer.
The rte_dma_vchan  can be setup with separate struct rte_dma_port_param
each for source and destination. The union could be extended to provide
the necessary information to pmd, this could be set of fields that
would be needed by different architectures like controller id,
pasid, smmu streamid and substreamid etc, if an opaque handle is needed,
it could also be accommodated in the union.
These transfers could also be initiated between 2 processes each having 2
dmadev VFs from the same PF as well. Marvell hardware supports this mode.
Since control plane for this can differ between PMDs, it is better to
setup the memory sharing outside dmadev and only pass the fields of interest to
the PMD for completing the transfer. For instance, for PCIe EP to Host
DMA transactions (MEM_TO_DEV and DEV_TO_MEM), the process of setting up
shared memory from PCIe host is not part of dmadev.
If we wish to make the memory sharing interface as a part of dmadev, then
preferably the control plane has to be abstracted to work for all the modes
and architectures.

Regards
Satananda


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXT] [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
@ 2023-08-18  8:08   ` Anoob Joseph
  2023-10-08  2:33   ` fengchengwen
  1 sibling, 0 replies; 12+ messages in thread
From: Anoob Joseph @ 2023-08-18  8:08 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: Vladimir Medvedkin, dev, Chengwen Feng, Kevin Laatz,
	Bruce Richardson, Jerin Jacob Kollanukkaran,
	Vamsi Krishna Attunuru, Amit Prakash Shukla,
	Vidya Sagar Velumuri

Hi Anatoly,

Marvell CNXK DMA hardware also supports this feature, and it would be a good feature to add. Thanks for introducing the feature. Please see inline.

Thanks,
Anoob

> -----Original Message-----
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> Sent: Friday, August 11, 2023 9:45 PM
> To: dev@dpdk.org; Chengwen Feng <fengchengwen@huawei.com>; Kevin
> Laatz <kevin.laatz@intel.com>; Bruce Richardson
> <bruce.richardson@intel.com>
> Cc: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Subject: [EXT] [PATCH v1 1/3] dmadev: add inter-domain operations
> 
> External Email
> 
> ----------------------------------------------------------------------
> Add a flag to indicate that a specific device supports inter-domain operations,
> and add an API for inter-domain copy and fill.
> 
> Inter-domain operation is an operation that is very similar to regular DMA
> operation, except either source or destination addresses can be in a
> different process's address space, indicated by source and destination
> handle values. These values are currently meant to be provided by private
> drivers' API's.
> 
> This commit also adds a controller ID field into the DMA device API.
> This is an arbitrary value that may not be implemented by hardware, but it is
> meant to represent some kind of device hierarchy.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  doc/guides/prog_guide/dmadev.rst |  18 +++++
>  lib/dmadev/rte_dmadev.c          |   2 +
>  lib/dmadev/rte_dmadev.h          | 133
> +++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h     |  12 +++
>  4 files changed, 165 insertions(+)
> 
<snip>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue an inter-domain copy operation.
> + *
> + * This queues up an inter-domain copy operation to be performed by
> +hardware, if
> + * the 'flags' parameter contains RTE_DMA_OP_FLAG_SUBMIT then trigger
> +doorbell
> + * to begin this operation, otherwise do not trigger doorbell.
> + *
> + * The source and destination handle parameters are arbitrary opaque
> +values,
> + * currently meant to be provided by private device driver API's. If
> +the source
> + * handle value is meaningful, RTE_DMA_OP_FLAG_SRC_HANDLE flag must
> be set.
> + * Similarly, if the destination handle value is meaningful,
> + * RTE_DMA_OP_FLAG_DST_HANDLE flag must be set. Source and
> destination
> +handle
> + * values are meant to provide information to the hardware about source
> +and/or
> + * destination PASID for the inter-domain copy operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer (if `src_handle` is set, source address
> + *   will be in address space of process referred to by source handle).
> + * @param dst
> + *   The address of the destination buffer (if `dst_handle` is set, destination
> + *   address will be in address space of process referred to by destination
> + *   handle).
> + * @param length
> + *   The length of the data to be copied.
> + * @param src_handle
> + *   Source handle value (if used, RTE_DMA_OP_FLAG_SRC_HANDLE flag
> must be set).
> + * @param dst_handle
> + *   Destination handle value (if used, RTE_DMA_OP_FLAG_DST_HANDLE
> flag must be
> + *   set).
> + * @param flags
> + *   Flags for this operation.
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued job.
> + *   - -ENOSPC: if no space left to enqueue.
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +static inline int
> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> +		rte_iova_t dst, uint32_t length, uint16_t src_handle,
> +		uint16_t dst_handle, uint64_t flags)
> +{

[Anoob] Won't this lead to duplication of all datapath APIs? Also, this approach assumes that 'inter-domain' operations always support run-time setting of 'src_handle' and 'dst_handle' within one DMA channel, which need not be supported by all platforms.

Can we move this 'src_handle' and 'dst_handle' registration to rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be configured in control path and the existing datapath APIs can work as is. The op flags (that is proposed) can be used to determine whether 'inter-domain' operation is requested. Having a fixed 'src_handle' & 'dst_handle' per vchan would be better for performance as well.

<snip>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
  2023-08-18  8:08   ` [EXT] " Anoob Joseph
@ 2023-10-08  2:33   ` fengchengwen
  2023-10-09  5:05     ` Jerin Jacob
  1 sibling, 1 reply; 12+ messages in thread
From: fengchengwen @ 2023-10-08  2:33 UTC (permalink / raw)
  To: Anatoly Burakov, dev, Kevin Laatz, Bruce Richardson; +Cc: Vladimir Medvedkin

Hi Anatoly,

On 2023/8/12 0:14, Anatoly Burakov wrote:
> Add a flag to indicate that a specific device supports inter-domain
> operations, and add an API for inter-domain copy and fill.
> 
> Inter-domain operation is an operation that is very similar to regular
> DMA operation, except either source or destination addresses can be in a
> different process's address space, indicated by source and destination
> handle values. These values are currently meant to be provided by
> private drivers' API's.
> 
> This commit also adds a controller ID field into the DMA device API.
> This is an arbitrary value that may not be implemented by hardware, but
> it is meant to represent some kind of device hierarchy.
> 
> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

...

> +__rte_experimental
> +static inline int
> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> +		rte_iova_t dst, uint32_t length, uint16_t src_handle,
> +		uint16_t dst_handle, uint64_t flags)

I would suggest add more general extension:
rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
              uint32_t length, uint64_t flags, void *param)
The param only valid under some flags bits.
As for this inter-domain extension: we could define inter-domain param struct.


Whether add in current rte_dma_copy() API or add one new API, I think it mainly
depend on performance impact of parameter transfer. Suggest more discuss for
differnt platform and call specification.


And last, Could you introduce the application scenarios of this feature?


Thanks.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-10-08  2:33   ` fengchengwen
@ 2023-10-09  5:05     ` Jerin Jacob
  2023-10-27 13:46       ` Medvedkin, Vladimir
  0 siblings, 1 reply; 12+ messages in thread
From: Jerin Jacob @ 2023-10-09  5:05 UTC (permalink / raw)
  To: fengchengwen
  Cc: Anatoly Burakov, dev, Kevin Laatz, Bruce Richardson, Vladimir Medvedkin

On Sun, Oct 8, 2023 at 8:03 AM fengchengwen <fengchengwen@huawei.com> wrote:
>
> Hi Anatoly,
>
> On 2023/8/12 0:14, Anatoly Burakov wrote:
> > Add a flag to indicate that a specific device supports inter-domain
> > operations, and add an API for inter-domain copy and fill.
> >
> > Inter-domain operation is an operation that is very similar to regular
> > DMA operation, except either source or destination addresses can be in a
> > different process's address space, indicated by source and destination
> > handle values. These values are currently meant to be provided by
> > private drivers' API's.
> >
> > This commit also adds a controller ID field into the DMA device API.
> > This is an arbitrary value that may not be implemented by hardware, but
> > it is meant to represent some kind of device hierarchy.
> >
> > Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
>
> ...
>
> > +__rte_experimental
> > +static inline int
> > +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
> > +             rte_iova_t dst, uint32_t length, uint16_t src_handle,
> > +             uint16_t dst_handle, uint64_t flags)
>
> I would suggest add more general extension:
> rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
>               uint32_t length, uint64_t flags, void *param)
> The param only valid under some flags bits.
> As for this inter-domain extension: we could define inter-domain param struct.
>
>
> Whether add in current rte_dma_copy() API or add one new API, I think it mainly
> depend on performance impact of parameter transfer. Suggest more discuss for
> differnt platform and call specification.

Or move src_handle/dst_hanel to vchan config to enable better performance.
Application create N number of vchan based on the requirements.

>
>
> And last, Could you introduce the application scenarios of this feature?

Looks like VM to VM or container to container copy.

>
>
> Thanks.
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-10-09  5:05     ` Jerin Jacob
@ 2023-10-27 13:46       ` Medvedkin, Vladimir
  2023-11-23  5:24         ` Jerin Jacob
  0 siblings, 1 reply; 12+ messages in thread
From: Medvedkin, Vladimir @ 2023-10-27 13:46 UTC (permalink / raw)
  To: Jerin Jacob, fengchengwen, sburla, anoobj
  Cc: Anatoly Burakov, dev, Kevin Laatz, Bruce Richardson

Hi Satananda, Anoob, Chengwen, Jerin, all,

After a number of internal discussions we have decided that we're going 
to postpone this feature/patchset till next release.

 >[Satananda] Have you considered extending  rte_dma_port_param and 
rte_dma_vchan_conf to represent interdomain memory transfer setup as a 
separate port type like RTE_DMA_PORT_INTER_DOMAIN ?

 >[Anoob] Can we move this 'src_handle' and 'dst_handle' registration to 
rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be 
configured in control path and the existing datapath APIs can work as is.

 >[Jerin] Or move src_handle/dst_hanel to vchan config

We've listened to feedback on implementation, and have prototyped a 
vchan-based interface. This has a number of advantages and 
disadvantages, both in terms of API usage and in terms of our specific 
driver.

Setting up inter-domain operations as separate vchans allow us to store 
data inside the PMD and not duplicate any API paths, so having multiple 
vchans addresses that problem. However, this also means that any new 
vchans added while the PMD is active (such as attaching to a new 
process) will have to be gated by start/stop. This is probably fine from 
API point of view, but a hassle for user (previously, we could've just 
started using the new inter-domain handle right away).

Another usability issue with multiple vchan approach is that now, each 
vchan will have its own enqueue/submit/completion cycle, so any use case 
relying on one thread communicating with many processes will have to 
process each vchan separately, instead of everything going into one 
vchan - again, looks fine API-wise, but a hassle for the user, since 
this requires calling submit and completion for each vchan, and in some 
cases it requires maintaining some kind of reordering queue. (On the 
other hand, it would be much easier to separate operations intended for 
different processes with this approach, so perhaps this is not such a 
big issue)

Finally, there is also an IDXD-specific issue. Currently, IDXD HW 
acceleration is implemented in such a way that each work queue will have 
a unique DMA device ID (rather than a unique vchan), and each device can 
technically process requests for both local and remote memory (local to 
remote, remote to local, remote to remote), all in one queue - as it was 
in our original implementation.

By changing implementation to use vchans, we're essentially bifurcating 
this single queue - all vchans would have their own rings etc., but the 
enqueue-to-hardware operation is still common to all vchans, because 
there's a single underlying queue as far as hardware is concerned. The 
queue is atomic in hardware, and technically, ENQCMD instruction returns 
status in case of enqueue failure (such as when too many requests are in 
flight), so technically we could just not pay attention to number of 
in-flight operations and just rely on ENQCMD returning failures to 
handle error/retry, but the problem with this is that this failure is 
only happening on submit, not on enqueue.

So, in essence, with IDXD driver we have two choices: either we 
implement some kind of in-flight counter to prevent our driver from 
submitting too many requests (that is, vchans will have to cooperate - 
use atomics or similar), or every user will have to handle not just 
errors on enqueue, but also on submit (which I don't believe many people 
do currently, even though technically submit can return failure - all 
non-test usage in DPDK seems to assume submit realistically won't fail, 
and I'd like to keep it that way).

We're in process of measuring performance impact of different 
implementations, however I should note that while atomic operations on 
data path are unfortunate, realistically these atomics are accessed only 
at beginning/end of every 'enqueue-submit-complete' cycle, and not on 
every operation. At the first glance where are no observable performance 
penalty in regular use case (assuming we are not calling submit for 
every enqueued job).

 >[Satananda]Do you have usecases where a process from 3rd domain sets 
up transfer between memories from 2 domains? i.e process 1 is src, 
process 2 is dest and process 3 executes transfer.

This usecase is working with proposed API on our hardware.

 >[Chengwen]And last, Could you introduce the application scenarios of 
this feature?

We have used this feature to improve performance for memif driver.


On 09/10/2023 06:05, Jerin Jacob wrote:
> On Sun, Oct 8, 2023 at 8:03 AM fengchengwen <fengchengwen@huawei.com> wrote:
>> Hi Anatoly,
>>
>> On 2023/8/12 0:14, Anatoly Burakov wrote:
>>> Add a flag to indicate that a specific device supports inter-domain
>>> operations, and add an API for inter-domain copy and fill.
>>>
>>> Inter-domain operation is an operation that is very similar to regular
>>> DMA operation, except either source or destination addresses can be in a
>>> different process's address space, indicated by source and destination
>>> handle values. These values are currently meant to be provided by
>>> private drivers' API's.
>>>
>>> This commit also adds a controller ID field into the DMA device API.
>>> This is an arbitrary value that may not be implemented by hardware, but
>>> it is meant to represent some kind of device hierarchy.
>>>
>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>> ...
>>
>>> +__rte_experimental
>>> +static inline int
>>> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
>>> +             rte_iova_t dst, uint32_t length, uint16_t src_handle,
>>> +             uint16_t dst_handle, uint64_t flags)
>> I would suggest add more general extension:
>> rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
>>                uint32_t length, uint64_t flags, void *param)
>> The param only valid under some flags bits.
>> As for this inter-domain extension: we could define inter-domain param struct.
>>
>>
>> Whether add in current rte_dma_copy() API or add one new API, I think it mainly
>> depend on performance impact of parameter transfer. Suggest more discuss for
>> differnt platform and call specification.
> Or move src_handle/dst_hanel to vchan config to enable better performance.
> Application create N number of vchan based on the requirements.
>
>>
>> And last, Could you introduce the application scenarios of this feature?
> Looks like VM to VM or container to container copy.
>
>>
>> Thanks.
>>
-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/3] dmadev: add inter-domain operations
  2023-10-27 13:46       ` Medvedkin, Vladimir
@ 2023-11-23  5:24         ` Jerin Jacob
  0 siblings, 0 replies; 12+ messages in thread
From: Jerin Jacob @ 2023-11-23  5:24 UTC (permalink / raw)
  To: Medvedkin, Vladimir
  Cc: fengchengwen, sburla, anoobj, Anatoly Burakov, dev, Kevin Laatz,
	Bruce Richardson

On Fri, Oct 27, 2023 at 7:16 PM Medvedkin, Vladimir
<vladimir.medvedkin@intel.com> wrote:
>
> Hi Satananda, Anoob, Chengwen, Jerin, all,
>
> After a number of internal discussions we have decided that we're going
> to postpone this feature/patchset till next release.
>
>  >[Satananda] Have you considered extending  rte_dma_port_param and
> rte_dma_vchan_conf to represent interdomain memory transfer setup as a
> separate port type like RTE_DMA_PORT_INTER_DOMAIN ?
>
>  >[Anoob] Can we move this 'src_handle' and 'dst_handle' registration to
> rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be
> configured in control path and the existing datapath APIs can work as is.
>
>  >[Jerin] Or move src_handle/dst_hanel to vchan config
>
> We've listened to feedback on implementation, and have prototyped a
> vchan-based interface. This has a number of advantages and
> disadvantages, both in terms of API usage and in terms of our specific
> driver.
>
> Setting up inter-domain operations as separate vchans allow us to store
> data inside the PMD and not duplicate any API paths, so having multiple
> vchans addresses that problem. However, this also means that any new
> vchans added while the PMD is active (such as attaching to a new

This could be mitigated by setup max number of vchan up front before start()
and use as demanded.

> process) will have to be gated by start/stop. This is probably fine from
> API point of view, but a hassle for user (previously, we could've just
> started using the new inter-domain handle right away).
>
> Another usability issue with multiple vchan approach is that now, each
> vchan will have its own enqueue/submit/completion cycle, so any use case
> relying on one thread communicating with many processes will have to
> process each vchan separately, instead of everything going into one
> vchan - again, looks fine API-wise, but a hassle for the user, since
> this requires calling submit and completion for each vchan, and in some
> cases it requires maintaining some kind of reordering queue. (On the
> other hand, it would be much easier to separate operations intended for
> different processes with this approach, so perhaps this is not such a
> big issue)

IMO, The design principle behind vchan was,
-A single HW queue be serving N number of vchan
-A vchan is nothing, but it creates desired HW instruction format as
template in slow path to use in fast path or write some slow path
registers to define the attribute of vchan.

IMO, The above-mentioned usability constraints will be there in all
PMD as vchan is muxing a single HW queue.

IMO, Decision for vchan vs fast path API could be
a) Number of vchan is required - In this case, we are using for VM to
VM or Container to Container copy. So I think, it is limited
b) HW support - Some HW's in order to reduce size of HW descriptor
size, some features will be configured as slow path only(can not be
changed at runtime, without reconfiguring the vchan).

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-23  5:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
2023-08-18  8:08   ` [EXT] " Anoob Joseph
2023-10-08  2:33   ` fengchengwen
2023-10-09  5:05     ` Jerin Jacob
2023-10-27 13:46       ` Medvedkin, Vladimir
2023-11-23  5:24         ` Jerin Jacob
2023-08-11 16:14 ` [PATCH v1 2/3] dma/idxd: implement " Anatoly Burakov
2023-08-11 16:14 ` [PATCH v1 3/3] dma/idxd: add API to create and attach to window Anatoly Burakov
2023-08-14  4:39   ` Jerin Jacob
2023-08-14  9:55     ` Burakov, Anatoly
2023-08-15 19:20 ` [EXT] [PATCH v1 0/3] Add support for inter-domain DMA operations Satananda Burla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).