DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/1] net/af_xdp: shared UMEM support
@ 2020-09-07 16:16 Ciara Loftus
  2020-09-07 16:16 ` [dpdk-dev] [PATCH 1/1] " Ciara Loftus
  0 siblings, 1 reply; 5+ messages in thread
From: Ciara Loftus @ 2020-09-07 16:16 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus

This patch integrates support for shared UMEMs into the AF_XDP PMD.
The feature is currently available in the linux-next tree [1] and
should be available in the v5.10 kernel.

Detailed information on the shared UMEM feature can be found in the
kernel documentation.

Support for the kernel feature is detected in DPDK by querying the
LINUX_KERNEL_VERSION and ensuring a kernel >= 5.10.0. Since this
kernel is not yet released, if one wishes to test this patch now it
is suggested to use the aforementioned linux-next tree and modify the
version check to align with the version of that tree. Once the kernel
is released (~December) this will no longer be necessary.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=acabf32805f7

Ciara Loftus (1):
  net/af_xdp: shared UMEM support

 doc/guides/nics/af_xdp.rst          |   6 +-
 drivers/net/af_xdp/compat.h         |  10 +
 drivers/net/af_xdp/meson.build      |   4 +
 drivers/net/af_xdp/rte_eth_af_xdp.c | 307 ++++++++++++++++++++++------
 4 files changed, 260 insertions(+), 67 deletions(-)
 create mode 100644 drivers/net/af_xdp/compat.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [dpdk-dev] [PATCH 1/1] net/af_xdp: shared UMEM support
  2020-09-07 16:16 [dpdk-dev] [PATCH 0/1] net/af_xdp: shared UMEM support Ciara Loftus
@ 2020-09-07 16:16 ` Ciara Loftus
  2020-09-10 11:55   ` Tahhan, Maryam
  0 siblings, 1 reply; 5+ messages in thread
From: Ciara Loftus @ 2020-09-07 16:16 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus

Kernel v5.10 will introduce the ability to efficiently share a UMEM
between AF_XDP sockets bound to different queue ids on the same or
different devices. This patch integrates that functionality into the AF_XDP
PMD.

A PMD will attempt to share a UMEM with others if the shared_umem=1 vdev
arg is set. UMEMs can only be shared across PMDs with the same mempool, up
to a limited number of PMDs goverened by the size of the given mempool.
Sharing UMEMs is not supported for non-zero-copy (aligned) mode.

The benefit of sharing UMEM across PMDs is a saving in memory due to not
having to register the UMEM multiple times. Throughput was measured to
remain within 2% of the default mode (not sharing UMEM).

A version of libbpf >= v0.2.0 is required and the appropriate pkg-config
file for libbpf must be installed such that meson can determine the
version.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 doc/guides/nics/af_xdp.rst          |   6 +-
 drivers/net/af_xdp/compat.h         |  10 +
 drivers/net/af_xdp/meson.build      |   4 +
 drivers/net/af_xdp/rte_eth_af_xdp.c | 307 ++++++++++++++++++++++------
 4 files changed, 260 insertions(+), 67 deletions(-)
 create mode 100644 drivers/net/af_xdp/compat.h

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 07bdd29e29..d22e2c9fa8 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -1,5 +1,5 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2019 Intel Corporation.
+    Copyright(c) 2020 Intel Corporation.
 
 AF_XDP Poll Mode Driver
 ==========================
@@ -32,6 +32,8 @@ The following options can be provided to set up an af_xdp port in DPDK.
 *   ``iface`` - name of the Kernel interface to attach to (required);
 *   ``start_queue`` - starting netdev queue id (optional, default 0);
 *   ``queue_count`` - total netdev queue number (optional, default 1);
+*   ``shared_umem`` - PMD will attempt to share UMEM with others (optional,
+    default 0);
 
 Prerequisites
 -------------
@@ -45,6 +47,8 @@ This is a Linux-specific PMD, thus the following prerequisites apply:
 *  A Kernel bound interface to attach to;
 *  For need_wakeup feature, it requires kernel version later than v5.3-rc1;
 *  For PMD zero copy, it requires kernel version later than v5.4-rc1;
+*  For shared_umem, it requires kernel version v5.10 or later and libbpf version
+   v0.2.0 or later.
 
 Set up an af_xdp interface
 -----------------------------
diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
new file mode 100644
index 0000000000..7a3e825d8f
--- /dev/null
+++ b/drivers/net/af_xdp/compat.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation.
+ */
+
+#include <linux/version.h>
+
+#if KERNEL_VERSION(5, 10, 0) <= LINUX_VERSION_CODE && \
+	defined(RTE_LIBRTE_AF_XDP_PMD_SHARED_UMEM)
+#define ETH_AF_XDP_SHARED_UMEM 1
+#endif
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
index 307aa0e388..fead8dd99f 100644
--- a/drivers/net/af_xdp/meson.build
+++ b/drivers/net/af_xdp/meson.build
@@ -10,6 +10,10 @@ endif
 
 if bpf_dep.found() and cc.has_header('bpf/xsk.h') and cc.has_header('linux/if_xdp.h')
 	ext_deps += bpf_dep
+	bpf_ver_dep = dependency('libbpf', version : '>=0.2.0', required: false)
+	if bpf_ver_dep.found()
+		dpdk_conf.set('RTE_LIBRTE_AF_XDP_PMD_SHARED_UMEM', 1)
+	endif
 else
 	build = false
 	reason = 'missing dependency, "libbpf"'
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 936d4a7d5f..39536edb73 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2019 Intel Corporation.
+ * Copyright(c) 2020 Intel Corporation.
  */
 #include <unistd.h>
 #include <errno.h>
@@ -37,6 +37,10 @@
 #include <rte_mbuf.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_spinlock.h>
+
+#include "compat.h"
+
 
 #ifndef SOL_XDP
 #define SOL_XDP 283
@@ -67,13 +71,13 @@ RTE_LOG_REGISTER(af_xdp_logtype, pmd.net.af_xdp, NOTICE);
 
 
 struct xsk_umem_info {
-	struct xsk_ring_prod fq;
-	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
 	struct rte_ring *buf_ring;
 	const struct rte_memzone *mz;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t refcnt;
+	uint32_t max_xsks;
 };
 
 struct rx_stats {
@@ -90,6 +94,9 @@ struct pkt_rx_queue {
 
 	struct rx_stats stats;
 
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+
 	struct pkt_tx_queue *pair;
 	struct pollfd fds[1];
 	int xsk_queue_idx;
@@ -118,6 +125,7 @@ struct pmd_internals {
 	int queue_cnt;
 	int max_queue_cnt;
 	int combined_queue_cnt;
+	bool shared_umem;
 
 	struct rte_ether_addr eth_addr;
 
@@ -128,11 +136,13 @@ struct pmd_internals {
 #define ETH_AF_XDP_IFACE_ARG			"iface"
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
+#define ETH_AF_XDP_SHARED_UMEM_ARG		"shared_umem"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
+	ETH_AF_XDP_SHARED_UMEM_ARG,
 	NULL
 };
 
@@ -143,12 +153,23 @@ static const struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+/* List which tracks PMDs to facilitate sharing UMEMs across them. */
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
 static inline int
 reserve_fill_queue_zc(struct xsk_umem_info *umem, uint16_t reserve_size,
-		      struct rte_mbuf **bufs)
+		      struct rte_mbuf **bufs, struct xsk_ring_prod *fq)
 {
-	struct xsk_ring_prod *fq = &umem->fq;
 	uint32_t idx;
 	uint16_t i;
 
@@ -176,9 +197,9 @@ reserve_fill_queue_zc(struct xsk_umem_info *umem, uint16_t reserve_size,
 #else
 static inline int
 reserve_fill_queue_cp(struct xsk_umem_info *umem, uint16_t reserve_size,
-		      struct rte_mbuf **bufs __rte_unused)
+		      struct rte_mbuf **bufs __rte_unused,
+		      struct xsk_ring_prod *fq)
 {
-	struct xsk_ring_prod *fq = &umem->fq;
 	void *addrs[reserve_size];
 	uint32_t idx;
 	uint16_t i;
@@ -211,12 +232,12 @@ reserve_fill_queue_cp(struct xsk_umem_info *umem, uint16_t reserve_size,
 
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size,
-		   struct rte_mbuf **bufs)
+		   struct rte_mbuf **bufs, struct xsk_ring_prod *fq)
 {
 #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-	return reserve_fill_queue_zc(umem, reserve_size, bufs);
+	return reserve_fill_queue_zc(umem, reserve_size, bufs, fq);
 #else
-	return reserve_fill_queue_cp(umem, reserve_size, bufs);
+	return reserve_fill_queue_cp(umem, reserve_size, bufs, fq);
 #endif
 }
 
@@ -226,6 +247,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
 	struct pkt_rx_queue *rxq = queue;
 	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_ring_prod *fq = &rxq->fq;
 	struct xsk_umem_info *umem = rxq->umem;
 	uint32_t idx_rx = 0;
 	unsigned long rx_bytes = 0;
@@ -243,7 +265,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	if (rcvd == 0) {
 #if defined(XDP_USE_NEED_WAKEUP)
-		if (xsk_ring_prod__needs_wakeup(&umem->fq))
+		if (xsk_ring_prod__needs_wakeup(fq))
 			(void)poll(rxq->fds, 1, 1000);
 #endif
 
@@ -277,7 +299,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	xsk_ring_cons__release(rx, rcvd);
 
-	(void)reserve_fill_queue(umem, rcvd, fq_bufs);
+	(void)reserve_fill_queue(umem, rcvd, fq_bufs, fq);
 
 	/* statistics */
 	rxq->stats.rx_pkts += rcvd;
@@ -297,7 +319,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_rx_queue *rxq = queue;
 	struct xsk_ring_cons *rx = &rxq->rx;
 	struct xsk_umem_info *umem = rxq->umem;
-	struct xsk_ring_prod *fq = &umem->fq;
+	struct xsk_ring_prod *fq = &rxq->fq;
 	uint32_t idx_rx = 0;
 	unsigned long rx_bytes = 0;
 	int rcvd, i;
@@ -318,7 +340,8 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	}
 
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
-		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE, NULL);
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE,
+					 NULL, fq);
 
 	for (i = 0; i < rcvd; i++) {
 		const struct xdp_desc *desc;
@@ -367,9 +390,8 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 }
 
 static void
-pull_umem_cq(struct xsk_umem_info *umem, int size)
+pull_umem_cq(struct xsk_umem_info *umem, int size, struct xsk_ring_cons *cq)
 {
-	struct xsk_ring_cons *cq = &umem->cq;
 	size_t i, n;
 	uint32_t idx_cq = 0;
 
@@ -392,11 +414,11 @@ pull_umem_cq(struct xsk_umem_info *umem, int size)
 }
 
 static void
-kick_tx(struct pkt_tx_queue *txq)
+kick_tx(struct pkt_tx_queue *txq, struct xsk_ring_cons *cq)
 {
 	struct xsk_umem_info *umem = txq->umem;
 
-	pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
+	pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS, cq);
 
 #if defined(XDP_USE_NEED_WAKEUP)
 	if (xsk_ring_prod__needs_wakeup(&txq->tx))
@@ -410,7 +432,8 @@ kick_tx(struct pkt_tx_queue *txq)
 			/* pull from completion queue to leave more space */
 			if (errno == EAGAIN)
 				pull_umem_cq(umem,
-					     XSK_RING_CONS__DEFAULT_NUM_DESCS);
+					     XSK_RING_CONS__DEFAULT_NUM_DESCS,
+					     cq);
 		}
 }
 
@@ -427,17 +450,18 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	uint16_t count = 0;
 	struct xdp_desc *desc;
 	uint64_t addr, offset;
-	uint32_t free_thresh = umem->cq.size >> 1;
+	struct xsk_ring_cons *cq = &txq->pair->cq;
+	uint32_t free_thresh = cq->size >> 1;
 
-	if (xsk_cons_nb_avail(&umem->cq, free_thresh) >= free_thresh)
-		pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
+	if (xsk_cons_nb_avail(cq, free_thresh) >= free_thresh)
+		pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS, cq);
 
 	for (i = 0; i < nb_pkts; i++) {
 		mbuf = bufs[i];
 
 		if (mbuf->pool == umem->mb_pool) {
 			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
-				kick_tx(txq);
+				kick_tx(txq, cq);
 				if (!xsk_ring_prod__reserve(&txq->tx, 1,
 							    &idx_tx))
 					goto out;
@@ -462,7 +486,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
 				rte_pktmbuf_free(local_mbuf);
-				kick_tx(txq);
+				kick_tx(txq, cq);
 				goto out;
 			}
 
@@ -486,7 +510,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		tx_bytes += mbuf->pkt_len;
 	}
 
-	kick_tx(txq);
+	kick_tx(txq, cq);
 
 out:
 	xsk_ring_prod__submit(&txq->tx, count);
@@ -508,10 +532,11 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	unsigned long tx_bytes = 0;
 	int i;
 	uint32_t idx_tx;
+	struct xsk_ring_cons *cq = &txq->pair->cq;
 
 	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
 
-	pull_umem_cq(umem, nb_pkts);
+	pull_umem_cq(umem, nb_pkts, cq);
 
 	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
 					nb_pkts, NULL);
@@ -519,7 +544,7 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		return 0;
 
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
-		kick_tx(txq);
+		kick_tx(txq, cq);
 		rte_ring_enqueue_bulk(umem->buf_ring, addrs, nb_pkts, NULL);
 		return 0;
 	}
@@ -542,7 +567,7 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
 
-	kick_tx(txq);
+	kick_tx(txq, cq);
 
 	txq->stats.tx_pkts += nb_pkts;
 	txq->stats.tx_bytes += tx_bytes;
@@ -576,13 +601,101 @@ eth_dev_stop(struct rte_eth_dev *dev)
 	dev->data->dev_link.link_status = ETH_LINK_DOWN;
 }
 
+/* Find ethdev in list */
+static inline struct internal_list *
+find_internal_resource(struct pmd_internals *port_int)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internals *list_int;
+
+	if (port_int == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		list_int = list->eth_dev->data->dev_private;
+		if (list_int == port_int) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+/* Get a pointer to an existing UMEM which overlays the rxq's mb_pool */
+static inline struct xsk_umem_info *
+get_shared_umem(struct pkt_rx_queue *rxq) {
+	struct internal_list *list;
+	struct pmd_internals *internals;
+	int i = 0;
+	struct rte_mempool *mb_pool = rxq->mb_pool;
+
+	if (mb_pool == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internals = list->eth_dev->data->dev_private;
+		for (i = 0; i < internals->queue_cnt; i++) {
+			struct pkt_rx_queue *list_rxq =
+						&internals->rx_queues[i];
+			if (rxq == list_rxq)
+				continue;
+			if (mb_pool == internals->rx_queues[i].mb_pool) {
+				if (__atomic_load_n(
+					&internals->rx_queues[i].umem->refcnt,
+							__ATOMIC_ACQUIRE)) {
+					pthread_mutex_unlock(
+							&internal_list_lock);
+					return internals->rx_queues[i].umem;
+				}
+			}
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	return NULL;
+}
+
 static int
 eth_dev_configure(struct rte_eth_dev *dev)
 {
+	struct pmd_internals *internal = dev->data->dev_private;
+
 	/* rx/tx must be paired */
 	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
 		return -EINVAL;
 
+	if (internal->shared_umem) {
+		struct internal_list *list = NULL;
+		const char *name = dev->device->name;
+
+		/* Ensure PMD is not already inserted into the list */
+		list = find_internal_resource(internal);
+		if (list)
+			return 0;
+
+		list = rte_zmalloc_socket(name, sizeof(*list), 0,
+					dev->device->numa_node);
+		if (list == NULL)
+			return -1;
+
+		list->eth_dev = dev;
+		pthread_mutex_lock(&internal_list_lock);
+		TAILQ_INSERT_TAIL(&internal_list, list, next);
+		pthread_mutex_unlock(&internal_list_lock);
+	}
+
 	return 0;
 }
 
@@ -716,8 +829,12 @@ eth_dev_close(struct rte_eth_dev *dev)
 		if (rxq->umem == NULL)
 			break;
 		xsk_socket__delete(rxq->xsk);
-		(void)xsk_umem__delete(rxq->umem->umem);
-		xdp_umem_destroy(rxq->umem);
+
+		if (__atomic_sub_fetch(&rxq->umem->refcnt, 1, __ATOMIC_ACQUIRE)
+				== 0) {
+			(void)xsk_umem__delete(rxq->umem->umem);
+			xdp_umem_destroy(rxq->umem);
+		}
 
 		/* free pkt_tx_queue */
 		rte_free(rxq->pair);
@@ -731,6 +848,19 @@ eth_dev_close(struct rte_eth_dev *dev)
 	dev->data->mac_addrs = NULL;
 
 	remove_xdp_program(internals);
+
+	if (internals->shared_umem) {
+		struct internal_list *list;
+
+		/* Remove ethdev from list used to track and share UMEMs */
+		list = find_internal_resource(internals);
+		if (list) {
+			pthread_mutex_lock(&internal_list_lock);
+			TAILQ_REMOVE(&internal_list, list, next);
+			pthread_mutex_unlock(&internal_list_lock);
+			rte_free(list);
+		}
+	}
 }
 
 static void
@@ -755,10 +885,10 @@ static inline uint64_t get_base_addr(struct rte_mempool *mp)
 }
 
 static struct
-xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
+xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 				  struct pkt_rx_queue *rxq)
 {
-	struct xsk_umem_info *umem;
+	struct xsk_umem_info *umem = NULL;
 	int ret;
 	struct xsk_umem_config usr_config = {
 		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS * 2,
@@ -767,33 +897,53 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
 	void *base_addr = NULL;
 	struct rte_mempool *mb_pool = rxq->mb_pool;
 
-	usr_config.frame_size = rte_mempool_calc_obj_size(mb_pool->elt_size,
-								mb_pool->flags,
-								NULL);
-	usr_config.frame_headroom = mb_pool->header_size +
-					sizeof(struct rte_mbuf) +
-					rte_pktmbuf_priv_size(mb_pool) +
-					RTE_PKTMBUF_HEADROOM;
-
-	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
-	if (umem == NULL) {
-		AF_XDP_LOG(ERR, "Failed to allocate umem info");
-		return NULL;
+	if (internals->shared_umem) {
+		umem = get_shared_umem(rxq);
+		if (umem != NULL &&
+			__atomic_load_n(&umem->refcnt, __ATOMIC_ACQUIRE) <
+					umem->max_xsks) {
+			AF_XDP_LOG(INFO, "%s,qid%i sharing UMEM\n",
+					internals->if_name, rxq->xsk_queue_idx);
+			__atomic_fetch_add(&umem->refcnt, 1, __ATOMIC_ACQUIRE);
+		}
 	}
 
-	umem->mb_pool = mb_pool;
-	base_addr = (void *)get_base_addr(mb_pool);
-
-	ret = xsk_umem__create(&umem->umem, base_addr,
-			       mb_pool->populated_size * usr_config.frame_size,
-			       &umem->fq, &umem->cq,
-			       &usr_config);
+	if (umem == NULL) {
+		usr_config.frame_size =
+			rte_mempool_calc_obj_size(mb_pool->elt_size,
+						  mb_pool->flags, NULL);
+		usr_config.frame_headroom = mb_pool->header_size +
+						sizeof(struct rte_mbuf) +
+						rte_pktmbuf_priv_size(mb_pool) +
+						RTE_PKTMBUF_HEADROOM;
+
+		umem = rte_zmalloc_socket("umem", sizeof(*umem), 0,
+					  rte_socket_id());
+		if (umem == NULL) {
+			AF_XDP_LOG(ERR, "Failed to allocate umem info");
+			return NULL;
+		}
 
-	if (ret) {
-		AF_XDP_LOG(ERR, "Failed to create umem");
-		goto err;
+		umem->mb_pool = mb_pool;
+		base_addr = (void *)get_base_addr(mb_pool);
+		ret = xsk_umem__create(&umem->umem, base_addr,
+				mb_pool->populated_size * usr_config.frame_size,
+				&rxq->fq, &rxq->cq,
+				&usr_config);
+		if (ret) {
+			AF_XDP_LOG(ERR, "Failed to create umem");
+			goto err;
+		}
+		umem->buffer = base_addr;
+
+		if (internals->shared_umem) {
+			umem->max_xsks = mb_pool->populated_size /
+						ETH_AF_XDP_NUM_BUFFERS;
+			AF_XDP_LOG(INFO, "Max xsks for UMEM %s: %u\n",
+						mb_pool->name, umem->max_xsks);
+			__atomic_store_n(&umem->refcnt, 1, __ATOMIC_RELEASE);
+		}
 	}
-	umem->buffer = base_addr;
 
 #else
 static struct
@@ -846,7 +996,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 
 	ret = xsk_umem__create(&umem->umem, mz->addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
-			       &umem->fq, &umem->cq,
+			       &rxq->fq, &rxq->cq,
 			       &usr_config);
 
 	if (ret) {
@@ -888,9 +1038,17 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
 #endif
 
-	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
-			rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
-			&txq->tx, &cfg);
+	if (internals->shared_umem) {
+#ifdef ETH_AF_XDP_SHARED_UMEM
+		ret = xsk_socket__create_shared(&rxq->xsk, internals->if_name,
+				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
+				&txq->tx, &rxq->fq, &rxq->cq, &cfg);
+#endif
+	} else {
+		ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
+				&txq->tx, &cfg);
+	}
 	if (ret) {
 		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
 		goto err;
@@ -902,7 +1060,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 		goto err;
 	}
 #endif
-	ret = reserve_fill_queue(rxq->umem, reserve_size, fq_bufs);
+	ret = reserve_fill_queue(rxq->umem, reserve_size, fq_bufs, &rxq->fq);
 	if (ret) {
 		xsk_socket__delete(rxq->xsk);
 		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
@@ -912,7 +1070,8 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	return 0;
 
 err:
-	xdp_umem_destroy(rxq->umem);
+	if (__atomic_sub_fetch(&rxq->umem->refcnt, 1, __ATOMIC_ACQUIRE) == 0)
+		xdp_umem_destroy(rxq->umem);
 
 	return ret;
 }
@@ -1142,7 +1301,7 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt)
+			int *queue_cnt, int *shared_umem)
 {
 	int ret;
 
@@ -1163,6 +1322,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		goto free_kvlist;
 	}
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_SHARED_UMEM_ARG,
+				&parse_integer_arg, shared_umem);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -1200,7 +1364,7 @@ get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt)
+			int start_queue_idx, int queue_cnt, int shared_umem)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -1217,6 +1381,15 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->queue_cnt = queue_cnt;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
 
+	if (shared_umem) {
+#ifndef ETH_AF_XDP_SHARED_UMEM
+		AF_XDP_LOG(ERR, "Shared UMEM feature not available. "
+				"Check kernel and libbpf version\n");
+		goto err_free_internals;
+#endif
+	}
+	internals->shared_umem = shared_umem;
+
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
 		AF_XDP_LOG(ERR, "Failed to get channel info of interface: %s\n",
@@ -1292,6 +1465,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	char if_name[IFNAMSIZ] = {'\0'};
 	int xsk_start_queue_idx = ETH_AF_XDP_DFLT_START_QUEUE_IDX;
 	int xsk_queue_cnt = ETH_AF_XDP_DFLT_QUEUE_COUNT;
+	int shared_umem = 0;
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
 
@@ -1321,7 +1495,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt) < 0) {
+			     &xsk_queue_cnt, &shared_umem) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1332,7 +1506,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt);
+					xsk_queue_cnt, shared_umem);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -1375,4 +1549,5 @@ RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
 RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "iface=<string> "
 			      "start_queue=<int> "
-			      "queue_count=<int> ");
+			      "queue_count=<int> "
+			      "shared_umem=<int> ");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 1/1] net/af_xdp: shared UMEM support
  2020-09-07 16:16 ` [dpdk-dev] [PATCH 1/1] " Ciara Loftus
@ 2020-09-10 11:55   ` Tahhan, Maryam
  2020-09-17  8:54     ` Loftus, Ciara
  0 siblings, 1 reply; 5+ messages in thread
From: Tahhan, Maryam @ 2020-09-10 11:55 UTC (permalink / raw)
  To: Loftus, Ciara, dev



> -----Original Message-----
> From: Loftus, Ciara <ciara.loftus@intel.com>
> Sent: Monday 7 September 2020 17:16
> To: dev@dpdk.org
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: [PATCH 1/1] net/af_xdp: shared UMEM support
> 
> Kernel v5.10 will introduce the ability to efficiently share a UMEM between
> AF_XDP sockets bound to different queue ids on the same or different
> devices. This patch integrates that functionality into the AF_XDP PMD.
> 
> A PMD will attempt to share a UMEM with others if the shared_umem=1
> vdev arg is set. UMEMs can only be shared across PMDs with the same
> mempool, up to a limited number of PMDs goverened by the size of the
> given mempool.
> Sharing UMEMs is not supported for non-zero-copy (aligned) mode.
> 
> The benefit of sharing UMEM across PMDs is a saving in memory due to not
> having to register the UMEM multiple times. Throughput was measured to
> remain within 2% of the default mode (not sharing UMEM).
> 
> A version of libbpf >= v0.2.0 is required and the appropriate pkg-config file
> for libbpf must be installed such that meson can determine the version.
> 
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

<snip>

> 
> +/* List which tracks PMDs to facilitate sharing UMEMs across them. */
> +struct internal_list {
> +	TAILQ_ENTRY(internal_list) next;
> +	struct rte_eth_dev *eth_dev;
> +};
> +
> +TAILQ_HEAD(internal_list_head, internal_list); static struct
> +internal_list_head internal_list =
> +	TAILQ_HEAD_INITIALIZER(internal_list);
> +
> +static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;

[Tahhan, Maryam] do multiple threads typically initialize and ethdev/invoke the underlying driver? 
Most apps I've seen initialize the ports one after the other in the starting thread - so if there's not multiple threads doing initialization - we may want to consider removing this mutex...
Or maybe do you see something potentially removing a port while a port is being added?

<snip>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 1/1] net/af_xdp: shared UMEM support
  2020-09-10 11:55   ` Tahhan, Maryam
@ 2020-09-17  8:54     ` Loftus, Ciara
  2020-09-17  9:49       ` Tahhan, Maryam
  0 siblings, 1 reply; 5+ messages in thread
From: Loftus, Ciara @ 2020-09-17  8:54 UTC (permalink / raw)
  To: Tahhan, Maryam; +Cc: dev

> >
> > Kernel v5.10 will introduce the ability to efficiently share a UMEM between
> > AF_XDP sockets bound to different queue ids on the same or different
> > devices. This patch integrates that functionality into the AF_XDP PMD.
> >
> > A PMD will attempt to share a UMEM with others if the shared_umem=1
> > vdev arg is set. UMEMs can only be shared across PMDs with the same
> > mempool, up to a limited number of PMDs goverened by the size of the
> > given mempool.
> > Sharing UMEMs is not supported for non-zero-copy (aligned) mode.
> >
> > The benefit of sharing UMEM across PMDs is a saving in memory due to not
> > having to register the UMEM multiple times. Throughput was measured to
> > remain within 2% of the default mode (not sharing UMEM).
> >
> > A version of libbpf >= v0.2.0 is required and the appropriate pkg-config file
> > for libbpf must be installed such that meson can determine the version.
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> 
> <snip>
> 
> >
> > +/* List which tracks PMDs to facilitate sharing UMEMs across them. */
> > +struct internal_list {
> > +TAILQ_ENTRY(internal_list) next;
> > +struct rte_eth_dev *eth_dev;
> > +};
> > +
> > +TAILQ_HEAD(internal_list_head, internal_list); static struct
> > +internal_list_head internal_list =
> > +TAILQ_HEAD_INITIALIZER(internal_list);
> > +
> > +static pthread_mutex_t internal_list_lock =
> PTHREAD_MUTEX_INITIALIZER;
> 
> [Tahhan, Maryam] do multiple threads typically initialize and ethdev/invoke
> the underlying driver?
> Most apps I've seen initialize the ports one after the other in the starting
> thread - so if there's not multiple threads doing initialization - we may want
> to consider removing this mutex...
> Or maybe do you see something potentially removing a port while a port is
> being added?

Hi Maryam,

Yes. Although unlikely, I'm not aware of any guarantee that a port A cannot be removed when port B is being added and since both operations can touch the tailq I'm inclined to keep the mutex. But I'm open to correction.

Thanks,
Ciara

> 
> <snip>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH 1/1] net/af_xdp: shared UMEM support
  2020-09-17  8:54     ` Loftus, Ciara
@ 2020-09-17  9:49       ` Tahhan, Maryam
  0 siblings, 0 replies; 5+ messages in thread
From: Tahhan, Maryam @ 2020-09-17  9:49 UTC (permalink / raw)
  To: Loftus, Ciara; +Cc: dev



> -----Original Message-----
> From: Loftus, Ciara <ciara.loftus@intel.com>
> Sent: Thursday 17 September 2020 09:55
> To: Tahhan, Maryam <maryam.tahhan@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH 1/1] net/af_xdp: shared UMEM support
> 
> > >
> > > Kernel v5.10 will introduce the ability to efficiently share a UMEM
> > > between AF_XDP sockets bound to different queue ids on the same or
> > > different devices. This patch integrates that functionality into the AF_XDP
> PMD.
> > >
> > > A PMD will attempt to share a UMEM with others if the shared_umem=1
> > > vdev arg is set. UMEMs can only be shared across PMDs with the same
> > > mempool, up to a limited number of PMDs goverened by the size of the
> > > given mempool.
> > > Sharing UMEMs is not supported for non-zero-copy (aligned) mode.
> > >
> > > The benefit of sharing UMEM across PMDs is a saving in memory due to
> > > not having to register the UMEM multiple times. Throughput was
> > > measured to remain within 2% of the default mode (not sharing UMEM).
> > >
> > > A version of libbpf >= v0.2.0 is required and the appropriate
> > > pkg-config file for libbpf must be installed such that meson can determine
> the version.
> > >
> > > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> >
> > <snip>
> >
> > >
> > > +/* List which tracks PMDs to facilitate sharing UMEMs across them.
> > > +*/ struct internal_list {
> > > +TAILQ_ENTRY(internal_list) next;
> > > +struct rte_eth_dev *eth_dev;
> > > +};
> > > +
> > > +TAILQ_HEAD(internal_list_head, internal_list); static struct
> > > +internal_list_head internal_list =
> > > +TAILQ_HEAD_INITIALIZER(internal_list);
> > > +
> > > +static pthread_mutex_t internal_list_lock =
> > PTHREAD_MUTEX_INITIALIZER;
> >
> > [Tahhan, Maryam] do multiple threads typically initialize and
> > ethdev/invoke the underlying driver?
> > Most apps I've seen initialize the ports one after the other in the
> > starting thread - so if there's not multiple threads doing
> > initialization - we may want to consider removing this mutex...
> > Or maybe do you see something potentially removing a port while a port
> > is being added?
> 
> Hi Maryam,
> 
> Yes. Although unlikely, I'm not aware of any guarantee that a port A cannot
> be removed when port B is being added and since both operations can touch
> the tailq I'm inclined to keep the mutex. But I'm open to correction.
> 
> Thanks,
> Ciara
> 

No worries :) better to be safe than sorry 

> >
> > <snip>
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-17  9:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 16:16 [dpdk-dev] [PATCH 0/1] net/af_xdp: shared UMEM support Ciara Loftus
2020-09-07 16:16 ` [dpdk-dev] [PATCH 1/1] " Ciara Loftus
2020-09-10 11:55   ` Tahhan, Maryam
2020-09-17  8:54     ` Loftus, Ciara
2020-09-17  9:49       ` Tahhan, Maryam

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git