DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support
@ 2020-08-11  9:50 Ciara Loftus
  2020-08-11  9:50 ` [dpdk-dev] [PATCH RFC 1/1] " Ciara Loftus
  2020-08-25 12:14 ` [dpdk-dev] [PATCH RFC 0/1] " Ferruh Yigit
  0 siblings, 2 replies; 6+ messages in thread
From: Ciara Loftus @ 2020-08-11  9:50 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus

This RFC integrates shared UMEM support into the AF_XDP PMD. It is based on the
WIP kernel series [1] by Magnus Karlsson.

Detailed information on the shared UMEM feature can be found in the final patch
in the aforementioned series.

Support for the kernel feature can eventually be detected in DPDK by querying
the LINUX_KERNEL_VERSION. As of now the feature is not yet merged upstream, so
for this RFC it is assumed the user is using a patched version of v5.8.

Shared UMEM is only available for zero copy mode.

In order to share UMEM information between PMDs, the ethdevs wishing to share
must be tracked somehow. The method chosen to do so is similar to methods used
in the vHost [2] and vDPA drivers, where pointers to the ethdevs are maintained
in an internal list. Proposals for alternate solutions are welcome.

Performance data to follow with the v1.

[1] https://patchwork.ozlabs.org/project/netdev/cover/1595307848-20719-1-git-send-email-magnus.karlsson@intel.com/
[2] https://git.dpdk.org/dpdk/commit/?id=ee584e9710b9abd60ee9faef664e106dcea10085

Ciara Loftus (1):
  net/af_xdp: shared UMEM support

 doc/guides/nics/af_xdp.rst          |   5 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c | 315 ++++++++++++++++++++++------
 2 files changed, 252 insertions(+), 68 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-dev] [PATCH RFC 1/1] net/af_xdp: shared UMEM support
  2020-08-11  9:50 [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support Ciara Loftus
@ 2020-08-11  9:50 ` Ciara Loftus
  2020-08-25 14:09   ` Ferruh Yigit
  2020-08-25 12:14 ` [dpdk-dev] [PATCH RFC 0/1] " Ferruh Yigit
  1 sibling, 1 reply; 6+ messages in thread
From: Ciara Loftus @ 2020-08-11  9:50 UTC (permalink / raw)
  To: dev; +Cc: Ciara Loftus

A future kernel will introduce the ability to efficiently share a UMEM
between AF_XDP sockets bound to different queue ids on the same or
different devices. This patch integrates that functionality into the AF_XDP
PMD.

A PMD will attempt to share a UMEM with others if the shared_umem=1 vdev
arg is set. UMEMs can only be shared across PMDs with the same mempool, up
to a limited number of PMDs goverened by the size of the given mempool.

The benefit of sharing UMEM across PMDs is a saving in memory due to not
having to register the UMEM multiple times.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
 doc/guides/nics/af_xdp.rst          |   5 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c | 315 ++++++++++++++++++++++------
 2 files changed, 252 insertions(+), 68 deletions(-)

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 07bdd29e29..26f7d171f8 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -1,5 +1,5 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2019 Intel Corporation.
+    Copyright(c) 2020 Intel Corporation.
 
 AF_XDP Poll Mode Driver
 ==========================
@@ -32,6 +32,8 @@ The following options can be provided to set up an af_xdp port in DPDK.
 *   ``iface`` - name of the Kernel interface to attach to (required);
 *   ``start_queue`` - starting netdev queue id (optional, default 0);
 *   ``queue_count`` - total netdev queue number (optional, default 1);
+*   ``shared_umem`` - PMD will attempt to share UMEM with others (optional,
+    default 0);
 
 Prerequisites
 -------------
@@ -45,6 +47,7 @@ This is a Linux-specific PMD, thus the following prerequisites apply:
 *  A Kernel bound interface to attach to;
 *  For need_wakeup feature, it requires kernel version later than v5.3-rc1;
 *  For PMD zero copy, it requires kernel version later than v5.4-rc1;
+*  For shared_umem, it requires kernel version later than v5.TBD;
 
 Set up an af_xdp interface
 -----------------------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 936d4a7d5f..24f9458712 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2019 Intel Corporation.
+ * Copyright(c) 2020 Intel Corporation.
  */
 #include <unistd.h>
 #include <errno.h>
@@ -15,6 +15,7 @@
 #include <linux/if_link.h>
 #include <linux/ethtool.h>
 #include <linux/sockios.h>
+#include <linux/version.h>
 #include "af_xdp_deps.h"
 #include <bpf/xsk.h>
 
@@ -37,6 +38,11 @@
 #include <rte_mbuf.h>
 #include <rte_malloc.h>
 #include <rte_ring.h>
+#include <rte_spinlock.h>
+
+#if KERNEL_VERSION(5, 7, 0) < LINUX_VERSION_CODE
+#define ETH_AF_XDP_SHARED_UMEM 1
+#endif
 
 #ifndef SOL_XDP
 #define SOL_XDP 283
@@ -67,13 +73,13 @@ RTE_LOG_REGISTER(af_xdp_logtype, pmd.net.af_xdp, NOTICE);
 
 
 struct xsk_umem_info {
-	struct xsk_ring_prod fq;
-	struct xsk_ring_cons cq;
 	struct xsk_umem *umem;
 	struct rte_ring *buf_ring;
 	const struct rte_memzone *mz;
 	struct rte_mempool *mb_pool;
 	void *buffer;
+	uint8_t refcnt;
+	uint32_t max_xsks;
 };
 
 struct rx_stats {
@@ -90,6 +96,9 @@ struct pkt_rx_queue {
 
 	struct rx_stats stats;
 
+	struct xsk_ring_prod fq;
+	struct xsk_ring_cons cq;
+
 	struct pkt_tx_queue *pair;
 	struct pollfd fds[1];
 	int xsk_queue_idx;
@@ -118,6 +127,8 @@ struct pmd_internals {
 	int queue_cnt;
 	int max_queue_cnt;
 	int combined_queue_cnt;
+	bool shared_umem_requested;
+	bool shared_umem_configured;
 
 	struct rte_ether_addr eth_addr;
 
@@ -128,11 +139,13 @@ struct pmd_internals {
 #define ETH_AF_XDP_IFACE_ARG			"iface"
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
+#define ETH_AF_XDP_SHARED_UMEM_ARG		"shared_umem"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
+	ETH_AF_XDP_SHARED_UMEM_ARG,
 	NULL
 };
 
@@ -143,12 +156,23 @@ static const struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+/* List which tracks PMDs to facilitate sharing UMEMs across them. */
+struct internal_list {
+	TAILQ_ENTRY(internal_list) next;
+	struct rte_eth_dev *eth_dev;
+};
+
+TAILQ_HEAD(internal_list_head, internal_list);
+static struct internal_list_head internal_list =
+	TAILQ_HEAD_INITIALIZER(internal_list);
+
+static pthread_mutex_t internal_list_lock = PTHREAD_MUTEX_INITIALIZER;
+
 #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
 static inline int
 reserve_fill_queue_zc(struct xsk_umem_info *umem, uint16_t reserve_size,
-		      struct rte_mbuf **bufs)
+		      struct rte_mbuf **bufs, struct xsk_ring_prod *fq)
 {
-	struct xsk_ring_prod *fq = &umem->fq;
 	uint32_t idx;
 	uint16_t i;
 
@@ -176,9 +200,9 @@ reserve_fill_queue_zc(struct xsk_umem_info *umem, uint16_t reserve_size,
 #else
 static inline int
 reserve_fill_queue_cp(struct xsk_umem_info *umem, uint16_t reserve_size,
-		      struct rte_mbuf **bufs __rte_unused)
+		      struct rte_mbuf **bufs __rte_unused,
+		      struct xsk_ring_prod *fq)
 {
-	struct xsk_ring_prod *fq = &umem->fq;
 	void *addrs[reserve_size];
 	uint32_t idx;
 	uint16_t i;
@@ -211,12 +235,12 @@ reserve_fill_queue_cp(struct xsk_umem_info *umem, uint16_t reserve_size,
 
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size,
-		   struct rte_mbuf **bufs)
+		   struct rte_mbuf **bufs, struct xsk_ring_prod *fq)
 {
 #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-	return reserve_fill_queue_zc(umem, reserve_size, bufs);
+	return reserve_fill_queue_zc(umem, reserve_size, bufs, fq);
 #else
-	return reserve_fill_queue_cp(umem, reserve_size, bufs);
+	return reserve_fill_queue_cp(umem, reserve_size, bufs, fq);
 #endif
 }
 
@@ -226,6 +250,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 {
 	struct pkt_rx_queue *rxq = queue;
 	struct xsk_ring_cons *rx = &rxq->rx;
+	struct xsk_ring_prod *fq = &rxq->fq;
 	struct xsk_umem_info *umem = rxq->umem;
 	uint32_t idx_rx = 0;
 	unsigned long rx_bytes = 0;
@@ -243,7 +268,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	if (rcvd == 0) {
 #if defined(XDP_USE_NEED_WAKEUP)
-		if (xsk_ring_prod__needs_wakeup(&umem->fq))
+		if (xsk_ring_prod__needs_wakeup(fq))
 			(void)poll(rxq->fds, 1, 1000);
 #endif
 
@@ -277,7 +302,7 @@ af_xdp_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	xsk_ring_cons__release(rx, rcvd);
 
-	(void)reserve_fill_queue(umem, rcvd, fq_bufs);
+	(void)reserve_fill_queue(umem, rcvd, fq_bufs, fq);
 
 	/* statistics */
 	rxq->stats.rx_pkts += rcvd;
@@ -297,7 +322,7 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	struct pkt_rx_queue *rxq = queue;
 	struct xsk_ring_cons *rx = &rxq->rx;
 	struct xsk_umem_info *umem = rxq->umem;
-	struct xsk_ring_prod *fq = &umem->fq;
+	struct xsk_ring_prod *fq = &rxq->fq;
 	uint32_t idx_rx = 0;
 	unsigned long rx_bytes = 0;
 	int rcvd, i;
@@ -318,7 +343,8 @@ af_xdp_rx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	}
 
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
-		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE, NULL);
+		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE,
+					 NULL, fq);
 
 	for (i = 0; i < rcvd; i++) {
 		const struct xdp_desc *desc;
@@ -367,9 +393,8 @@ eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 }
 
 static void
-pull_umem_cq(struct xsk_umem_info *umem, int size)
+pull_umem_cq(struct xsk_umem_info *umem, int size, struct xsk_ring_cons *cq)
 {
-	struct xsk_ring_cons *cq = &umem->cq;
 	size_t i, n;
 	uint32_t idx_cq = 0;
 
@@ -392,11 +417,11 @@ pull_umem_cq(struct xsk_umem_info *umem, int size)
 }
 
 static void
-kick_tx(struct pkt_tx_queue *txq)
+kick_tx(struct pkt_tx_queue *txq, struct xsk_ring_cons *cq)
 {
 	struct xsk_umem_info *umem = txq->umem;
 
-	pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
+	pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS, cq);
 
 #if defined(XDP_USE_NEED_WAKEUP)
 	if (xsk_ring_prod__needs_wakeup(&txq->tx))
@@ -410,7 +435,8 @@ kick_tx(struct pkt_tx_queue *txq)
 			/* pull from completion queue to leave more space */
 			if (errno == EAGAIN)
 				pull_umem_cq(umem,
-					     XSK_RING_CONS__DEFAULT_NUM_DESCS);
+					     XSK_RING_CONS__DEFAULT_NUM_DESCS,
+					     cq);
 		}
 }
 
@@ -427,17 +453,18 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	uint16_t count = 0;
 	struct xdp_desc *desc;
 	uint64_t addr, offset;
-	uint32_t free_thresh = umem->cq.size >> 1;
+	struct xsk_ring_cons *cq = &txq->pair->cq;
+	uint32_t free_thresh = cq->size >> 1;
 
-	if (xsk_cons_nb_avail(&umem->cq, free_thresh) >= free_thresh)
-		pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS);
+	if (xsk_cons_nb_avail(cq, free_thresh) >= free_thresh)
+		pull_umem_cq(umem, XSK_RING_CONS__DEFAULT_NUM_DESCS, cq);
 
 	for (i = 0; i < nb_pkts; i++) {
 		mbuf = bufs[i];
 
 		if (mbuf->pool == umem->mb_pool) {
 			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
-				kick_tx(txq);
+				kick_tx(txq, cq);
 				if (!xsk_ring_prod__reserve(&txq->tx, 1,
 							    &idx_tx))
 					goto out;
@@ -462,7 +489,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 			if (!xsk_ring_prod__reserve(&txq->tx, 1, &idx_tx)) {
 				rte_pktmbuf_free(local_mbuf);
-				kick_tx(txq);
+				kick_tx(txq, cq);
 				goto out;
 			}
 
@@ -486,7 +513,7 @@ af_xdp_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		tx_bytes += mbuf->pkt_len;
 	}
 
-	kick_tx(txq);
+	kick_tx(txq, cq);
 
 out:
 	xsk_ring_prod__submit(&txq->tx, count);
@@ -508,10 +535,11 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	unsigned long tx_bytes = 0;
 	int i;
 	uint32_t idx_tx;
+	struct xsk_ring_cons *cq = &txq->pair->cq;
 
 	nb_pkts = RTE_MIN(nb_pkts, ETH_AF_XDP_TX_BATCH_SIZE);
 
-	pull_umem_cq(umem, nb_pkts);
+	pull_umem_cq(umem, nb_pkts, cq);
 
 	nb_pkts = rte_ring_dequeue_bulk(umem->buf_ring, addrs,
 					nb_pkts, NULL);
@@ -519,7 +547,7 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		return 0;
 
 	if (xsk_ring_prod__reserve(&txq->tx, nb_pkts, &idx_tx) != nb_pkts) {
-		kick_tx(txq);
+		kick_tx(txq, cq);
 		rte_ring_enqueue_bulk(umem->buf_ring, addrs, nb_pkts, NULL);
 		return 0;
 	}
@@ -542,7 +570,7 @@ af_xdp_tx_cp(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 	xsk_ring_prod__submit(&txq->tx, nb_pkts);
 
-	kick_tx(txq);
+	kick_tx(txq, cq);
 
 	txq->stats.tx_pkts += nb_pkts;
 	txq->stats.tx_bytes += tx_bytes;
@@ -576,6 +604,72 @@ eth_dev_stop(struct rte_eth_dev *dev)
 	dev->data->dev_link.link_status = ETH_LINK_DOWN;
 }
 
+/* Find ethdev in list */
+static inline struct internal_list *
+find_internal_resource(struct pmd_internals *port_int)
+{
+	int found = 0;
+	struct internal_list *list;
+	struct pmd_internals *list_int;
+
+	if (port_int == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		list_int = list->eth_dev->data->dev_private;
+		if (list_int == port_int) {
+			found = 1;
+			break;
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	if (!found)
+		return NULL;
+
+	return list;
+}
+
+/* Get a pointer to an existing UMEM which overlays the rxq's mb_pool */
+static inline struct xsk_umem_info *
+get_shared_umem(struct pkt_rx_queue *rxq) {
+	struct internal_list *list;
+	struct pmd_internals *internals;
+	int i = 0;
+	struct rte_mempool *mb_pool = rxq->mb_pool;
+
+	if (mb_pool == NULL)
+		return NULL;
+
+	pthread_mutex_lock(&internal_list_lock);
+
+	TAILQ_FOREACH(list, &internal_list, next) {
+		internals = list->eth_dev->data->dev_private;
+		for (i = 0; i < internals->queue_cnt; i++) {
+			struct pkt_rx_queue *list_rxq =
+						&internals->rx_queues[i];
+			if (rxq == list_rxq)
+				continue;
+			if (mb_pool == internals->rx_queues[i].mb_pool) {
+				if (__atomic_load_n(
+					&internals->rx_queues[i].umem->refcnt,
+							__ATOMIC_ACQUIRE)) {
+					pthread_mutex_unlock(
+							&internal_list_lock);
+					return internals->rx_queues[i].umem;
+				}
+			}
+		}
+	}
+
+	pthread_mutex_unlock(&internal_list_lock);
+
+	return NULL;
+}
+
 static int
 eth_dev_configure(struct rte_eth_dev *dev)
 {
@@ -583,6 +677,29 @@ eth_dev_configure(struct rte_eth_dev *dev)
 	if (dev->data->nb_rx_queues != dev->data->nb_tx_queues)
 		return -EINVAL;
 
+#ifdef ETH_AF_XDP_SHARED_UMEM
+	struct internal_list *list = NULL;
+	const char *name = dev->device->name;
+	struct pmd_internals *internal = dev->data->dev_private;
+
+	if (internal->shared_umem_requested) {
+		/* Ensure PMD is not already inserted into the list */
+		list = find_internal_resource(internal);
+		if (list)
+			return 0;
+
+		list = rte_zmalloc_socket(name, sizeof(*list), 0,
+					dev->device->numa_node);
+		if (list == NULL)
+			return -1;
+
+		list->eth_dev = dev;
+		pthread_mutex_lock(&internal_list_lock);
+		TAILQ_INSERT_TAIL(&internal_list, list, next);
+		pthread_mutex_unlock(&internal_list_lock);
+	}
+#endif
+
 	return 0;
 }
 
@@ -716,8 +833,12 @@ eth_dev_close(struct rte_eth_dev *dev)
 		if (rxq->umem == NULL)
 			break;
 		xsk_socket__delete(rxq->xsk);
-		(void)xsk_umem__delete(rxq->umem->umem);
-		xdp_umem_destroy(rxq->umem);
+
+		if (__atomic_sub_fetch(&rxq->umem->refcnt, 1, __ATOMIC_ACQUIRE)
+				== 0) {
+			(void)xsk_umem__delete(rxq->umem->umem);
+			xdp_umem_destroy(rxq->umem);
+		}
 
 		/* free pkt_tx_queue */
 		rte_free(rxq->pair);
@@ -731,6 +852,21 @@ eth_dev_close(struct rte_eth_dev *dev)
 	dev->data->mac_addrs = NULL;
 
 	remove_xdp_program(internals);
+
+#ifdef ETH_AF_XDP_SHARED_UMEM
+	struct internal_list *list;
+
+	if (internals->shared_umem_requested) {
+		/* Remove ethdev from list used to track and share UMEMs */
+		list = find_internal_resource(internals);
+		if (list) {
+			pthread_mutex_lock(&internal_list_lock);
+			TAILQ_REMOVE(&internal_list, list, next);
+			pthread_mutex_unlock(&internal_list_lock);
+			rte_free(list);
+		}
+	}
+#endif
 }
 
 static void
@@ -755,10 +891,10 @@ static inline uint64_t get_base_addr(struct rte_mempool *mp)
 }
 
 static struct
-xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
+xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 				  struct pkt_rx_queue *rxq)
 {
-	struct xsk_umem_info *umem;
+	struct xsk_umem_info *umem = NULL;
 	int ret;
 	struct xsk_umem_config usr_config = {
 		.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS * 2,
@@ -767,33 +903,57 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals __rte_unused,
 	void *base_addr = NULL;
 	struct rte_mempool *mb_pool = rxq->mb_pool;
 
-	usr_config.frame_size = rte_mempool_calc_obj_size(mb_pool->elt_size,
-								mb_pool->flags,
-								NULL);
-	usr_config.frame_headroom = mb_pool->header_size +
-					sizeof(struct rte_mbuf) +
-					rte_pktmbuf_priv_size(mb_pool) +
-					RTE_PKTMBUF_HEADROOM;
-
-	umem = rte_zmalloc_socket("umem", sizeof(*umem), 0, rte_socket_id());
-	if (umem == NULL) {
-		AF_XDP_LOG(ERR, "Failed to allocate umem info");
-		return NULL;
+#ifdef ETH_AF_XDP_SHARED_UMEM
+	if (internals->shared_umem_requested) {
+		umem = get_shared_umem(rxq);
+		if (umem != NULL &&
+			__atomic_load_n(&umem->refcnt, __ATOMIC_ACQUIRE) <
+					umem->max_xsks) {
+			AF_XDP_LOG(INFO, "%s,qid%i sharing UMEM\n",
+					internals->if_name, rxq->xsk_queue_idx);
+			internals->shared_umem_configured = true;
+			__atomic_fetch_add(&umem->refcnt, 1, __ATOMIC_ACQUIRE);
+		}
 	}
+#endif
 
-	umem->mb_pool = mb_pool;
-	base_addr = (void *)get_base_addr(mb_pool);
-
-	ret = xsk_umem__create(&umem->umem, base_addr,
-			       mb_pool->populated_size * usr_config.frame_size,
-			       &umem->fq, &umem->cq,
-			       &usr_config);
+	if (!internals->shared_umem_configured) {
+		usr_config.frame_size =
+			rte_mempool_calc_obj_size(mb_pool->elt_size,
+						  mb_pool->flags, NULL);
+		usr_config.frame_headroom = mb_pool->header_size +
+						sizeof(struct rte_mbuf) +
+						rte_pktmbuf_priv_size(mb_pool) +
+						RTE_PKTMBUF_HEADROOM;
+
+		umem = rte_zmalloc_socket("umem", sizeof(*umem), 0,
+					  rte_socket_id());
+		if (umem == NULL) {
+			AF_XDP_LOG(ERR, "Failed to allocate umem info");
+			return NULL;
+		}
 
-	if (ret) {
-		AF_XDP_LOG(ERR, "Failed to create umem");
-		goto err;
+		umem->mb_pool = mb_pool;
+		base_addr = (void *)get_base_addr(mb_pool);
+		ret = xsk_umem__create(&umem->umem, base_addr,
+				mb_pool->populated_size * usr_config.frame_size,
+				&rxq->fq, &rxq->cq,
+				&usr_config);
+		if (ret) {
+			AF_XDP_LOG(ERR, "Failed to create umem\n");
+			goto err;
+		}
+		umem->buffer = base_addr;
+#ifdef ETH_AF_XDP_SHARED_UMEM
+		if (internals->shared_umem_requested) {
+			umem->max_xsks = mb_pool->populated_size /
+						ETH_AF_XDP_NUM_BUFFERS;
+			AF_XDP_LOG(INFO, "Max xsks for UMEM %s: %u\n",
+						mb_pool->name, umem->max_xsks);
+			__atomic_store_n(&umem->refcnt, 1, __ATOMIC_RELEASE);
+		}
+#endif
 	}
-	umem->buffer = base_addr;
 
 #else
 static struct
@@ -846,11 +1006,11 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 
 	ret = xsk_umem__create(&umem->umem, mz->addr,
 			       ETH_AF_XDP_NUM_BUFFERS * ETH_AF_XDP_FRAME_SIZE,
-			       &umem->fq, &umem->cq,
+			       &rxq->fq, &rxq->cq,
 			       &usr_config);
 
 	if (ret) {
-		AF_XDP_LOG(ERR, "Failed to create umem");
+		AF_XDP_LOG(ERR, "Failed to create umem\n");
 		goto err;
 	}
 	umem->mz = mz;
@@ -888,9 +1048,15 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
 #endif
 
-	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
-			rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
-			&txq->tx, &cfg);
+	if (!internals->shared_umem_configured) {
+		ret = xsk_socket__create(&rxq->xsk, internals->if_name,
+				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
+				&txq->tx, &cfg);
+	} else {
+		ret = xsk_socket__create_shared(&rxq->xsk, internals->if_name,
+				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
+				&txq->tx, &rxq->fq, &rxq->cq, &cfg);
+	}
 	if (ret) {
 		AF_XDP_LOG(ERR, "Failed to create xsk socket.\n");
 		goto err;
@@ -902,7 +1068,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 		goto err;
 	}
 #endif
-	ret = reserve_fill_queue(rxq->umem, reserve_size, fq_bufs);
+	ret = reserve_fill_queue(rxq->umem, reserve_size, fq_bufs, &rxq->fq);
 	if (ret) {
 		xsk_socket__delete(rxq->xsk);
 		AF_XDP_LOG(ERR, "Failed to reserve fill queue.\n");
@@ -912,7 +1078,8 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	return 0;
 
 err:
-	xdp_umem_destroy(rxq->umem);
+	if (__atomic_sub_fetch(&rxq->umem->refcnt, 1, __ATOMIC_ACQUIRE) == 0)
+		xdp_umem_destroy(rxq->umem);
 
 	return ret;
 }
@@ -1142,7 +1309,7 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt)
+			int *queue_cnt, int *shared_umem)
 {
 	int ret;
 
@@ -1163,6 +1330,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		goto free_kvlist;
 	}
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_SHARED_UMEM_ARG,
+				&parse_integer_arg, shared_umem);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -1200,7 +1372,7 @@ get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt)
+			int start_queue_idx, int queue_cnt, int shared_umem)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -1216,6 +1388,13 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->start_queue_idx = start_queue_idx;
 	internals->queue_cnt = queue_cnt;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+#ifndef ETH_AF_XDP_SHARED_UMEM
+	if (shared_umem) {
+		AF_XDP_LOG(ERR, "Shared UMEM feature not available. Check kernel version\n");
+		goto err_free_internals;
+	}
+#endif
+	internals->shared_umem_requested = shared_umem;
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
@@ -1292,6 +1471,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	char if_name[IFNAMSIZ] = {'\0'};
 	int xsk_start_queue_idx = ETH_AF_XDP_DFLT_START_QUEUE_IDX;
 	int xsk_queue_cnt = ETH_AF_XDP_DFLT_QUEUE_COUNT;
+	int shared_umem = 0;
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
 
@@ -1321,7 +1501,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt) < 0) {
+			     &xsk_queue_cnt, &shared_umem) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1332,7 +1512,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt);
+					xsk_queue_cnt, shared_umem);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -1375,4 +1555,5 @@ RTE_PMD_REGISTER_VDEV(net_af_xdp, pmd_af_xdp_drv);
 RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "iface=<string> "
 			      "start_queue=<int> "
-			      "queue_count=<int> ");
+			      "queue_count=<int> "
+			      "shared_umem=<int> ");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support
  2020-08-11  9:50 [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support Ciara Loftus
  2020-08-11  9:50 ` [dpdk-dev] [PATCH RFC 1/1] " Ciara Loftus
@ 2020-08-25 12:14 ` Ferruh Yigit
  2020-08-26  7:45   ` Loftus, Ciara
  1 sibling, 1 reply; 6+ messages in thread
From: Ferruh Yigit @ 2020-08-25 12:14 UTC (permalink / raw)
  To: Ciara Loftus, dev

On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> This RFC integrates shared UMEM support into the AF_XDP PMD. It is based on the
> WIP kernel series [1] by Magnus Karlsson.
> 
> Detailed information on the shared UMEM feature can be found in the final patch
> in the aforementioned series.
> 
> Support for the kernel feature can eventually be detected in DPDK by querying
> the LINUX_KERNEL_VERSION. As of now the feature is not yet merged upstream, so
> for this RFC it is assumed the user is using a patched version of v5.8.

Hi Ciara,

I don't see this feature in kernel in 5.9.0-rc2, is there any update in the
kernel side?

> 
> Shared UMEM is only available for zero copy mode.
> 
> In order to share UMEM information between PMDs, the ethdevs wishing to share
> must be tracked somehow. The method chosen to do so is similar to methods used
> in the vHost [2] and vDPA drivers, where pointers to the ethdevs are maintained
> in an internal list. Proposals for alternate solutions are welcome.
> 
> Performance data to follow with the v1.
> 
> [1] https://patchwork.ozlabs.org/project/netdev/cover/1595307848-20719-1-git-send-email-magnus.karlsson@intel.com/
> [2] https://git.dpdk.org/dpdk/commit/?id=ee584e9710b9abd60ee9faef664e106dcea10085
> 
> Ciara Loftus (1):
>   net/af_xdp: shared UMEM support
> 
>  doc/guides/nics/af_xdp.rst          |   5 +-
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 315 ++++++++++++++++++++++------
>  2 files changed, 252 insertions(+), 68 deletions(-)
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH RFC 1/1] net/af_xdp: shared UMEM support
  2020-08-11  9:50 ` [dpdk-dev] [PATCH RFC 1/1] " Ciara Loftus
@ 2020-08-25 14:09   ` Ferruh Yigit
  2020-08-26  8:37     ` Loftus, Ciara
  0 siblings, 1 reply; 6+ messages in thread
From: Ferruh Yigit @ 2020-08-25 14:09 UTC (permalink / raw)
  To: Ciara Loftus, dev

On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> A future kernel will introduce the ability to efficiently share a UMEM
> between AF_XDP sockets bound to different queue ids on the same or
> different devices. This patch integrates that functionality into the AF_XDP
> PMD.
> 
> A PMD will attempt to share a UMEM with others if the shared_umem=1 vdev
> arg is set. UMEMs can only be shared across PMDs with the same mempool, up
> to a limited number of PMDs goverened by the size of the given mempool.
> 
> The benefit of sharing UMEM across PMDs is a saving in memory due to not
> having to register the UMEM multiple times.
> 
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>

<...>

> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2019 Intel Corporation.
> + * Copyright(c) 2020 Intel Corporation.
>   */
>  #include <unistd.h>
>  #include <errno.h>
> @@ -15,6 +15,7 @@
>  #include <linux/if_link.h>
>  #include <linux/ethtool.h>
>  #include <linux/sockios.h>
> +#include <linux/version.h>
>  #include "af_xdp_deps.h"
>  #include <bpf/xsk.h>
>  
> @@ -37,6 +38,11 @@
>  #include <rte_mbuf.h>
>  #include <rte_malloc.h>
>  #include <rte_ring.h>
> +#include <rte_spinlock.h>
> +
> +#if KERNEL_VERSION(5, 7, 0) < LINUX_VERSION_CODE
> +#define ETH_AF_XDP_SHARED_UMEM 1
> +#endif

I think better to separate these version checks from the actual code, what do
you think creating a compat.h under 'net/af_xdp' and move above logic there?

<...>

> @@ -888,9 +1048,15 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
>  	cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
>  #endif
>  
> -	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> -			rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
> -			&txq->tx, &cfg);
> +	if (!internals->shared_umem_configured) {
> +		ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> +				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
> +				&txq->tx, &cfg);
> +	} else {
> +		ret = xsk_socket__create_shared(&rxq->xsk, internals->if_name,
> +				rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
> +				&txq->tx, &rxq->fq, &rxq->cq, &cfg);
> +	}

Is the above dependency (ETH_AF_XDP_SHARED_UMEM) for the kernel 'af_xdp' code,
or for 'libbpf.so'?

The 'xsk_socket__create_shared()' API is not available in the latest
'libbpf.so', I wonder if the kernel version check is to align with the correct
'libbpf.so' version.
If not how the dependent version of the 'libbpf.so' managed for DPDK?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support
  2020-08-25 12:14 ` [dpdk-dev] [PATCH RFC 0/1] " Ferruh Yigit
@ 2020-08-26  7:45   ` Loftus, Ciara
  0 siblings, 0 replies; 6+ messages in thread
From: Loftus, Ciara @ 2020-08-26  7:45 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev

> 
> On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> > This RFC integrates shared UMEM support into the AF_XDP PMD. It is
> based on the
> > WIP kernel series [1] by Magnus Karlsson.
> >
> > Detailed information on the shared UMEM feature can be found in the final
> patch
> > in the aforementioned series.
> >
> > Support for the kernel feature can eventually be detected in DPDK by
> querying
> > the LINUX_KERNEL_VERSION. As of now the feature is not yet merged
> upstream, so
> > for this RFC it is assumed the user is using a patched version of v5.8.
> 
> Hi Ciara,
> 
> I don't see this feature in kernel in 5.9.0-rc2, is there any update in the
> kernel side?

Hi Ferruh,

Some performance quirks have been identified with the latest version of the kernel patch.
Once those are resolved we should expect a new revision, hopefully within the 5.9 window.

Thanks,
Ciara

> 
> >
> > Shared UMEM is only available for zero copy mode.
> >
> > In order to share UMEM information between PMDs, the ethdevs wishing
> to share
> > must be tracked somehow. The method chosen to do so is similar to
> methods used
> > in the vHost [2] and vDPA drivers, where pointers to the ethdevs are
> maintained
> > in an internal list. Proposals for alternate solutions are welcome.
> >
> > Performance data to follow with the v1.
> >
> > [1] https://patchwork.ozlabs.org/project/netdev/cover/1595307848-
> 20719-1-git-send-email-magnus.karlsson@intel.com/
> > [2]
> https://git.dpdk.org/dpdk/commit/?id=ee584e9710b9abd60ee9faef664e106
> dcea10085
> >
> > Ciara Loftus (1):
> >   net/af_xdp: shared UMEM support
> >
> >  doc/guides/nics/af_xdp.rst          |   5 +-
> >  drivers/net/af_xdp/rte_eth_af_xdp.c | 315 ++++++++++++++++++++++--
> ----
> >  2 files changed, 252 insertions(+), 68 deletions(-)
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] [PATCH RFC 1/1] net/af_xdp: shared UMEM support
  2020-08-25 14:09   ` Ferruh Yigit
@ 2020-08-26  8:37     ` Loftus, Ciara
  0 siblings, 0 replies; 6+ messages in thread
From: Loftus, Ciara @ 2020-08-26  8:37 UTC (permalink / raw)
  To: Yigit, Ferruh; +Cc: dev


> 
> On 8/11/2020 10:50 AM, Ciara Loftus wrote:
> > A future kernel will introduce the ability to efficiently share a UMEM
> > between AF_XDP sockets bound to different queue ids on the same or
> > different devices. This patch integrates that functionality into the AF_XDP
> > PMD.
> >
> > A PMD will attempt to share a UMEM with others if the shared_umem=1
> vdev
> > arg is set. UMEMs can only be shared across PMDs with the same
> mempool, up
> > to a limited number of PMDs goverened by the size of the given mempool.
> >
> > The benefit of sharing UMEM across PMDs is a saving in memory due to not
> > having to register the UMEM multiple times.
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> 
> <...>
> 
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2019 Intel Corporation.
> > + * Copyright(c) 2020 Intel Corporation.
> >   */
> >  #include <unistd.h>
> >  #include <errno.h>
> > @@ -15,6 +15,7 @@
> >  #include <linux/if_link.h>
> >  #include <linux/ethtool.h>
> >  #include <linux/sockios.h>
> > +#include <linux/version.h>
> >  #include "af_xdp_deps.h"
> >  #include <bpf/xsk.h>
> >
> > @@ -37,6 +38,11 @@
> >  #include <rte_mbuf.h>
> >  #include <rte_malloc.h>
> >  #include <rte_ring.h>
> > +#include <rte_spinlock.h>
> > +
> > +#if KERNEL_VERSION(5, 7, 0) < LINUX_VERSION_CODE
> > +#define ETH_AF_XDP_SHARED_UMEM 1
> > +#endif
> 
> I think better to separate these version checks from the actual code, what do
> you think creating a compat.h under 'net/af_xdp' and move above logic
> there?

Good suggestion. Much cleaner. I'll implement this.

> 
> <...>
> 
> > @@ -888,9 +1048,15 @@ xsk_configure(struct pmd_internals *internals,
> struct pkt_rx_queue *rxq,
> >  	cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
> >  #endif
> >
> > -	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> > -			rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
> > -			&txq->tx, &cfg);
> > +	if (!internals->shared_umem_configured) {
> > +		ret = xsk_socket__create(&rxq->xsk, internals->if_name,
> > +				rxq->xsk_queue_idx, rxq->umem->umem,
> &rxq->rx,
> > +				&txq->tx, &cfg);
> > +	} else {
> > +		ret = xsk_socket__create_shared(&rxq->xsk, internals-
> >if_name,
> > +				rxq->xsk_queue_idx, rxq->umem->umem,
> &rxq->rx,
> > +				&txq->tx, &rxq->fq, &rxq->cq, &cfg);
> > +	}
> 
> Is the above dependency (ETH_AF_XDP_SHARED_UMEM) for the kernel
> 'af_xdp' code,
> or for 'libbpf.so'?
> 
> The 'xsk_socket__create_shared()' API is not available in the latest
> 'libbpf.so', I wonder if the kernel version check is to align with the correct
> 'libbpf.so' version.
> If not how the dependent version of the 'libbpf.so' managed for DPDK?

Good point.
For the RFC I'm assuming the user is using libbpf packaged with the kernel both with appropriate support.
In the next RFC/v1 I'll introduce a check for both the correct libbpf version and underlying kernel support, as both are required.

Thanks,
Ciara


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-26  8:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11  9:50 [dpdk-dev] [PATCH RFC 0/1] net/af_xdp: shared UMEM support Ciara Loftus
2020-08-11  9:50 ` [dpdk-dev] [PATCH RFC 1/1] " Ciara Loftus
2020-08-25 14:09   ` Ferruh Yigit
2020-08-26  8:37     ` Loftus, Ciara
2020-08-25 12:14 ` [dpdk-dev] [PATCH RFC 0/1] " Ferruh Yigit
2020-08-26  7:45   ` Loftus, Ciara

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git