From: Kevin Traynor <ktraynor@redhat.com>
To: Yongseok Koh <yskoh@mellanox.com>
Cc: dpdk stable <stable@dpdk.org>
Subject: [dpdk-stable] patch 'net/mlx4: support externally allocated static memory' has been queued to stable release 18.08.1
Date: Wed, 21 Nov 2018 16:48:03 +0000 [thread overview]
Message-ID: <20181121164828.32249-49-ktraynor@redhat.com> (raw)
In-Reply-To: <20181121164828.32249-1-ktraynor@redhat.com>
Hi,
FYI, your patch has been queued to stable release 18.08.1
Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 11/27/18. So please
shout if anyone has objections.
Also note that after the patch there's a diff of the upstream commit vs the patch applied
to the branch. If the code is different (ie: not only metadata diffs), due for example to
a change in context or macro names, please double check it.
Thanks.
Kevin Traynor
---
>From 95f04411e1af6649ff40d7bbd3ee37a5a34832a6 Mon Sep 17 00:00:00 2001
From: Yongseok Koh <yskoh@mellanox.com>
Date: Mon, 24 Sep 2018 18:36:45 +0000
Subject: [PATCH] net/mlx4: support externally allocated static memory
[ upstream commit 31912d9924039c3a4f58e1bb00f380e5b4c7bd81 ]
When MLX PMD registers memory for DMA, it accesses the global memseg list
of DPDK to maximize the range of registration so that LKey search can be
more efficient. Granularity of MR registration is per page.
Externally allocated memory shouldn't be used for DMA because it can't be
searched in the memseg list and free event can't be tracked by DPDK. If it
is used, the following error will occur:
net_mlx5: port 0 unable to find virtually contiguous chunk for
address (0x5600017587c0). rte_memseg_contig_walk() failed.
There's a pending patchset [1] which enables externally allocated memory.
Once it is merged, users can register their own memory out of EAL then that
will resolve this issue.
Meanwhile, if the external memory is static (allocated on startup and never
freed), such memory can also be registered by little tweak in the code.
[1] http://patches.dpdk.org/project/dpdk/list/?series=1415
This patch is not a bug fix but needs to be included in stable versions.
Fixes: 9797bfcce1c9 ("net/mlx4: add new memory region support")
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
drivers/net/mlx4/mlx4_mr.c | 149 +++++++++++++++++++++++++++++++++++
drivers/net/mlx4/mlx4_rxtx.h | 35 +++++++-
2 files changed, 183 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..bee858643 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -290,4 +290,21 @@ mr_find_next_chunk(struct mlx4_mr *mr, struct mlx4_mr_cache *entry,
uint32_t idx = 0;
+ /* MR for external memory doesn't have memseg list. */
+ if (mr->msl == NULL) {
+ struct ibv_mr *ibv_mr = mr->ibv_mr;
+
+ assert(mr->ms_bmp_n == 1);
+ assert(mr->ms_n == 1);
+ assert(base_idx == 0);
+ /*
+ * Can't search it from memseg list but get it directly from
+ * verbs MR as there's only one chunk.
+ */
+ entry->start = (uintptr_t)ibv_mr->addr;
+ entry->end = (uintptr_t)ibv_mr->addr + mr->ibv_mr->length;
+ entry->lkey = rte_cpu_to_be_32(mr->ibv_mr->lkey);
+ /* Returning 1 ends iteration. */
+ return 1;
+ }
for (idx = base_idx; idx < mr->ms_bmp_n; ++idx) {
if (rte_bitmap_get(mr->ms_bmp, idx)) {
@@ -810,4 +827,5 @@ mlx4_mr_mem_event_free_cb(struct rte_eth_dev *dev, const void *addr, size_t len)
if (mr == NULL)
continue;
+ assert(mr->msl); /* Can't be external memory. */
ms = rte_mem_virt2memseg((void *)start, msl);
assert(ms != NULL);
@@ -1056,4 +1074,131 @@ mlx4_mr_flush_local_cache(struct mlx4_mr_ctrl *mr_ctrl)
}
+/**
+ * Called during rte_mempool_mem_iter() by mlx4_mr_update_ext_mp().
+ *
+ * Externally allocated chunk is registered and a MR is created for the chunk.
+ * The MR object is added to the global list. If memseg list of a MR object
+ * (mr->msl) is null, the MR object can be regarded as externally allocated
+ * memory.
+ *
+ * Once external memory is registered, it should be static. If the memory is
+ * freed and the virtual address range has different physical memory mapped
+ * again, it may cause crash on device due to the wrong translation entry. PMD
+ * can't track the free event of the external memory for now.
+ */
+static void
+mlx4_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque,
+ struct rte_mempool_memhdr *memhdr,
+ unsigned mem_idx __rte_unused)
+{
+ struct mr_update_mp_data *data = opaque;
+ struct rte_eth_dev *dev = data->dev;
+ struct priv *priv = dev->data->dev_private;
+ struct mlx4_mr_ctrl *mr_ctrl = data->mr_ctrl;
+ struct mlx4_mr *mr = NULL;
+ uintptr_t addr = (uintptr_t)memhdr->addr;
+ size_t len = memhdr->len;
+ struct mlx4_mr_cache entry;
+ uint32_t lkey;
+
+ /* If already registered, it should return. */
+ rte_rwlock_read_lock(&priv->mr.rwlock);
+ lkey = mr_lookup_dev(dev, &entry, addr);
+ rte_rwlock_read_unlock(&priv->mr.rwlock);
+ if (lkey != UINT32_MAX)
+ return;
+ mr = rte_zmalloc_socket(NULL,
+ RTE_ALIGN_CEIL(sizeof(*mr),
+ RTE_CACHE_LINE_SIZE),
+ RTE_CACHE_LINE_SIZE, mp->socket_id);
+ if (mr == NULL) {
+ WARN("port %u unable to allocate memory for a new MR of"
+ " mempool (%s).",
+ dev->data->port_id, mp->name);
+ data->ret = -1;
+ return;
+ }
+ DEBUG("port %u register MR for chunk #%d of mempool (%s)",
+ dev->data->port_id, mem_idx, mp->name);
+ mr->ibv_mr = mlx4_glue->reg_mr(priv->pd, (void *)addr, len,
+ IBV_ACCESS_LOCAL_WRITE);
+ if (mr->ibv_mr == NULL) {
+ WARN("port %u fail to create a verbs MR for address (%p)",
+ dev->data->port_id, (void *)addr);
+ rte_free(mr);
+ data->ret = -1;
+ return;
+ }
+ mr->msl = NULL; /* Mark it is external memory. */
+ mr->ms_bmp = NULL;
+ mr->ms_n = 1;
+ mr->ms_bmp_n = 1;
+ rte_rwlock_write_lock(&priv->mr.rwlock);
+ LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr);
+ DEBUG("port %u MR CREATED (%p) for external memory %p:\n"
+ " [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+ " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+ dev->data->port_id, (void *)mr, (void *)addr,
+ addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey),
+ mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+ /* Insert to the global cache table. */
+ mr_insert_dev_cache(dev, mr);
+ rte_rwlock_write_unlock(&priv->mr.rwlock);
+ /* Insert to the local cache table */
+ mlx4_mr_addr2mr_bh(dev, mr_ctrl, addr);
+}
+
+/**
+ * Register MR for entire memory chunks in a Mempool having externally allocated
+ * memory and fill in local cache.
+ *
+ * @param dev
+ * Pointer to Ethernet device.
+ * @param mr_ctrl
+ * Pointer to per-queue MR control structure.
+ * @param mp
+ * Pointer to registering Mempool.
+ *
+ * @return
+ * 0 on success, -1 on failure.
+ */
+static uint32_t
+mlx4_mr_update_ext_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
+ struct rte_mempool *mp)
+{
+ struct mr_update_mp_data data = {
+ .dev = dev,
+ .mr_ctrl = mr_ctrl,
+ .ret = 0,
+ };
+
+ rte_mempool_mem_iter(mp, mlx4_mr_update_ext_mp_cb, &data);
+ return data.ret;
+}
+
+/**
+ * Register MR entire memory chunks in a Mempool having externally allocated
+ * memory and search LKey of the address to return.
+ *
+ * @param dev
+ * Pointer to Ethernet device.
+ * @param addr
+ * Search key.
+ * @param mp
+ * Pointer to registering Mempool where addr belongs.
+ *
+ * @return
+ * LKey for address on success, UINT32_MAX on failure.
+ */
+uint32_t
+mlx4_tx_update_ext_mp(struct txq *txq, uintptr_t addr, struct rte_mempool *mp)
+{
+ struct mlx4_mr_ctrl *mr_ctrl = &txq->mr_ctrl;
+ struct priv *priv = txq->priv;
+
+ mlx4_mr_update_ext_mp(priv->dev, mr_ctrl, mp);
+ return mlx4_tx_addr2mr_bh(txq, addr);
+}
+
/* Called during rte_mempool_mem_iter() by mlx4_mr_update_mp(). */
static void
@@ -1099,4 +1244,8 @@ mlx4_mr_update_mp(struct rte_eth_dev *dev, struct mlx4_mr_ctrl *mr_ctrl,
rte_mempool_mem_iter(mp, mlx4_mr_update_mp_cb, &data);
+ if (data.ret < 0 && rte_errno == ENXIO) {
+ /* Mempool may have externally allocated memory. */
+ return mlx4_mr_update_ext_mp(dev, mr_ctrl, mp);
+ }
return data.ret;
}
diff --git a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h
index ffa8abfca..1be060cda 100644
--- a/drivers/net/mlx4/mlx4_rxtx.h
+++ b/drivers/net/mlx4/mlx4_rxtx.h
@@ -164,4 +164,24 @@ void mlx4_mr_flush_local_cache(struct mlx4_mr_ctrl *mr_ctrl);
uint32_t mlx4_rx_addr2mr_bh(struct rxq *rxq, uintptr_t addr);
uint32_t mlx4_tx_addr2mr_bh(struct txq *txq, uintptr_t addr);
+uint32_t mlx4_tx_update_ext_mp(struct txq *txq, uintptr_t addr,
+ struct rte_mempool *mp);
+
+/**
+ * Get Memory Pool (MP) from mbuf. If mbuf is indirect, the pool from which the
+ * cloned mbuf is allocated is returned instead.
+ *
+ * @param buf
+ * Pointer to mbuf.
+ *
+ * @return
+ * Memory pool where data is located for given mbuf.
+ */
+static struct rte_mempool *
+mlx4_mb2mp(struct rte_mbuf *buf)
+{
+ if (unlikely(RTE_MBUF_INDIRECT(buf)))
+ return rte_mbuf_from_indirect(buf)->pool;
+ return buf->pool;
+}
/**
@@ -223,5 +243,18 @@ mlx4_tx_addr2mr(struct txq *txq, uintptr_t addr)
}
-#define mlx4_tx_mb2mr(rxq, mb) mlx4_tx_addr2mr(rxq, (uintptr_t)((mb)->buf_addr))
+static __rte_always_inline uint32_t
+mlx4_tx_mb2mr(struct txq *txq, struct rte_mbuf *mb)
+{
+ uintptr_t addr = (uintptr_t)mb->buf_addr;
+ uint32_t lkey = mlx4_tx_addr2mr(txq, addr);
+
+ if (likely(lkey != UINT32_MAX))
+ return lkey;
+ if (rte_errno == ENXIO) {
+ /* Mempool may have externally allocated memory. */
+ lkey = mlx4_tx_update_ext_mp(txq, addr, mlx4_mb2mp(mb));
+ }
+ return lkey;
+}
#endif /* MLX4_RXTX_H_ */
--
2.19.0
---
Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- - 2018-11-21 16:44:32.279722373 +0000
+++ 0049-net-mlx4-support-externally-allocated-static-memory.patch 2018-11-21 16:44:30.000000000 +0000
@@ -1,8 +1,10 @@
-From 31912d9924039c3a4f58e1bb00f380e5b4c7bd81 Mon Sep 17 00:00:00 2001
+From 95f04411e1af6649ff40d7bbd3ee37a5a34832a6 Mon Sep 17 00:00:00 2001
From: Yongseok Koh <yskoh@mellanox.com>
Date: Mon, 24 Sep 2018 18:36:45 +0000
Subject: [PATCH] net/mlx4: support externally allocated static memory
+[ upstream commit 31912d9924039c3a4f58e1bb00f380e5b4c7bd81 ]
+
When MLX PMD registers memory for DMA, it accesses the global memseg list
of DPDK to maximize the range of registration so that LKey search can be
more efficient. Granularity of MR registration is per page.
@@ -26,7 +28,6 @@
This patch is not a bug fix but needs to be included in stable versions.
Fixes: 9797bfcce1c9 ("net/mlx4: add new memory region support")
-Cc: stable@dpdk.org
Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
next prev parent reply other threads:[~2018-11-21 16:50 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-21 16:47 [dpdk-stable] patch 'doc: fix eventdev shared library version' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'event/dpaa2: fix mbuf assignment in atomic processing' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'eventdev: fix eth Rx adapter hotplug incompatibility' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: remove eth Rx adapter vdev workaround' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'event/sw: fix cq index check for unlink usecases' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: check burst mode capability' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: fix build for timer adapter' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: fix RSS config for eth Rx " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'app/eventdev: fix minor typos' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: fix eth Rx adapter test for skeleton PMD' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'test/event: fix Rx adapter intr " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'mem: fix undefined behavior in NUMA-aware mapping' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'mem: fix --huge-unlink option' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'igb_uio: fix refcount if open returns error' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e: fix send admin queue command before init' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: fix partition id calculation for X722' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: improve the polling mechanism' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: read LLDP config area with correct endianness' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: properly clean resources' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: gracefully clean the " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/i40e/base: correct global reset timeout calculation' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/ixgbe: wait longer for link after fiber MAC setup' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/enic: do not use non-standard integer types' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/enic: set Rx VLAN offload flag for non-stripped packets' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/enic: explicitly disable overlay offload' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/failsafe: report actual device capabilities' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/nfp: fix RSS' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnx2x: fix logging to include device name' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnx2x: fix to disable further interrupts' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnx2x: fix call to link handling periodic function' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnx2x: fix to add PHY lock' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/enic: fix flow API memory leak' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: get rid of ff pools and use VNIC info array' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: fix uninitialized pointer access in Tx' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: fix MTU setting' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: fix registration of VF async event completion ring' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: set MAC filtering as outer for non tunnel frames' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: set a VNIC as default only once' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: set VLAN strip mode before default VNIC cfg' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: remove excess log messages' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/bnxt: reduce polling interval for valid bit' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'app/testpmd: check Rx VLAN offload flag to print VLAN TCI' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'app/testpmd: fix csum parse-tunnel command invocation' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'doc: fix typos in the flow API guide' " Kevin Traynor
2018-11-21 16:47 ` [dpdk-stable] patch 'net/sfc: receive prepared packets even in Rx exception case' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'mbuf: fix Tx offload mask' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: fix representor port link status' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: fix representor port xstats' " Kevin Traynor
2018-11-21 16:48 ` Kevin Traynor [this message]
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: support externally allocated static memory' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'app/testpmd: fix displaying RSS hash functions' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'doc: clarify L3 Tx checksum prerequisite' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'doc: clarify L4 " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/failsafe: use prefix for function' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: fix errno values for flow engine' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'doc: add VFIO in ENA guide' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'drivers/net: fix log type string' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'app/testpmd: fix printf format in event callback' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'app/testpmd: fix duplicate exit' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: support missing counter in extended statistics' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: add representor specific " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/mlx5: always use representor ifindex for ioctl' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/e1000: do not error out if Rx drop enable is set' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/ifc: fix address translation function name' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/sfc: do not skip RSS configuration step on reconfigure' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/sfc: allow to query RSS key and HF in isolated mode' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/sfc: allow to query RSS key and HF when RSS is disabled' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'eal: use correct data type for bitmap slab operations' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'app/testpmd: fix metering and policing commands' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'examples/ip_pipeline: fix IPv6 endianness' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/softnic: " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'bus/fslmc: fix physical addressing check' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/dpaa2: fix IOVA conversion for congestion memory' " Kevin Traynor
2018-11-21 16:48 ` [dpdk-stable] patch 'net/dpaa2: fix VLAN filter enablement' " Kevin Traynor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181121164828.32249-49-ktraynor@redhat.com \
--to=ktraynor@redhat.com \
--cc=stable@dpdk.org \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).