* [dpdk-dev] [PATCH v2 01/23] mlx4: fix possible crash on scattered mbuf allocation failure
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 02/23] mlx4: add MOFED 3.0 compatibility to interfaces names retrieval Adrien Mazarguil
` (22 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
When failing to allocate a segment, mlx4_rx_burst_sp() may call
rte_pktmbuf_free() on an incomplete scattered mbuf whose next pointer in the
last segment is not set.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 5391b7a..d1166b2 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2370,8 +2370,10 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
DEBUG("rxq=%p, wr_id=%" PRIu64 ":"
" can't allocate a new mbuf",
(void *)rxq, wr_id);
- if (pkt_buf != NULL)
+ if (pkt_buf != NULL) {
+ *pkt_buf_next = NULL;
rte_pktmbuf_free(pkt_buf);
+ }
/* Increase out of memory counters. */
++rxq->stats.rx_nombuf;
++rxq->priv->dev->data->rx_mbuf_alloc_failed;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 02/23] mlx4: add MOFED 3.0 compatibility to interfaces names retrieval
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 01/23] mlx4: fix possible crash on scattered mbuf allocation failure Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 03/23] mlx4: make sure experimental device query function is implemented Adrien Mazarguil
` (21 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
Since Mellanox OFED 3.0 and Linux 3.15, interface port numbers are stored
in dev_port instead of dev_id sysfs files.
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Nitzan Weller <nitzanwe@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 51 +++++++++++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 12 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index d1166b2..ad37e01 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -337,9 +337,11 @@ priv_unlock(struct priv *priv)
static int
priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
{
- int ret = -1;
DIR *dir;
struct dirent *dent;
+ unsigned int dev_type = 0;
+ unsigned int dev_port_prev = ~0u;
+ char match[IF_NAMESIZE] = "";
{
MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
@@ -351,7 +353,7 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
while ((dent = readdir(dir)) != NULL) {
char *name = dent->d_name;
FILE *file;
- unsigned int dev_id;
+ unsigned int dev_port;
int r;
if ((name[0] == '.') &&
@@ -359,22 +361,47 @@ priv_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
((name[1] == '.') && (name[2] == '\0'))))
continue;
- MKSTR(path, "%s/device/net/%s/dev_id",
- priv->ctx->device->ibdev_path, name);
+ MKSTR(path, "%s/device/net/%s/%s",
+ priv->ctx->device->ibdev_path, name,
+ (dev_type ? "dev_id" : "dev_port"));
file = fopen(path, "rb");
- if (file == NULL)
+ if (file == NULL) {
+ if (errno != ENOENT)
+ continue;
+ /*
+ * Switch to dev_id when dev_port does not exist as
+ * is the case with Linux kernel versions < 3.15.
+ */
+try_dev_id:
+ match[0] = '\0';
+ if (dev_type)
+ break;
+ dev_type = 1;
+ dev_port_prev = ~0u;
+ rewinddir(dir);
continue;
- r = fscanf(file, "%x", &dev_id);
- fclose(file);
- if ((r == 1) && (dev_id == (priv->port - 1u))) {
- snprintf(*ifname, sizeof(*ifname), "%s", name);
- ret = 0;
- break;
}
+ r = fscanf(file, (dev_type ? "%x" : "%u"), &dev_port);
+ fclose(file);
+ if (r != 1)
+ continue;
+ /*
+ * Switch to dev_id when dev_port returns the same value for
+ * all ports. May happen when using a MOFED release older than
+ * 3.0 with a Linux kernel >= 3.15.
+ */
+ if (dev_port == dev_port_prev)
+ goto try_dev_id;
+ dev_port_prev = dev_port;
+ if (dev_port == (priv->port - 1u))
+ snprintf(match, sizeof(match), "%s", name);
}
closedir(dir);
- return ret;
+ if (match[0] == '\0')
+ return -1;
+ strncpy(*ifname, match, sizeof(*ifname));
+ return 0;
}
/**
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 03/23] mlx4: make sure experimental device query function is implemented
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 01/23] mlx4: fix possible crash on scattered mbuf allocation failure Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 02/23] mlx4: add MOFED 3.0 compatibility to interfaces names retrieval Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 04/23] mlx4: avoid looking up WR ID to improve RX performance Adrien Mazarguil
` (20 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Olga Shern <olgas@mellanox.com>
HAVE_EXP_QUERY_DEVICE is used to check whether ibv_exp_query_device() can be
used. RSS and inline receive features depend on it.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/Makefile | 4 ++++
drivers/net/mlx4/mlx4.c | 17 ++++++++++-------
2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index 97b364a..ce1f2b0 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -112,6 +112,10 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
SEND_RAW_WR_SUPPORT \
infiniband/verbs.h \
type 'struct ibv_send_wr_raw' $(AUTOCONF_OUTPUT)
+ $Q sh -- '$<' '$@' \
+ HAVE_EXP_QUERY_DEVICE \
+ infiniband/verbs.h \
+ type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
mlx4.o: mlx4_autoconf.h
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ad37e01..bd20569 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4458,17 +4458,18 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
struct ibv_pd *pd = NULL;
struct priv *priv = NULL;
struct rte_eth_dev *eth_dev;
-#if defined(INLINE_RECV) || defined(RSS_SUPPORT)
+#ifdef HAVE_EXP_QUERY_DEVICE
struct ibv_exp_device_attr exp_device_attr;
-#endif
+#endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
union ibv_gid temp_gid;
+#ifdef HAVE_EXP_QUERY_DEVICE
+ exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
#ifdef RSS_SUPPORT
- exp_device_attr.comp_mask =
- (IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS |
- IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ);
+ exp_device_attr.comp_mask |= IBV_EXP_DEVICE_ATTR_RSS_TBL_SZ;
#endif /* RSS_SUPPORT */
+#endif /* HAVE_EXP_QUERY_DEVICE */
DEBUG("using port %u (%08" PRIx32 ")", port, test);
@@ -4513,11 +4514,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
-#ifdef RSS_SUPPORT
+#ifdef HAVE_EXP_QUERY_DEVICE
if (ibv_exp_query_device(ctx, &exp_device_attr)) {
- INFO("experimental ibv_exp_query_device");
+ ERROR("ibv_exp_query_device() failed");
goto port_error;
}
+#ifdef RSS_SUPPORT
if ((exp_device_attr.exp_device_cap_flags &
IBV_EXP_DEVICE_QPG) &&
(exp_device_attr.exp_device_cap_flags &
@@ -4569,6 +4571,7 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
priv->inl_recv_size);
}
#endif /* INLINE_RECV */
+#endif /* HAVE_EXP_QUERY_DEVICE */
(void)mlx4_getenv_int;
priv->vf = vf;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 04/23] mlx4: avoid looking up WR ID to improve RX performance
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (2 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 03/23] mlx4: make sure experimental device query function is implemented Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 05/23] mlx4: merge RX queue setup functions Adrien Mazarguil
` (19 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
This is done by storing the current index in the RX queue structure.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index bd20569..08b1b81 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -200,6 +200,7 @@ struct rxq {
struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
+ unsigned int elts_head; /* Current index in (*elts)[]. */
union {
struct rxq_elt_sp (*sp)[]; /* Scattered RX elements. */
struct rxq_elt (*no_sp)[]; /* RX elements. */
@@ -1640,6 +1641,7 @@ rxq_alloc_elts_sp(struct rxq *rxq, unsigned int elts_n,
DEBUG("%p: allocated and configured %u WRs (%zu segments)",
(void *)rxq, elts_n, (elts_n * elemof((*elts)[0].sges)));
rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
rxq->elts.sp = elts;
assert(ret == 0);
return 0;
@@ -1785,6 +1787,7 @@ rxq_alloc_elts(struct rxq *rxq, unsigned int elts_n, struct rte_mbuf **pool)
DEBUG("%p: allocated and configured %u single-segment WRs",
(void *)rxq, elts_n);
rxq->elts_n = elts_n;
+ rxq->elts_head = 0;
rxq->elts.no_sp = elts;
assert(ret == 0);
return 0;
@@ -2320,6 +2323,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
{
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
@@ -2346,7 +2351,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct ibv_wc *wc = &wcs[i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
- struct rxq_elt_sp *elt = &(*elts)[wr_id];
+ struct rxq_elt_sp *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
struct rte_mbuf **pkt_buf_next = &pkt_buf;
@@ -2354,10 +2359,15 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
unsigned int j = 0;
/* Sanity checks. */
+#ifdef NDEBUG
+ (void)wr_id;
+#endif
assert(wr_id < rxq->elts_n);
assert(wr_id == wr->wr_id);
assert(wr->sg_list == elt->sges);
assert(wr->num_sge == elemof(elt->sges));
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
/* Link completed WRs together for repost. */
*next = wr;
next = &wr->next;
@@ -2468,6 +2478,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
rxq->stats.ibytes += wc->byte_len;
#endif
repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
continue;
}
*next = NULL;
@@ -2485,6 +2497,7 @@ repost:
strerror(i));
abort();
}
+ rxq->elts_head = elts_head;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
rxq->stats.ipackets += ret;
@@ -2514,6 +2527,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
{
struct rxq *rxq = (struct rxq *)dpdk_rxq;
struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
+ const unsigned int elts_n = rxq->elts_n;
+ unsigned int elts_head = rxq->elts_head;
struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
@@ -2538,7 +2553,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct ibv_wc *wc = &wcs[i];
uint64_t wr_id = wc->wr_id;
uint32_t len = wc->byte_len;
- struct rxq_elt *elt = &(*elts)[WR_ID(wr_id).id];
+ struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
WR_ID(wr_id).offset);
@@ -2549,6 +2564,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(wr_id == wr->wr_id);
assert(wr->sg_list == &elt->sge);
assert(wr->num_sge == 1);
+ assert(elts_head < rxq->elts_n);
+ assert(rxq->elts_head < rxq->elts_n);
/* Link completed WRs together for repost. */
*next = wr;
next = &wr->next;
@@ -2609,6 +2626,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
rxq->stats.ibytes += wc->byte_len;
#endif
repost:
+ if (++elts_head >= elts_n)
+ elts_head = 0;
continue;
}
*next = NULL;
@@ -2626,6 +2645,7 @@ repost:
strerror(i));
abort();
}
+ rxq->elts_head = elts_head;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
rxq->stats.ipackets += ret;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 05/23] mlx4: merge RX queue setup functions
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (3 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 04/23] mlx4: avoid looking up WR ID to improve RX performance Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 06/23] mlx4: allow applications to partially use fork() Adrien Mazarguil
` (18 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
Make rxq_setup_qp() handle inline support like rxq_setup_qp_rss() instead of
having two separate functions.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 61 ++++++++-----------------------------------------
1 file changed, 9 insertions(+), 52 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 08b1b81..8be1574 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -2653,10 +2653,9 @@ repost:
return ret;
}
-#ifdef INLINE_RECV
-
/**
- * Allocate a Queue Pair in case inline receive is supported.
+ * Allocate a Queue Pair.
+ * Optionally setup inline receive if supported.
*
* @param priv
* Pointer to private structure.
@@ -2676,7 +2675,6 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
.send_cq = cq,
/* CQ to be associated with the receive queue. */
.recv_cq = cq,
- .max_inl_recv = priv->inl_recv_size,
.cap = {
/* Max number of outstanding WRs. */
.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
@@ -2689,61 +2687,22 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
MLX4_PMD_SGE_WR_N),
},
.qp_type = IBV_QPT_RAW_PACKET,
- .pd = priv->pd
+ .comp_mask = IBV_EXP_QP_INIT_ATTR_PD,
+ .pd = priv->pd,
};
- attr.comp_mask = IBV_EXP_QP_INIT_ATTR_PD;
+#ifdef INLINE_RECV
+ attr.max_inl_recv = priv->inl_recv_size;
attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
-
+#endif
return ibv_exp_create_qp(priv->ctx, &attr);
}
-#else /* INLINE_RECV */
-
-/**
- * Allocate a Queue Pair.
- *
- * @param priv
- * Pointer to private structure.
- * @param cq
- * Completion queue to associate with QP.
- * @param desc
- * Number of descriptors in QP (hint only).
- *
- * @return
- * QP pointer or NULL in case of error.
- */
-static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
-{
- struct ibv_qp_init_attr attr = {
- /* CQ to be associated with the send queue. */
- .send_cq = cq,
- /* CQ to be associated with the receive queue. */
- .recv_cq = cq,
- .cap = {
- /* Max number of outstanding WRs. */
- .max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
- priv->device_attr.max_qp_wr :
- desc),
- /* Max number of scatter/gather elements in a WR. */
- .max_recv_sge = ((priv->device_attr.max_sge <
- MLX4_PMD_SGE_WR_N) ?
- priv->device_attr.max_sge :
- MLX4_PMD_SGE_WR_N),
- },
- .qp_type = IBV_QPT_RAW_PACKET
- };
-
- return ibv_create_qp(priv->pd, &attr);
-}
-
-#endif /* INLINE_RECV */
-
#ifdef RSS_SUPPORT
/**
* Allocate a RSS Queue Pair.
+ * Optionally setup inline receive if supported.
*
* @param priv
* Pointer to private structure.
@@ -2766,9 +2725,6 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
.send_cq = cq,
/* CQ to be associated with the receive queue. */
.recv_cq = cq,
-#ifdef INLINE_RECV
- .max_inl_recv = priv->inl_recv_size,
-#endif
.cap = {
/* Max number of outstanding WRs. */
.max_recv_wr = ((priv->device_attr.max_qp_wr < desc) ?
@@ -2787,6 +2743,7 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
};
#ifdef INLINE_RECV
+ attr.max_inl_recv = priv->inl_recv_size,
attr.comp_mask |= IBV_EXP_QP_INIT_ATTR_INL_RECV;
#endif
if (parent) {
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 06/23] mlx4: allow applications to partially use fork()
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (4 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 05/23] mlx4: merge RX queue setup functions Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 07/23] mlx4: improve accuracy of link status information Adrien Mazarguil
` (17 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Olga Shern <olgas@mellanox.com>
Although using the PMD from a forked process is still unsupported, this
commit makes Verbs safe enough for applications to call fork() for other
purposes.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8be1574..ed68beb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4686,6 +4686,14 @@ rte_mlx4_pmd_init(const char *name, const char *args)
{
(void)name;
(void)args;
+ /*
+ * RDMAV_HUGEPAGES_SAFE tells ibv_fork_init() we intend to use
+ * huge pages. Calling ibv_fork_init() during init allows
+ * applications to use fork() safely for purposes other than
+ * using this PMD, which is not supported in forked processes.
+ */
+ setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+ ibv_fork_init();
rte_eal_pci_register(&mlx4_driver.pci_drv);
return 0;
}
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 07/23] mlx4: improve accuracy of link status information
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (5 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 06/23] mlx4: allow applications to partially use fork() Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 08/23] mlx4: use MOFED 3.0 extended flow steering API Adrien Mazarguil
` (16 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Olga Shern <olgas@mellanox.com>
Query interface properties using the ethtool API instead of Verbs
through ibv_query_port(). The returned information is more accurate for
Ethernet links since several link speeds cannot be mapped to Verbs
semantics.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 44 +++++++++++++++++++++++++-------------------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index ed68beb..02dd894 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -254,7 +254,6 @@ struct priv {
struct rte_eth_dev *dev; /* Ethernet device. */
struct ibv_context *ctx; /* Verbs context. */
struct ibv_device_attr device_attr; /* Device properties. */
- struct ibv_port_attr port_attr; /* Physical port properties. */
struct ibv_pd *pd; /* Protection Domain. */
/*
* MAC addresses array and configuration bit-field.
@@ -3820,29 +3819,37 @@ static int
mlx4_link_update_unlocked(struct rte_eth_dev *dev, int wait_to_complete)
{
struct priv *priv = dev->data->dev_private;
- struct ibv_port_attr port_attr;
- static const uint8_t width_mult[] = {
- /* Multiplier values taken from devinfo.c in libibverbs. */
- 0, 1, 4, 0, 8, 0, 0, 0, 12, 0
+ struct ethtool_cmd edata = {
+ .cmd = ETHTOOL_GSET
};
+ struct ifreq ifr;
+ struct rte_eth_link dev_link;
+ int link_speed = 0;
(void)wait_to_complete;
- errno = ibv_query_port(priv->ctx, priv->port, &port_attr);
- if (errno) {
- WARN("port query failed: %s", strerror(errno));
+ if (priv_ifreq(priv, SIOCGIFFLAGS, &ifr)) {
+ WARN("ioctl(SIOCGIFFLAGS) failed: %s", strerror(errno));
return -1;
}
- dev->data->dev_link = (struct rte_eth_link){
- .link_speed = (ibv_rate_to_mbps(mult_to_ibv_rate
- (port_attr.active_speed)) *
- width_mult[(port_attr.active_width %
- sizeof(width_mult))]),
- .link_duplex = ETH_LINK_FULL_DUPLEX,
- .link_status = (port_attr.state == IBV_PORT_ACTIVE)
- };
- if (memcmp(&port_attr, &priv->port_attr, sizeof(port_attr))) {
+ memset(&dev_link, 0, sizeof(dev_link));
+ dev_link.link_status = ((ifr.ifr_flags & IFF_UP) &&
+ (ifr.ifr_flags & IFF_RUNNING));
+ ifr.ifr_data = &edata;
+ if (priv_ifreq(priv, SIOCETHTOOL, &ifr)) {
+ WARN("ioctl(SIOCETHTOOL, ETHTOOL_GSET) failed: %s",
+ strerror(errno));
+ return -1;
+ }
+ link_speed = ethtool_cmd_speed(&edata);
+ if (link_speed == -1)
+ dev_link.link_speed = 0;
+ else
+ dev_link.link_speed = link_speed;
+ dev_link.link_duplex = ((edata.duplex == DUPLEX_HALF) ?
+ ETH_LINK_HALF_DUPLEX : ETH_LINK_FULL_DUPLEX);
+ if (memcmp(&dev_link, &dev->data->dev_link, sizeof(dev_link))) {
/* Link status changed. */
- priv->port_attr = port_attr;
+ dev->data->dev_link = dev_link;
return 0;
}
/* Link status is still the same. */
@@ -4487,7 +4494,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
priv->ctx = ctx;
priv->device_attr = device_attr;
- priv->port_attr = port_attr;
priv->port = port;
priv->pd = pd;
priv->mtu = ETHER_MTU;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 08/23] mlx4: use MOFED 3.0 extended flow steering API
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (6 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 07/23] mlx4: improve accuracy of link status information Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 09/23] mlx4: fix error message for invalid number of descriptors Adrien Mazarguil
` (15 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <alexr@mellanox.com>
This commit drops "exp" from related function and type names to stop using
the experimental API.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 54 ++++++++++++++++++++++++-------------------------
1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 02dd894..028e455 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -195,9 +195,9 @@ struct rxq {
* may contain several specifications, one per configured VLAN ID.
*/
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
- struct ibv_exp_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
- struct ibv_exp_flow *promisc_flow; /* Promiscuous flow. */
- struct ibv_exp_flow *allmulti_flow; /* Multicast flow. */
+ struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+ struct ibv_flow *promisc_flow; /* Promiscuous flow. */
+ struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
unsigned int elts_n; /* (*elts)[] length. */
unsigned int elts_head; /* Current index in (*elts)[]. */
@@ -1872,7 +1872,7 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
(*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
mac_index);
assert(rxq->mac_flow[mac_index] != NULL);
- claim_zero(ibv_exp_destroy_flow(rxq->mac_flow[mac_index]));
+ claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
}
@@ -1917,7 +1917,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
unsigned int vlans = 0;
unsigned int specs = 0;
unsigned int i, j;
- struct ibv_exp_flow *flow;
+ struct ibv_flow *flow;
assert(mac_index < elemof(priv->mac));
if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
@@ -1929,28 +1929,28 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
specs = (vlans ? vlans : 1);
/* Allocate flow specification on the stack. */
- struct ibv_exp_flow_attr data
+ struct ibv_flow_attr data
[1 +
- (sizeof(struct ibv_exp_flow_spec_eth[specs]) /
- sizeof(struct ibv_exp_flow_attr)) +
- !!(sizeof(struct ibv_exp_flow_spec_eth[specs]) %
- sizeof(struct ibv_exp_flow_attr))];
- struct ibv_exp_flow_attr *attr = (void *)&data[0];
- struct ibv_exp_flow_spec_eth *spec = (void *)&data[1];
+ (sizeof(struct ibv_flow_spec_eth[specs]) /
+ sizeof(struct ibv_flow_attr)) +
+ !!(sizeof(struct ibv_flow_spec_eth[specs]) %
+ sizeof(struct ibv_flow_attr))];
+ struct ibv_flow_attr *attr = (void *)&data[0];
+ struct ibv_flow_spec_eth *spec = (void *)&data[1];
/*
* No padding must be inserted by the compiler between attr and spec.
* This layout is expected by libibverbs.
*/
assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
- *attr = (struct ibv_exp_flow_attr){
- .type = IBV_EXP_FLOW_ATTR_NORMAL,
+ *attr = (struct ibv_flow_attr){
+ .type = IBV_FLOW_ATTR_NORMAL,
.num_of_specs = specs,
.port = priv->port,
.flags = 0
};
- *spec = (struct ibv_exp_flow_spec_eth){
- .type = IBV_EXP_FLOW_SPEC_ETH,
+ *spec = (struct ibv_flow_spec_eth){
+ .type = IBV_FLOW_SPEC_ETH,
.size = sizeof(*spec),
.val = {
.dst_mac = {
@@ -1981,7 +1981,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
vlans);
/* Create related flow. */
errno = 0;
- flow = ibv_exp_create_flow(rxq->qp, attr);
+ flow = ibv_create_flow(rxq->qp, attr);
if (flow == NULL) {
int err = errno;
@@ -2168,9 +2168,9 @@ end:
static int
rxq_allmulticast_enable(struct rxq *rxq)
{
- struct ibv_exp_flow *flow;
- struct ibv_exp_flow_attr attr = {
- .type = IBV_EXP_FLOW_ATTR_MC_DEFAULT,
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_MC_DEFAULT,
.num_of_specs = 0,
.port = rxq->priv->port,
.flags = 0
@@ -2180,7 +2180,7 @@ rxq_allmulticast_enable(struct rxq *rxq)
if (rxq->allmulti_flow != NULL)
return EBUSY;
errno = 0;
- flow = ibv_exp_create_flow(rxq->qp, &attr);
+ flow = ibv_create_flow(rxq->qp, &attr);
if (flow == NULL) {
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -2207,7 +2207,7 @@ rxq_allmulticast_disable(struct rxq *rxq)
DEBUG("%p: disabling allmulticast mode", (void *)rxq);
if (rxq->allmulti_flow == NULL)
return;
- claim_zero(ibv_exp_destroy_flow(rxq->allmulti_flow));
+ claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
rxq->allmulti_flow = NULL;
DEBUG("%p: allmulticast mode disabled", (void *)rxq);
}
@@ -2224,9 +2224,9 @@ rxq_allmulticast_disable(struct rxq *rxq)
static int
rxq_promiscuous_enable(struct rxq *rxq)
{
- struct ibv_exp_flow *flow;
- struct ibv_exp_flow_attr attr = {
- .type = IBV_EXP_FLOW_ATTR_ALL_DEFAULT,
+ struct ibv_flow *flow;
+ struct ibv_flow_attr attr = {
+ .type = IBV_FLOW_ATTR_ALL_DEFAULT,
.num_of_specs = 0,
.port = rxq->priv->port,
.flags = 0
@@ -2238,7 +2238,7 @@ rxq_promiscuous_enable(struct rxq *rxq)
if (rxq->promisc_flow != NULL)
return EBUSY;
errno = 0;
- flow = ibv_exp_create_flow(rxq->qp, &attr);
+ flow = ibv_create_flow(rxq->qp, &attr);
if (flow == NULL) {
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuration failed, errno=%d: %s",
@@ -2267,7 +2267,7 @@ rxq_promiscuous_disable(struct rxq *rxq)
DEBUG("%p: disabling promiscuous mode", (void *)rxq);
if (rxq->promisc_flow == NULL)
return;
- claim_zero(ibv_exp_destroy_flow(rxq->promisc_flow));
+ claim_zero(ibv_destroy_flow(rxq->promisc_flow));
rxq->promisc_flow = NULL;
DEBUG("%p: promiscuous mode disabled", (void *)rxq);
}
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 09/23] mlx4: fix error message for invalid number of descriptors
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (7 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 08/23] mlx4: use MOFED 3.0 extended flow steering API Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 10/23] mlx4: remove provision for flow creation failure in DMFS A0 mode Adrien Mazarguil
` (14 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Or Ami <ora@mellanox.com>
The number of descriptors must be a multiple of MLX4_PMD_SGE_WR_N.
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 028e455..c87facb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1353,7 +1353,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
(void)conf; /* Thresholds configuration (ignored). */
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of TX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
desc /= MLX4_PMD_SGE_WR_N;
@@ -3002,7 +3002,7 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
}
if ((desc == 0) || (desc % MLX4_PMD_SGE_WR_N)) {
ERROR("%p: invalid number of RX descriptors (must be a"
- " multiple of %d)", (void *)dev, desc);
+ " multiple of %d)", (void *)dev, MLX4_PMD_SGE_WR_N);
return EINVAL;
}
/* Get mbuf length. */
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 10/23] mlx4: remove provision for flow creation failure in DMFS A0 mode
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (8 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 09/23] mlx4: fix error message for invalid number of descriptors Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 11/23] mlx4: fix support for multiple VLAN filters Adrien Mazarguil
` (13 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Or Ami <ora@mellanox.com>
Starting from MLNX_OFED 3.0 FW 2.34.5000 when working with optimized
steering mode (-7) QPs can be attached to the port's MAC, therefore no need
for the check.
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index c87facb..8da21cd 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -272,7 +272,6 @@ struct priv {
uint8_t port; /* Physical port number. */
unsigned int started:1; /* Device started, flows enabled. */
unsigned int promisc:1; /* Device in promiscuous mode. */
- unsigned int promisc_ok:1; /* Promiscuous flow is supported. */
unsigned int allmulti:1; /* Device receives all multicast packets. */
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
@@ -1983,25 +1982,6 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
errno = 0;
flow = ibv_create_flow(rxq->qp, attr);
if (flow == NULL) {
- int err = errno;
-
- /* Flow creation failure is not fatal when in DMFS A0 mode.
- * Ignore error if promiscuity is already enabled or can be
- * enabled. */
- if (priv->promisc_ok)
- return 0;
- if ((rxq->promisc_flow != NULL) ||
- (rxq_promiscuous_enable(rxq) == 0)) {
- if (rxq->promisc_flow != NULL)
- rxq_promiscuous_disable(rxq);
- WARN("cannot configure normal flow but promiscuous"
- " mode is fine, assuming promiscuous optimization"
- " is enabled"
- " (options mlx4_core log_num_mgm_entry_size=-7)");
- priv->promisc_ok = 1;
- return 0;
- }
- errno = err;
/* It's not clear whether errno is always set in this case. */
ERROR("%p: flow configuration failed, errno=%d: %s",
(void *)rxq, errno,
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 11/23] mlx4: fix support for multiple VLAN filters
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (9 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 10/23] mlx4: remove provision for flow creation failure in DMFS A0 mode Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 12/23] mlx4: query netdevice to get initial MAC address Adrien Mazarguil
` (12 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Olga Shern <olgas@mellanox.com>
This commit fixes the "Multiple RX VLAN filters can be configured, but only
the first one works" bug. Since a single flow specification cannot contain
several VLAN definitions, the flows table is extended with MLX4_MAX_VLAN_IDS
possible specifications per configured MAC address.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 174 ++++++++++++++++++++++++++++++++----------------
1 file changed, 115 insertions(+), 59 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 8da21cd..37aca55 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -33,8 +33,6 @@
/*
* Known limitations:
- * - Multiple RX VLAN filters can be configured, but only the first one
- * works properly.
* - RSS hash key and options cannot be modified.
* - Hardware counters aren't implemented.
*/
@@ -191,11 +189,10 @@ struct rxq {
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
/*
- * There is exactly one flow configured per MAC address. Each flow
- * may contain several specifications, one per configured VLAN ID.
+ * Each VLAN ID requires a separate flow steering rule.
*/
BITFIELD_DECLARE(mac_configured, uint32_t, MLX4_MAX_MAC_ADDRESSES);
- struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES];
+ struct ibv_flow *mac_flow[MLX4_MAX_MAC_ADDRESSES][MLX4_MAX_VLAN_IDS];
struct ibv_flow *promisc_flow; /* Promiscuous flow. */
struct ibv_flow *allmulti_flow; /* Multicast flow. */
unsigned int port_id; /* Port ID for incoming packets. */
@@ -1843,15 +1840,17 @@ rxq_free_elts(struct rxq *rxq)
}
/**
- * Unregister a MAC address from a RX queue.
+ * Delete flow steering rule.
*
* @param rxq
* Pointer to RX queue structure.
* @param mac_index
* MAC address index.
+ * @param vlan_index
+ * VLAN index.
*/
static void
-rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+rxq_del_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
{
#ifndef NDEBUG
struct priv *priv = rxq->priv;
@@ -1859,20 +1858,43 @@ rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
(const uint8_t (*)[ETHER_ADDR_LEN])
priv->mac[mac_index].addr_bytes;
#endif
+ assert(rxq->mac_flow[mac_index][vlan_index] != NULL);
+ DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
+ " (VLAN ID %" PRIu16 ")",
+ (void *)rxq,
+ (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
+ mac_index, priv->vlan_filter[vlan_index].id);
+ claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]));
+ rxq->mac_flow[mac_index][vlan_index] = NULL;
+}
+
+/**
+ * Unregister a MAC address from a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index.
+ */
+static void
+rxq_mac_addr_del(struct rxq *rxq, unsigned int mac_index)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ unsigned int vlans = 0;
assert(mac_index < elemof(priv->mac));
- if (!BITFIELD_ISSET(rxq->mac_configured, mac_index)) {
- assert(rxq->mac_flow[mac_index] == NULL);
+ if (!BITFIELD_ISSET(rxq->mac_configured, mac_index))
return;
+ for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
+ if (!priv->vlan_filter[i].enabled)
+ continue;
+ rxq_del_flow(rxq, mac_index, i);
+ vlans++;
+ }
+ if (!vlans) {
+ rxq_del_flow(rxq, mac_index, 0);
}
- DEBUG("%p: removing MAC address %02x:%02x:%02x:%02x:%02x:%02x"
- " index %u",
- (void *)rxq,
- (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
- mac_index);
- assert(rxq->mac_flow[mac_index] != NULL);
- claim_zero(ibv_destroy_flow(rxq->mac_flow[mac_index]));
- rxq->mac_flow[mac_index] = NULL;
BITFIELD_RESET(rxq->mac_configured, mac_index);
}
@@ -1896,47 +1918,37 @@ static int rxq_promiscuous_enable(struct rxq *);
static void rxq_promiscuous_disable(struct rxq *);
/**
- * Register a MAC address in a RX queue.
+ * Add single flow steering rule.
*
* @param rxq
* Pointer to RX queue structure.
* @param mac_index
* MAC address index to register.
+ * @param vlan_index
+ * VLAN index. Use -1 for a flow without VLAN.
*
* @return
* 0 on success, errno value on failure.
*/
static int
-rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+rxq_add_flow(struct rxq *rxq, unsigned int mac_index, unsigned int vlan_index)
{
+ struct ibv_flow *flow;
struct priv *priv = rxq->priv;
const uint8_t (*mac)[ETHER_ADDR_LEN] =
- (const uint8_t (*)[ETHER_ADDR_LEN])
- priv->mac[mac_index].addr_bytes;
- unsigned int vlans = 0;
- unsigned int specs = 0;
- unsigned int i, j;
- struct ibv_flow *flow;
-
- assert(mac_index < elemof(priv->mac));
- if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
- rxq_mac_addr_del(rxq, mac_index);
- /* Number of configured VLANs. */
- for (i = 0; (i != elemof(priv->vlan_filter)); ++i)
- if (priv->vlan_filter[i].enabled)
- ++vlans;
- specs = (vlans ? vlans : 1);
+ (const uint8_t (*)[ETHER_ADDR_LEN])
+ priv->mac[mac_index].addr_bytes;
/* Allocate flow specification on the stack. */
- struct ibv_flow_attr data
- [1 +
- (sizeof(struct ibv_flow_spec_eth[specs]) /
- sizeof(struct ibv_flow_attr)) +
- !!(sizeof(struct ibv_flow_spec_eth[specs]) %
- sizeof(struct ibv_flow_attr))];
- struct ibv_flow_attr *attr = (void *)&data[0];
- struct ibv_flow_spec_eth *spec = (void *)&data[1];
+ struct __attribute__((packed)) {
+ struct ibv_flow_attr attr;
+ struct ibv_flow_spec_eth spec;
+ } data;
+ struct ibv_flow_attr *attr = &data.attr;
+ struct ibv_flow_spec_eth *spec = &data.spec;
+ assert(mac_index < elemof(priv->mac));
+ assert((vlan_index < elemof(priv->vlan_filter)) || (vlan_index == -1u));
/*
* No padding must be inserted by the compiler between attr and spec.
* This layout is expected by libibverbs.
@@ -1944,7 +1956,7 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
assert(((uint8_t *)attr + sizeof(*attr)) == (uint8_t *)spec);
*attr = (struct ibv_flow_attr){
.type = IBV_FLOW_ATTR_NORMAL,
- .num_of_specs = specs,
+ .num_of_specs = 1,
.port = priv->port,
.flags = 0
};
@@ -1955,29 +1967,23 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
.dst_mac = {
(*mac)[0], (*mac)[1], (*mac)[2],
(*mac)[3], (*mac)[4], (*mac)[5]
- }
+ },
+ .vlan_tag = ((vlan_index != -1u) ?
+ htons(priv->vlan_filter[vlan_index].id) :
+ 0),
},
.mask = {
.dst_mac = "\xff\xff\xff\xff\xff\xff",
- .vlan_tag = (vlans ? htons(0xfff) : 0)
+ .vlan_tag = ((vlan_index != -1u) ? htons(0xfff) : 0),
}
};
- /* Fill VLAN specifications. */
- for (i = 0, j = 0; (i != elemof(priv->vlan_filter)); ++i) {
- if (!priv->vlan_filter[i].enabled)
- continue;
- assert(j != vlans);
- if (j)
- spec[j] = spec[0];
- spec[j].val.vlan_tag = htons(priv->vlan_filter[i].id);
- ++j;
- }
DEBUG("%p: adding MAC address %02x:%02x:%02x:%02x:%02x:%02x index %u"
- " (%u VLAN(s) configured)",
+ " (VLAN %s %" PRIu16 ")",
(void *)rxq,
(*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
mac_index,
- vlans);
+ ((vlan_index != -1u) ? "ID" : "index"),
+ ((vlan_index != -1u) ? priv->vlan_filter[vlan_index].id : -1u));
/* Create related flow. */
errno = 0;
flow = ibv_create_flow(rxq->qp, attr);
@@ -1990,8 +1996,58 @@ rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
return errno;
return EINVAL;
}
- assert(rxq->mac_flow[mac_index] == NULL);
- rxq->mac_flow[mac_index] = flow;
+ if (vlan_index == -1u)
+ vlan_index = 0;
+ assert(rxq->mac_flow[mac_index][vlan_index] == NULL);
+ rxq->mac_flow[mac_index][vlan_index] = flow;
+ return 0;
+}
+
+/**
+ * Register a MAC address in a RX queue.
+ *
+ * @param rxq
+ * Pointer to RX queue structure.
+ * @param mac_index
+ * MAC address index to register.
+ *
+ * @return
+ * 0 on success, errno value on failure.
+ */
+static int
+rxq_mac_addr_add(struct rxq *rxq, unsigned int mac_index)
+{
+ struct priv *priv = rxq->priv;
+ unsigned int i;
+ unsigned int vlans = 0;
+ int ret;
+
+ assert(mac_index < elemof(priv->mac));
+ if (BITFIELD_ISSET(rxq->mac_configured, mac_index))
+ rxq_mac_addr_del(rxq, mac_index);
+ /* Fill VLAN specifications. */
+ for (i = 0; (i != elemof(priv->vlan_filter)); ++i) {
+ if (!priv->vlan_filter[i].enabled)
+ continue;
+ /* Create related flow. */
+ ret = rxq_add_flow(rxq, mac_index, i);
+ if (!ret) {
+ vlans++;
+ continue;
+ }
+ /* Failure, rollback. */
+ while (i != 0)
+ if (priv->vlan_filter[--i].enabled)
+ rxq_del_flow(rxq, mac_index, i);
+ assert(ret > 0);
+ return ret;
+ }
+ /* In case there is no VLAN filter. */
+ if (!vlans) {
+ ret = rxq_add_flow(rxq, mac_index, -1);
+ if (ret)
+ return ret;
+ }
BITFIELD_SET(rxq->mac_configured, mac_index);
return 0;
}
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 12/23] mlx4: query netdevice to get initial MAC address
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (10 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 11/23] mlx4: fix support for multiple VLAN filters Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 13/23] mlx4: use MOFED 3.0 fast verbs interface for RX operations Adrien Mazarguil
` (11 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev
From: Or Ami <ora@mellanox.com>
Querying the netdevice instead of deriving the port's MAC address from its
GID is less prone to errors. There is no guarantee that the GID will always
contain it nor that the algorithm won't change.
Signed-off-by: Or Ami <ora@mellanox.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 37aca55..cdc679a 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -4305,22 +4305,25 @@ mlx4_ibv_device_to_pci_addr(const struct ibv_device *device,
}
/**
- * Derive MAC address from port GID.
+ * Get MAC address by querying netdevice.
*
+ * @param[in] priv
+ * struct priv for the requested device.
* @param[out] mac
* MAC address output buffer.
- * @param port
- * Physical port number.
- * @param[in] gid
- * Port GID.
+ *
+ * @return
+ * 0 on success, -1 on failure and errno is set.
*/
-static void
-mac_from_gid(uint8_t (*mac)[ETHER_ADDR_LEN], uint32_t port, uint8_t *gid)
+static int
+priv_get_mac(struct priv *priv, uint8_t (*mac)[ETHER_ADDR_LEN])
{
- memcpy(&(*mac)[0], gid + 8, 3);
- memcpy(&(*mac)[3], gid + 13, 3);
- if (port == 1)
- (*mac)[0] ^= 2;
+ struct ifreq request;
+
+ if (priv_ifreq(priv, SIOCGIFHWADDR, &request))
+ return -1;
+ memcpy(mac, request.ifr_hwaddr.sa_data, ETHER_ADDR_LEN);
+ return 0;
}
/* Support up to 32 adapters. */
@@ -4482,7 +4485,6 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
struct ibv_exp_device_attr exp_device_attr;
#endif /* HAVE_EXP_QUERY_DEVICE */
struct ether_addr mac;
- union ibv_gid temp_gid;
#ifdef HAVE_EXP_QUERY_DEVICE
exp_device_attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS;
@@ -4594,12 +4596,12 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
(void)mlx4_getenv_int;
priv->vf = vf;
- if (ibv_query_gid(ctx, port, 0, &temp_gid)) {
- ERROR("ibv_query_gid() failure");
+ /* Configure the first MAC address by default. */
+ if (priv_get_mac(priv, &mac.addr_bytes)) {
+ ERROR("cannot get MAC address, is mlx4_en loaded?"
+ " (errno: %s)", strerror(errno));
goto port_error;
}
- /* Configure the first MAC address by default. */
- mac_from_gid(&mac.addr_bytes, port, temp_gid.raw);
INFO("port %u MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
priv->port,
mac.addr_bytes[0], mac.addr_bytes[1],
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 13/23] mlx4: use MOFED 3.0 fast verbs interface for RX operations
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (11 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 12/23] mlx4: query netdevice to get initial MAC address Adrien Mazarguil
@ 2015-06-30 9:27 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 14/23] mlx4: improve performance by requesting TX completion events less often Adrien Mazarguil
` (10 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:27 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
This commit replaces the CQ polling and QP posting functions
(mlx4_rx_burst() only) with a new low level interface to improve
performance.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Gilad Berman <giladb@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 249 +++++++++++++++++++++++++++++++-----------------
1 file changed, 162 insertions(+), 87 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index cdc679a..1881f5b 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -188,6 +188,8 @@ struct rxq {
struct ibv_mr *mr; /* Memory Region (for mp). */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
/*
* Each VLAN ID requires a separate flow steering rule.
*/
@@ -2319,11 +2321,35 @@ rxq_promiscuous_disable(struct rxq *rxq)
static void
rxq_cleanup(struct rxq *rxq)
{
+ struct ibv_exp_release_intf_params params;
+
DEBUG("cleaning up %p", (void *)rxq);
if (rxq->sp)
rxq_free_elts_sp(rxq);
else
rxq_free_elts(rxq);
+ if (rxq->if_qp != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_qp,
+ ¶ms));
+ }
+ if (rxq->if_cq != NULL) {
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ assert(rxq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(rxq->priv->ctx,
+ rxq->if_cq,
+ ¶ms));
+ }
if (rxq->qp != NULL) {
rxq_promiscuous_disable(rxq);
rxq_allmulticast_disable(rxq);
@@ -2360,34 +2386,23 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct rxq_elt_sp (*elts)[rxq->elts_n] = rxq->elts.sp;
const unsigned int elts_n = rxq->elts_n;
unsigned int elts_head = rxq->elts_head;
- struct ibv_wc wcs[pkts_n];
struct ibv_recv_wr head;
struct ibv_recv_wr **next = &head.next;
struct ibv_recv_wr *bad_wr;
- int ret = 0;
- int wcs_n;
- int i;
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
if (unlikely(!rxq->sp))
return mlx4_rx_burst(dpdk_rxq, pkts, pkts_n);
if (unlikely(elts == NULL)) /* See RTE_DEV_CMD_SET_MTU. */
return 0;
- wcs_n = ibv_poll_cq(rxq->cq, pkts_n, wcs);
- if (unlikely(wcs_n == 0))
- return 0;
- if (unlikely(wcs_n < 0)) {
- DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
- (void *)rxq, wcs_n);
- return 0;
- }
- assert(wcs_n <= (int)pkts_n);
- /* For each work completion. */
- for (i = 0; (i != wcs_n); ++i) {
- struct ibv_wc *wc = &wcs[i];
- uint64_t wr_id = wc->wr_id;
- uint32_t len = wc->byte_len;
+ for (i = 0; (i != pkts_n); ++i) {
struct rxq_elt_sp *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
+ unsigned int pkt_buf_len;
struct rte_mbuf *pkt_buf = NULL; /* Buffer returned in pkts. */
struct rte_mbuf **pkt_buf_next = &pkt_buf;
unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
@@ -2398,26 +2413,51 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
(void)wr_id;
#endif
assert(wr_id < rxq->elts_n);
- assert(wr_id == wr->wr_id);
assert(wr->sg_list == elt->sges);
assert(wr->num_sge == elemof(elt->sges));
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
- /* Link completed WRs together for repost. */
- *next = wr;
- next = &wr->next;
- if (unlikely(wc->status != IBV_WC_SUCCESS)) {
- /* Whatever, just repost the offending WR. */
- DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work completion"
- " status (%d): %s",
- (void *)rxq, wc->wr_id, wc->status,
- ibv_wc_status_str(wc->status));
+ ret = rxq->if_cq->poll_length(rxq->cq, NULL, NULL);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
#ifdef MLX4_PMD_SOFT_COUNTERS
- /* Increase dropped packets counter. */
- ++rxq->stats.idropped;
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
#endif
- goto repost;
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
+ goto repost;
+ }
+ ret = wc.byte_len;
}
+ if (ret == 0)
+ break;
+ len = ret;
+ pkt_buf_len = len;
+ /* Link completed WRs together for repost. */
+ *next = wr;
+ next = &wr->next;
/*
* Replace spent segments with new ones, concatenate and
* return them as pkt_buf.
@@ -2502,42 +2542,43 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(j != 0);
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
- PKT_LEN(pkt_buf) = wc->byte_len;
+ PKT_LEN(pkt_buf) = pkt_buf_len;
pkt_buf->ol_flags = 0;
/* Return packet. */
*(pkts++) = pkt_buf;
- ++ret;
+ ++pkts_ret;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase bytes counter. */
- rxq->stats.ibytes += wc->byte_len;
+ rxq->stats.ibytes += pkt_buf_len;
#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
continue;
}
+ if (unlikely(i == 0))
+ return 0;
*next = NULL;
/* Repost WRs. */
#ifdef DEBUG_RECV
- DEBUG("%p: reposting %d WRs starting from %" PRIu64 " (%p)",
- (void *)rxq, wcs_n, wcs[0].wr_id, (void *)head.next);
+ DEBUG("%p: reposting %d WRs", (void *)rxq, i);
#endif
- i = ibv_post_recv(rxq->qp, head.next, &bad_wr);
- if (unlikely(i)) {
+ ret = ibv_post_recv(rxq->qp, head.next, &bad_wr);
+ if (unlikely(ret)) {
/* Inability to repost WRs is fatal. */
DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
(void *)rxq->priv,
(void *)bad_wr,
- strerror(i));
+ strerror(ret));
abort();
}
rxq->elts_head = elts_head;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
- rxq->stats.ipackets += ret;
+ rxq->stats.ipackets += pkts_ret;
#endif
- return ret;
+ return pkts_ret;
}
/**
@@ -2564,58 +2605,64 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct rxq_elt (*elts)[rxq->elts_n] = rxq->elts.no_sp;
const unsigned int elts_n = rxq->elts_n;
unsigned int elts_head = rxq->elts_head;
- struct ibv_wc wcs[pkts_n];
- struct ibv_recv_wr head;
- struct ibv_recv_wr **next = &head.next;
- struct ibv_recv_wr *bad_wr;
- int ret = 0;
- int wcs_n;
- int i;
+ struct ibv_sge sges[pkts_n];
+ unsigned int i;
+ unsigned int pkts_ret = 0;
+ int ret;
if (unlikely(rxq->sp))
return mlx4_rx_burst_sp(dpdk_rxq, pkts, pkts_n);
- wcs_n = ibv_poll_cq(rxq->cq, pkts_n, wcs);
- if (unlikely(wcs_n == 0))
- return 0;
- if (unlikely(wcs_n < 0)) {
- DEBUG("rxq=%p, ibv_poll_cq() failed (wc_n=%d)",
- (void *)rxq, wcs_n);
- return 0;
- }
- assert(wcs_n <= (int)pkts_n);
- /* For each work completion. */
- for (i = 0; (i != wcs_n); ++i) {
- struct ibv_wc *wc = &wcs[i];
- uint64_t wr_id = wc->wr_id;
- uint32_t len = wc->byte_len;
+ for (i = 0; (i != pkts_n); ++i) {
struct rxq_elt *elt = &(*elts)[elts_head];
struct ibv_recv_wr *wr = &elt->wr;
+ uint64_t wr_id = wr->wr_id;
+ unsigned int len;
struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
WR_ID(wr_id).offset);
struct rte_mbuf *rep;
/* Sanity checks. */
assert(WR_ID(wr_id).id < rxq->elts_n);
- assert(wr_id == wr->wr_id);
assert(wr->sg_list == &elt->sge);
assert(wr->num_sge == 1);
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
- /* Link completed WRs together for repost. */
- *next = wr;
- next = &wr->next;
- if (unlikely(wc->status != IBV_WC_SUCCESS)) {
- /* Whatever, just repost the offending WR. */
- DEBUG("rxq=%p, wr_id=%" PRIu32 ": bad work completion"
- " status (%d): %s",
- (void *)rxq, WR_ID(wr_id).id, wc->status,
- ibv_wc_status_str(wc->status));
+ ret = rxq->if_cq->poll_length(rxq->cq, NULL, NULL);
+ if (unlikely(ret < 0)) {
+ struct ibv_wc wc;
+ int wcs_n;
+
+ DEBUG("rxq=%p, poll_length() failed (ret=%d)",
+ (void *)rxq, ret);
+ /* ibv_poll_cq() must be used in case of failure. */
+ wcs_n = ibv_poll_cq(rxq->cq, 1, &wc);
+ if (unlikely(wcs_n == 0))
+ break;
+ if (unlikely(wcs_n < 0)) {
+ DEBUG("rxq=%p, ibv_poll_cq() failed (wcs_n=%d)",
+ (void *)rxq, wcs_n);
+ break;
+ }
+ assert(wcs_n == 1);
+ if (unlikely(wc.status != IBV_WC_SUCCESS)) {
+ /* Whatever, just repost the offending WR. */
+ DEBUG("rxq=%p, wr_id=%" PRIu64 ": bad work"
+ " completion status (%d): %s",
+ (void *)rxq, wc.wr_id, wc.status,
+ ibv_wc_status_str(wc.status));
#ifdef MLX4_PMD_SOFT_COUNTERS
- /* Increase dropped packets counter. */
- ++rxq->stats.idropped;
+ /* Increment dropped packets counter. */
+ ++rxq->stats.idropped;
#endif
- goto repost;
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+ goto repost;
+ }
+ ret = wc.byte_len;
}
+ if (ret == 0)
+ break;
+ len = ret;
/*
* Fetch initial bytes of packet descriptor into a
* cacheline while allocating rep.
@@ -2644,6 +2691,9 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
(uintptr_t)rep);
assert(WR_ID(wr->wr_id).id == WR_ID(wr_id).id);
+ /* Add SGE to array for repost. */
+ sges[i] = elt->sge;
+
/* Update seg information. */
SET_DATA_OFF(seg, RTE_PKTMBUF_HEADROOM);
NB_SEGS(seg) = 1;
@@ -2655,37 +2705,36 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Return packet. */
*(pkts++) = seg;
- ++ret;
+ ++pkts_ret;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase bytes counter. */
- rxq->stats.ibytes += wc->byte_len;
+ rxq->stats.ibytes += len;
#endif
repost:
if (++elts_head >= elts_n)
elts_head = 0;
continue;
}
- *next = NULL;
+ if (unlikely(i == 0))
+ return 0;
/* Repost WRs. */
#ifdef DEBUG_RECV
- DEBUG("%p: reposting %d WRs starting from %" PRIu32 " (%p)",
- (void *)rxq, wcs_n, WR_ID(wcs[0].wr_id).id, (void *)head.next);
+ DEBUG("%p: reposting %u WRs", (void *)rxq, i);
#endif
- i = ibv_post_recv(rxq->qp, head.next, &bad_wr);
- if (unlikely(i)) {
+ ret = rxq->if_qp->recv_burst(rxq->qp, sges, i);
+ if (unlikely(ret)) {
/* Inability to repost WRs is fatal. */
- DEBUG("%p: ibv_post_recv(): failed for WR %p: %s",
+ DEBUG("%p: recv_burst(): failed (ret=%d)",
(void *)rxq->priv,
- (void *)bad_wr,
- strerror(i));
+ ret);
abort();
}
rxq->elts_head = elts_head;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increase packets counter. */
- rxq->stats.ipackets += ret;
+ rxq->stats.ipackets += pkts_ret;
#endif
- return ret;
+ return pkts_ret;
}
/**
@@ -3019,6 +3068,10 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
.socket = socket
};
struct ibv_exp_qp_attr mod;
+ union {
+ struct ibv_exp_query_intf_params params;
+ } attr;
+ enum ibv_exp_query_intf_status status;
struct ibv_recv_wr *bad_wr;
struct rte_mbuf *buf;
int ret = 0;
@@ -3160,6 +3213,28 @@ skip_alloc:
/* Save port ID. */
tmpl.port_id = dev->data->port_id;
DEBUG("%p: RTE port ID: %u", (void *)rxq, tmpl.port_id);
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
/* Clean up rxq in case we're reinitializing it. */
DEBUG("%p: cleaning-up old rxq just in case", (void *)rxq);
rxq_cleanup(rxq);
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 14/23] mlx4: improve performance by requesting TX completion events less often
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (12 preceding siblings ...)
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 13/23] mlx4: use MOFED 3.0 fast verbs interface for RX operations Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 15/23] mlx4: use MOFED 3.0 fast verbs interface for TX operations Adrien Mazarguil
` (9 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
Instead of requesting a completion event for each TX burst, request it on a
fixed schedule once every MLX4_PMD_TX_PER_COMP_REQ (currently 64) packets to
improve performance.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 54 ++++++++++++++++++++++++++++++++-----------------
drivers/net/mlx4/mlx4.h | 3 +++
2 files changed, 39 insertions(+), 18 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 1881f5b..f76f415 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -243,6 +243,8 @@ struct txq {
unsigned int elts_head; /* Current index in (*elts)[]. */
unsigned int elts_tail; /* First element awaiting completion. */
unsigned int elts_comp; /* Number of completion requests. */
+ unsigned int elts_comp_cd; /* Countdown for next completion request. */
+ unsigned int elts_comp_cd_init; /* Initial value for countdown. */
struct mlx4_txq_stats stats; /* TX queue counters. */
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
@@ -810,6 +812,12 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
txq->elts_head = 0;
txq->elts_tail = 0;
txq->elts_comp = 0;
+ /* Request send completion every MLX4_PMD_TX_PER_COMP_REQ packets or
+ * at least 4 times per ring. */
+ txq->elts_comp_cd_init =
+ ((MLX4_PMD_TX_PER_COMP_REQ < (elts_n / 4)) ?
+ MLX4_PMD_TX_PER_COMP_REQ : (elts_n / 4));
+ txq->elts_comp_cd = txq->elts_comp_cd_init;
txq->elts_linear = elts_linear;
txq->mr_linear = mr_linear;
assert(ret == 0);
@@ -896,9 +904,9 @@ txq_cleanup(struct txq *txq)
* Manage TX completions.
*
* When sending a burst, mlx4_tx_burst() posts several WRs.
- * To improve performance, a completion event is only required for the last of
- * them. Doing so discards completion information for other WRs, but this
- * information would not be used anyway.
+ * To improve performance, a completion event is only required once every
+ * MLX4_PMD_TX_PER_COMP_REQ sends. Doing so discards completion information
+ * for other WRs, but this information would not be used anyway.
*
* @param txq
* Pointer to TX queue structure.
@@ -910,7 +918,7 @@ static int
txq_complete(struct txq *txq)
{
unsigned int elts_comp = txq->elts_comp;
- unsigned int elts_tail;
+ unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
struct ibv_wc wcs[elts_comp];
int wcs_n;
@@ -932,17 +940,12 @@ txq_complete(struct txq *txq)
elts_comp -= wcs_n;
assert(elts_comp <= txq->elts_comp);
/*
- * Work Completion ID contains the associated element index in
- * (*txq->elts)[]. Since WCs are returned in order, we only need to
- * look at the last WC to clear older Work Requests.
- *
* Assume WC status is successful as nothing can be done about it
* anyway.
*/
- elts_tail = WR_ID(wcs[wcs_n - 1].wr_id).id;
- /* Consume the last WC. */
- if (++elts_tail >= elts_n)
- elts_tail = 0;
+ elts_tail += wcs_n * txq->elts_comp_cd_init;
+ if (elts_tail >= elts_n)
+ elts_tail -= elts_n;
txq->elts_tail = elts_tail;
txq->elts_comp = elts_comp;
return 0;
@@ -1062,10 +1065,13 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
unsigned int elts_head = txq->elts_head;
const unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
+ unsigned int elts_comp_cd = txq->elts_comp_cd;
+ unsigned int elts_comp = 0;
unsigned int i;
unsigned int max;
int err;
+ assert(elts_comp_cd != 0);
txq_complete(txq);
max = (elts_n - (elts_head - elts_tail));
if (max > elts_n)
@@ -1243,6 +1249,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
else
#endif
wr->send_flags = 0;
+ /* Request TX completion. */
+ if (unlikely(--elts_comp_cd == 0)) {
+ elts_comp_cd = txq->elts_comp_cd_init;
+ ++elts_comp;
+ wr->send_flags |= IBV_SEND_SIGNALED;
+ }
if (++elts_head >= elts_n)
elts_head = 0;
#ifdef MLX4_PMD_SOFT_COUNTERS
@@ -1259,14 +1271,11 @@ stop:
txq->stats.opackets += i;
#endif
*wr_next = NULL;
- /* The last WR is the only one asking for a completion event. */
- containerof(wr_next, mlx4_send_wr_t, next)->
- send_flags |= IBV_SEND_SIGNALED;
err = mlx4_post_send(txq->qp, head.next, &bad_wr);
if (unlikely(err)) {
unsigned int unsent = 0;
- /* An error occurred, completion event is lost. Fix counters. */
+ /* An error occurred, fix counters. */
while (bad_wr != NULL) {
struct txq_elt *elt =
containerof(bad_wr, struct txq_elt, wr);
@@ -1285,6 +1294,14 @@ stop:
txq->stats.obytes -= wr->sg_list[j].length;
#endif
++unsent;
+ if (wr->send_flags & IBV_SEND_SIGNALED) {
+ assert(elts_comp != 0);
+ --elts_comp;
+ }
+ if (elts_comp_cd == txq->elts_comp_cd_init)
+ elts_comp_cd = 1;
+ else
+ ++elts_comp_cd;
#ifndef NDEBUG
/* For assert(). */
for (j = 0; ((int)j < wr->num_sge); ++j) {
@@ -1310,9 +1327,10 @@ stop:
DEBUG("%p: mlx4_post_send() failed, %u unprocessed WRs: %s",
(void *)txq, unsent,
((err <= -1) ? "Internal error" : strerror(err)));
- } else
- ++txq->elts_comp;
+ }
txq->elts_head = elts_head;
+ txq->elts_comp += elts_comp;
+ txq->elts_comp_cd = elts_comp_cd;
return i;
}
diff --git a/drivers/net/mlx4/mlx4.h b/drivers/net/mlx4/mlx4.h
index 433aa3b..151c34b 100644
--- a/drivers/net/mlx4/mlx4.h
+++ b/drivers/net/mlx4/mlx4.h
@@ -51,6 +51,9 @@
/* Maximum number of simultaneous VLAN filters supported. See above. */
#define MLX4_MAX_VLAN_IDS 127
+/* Request send completion once in every 64 sends, might be less. */
+#define MLX4_PMD_TX_PER_COMP_REQ 64
+
/* Maximum number of Scatter/Gather Elements per Work Request. */
#ifndef MLX4_PMD_SGE_WR_N
#define MLX4_PMD_SGE_WR_N 4
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 15/23] mlx4: use MOFED 3.0 fast verbs interface for TX operations
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (13 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 14/23] mlx4: improve performance by requesting TX completion events less often Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 16/23] mlx4: move scattered TX processing to helper function Adrien Mazarguil
` (8 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
The "raw" post send interface was experimental and has been deprecated. This
commit replaces it with a new low level interface that dissociates post and
flush (doorbell) operations for improved QP performance.
The CQ polling function is updated as well.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/Makefile | 4 --
drivers/net/mlx4/mlx4.c | 167 +++++++++++++++++++++++-----------------------
2 files changed, 85 insertions(+), 86 deletions(-)
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index ce1f2b0..fd74dc8 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -109,10 +109,6 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
infiniband/verbs.h \
enum IBV_EXP_DEVICE_ATTR_INLINE_RECV_SZ $(AUTOCONF_OUTPUT)
$Q sh -- '$<' '$@' \
- SEND_RAW_WR_SUPPORT \
- infiniband/verbs.h \
- type 'struct ibv_send_wr_raw' $(AUTOCONF_OUTPUT)
- $Q sh -- '$<' '$@' \
HAVE_EXP_QUERY_DEVICE \
infiniband/verbs.h \
type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f76f415..3dff64d 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -139,15 +139,6 @@ static inline void wr_id_t_check(void)
(void)wr_id_t_check;
}
-/* If raw send operations are available, use them since they are faster. */
-#ifdef SEND_RAW_WR_SUPPORT
-typedef struct ibv_send_wr_raw mlx4_send_wr_t;
-#define mlx4_post_send ibv_post_send_raw
-#else
-typedef struct ibv_send_wr mlx4_send_wr_t;
-#define mlx4_post_send ibv_post_send
-#endif
-
struct mlx4_rxq_stats {
unsigned int idx; /**< Mapping index. */
#ifdef MLX4_PMD_SOFT_COUNTERS
@@ -212,7 +203,7 @@ struct rxq {
/* TX element. */
struct txq_elt {
- mlx4_send_wr_t wr; /* Work Request. */
+ struct ibv_send_wr wr; /* Work Request. */
struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
/* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
};
@@ -235,6 +226,8 @@ struct txq {
} mp2mr[MLX4_PMD_TX_MP_CACHE]; /* MP to MR translation table. */
struct ibv_cq *cq; /* Completion Queue. */
struct ibv_qp *qp; /* Queue Pair. */
+ struct ibv_exp_qp_burst_family *if_qp; /* QP burst interface. */
+ struct ibv_exp_cq_family *if_cq; /* CQ interface. */
#if MLX4_PMD_MAX_INLINE > 0
uint32_t max_inline; /* Max inline send size <= MLX4_PMD_MAX_INLINE. */
#endif
@@ -797,7 +790,7 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
for (i = 0; (i != elts_n); ++i) {
struct txq_elt *elt = &(*elts)[i];
- mlx4_send_wr_t *wr = &elt->wr;
+ struct ibv_send_wr *wr = &elt->wr;
/* Configure WR. */
WR_ID(wr->wr_id).id = i;
@@ -883,10 +876,33 @@ txq_free_elts(struct txq *txq)
static void
txq_cleanup(struct txq *txq)
{
+ struct ibv_exp_release_intf_params params;
size_t i;
DEBUG("cleaning up %p", (void *)txq);
txq_free_elts(txq);
+ if (txq->if_qp != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->qp != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_qp,
+ ¶ms));
+ }
+ if (txq->if_cq != NULL) {
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ assert(txq->cq != NULL);
+ params = (struct ibv_exp_release_intf_params){
+ .comp_mask = 0,
+ };
+ claim_zero(ibv_exp_release_intf(txq->priv->ctx,
+ txq->if_cq,
+ ¶ms));
+ }
if (txq->qp != NULL)
claim_zero(ibv_destroy_qp(txq->qp));
if (txq->cq != NULL)
@@ -920,7 +936,6 @@ txq_complete(struct txq *txq)
unsigned int elts_comp = txq->elts_comp;
unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
- struct ibv_wc wcs[elts_comp];
int wcs_n;
if (unlikely(elts_comp == 0))
@@ -929,7 +944,7 @@ txq_complete(struct txq *txq)
DEBUG("%p: processing %u work requests completions",
(void *)txq, elts_comp);
#endif
- wcs_n = ibv_poll_cq(txq->cq, elts_comp, wcs);
+ wcs_n = txq->if_cq->poll_cnt(txq->cq, elts_comp);
if (unlikely(wcs_n == 0))
return 0;
if (unlikely(wcs_n < 0)) {
@@ -1059,9 +1074,8 @@ static uint16_t
mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
{
struct txq *txq = (struct txq *)dpdk_txq;
- mlx4_send_wr_t head;
- mlx4_send_wr_t **wr_next = &head.next;
- mlx4_send_wr_t *bad_wr;
+ struct ibv_send_wr head;
+ struct ibv_send_wr **wr_next = &head.next;
unsigned int elts_head = txq->elts_head;
const unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
@@ -1087,13 +1101,14 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
for (i = 0; (i != max); ++i) {
struct rte_mbuf *buf = pkts[i];
struct txq_elt *elt = &(*txq->elts)[elts_head];
- mlx4_send_wr_t *wr = &elt->wr;
+ struct ibv_send_wr *wr = &elt->wr;
unsigned int segs = NB_SEGS(buf);
-#if (MLX4_PMD_MAX_INLINE > 0) || defined(MLX4_PMD_SOFT_COUNTERS)
+#ifdef MLX4_PMD_SOFT_COUNTERS
unsigned int sent_size = 0;
#endif
unsigned int j;
int linearize = 0;
+ uint32_t send_flags = 0;
/* Clean up old buffer. */
if (likely(WR_ID(wr->wr_id).offset != 0)) {
@@ -1179,7 +1194,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
(uintptr_t)sge->addr);
sge->length = DATA_LEN(buf);
sge->lkey = lkey;
-#if (MLX4_PMD_MAX_INLINE > 0) || defined(MLX4_PMD_SOFT_COUNTERS)
+#ifdef MLX4_PMD_SOFT_COUNTERS
sent_size += sge->length;
#endif
buf = NEXT(buf);
@@ -1236,24 +1251,19 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
sge->addr = (uintptr_t)&(*linear)[0];
sge->length = size;
sge->lkey = txq->mr_linear->lkey;
-#if (MLX4_PMD_MAX_INLINE > 0) || defined(MLX4_PMD_SOFT_COUNTERS)
+#ifdef MLX4_PMD_SOFT_COUNTERS
sent_size += size;
#endif
}
/* Link WRs together for ibv_post_send(). */
*wr_next = wr;
wr_next = &wr->next;
-#if MLX4_PMD_MAX_INLINE > 0
- if (sent_size <= txq->max_inline)
- wr->send_flags = IBV_SEND_INLINE;
- else
-#endif
- wr->send_flags = 0;
+ assert(wr->send_flags == 0);
/* Request TX completion. */
if (unlikely(--elts_comp_cd == 0)) {
elts_comp_cd = txq->elts_comp_cd_init;
++elts_comp;
- wr->send_flags |= IBV_SEND_SIGNALED;
+ send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
if (++elts_head >= elts_n)
elts_head = 0;
@@ -1261,6 +1271,24 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
/* Increment sent bytes counter. */
txq->stats.obytes += sent_size;
#endif
+ /* Put SG list into send queue and ask for completion event. */
+#if MLX4_PMD_MAX_INLINE > 0
+ if ((segs == 1) &&
+ (elt->sges[0].length <= txq->max_inline))
+ err = txq->if_qp->send_pending_inline
+ (txq->qp,
+ (void *)(uintptr_t)elt->sges[0].addr,
+ elt->sges[0].length,
+ send_flags);
+ else
+#endif
+ err = txq->if_qp->send_pending_sg_list
+ (txq->qp,
+ elt->sges,
+ segs,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
}
stop:
/* Take a shortcut if nothing must be sent. */
@@ -1271,62 +1299,13 @@ stop:
txq->stats.opackets += i;
#endif
*wr_next = NULL;
- err = mlx4_post_send(txq->qp, head.next, &bad_wr);
+ /* Ring QP doorbell. */
+ err = txq->if_qp->send_flush(txq->qp);
if (unlikely(err)) {
- unsigned int unsent = 0;
-
- /* An error occurred, fix counters. */
- while (bad_wr != NULL) {
- struct txq_elt *elt =
- containerof(bad_wr, struct txq_elt, wr);
- mlx4_send_wr_t *wr = &elt->wr;
- mlx4_send_wr_t *next = wr->next;
-#if defined(MLX4_PMD_SOFT_COUNTERS) || !defined(NDEBUG)
- unsigned int j;
-#endif
-
- assert(wr == bad_wr);
- /* Clean up TX element without freeing it, caller
- * should take care of this. */
- WR_ID(elt->wr.wr_id).offset = 0;
-#ifdef MLX4_PMD_SOFT_COUNTERS
- for (j = 0; ((int)j < wr->num_sge); ++j)
- txq->stats.obytes -= wr->sg_list[j].length;
-#endif
- ++unsent;
- if (wr->send_flags & IBV_SEND_SIGNALED) {
- assert(elts_comp != 0);
- --elts_comp;
- }
- if (elts_comp_cd == txq->elts_comp_cd_init)
- elts_comp_cd = 1;
- else
- ++elts_comp_cd;
-#ifndef NDEBUG
- /* For assert(). */
- for (j = 0; ((int)j < wr->num_sge); ++j) {
- elt->sges[j].addr = 0;
- elt->sges[j].length = 0;
- elt->sges[j].lkey = 0;
- }
- wr->next = NULL;
- wr->num_sge = 0;
-#endif
- bad_wr = next;
- }
-#ifdef MLX4_PMD_SOFT_COUNTERS
- txq->stats.opackets -= unsent;
-#endif
- assert(i >= unsent);
- i -= unsent;
- /* "Unsend" remaining packets. */
- elts_head -= unsent;
- if (elts_head >= elts_n)
- elts_head += elts_n;
- assert(elts_head < elts_n);
- DEBUG("%p: mlx4_post_send() failed, %u unprocessed WRs: %s",
- (void *)txq, unsent,
- ((err <= -1) ? "Internal error" : strerror(err)));
+ /* A nonzero value is not supposed to be returned.
+ * Nothing can be done about it. */
+ DEBUG("%p: send_flush() failed with error %d",
+ (void *)txq, err);
}
txq->elts_head = elts_head;
txq->elts_comp += elts_comp;
@@ -1361,9 +1340,11 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
.socket = socket
};
union {
+ struct ibv_exp_query_intf_params params;
struct ibv_qp_init_attr init;
struct ibv_exp_qp_attr mod;
} attr;
+ enum ibv_exp_query_intf_status status;
int ret = 0;
(void)conf; /* Thresholds configuration (ignored). */
@@ -1455,6 +1436,28 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
(void *)dev, strerror(ret));
goto error;
}
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_CQ,
+ .obj = tmpl.cq,
+ };
+ tmpl.if_cq = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_cq == NULL) {
+ ERROR("%p: CQ interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
+ attr.params = (struct ibv_exp_query_intf_params){
+ .intf_scope = IBV_EXP_INTF_GLOBAL,
+ .intf = IBV_EXP_INTF_QP_BURST,
+ .obj = tmpl.qp,
+ };
+ tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
+ if (tmpl.if_qp == NULL) {
+ ERROR("%p: QP interface family query failed with status %d",
+ (void *)dev, status);
+ goto error;
+ }
/* Clean up txq in case we're reinitializing it. */
DEBUG("%p: cleaning-up old txq just in case", (void *)txq);
txq_cleanup(txq);
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 16/23] mlx4: move scattered TX processing to helper function
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (14 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 15/23] mlx4: use MOFED 3.0 fast verbs interface for TX operations Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 17/23] mlx4: shrink TX queue elements for better performance Adrien Mazarguil
` (7 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
This commit makes scattered TX support entirely optional by moving it to a
separate function that is only available when MLX4_PMD_SGE_WR_N > 1.
Improves performance when scattered support is not needed.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 248 +++++++++++++++++++++++++++++++++---------------
1 file changed, 170 insertions(+), 78 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3dff64d..acf1290 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1025,6 +1025,8 @@ txq_mp2mr(struct txq *txq, struct rte_mempool *mp)
return txq->mp2mr[i].lkey;
}
+#if MLX4_PMD_SGE_WR_N > 1
+
/**
* Copy scattered mbuf contents to a single linear buffer.
*
@@ -1058,6 +1060,146 @@ linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
}
/**
+ * Handle scattered buffers for mlx4_tx_burst().
+ *
+ * @param txq
+ * TX queue structure.
+ * @param segs
+ * Number of segments in buf.
+ * @param elt
+ * TX queue element to fill.
+ * @param[in] buf
+ * Buffer to process.
+ * @param elts_head
+ * Index of the linear buffer to use if necessary (normally txq->elts_head).
+ *
+ * @return
+ * Processed packet size in bytes or (unsigned int)-1 in case of failure.
+ */
+static unsigned int
+tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
+ struct rte_mbuf *buf, unsigned int elts_head)
+{
+ struct ibv_send_wr *wr = &elt->wr;
+ unsigned int sent_size = 0;
+ unsigned int j;
+ int linearize = 0;
+
+ /* When there are too many segments, extra segments are
+ * linearized in the last SGE. */
+ if (unlikely(segs > elemof(elt->sges))) {
+ segs = (elemof(elt->sges) - 1);
+ linearize = 1;
+ }
+ /* Set WR fields. */
+ assert((rte_pktmbuf_mtod(buf, uintptr_t) -
+ (uintptr_t)buf) <= 0xffff);
+ WR_ID(wr->wr_id).offset =
+ (rte_pktmbuf_mtod(buf, uintptr_t) -
+ (uintptr_t)buf);
+ wr->num_sge = segs;
+ /* Register segments as SGEs. */
+ for (j = 0; (j != segs); ++j) {
+ struct ibv_sge *sge = &elt->sges[j];
+ uint32_t lkey;
+
+ /* Retrieve Memory Region key for this memory pool. */
+ lkey = txq_mp2mr(txq, buf->pool);
+ if (unlikely(lkey == (uint32_t)-1)) {
+ /* MR does not exist. */
+ DEBUG("%p: unable to get MP <-> MR association",
+ (void *)txq);
+ /* Clean up TX element. */
+ WR_ID(elt->wr.wr_id).offset = 0;
+#ifndef NDEBUG
+ /* For assert(). */
+ while (j) {
+ --j;
+ --sge;
+ sge->addr = 0;
+ sge->length = 0;
+ sge->lkey = 0;
+ }
+ wr->num_sge = 0;
+#endif
+ goto stop;
+ }
+ /* Sanity checks, only relevant with debugging enabled. */
+ assert(sge->addr == 0);
+ assert(sge->length == 0);
+ assert(sge->lkey == 0);
+ /* Update SGE. */
+ sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ if (txq->priv->vf)
+ rte_prefetch0((volatile void *)
+ (uintptr_t)sge->addr);
+ sge->length = DATA_LEN(buf);
+ sge->lkey = lkey;
+ sent_size += sge->length;
+ buf = NEXT(buf);
+ }
+ /* If buf is not NULL here and is not going to be linearized,
+ * nb_segs is not valid. */
+ assert(j == segs);
+ assert((buf == NULL) || (linearize));
+ /* Linearize extra segments. */
+ if (linearize) {
+ struct ibv_sge *sge = &elt->sges[segs];
+ linear_t *linear = &(*txq->elts_linear)[elts_head];
+ unsigned int size = linearize_mbuf(linear, buf);
+
+ assert(segs == (elemof(elt->sges) - 1));
+ if (size == 0) {
+ /* Invalid packet. */
+ DEBUG("%p: packet too large to be linearized.",
+ (void *)txq);
+ /* Clean up TX element. */
+ WR_ID(elt->wr.wr_id).offset = 0;
+#ifndef NDEBUG
+ /* For assert(). */
+ while (j) {
+ --j;
+ --sge;
+ sge->addr = 0;
+ sge->length = 0;
+ sge->lkey = 0;
+ }
+ wr->num_sge = 0;
+#endif
+ goto stop;
+ }
+ /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately
+ * and clear offset from WR ID. */
+ if (elemof(elt->sges) == 1) {
+ do {
+ struct rte_mbuf *next = NEXT(buf);
+
+ rte_pktmbuf_free_seg(buf);
+ buf = next;
+ } while (buf != NULL);
+ WR_ID(wr->wr_id).offset = 0;
+ }
+ /* Set WR fields and fill SGE with linear buffer. */
+ ++wr->num_sge;
+ /* Sanity checks, only relevant with debugging
+ * enabled. */
+ assert(sge->addr == 0);
+ assert(sge->length == 0);
+ assert(sge->lkey == 0);
+ /* Update SGE. */
+ sge->addr = (uintptr_t)&(*linear)[0];
+ sge->length = size;
+ sge->lkey = txq->mr_linear->lkey;
+ sent_size += size;
+ }
+ return sent_size;
+stop:
+ return -1;
+}
+
+#endif /* MLX4_PMD_SGE_WR_N > 1 */
+
+/**
* DPDK callback for TX.
*
* @param dpdk_txq
@@ -1106,8 +1248,9 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
#ifdef MLX4_PMD_SOFT_COUNTERS
unsigned int sent_size = 0;
#endif
+#ifndef NDEBUG
unsigned int j;
- int linearize = 0;
+#endif
uint32_t send_flags = 0;
/* Clean up old buffer. */
@@ -1143,24 +1286,19 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(wr->sg_list == &elt->sges[0]);
assert(wr->num_sge == 0);
assert(wr->opcode == IBV_WR_SEND);
- /* When there are too many segments, extra segments are
- * linearized in the last SGE. */
- if (unlikely(segs > elemof(elt->sges))) {
- segs = (elemof(elt->sges) - 1);
- linearize = 1;
- }
- /* Set WR fields. */
- assert((rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf) <= 0xffff);
- WR_ID(wr->wr_id).offset =
- (rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf);
- wr->num_sge = segs;
- /* Register segments as SGEs. */
- for (j = 0; (j != segs); ++j) {
- struct ibv_sge *sge = &elt->sges[j];
+ if (likely(segs == 1)) {
+ struct ibv_sge *sge = &elt->sges[0];
uint32_t lkey;
+ /* Set WR fields. */
+ assert((rte_pktmbuf_mtod(buf, uintptr_t) -
+ (uintptr_t)buf) <= 0xffff);
+ WR_ID(wr->wr_id).offset =
+ (rte_pktmbuf_mtod(buf, uintptr_t) -
+ (uintptr_t)buf);
+ wr->num_sge = segs;
+ /* Register segment as SGE. */
+ sge = &elt->sges[0];
/* Retrieve Memory Region key for this memory pool. */
lkey = txq_mp2mr(txq, buf->pool);
if (unlikely(lkey == (uint32_t)-1)) {
@@ -1171,13 +1309,9 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
WR_ID(elt->wr.wr_id).offset = 0;
#ifndef NDEBUG
/* For assert(). */
- while (j) {
- --j;
- --sge;
- sge->addr = 0;
- sge->length = 0;
- sge->lkey = 0;
- }
+ sge->addr = 0;
+ sge->length = 0;
+ sge->lkey = 0;
wr->num_sge = 0;
#endif
goto stop;
@@ -1197,63 +1331,21 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
#ifdef MLX4_PMD_SOFT_COUNTERS
sent_size += sge->length;
#endif
- buf = NEXT(buf);
- }
- /* If buf is not NULL here and is not going to be linearized,
- * nb_segs is not valid. */
- assert(j == segs);
- assert((buf == NULL) || (linearize));
- /* Linearize extra segments. */
- if (linearize) {
- struct ibv_sge *sge = &elt->sges[segs];
- linear_t *linear = &(*txq->elts_linear)[elts_head];
- unsigned int size = linearize_mbuf(linear, buf);
-
- assert(segs == (elemof(elt->sges) - 1));
- if (size == 0) {
- /* Invalid packet. */
- DEBUG("%p: packet too large to be linearized.",
- (void *)txq);
- /* Clean up TX element. */
- WR_ID(elt->wr.wr_id).offset = 0;
-#ifndef NDEBUG
- /* For assert(). */
- while (j) {
- --j;
- --sge;
- sge->addr = 0;
- sge->length = 0;
- sge->lkey = 0;
- }
- wr->num_sge = 0;
-#endif
- goto stop;
- }
- /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately
- * and clear offset from WR ID. */
- if (elemof(elt->sges) == 1) {
- do {
- struct rte_mbuf *next = NEXT(buf);
+ } else {
+#if MLX4_PMD_SGE_WR_N > 1
+ unsigned int ret;
- rte_pktmbuf_free_seg(buf);
- buf = next;
- } while (buf != NULL);
- WR_ID(wr->wr_id).offset = 0;
- }
- /* Set WR fields and fill SGE with linear buffer. */
- ++wr->num_sge;
- /* Sanity checks, only relevant with debugging
- * enabled. */
- assert(sge->addr == 0);
- assert(sge->length == 0);
- assert(sge->lkey == 0);
- /* Update SGE. */
- sge->addr = (uintptr_t)&(*linear)[0];
- sge->length = size;
- sge->lkey = txq->mr_linear->lkey;
+ ret = tx_burst_sg(txq, segs, elt, buf, elts_head);
+ if (ret == (unsigned int)-1)
+ goto stop;
#ifdef MLX4_PMD_SOFT_COUNTERS
- sent_size += size;
+ sent_size += ret;
#endif
+#else /* MLX4_PMD_SGE_WR_N > 1 */
+ DEBUG("%p: TX scattered buffers support not"
+ " compiled in", (void *)txq);
+ goto stop;
+#endif /* MLX4_PMD_SGE_WR_N > 1 */
}
/* Link WRs together for ibv_post_send(). */
*wr_next = wr;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 17/23] mlx4: shrink TX queue elements for better performance
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (15 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 16/23] mlx4: move scattered TX processing to helper function Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 18/23] mlx4: prefetch completed TX mbufs before releasing them Adrien Mazarguil
` (6 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
TX queue elements (struct txq_elt) contain WR and SGE structures required by
ibv_post_send(). This commit replaces them with a single pointer to the
related TX mbuf considering that:
- There is no need to keep these structures around forever since the
hardware doesn't access them after ibv_post_send() and send_pending*()
have returned.
- The TX queue index stored in the WR ID field is not used for completions
anymore since they use a separate counter (elts_comp_cd).
- The WR structure itself was only useful for ibv_post_send(), it is
currently only used to store the mbuf data address and an offset to the
mbuf structure in the WR ID field. send_pending*() callbacks only require
SGEs or buffer pointers.
Therefore for single segment mbufs, send_pending() or send_pending_inline()
can be used directly without involving SGEs. For scattered mbufs, SGEs are
allocated on the stack and passed to send_pending_sg_list().
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 244 +++++++++++++++++-------------------------------
1 file changed, 84 insertions(+), 160 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index acf1290..f251eb4 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -203,9 +203,7 @@ struct rxq {
/* TX element. */
struct txq_elt {
- struct ibv_send_wr wr; /* Work Request. */
- struct ibv_sge sges[MLX4_PMD_SGE_WR_N]; /* Scatter/Gather Elements. */
- /* mbuf pointer is derived from WR_ID(wr.wr_id).offset. */
+ struct rte_mbuf *buf;
};
/* Linear buffer type. It is used when transmitting buffers with too many
@@ -790,14 +788,8 @@ txq_alloc_elts(struct txq *txq, unsigned int elts_n)
}
for (i = 0; (i != elts_n); ++i) {
struct txq_elt *elt = &(*elts)[i];
- struct ibv_send_wr *wr = &elt->wr;
- /* Configure WR. */
- WR_ID(wr->wr_id).id = i;
- WR_ID(wr->wr_id).offset = 0;
- wr->sg_list = &elt->sges[0];
- wr->opcode = IBV_WR_SEND;
- /* Other fields are updated during TX. */
+ elt->buf = NULL;
}
DEBUG("%p: allocated and configured %u WRs", (void *)txq, elts_n);
txq->elts_n = elts_n;
@@ -856,10 +848,9 @@ txq_free_elts(struct txq *txq)
for (i = 0; (i != elemof(*elts)); ++i) {
struct txq_elt *elt = &(*elts)[i];
- if (WR_ID(elt->wr.wr_id).offset == 0)
+ if (elt->buf == NULL)
continue;
- rte_pktmbuf_free((void *)((uintptr_t)elt->sges[0].addr -
- WR_ID(elt->wr.wr_id).offset));
+ rte_pktmbuf_free(elt->buf);
}
rte_free(elts);
}
@@ -1072,35 +1063,37 @@ linearize_mbuf(linear_t *linear, struct rte_mbuf *buf)
* Buffer to process.
* @param elts_head
* Index of the linear buffer to use if necessary (normally txq->elts_head).
+ * @param[out] sges
+ * Array filled with SGEs on success.
*
* @return
- * Processed packet size in bytes or (unsigned int)-1 in case of failure.
+ * A structure containing the processed packet size in bytes and the
+ * number of SGEs. Both fields are set to (unsigned int)-1 in case of
+ * failure.
*/
-static unsigned int
+static struct tx_burst_sg_ret {
+ unsigned int length;
+ unsigned int num;
+}
tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
- struct rte_mbuf *buf, unsigned int elts_head)
+ struct rte_mbuf *buf, unsigned int elts_head,
+ struct ibv_sge (*sges)[MLX4_PMD_SGE_WR_N])
{
- struct ibv_send_wr *wr = &elt->wr;
unsigned int sent_size = 0;
unsigned int j;
int linearize = 0;
/* When there are too many segments, extra segments are
* linearized in the last SGE. */
- if (unlikely(segs > elemof(elt->sges))) {
- segs = (elemof(elt->sges) - 1);
+ if (unlikely(segs > elemof(*sges))) {
+ segs = (elemof(*sges) - 1);
linearize = 1;
}
- /* Set WR fields. */
- assert((rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf) <= 0xffff);
- WR_ID(wr->wr_id).offset =
- (rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf);
- wr->num_sge = segs;
+ /* Update element. */
+ elt->buf = buf;
/* Register segments as SGEs. */
for (j = 0; (j != segs); ++j) {
- struct ibv_sge *sge = &elt->sges[j];
+ struct ibv_sge *sge = &(*sges)[j];
uint32_t lkey;
/* Retrieve Memory Region key for this memory pool. */
@@ -1110,24 +1103,9 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
DEBUG("%p: unable to get MP <-> MR association",
(void *)txq);
/* Clean up TX element. */
- WR_ID(elt->wr.wr_id).offset = 0;
-#ifndef NDEBUG
- /* For assert(). */
- while (j) {
- --j;
- --sge;
- sge->addr = 0;
- sge->length = 0;
- sge->lkey = 0;
- }
- wr->num_sge = 0;
-#endif
+ elt->buf = NULL;
goto stop;
}
- /* Sanity checks, only relevant with debugging enabled. */
- assert(sge->addr == 0);
- assert(sge->length == 0);
- assert(sge->lkey == 0);
/* Update SGE. */
sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
if (txq->priv->vf)
@@ -1144,57 +1122,44 @@ tx_burst_sg(struct txq *txq, unsigned int segs, struct txq_elt *elt,
assert((buf == NULL) || (linearize));
/* Linearize extra segments. */
if (linearize) {
- struct ibv_sge *sge = &elt->sges[segs];
+ struct ibv_sge *sge = &(*sges)[segs];
linear_t *linear = &(*txq->elts_linear)[elts_head];
unsigned int size = linearize_mbuf(linear, buf);
- assert(segs == (elemof(elt->sges) - 1));
+ assert(segs == (elemof(*sges) - 1));
if (size == 0) {
/* Invalid packet. */
DEBUG("%p: packet too large to be linearized.",
(void *)txq);
/* Clean up TX element. */
- WR_ID(elt->wr.wr_id).offset = 0;
-#ifndef NDEBUG
- /* For assert(). */
- while (j) {
- --j;
- --sge;
- sge->addr = 0;
- sge->length = 0;
- sge->lkey = 0;
- }
- wr->num_sge = 0;
-#endif
+ elt->buf = NULL;
goto stop;
}
- /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately
- * and clear offset from WR ID. */
- if (elemof(elt->sges) == 1) {
+ /* If MLX4_PMD_SGE_WR_N is 1, free mbuf immediately. */
+ if (elemof(*sges) == 1) {
do {
struct rte_mbuf *next = NEXT(buf);
rte_pktmbuf_free_seg(buf);
buf = next;
} while (buf != NULL);
- WR_ID(wr->wr_id).offset = 0;
+ elt->buf = NULL;
}
- /* Set WR fields and fill SGE with linear buffer. */
- ++wr->num_sge;
- /* Sanity checks, only relevant with debugging
- * enabled. */
- assert(sge->addr == 0);
- assert(sge->length == 0);
- assert(sge->lkey == 0);
/* Update SGE. */
sge->addr = (uintptr_t)&(*linear)[0];
sge->length = size;
sge->lkey = txq->mr_linear->lkey;
sent_size += size;
}
- return sent_size;
+ return (struct tx_burst_sg_ret){
+ .length = sent_size,
+ .num = segs,
+ };
stop:
- return -1;
+ return (struct tx_burst_sg_ret){
+ .length = -1,
+ .num = -1,
+ };
}
#endif /* MLX4_PMD_SGE_WR_N > 1 */
@@ -1216,8 +1181,6 @@ static uint16_t
mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
{
struct txq *txq = (struct txq *)dpdk_txq;
- struct ibv_send_wr head;
- struct ibv_send_wr **wr_next = &head.next;
unsigned int elts_head = txq->elts_head;
const unsigned int elts_tail = txq->elts_tail;
const unsigned int elts_n = txq->elts_n;
@@ -1243,21 +1206,15 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
for (i = 0; (i != max); ++i) {
struct rte_mbuf *buf = pkts[i];
struct txq_elt *elt = &(*txq->elts)[elts_head];
- struct ibv_send_wr *wr = &elt->wr;
unsigned int segs = NB_SEGS(buf);
#ifdef MLX4_PMD_SOFT_COUNTERS
unsigned int sent_size = 0;
#endif
-#ifndef NDEBUG
- unsigned int j;
-#endif
uint32_t send_flags = 0;
/* Clean up old buffer. */
- if (likely(WR_ID(wr->wr_id).offset != 0)) {
- struct rte_mbuf *tmp = (void *)
- ((uintptr_t)elt->sges[0].addr -
- WR_ID(wr->wr_id).offset);
+ if (likely(elt->buf != NULL)) {
+ struct rte_mbuf *tmp = elt->buf;
/* Faster than rte_pktmbuf_free(). */
do {
@@ -1267,38 +1224,20 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
tmp = next;
} while (tmp != NULL);
}
-#ifndef NDEBUG
- /* For assert(). */
- WR_ID(wr->wr_id).offset = 0;
- for (j = 0; ((int)j < wr->num_sge); ++j) {
- elt->sges[j].addr = 0;
- elt->sges[j].length = 0;
- elt->sges[j].lkey = 0;
+ /* Request TX completion. */
+ if (unlikely(--elts_comp_cd == 0)) {
+ elts_comp_cd = txq->elts_comp_cd_init;
+ ++elts_comp;
+ send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
- wr->next = NULL;
- wr->num_sge = 0;
-#endif
- /* Sanity checks, most of which are only relevant with
- * debugging enabled. */
- assert(WR_ID(wr->wr_id).id == elts_head);
- assert(WR_ID(wr->wr_id).offset == 0);
- assert(wr->next == NULL);
- assert(wr->sg_list == &elt->sges[0]);
- assert(wr->num_sge == 0);
- assert(wr->opcode == IBV_WR_SEND);
if (likely(segs == 1)) {
- struct ibv_sge *sge = &elt->sges[0];
+ uintptr_t addr;
+ uint32_t length;
uint32_t lkey;
- /* Set WR fields. */
- assert((rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf) <= 0xffff);
- WR_ID(wr->wr_id).offset =
- (rte_pktmbuf_mtod(buf, uintptr_t) -
- (uintptr_t)buf);
- wr->num_sge = segs;
- /* Register segment as SGE. */
- sge = &elt->sges[0];
+ /* Retrieve buffer information. */
+ addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ length = DATA_LEN(buf);
/* Retrieve Memory Region key for this memory pool. */
lkey = txq_mp2mr(txq, buf->pool);
if (unlikely(lkey == (uint32_t)-1)) {
@@ -1306,40 +1245,54 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
DEBUG("%p: unable to get MP <-> MR"
" association", (void *)txq);
/* Clean up TX element. */
- WR_ID(elt->wr.wr_id).offset = 0;
-#ifndef NDEBUG
- /* For assert(). */
- sge->addr = 0;
- sge->length = 0;
- sge->lkey = 0;
- wr->num_sge = 0;
-#endif
+ elt->buf = NULL;
goto stop;
}
- /* Sanity checks, only relevant with debugging
- * enabled. */
- assert(sge->addr == 0);
- assert(sge->length == 0);
- assert(sge->lkey == 0);
- /* Update SGE. */
- sge->addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ /* Update element. */
+ elt->buf = buf;
if (txq->priv->vf)
rte_prefetch0((volatile void *)
- (uintptr_t)sge->addr);
- sge->length = DATA_LEN(buf);
- sge->lkey = lkey;
+ (uintptr_t)addr);
+ /* Put packet into send queue. */
+#if MLX4_PMD_MAX_INLINE > 0
+ if (length <= txq->max_inline)
+ err = txq->if_qp->send_pending_inline
+ (txq->qp,
+ (void *)addr,
+ length,
+ send_flags);
+ else
+#endif
+ err = txq->if_qp->send_pending
+ (txq->qp,
+ addr,
+ length,
+ lkey,
+ send_flags);
+ if (unlikely(err))
+ goto stop;
#ifdef MLX4_PMD_SOFT_COUNTERS
- sent_size += sge->length;
+ sent_size += length;
#endif
} else {
#if MLX4_PMD_SGE_WR_N > 1
- unsigned int ret;
+ struct ibv_sge sges[MLX4_PMD_SGE_WR_N];
+ struct tx_burst_sg_ret ret;
- ret = tx_burst_sg(txq, segs, elt, buf, elts_head);
- if (ret == (unsigned int)-1)
+ ret = tx_burst_sg(txq, segs, elt, buf, elts_head,
+ &sges);
+ if (ret.length == (unsigned int)-1)
+ goto stop;
+ /* Put SG list into send queue. */
+ err = txq->if_qp->send_pending_sg_list
+ (txq->qp,
+ sges,
+ ret.num,
+ send_flags);
+ if (unlikely(err))
goto stop;
#ifdef MLX4_PMD_SOFT_COUNTERS
- sent_size += ret;
+ sent_size += ret.length;
#endif
#else /* MLX4_PMD_SGE_WR_N > 1 */
DEBUG("%p: TX scattered buffers support not"
@@ -1347,40 +1300,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
goto stop;
#endif /* MLX4_PMD_SGE_WR_N > 1 */
}
- /* Link WRs together for ibv_post_send(). */
- *wr_next = wr;
- wr_next = &wr->next;
- assert(wr->send_flags == 0);
- /* Request TX completion. */
- if (unlikely(--elts_comp_cd == 0)) {
- elts_comp_cd = txq->elts_comp_cd_init;
- ++elts_comp;
- send_flags |= IBV_EXP_QP_BURST_SIGNALED;
- }
if (++elts_head >= elts_n)
elts_head = 0;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increment sent bytes counter. */
txq->stats.obytes += sent_size;
#endif
- /* Put SG list into send queue and ask for completion event. */
-#if MLX4_PMD_MAX_INLINE > 0
- if ((segs == 1) &&
- (elt->sges[0].length <= txq->max_inline))
- err = txq->if_qp->send_pending_inline
- (txq->qp,
- (void *)(uintptr_t)elt->sges[0].addr,
- elt->sges[0].length,
- send_flags);
- else
-#endif
- err = txq->if_qp->send_pending_sg_list
- (txq->qp,
- elt->sges,
- segs,
- send_flags);
- if (unlikely(err))
- goto stop;
}
stop:
/* Take a shortcut if nothing must be sent. */
@@ -1390,7 +1315,6 @@ stop:
/* Increment sent packets counter. */
txq->stats.opackets += i;
#endif
- *wr_next = NULL;
/* Ring QP doorbell. */
err = txq->if_qp->send_flush(txq->qp);
if (unlikely(err)) {
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 18/23] mlx4: prefetch completed TX mbufs before releasing them
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (16 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 17/23] mlx4: shrink TX queue elements for better performance Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 19/23] mlx4: add L3 and L4 checksum offload support Adrien Mazarguil
` (5 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index f251eb4..52f3fbb 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1205,6 +1205,9 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
max = pkts_n;
for (i = 0; (i != max); ++i) {
struct rte_mbuf *buf = pkts[i];
+ unsigned int elts_head_next =
+ (((elts_head + 1) == elts_n) ? 0 : elts_head + 1);
+ struct txq_elt *elt_next = &(*txq->elts)[elts_head_next];
struct txq_elt *elt = &(*txq->elts)[elts_head];
unsigned int segs = NB_SEGS(buf);
#ifdef MLX4_PMD_SOFT_COUNTERS
@@ -1253,6 +1256,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
if (txq->priv->vf)
rte_prefetch0((volatile void *)
(uintptr_t)addr);
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put packet into send queue. */
#if MLX4_PMD_MAX_INLINE > 0
if (length <= txq->max_inline)
@@ -1283,6 +1287,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
&sges);
if (ret.length == (unsigned int)-1)
goto stop;
+ RTE_MBUF_PREFETCH_TO_FREE(elt_next->buf);
/* Put SG list into send queue. */
err = txq->if_qp->send_pending_sg_list
(txq->qp,
@@ -1300,8 +1305,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
goto stop;
#endif /* MLX4_PMD_SGE_WR_N > 1 */
}
- if (++elts_head >= elts_n)
- elts_head = 0;
+ elts_head = elts_head_next;
#ifdef MLX4_PMD_SOFT_COUNTERS
/* Increment sent bytes counter. */
txq->stats.obytes += sent_size;
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 19/23] mlx4: add L3 and L4 checksum offload support
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (17 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 18/23] mlx4: prefetch completed TX mbufs before releasing them Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 20/23] mlx4: add L2 tunnel (VXLAN) " Adrien Mazarguil
` (4 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev
From: Gilad Berman <giladb@mellanox.com>
Mellanox ConnectX-3 adapters can handle L3 (IPv4) and L4 (TCP, UDP, TCP6,
UDP6) RX checksums validation and TX checksums generation, with and without
802.1Q (VLAN) headers.
Signed-off-by: Gilad Berman <giladb@mellanox.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 78 insertions(+), 4 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 52f3fbb..fa9216f 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -139,6 +139,12 @@ static inline void wr_id_t_check(void)
(void)wr_id_t_check;
}
+/* Transpose flags. Useful to convert IBV to DPDK flags. */
+#define TRANSPOSE(val, from, to) \
+ (((from) >= (to)) ? \
+ (((val) & (from)) / ((from) / (to))) : \
+ (((val) & (from)) * ((to) / (from))))
+
struct mlx4_rxq_stats {
unsigned int idx; /**< Mapping index. */
#ifdef MLX4_PMD_SOFT_COUNTERS
@@ -196,6 +202,7 @@ struct rxq {
struct rxq_elt (*no_sp)[]; /* RX elements. */
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
+ unsigned int csum:1; /* Enable checksum offloading. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -268,6 +275,7 @@ struct priv {
unsigned int hw_qpg:1; /* QP groups are supported. */
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
+ unsigned int hw_csum:1; /* Checksum offload is supported. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
#ifdef INLINE_RECV
@@ -1233,6 +1241,10 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
++elts_comp;
send_flags |= IBV_EXP_QP_BURST_SIGNALED;
}
+ /* Should we enable HW CKSUM offload */
+ if (buf->ol_flags &
+ (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM))
+ send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -2404,6 +2416,36 @@ rxq_cleanup(struct rxq *rxq)
memset(rxq, 0, sizeof(*rxq));
}
+/**
+ * Translate RX completion flags to offload flags.
+ *
+ * @param[in] rxq
+ * Pointer to RX queue structure.
+ * @param flags
+ * RX completion flags returned by poll_length_flags().
+ *
+ * @return
+ * Offload flags (ol_flags) for struct rte_mbuf.
+ */
+static inline uint32_t
+rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
+{
+ uint32_t ol_flags;
+
+ ol_flags =
+ TRANSPOSE(flags, IBV_EXP_CQ_RX_IPV4_PACKET, PKT_RX_IPV4_HDR) |
+ TRANSPOSE(flags, IBV_EXP_CQ_RX_IPV6_PACKET, PKT_RX_IPV6_HDR);
+ if (rxq->csum)
+ ol_flags |=
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
+ return ol_flags;
+}
+
static uint16_t
mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
@@ -2448,6 +2490,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct rte_mbuf **pkt_buf_next = &pkt_buf;
unsigned int seg_headroom = RTE_PKTMBUF_HEADROOM;
unsigned int j = 0;
+ uint32_t flags;
/* Sanity checks. */
#ifdef NDEBUG
@@ -2458,7 +2501,8 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(wr->num_sge == elemof(elt->sges));
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
- ret = rxq->if_cq->poll_length(rxq->cq, NULL, NULL);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
if (unlikely(ret < 0)) {
struct ibv_wc wc;
int wcs_n;
@@ -2584,7 +2628,7 @@ mlx4_rx_burst_sp(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NB_SEGS(pkt_buf) = j;
PORT(pkt_buf) = rxq->port_id;
PKT_LEN(pkt_buf) = pkt_buf_len;
- pkt_buf->ol_flags = 0;
+ pkt_buf->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = pkt_buf;
@@ -2661,6 +2705,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
struct rte_mbuf *seg = (void *)((uintptr_t)elt->sge.addr -
WR_ID(wr_id).offset);
struct rte_mbuf *rep;
+ uint32_t flags;
/* Sanity checks. */
assert(WR_ID(wr_id).id < rxq->elts_n);
@@ -2668,7 +2713,8 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
assert(wr->num_sge == 1);
assert(elts_head < rxq->elts_n);
assert(rxq->elts_head < rxq->elts_n);
- ret = rxq->if_cq->poll_length(rxq->cq, NULL, NULL);
+ ret = rxq->if_cq->poll_length_flags(rxq->cq, NULL, NULL,
+ &flags);
if (unlikely(ret < 0)) {
struct ibv_wc wc;
int wcs_n;
@@ -2742,7 +2788,7 @@ mlx4_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
NEXT(seg) = NULL;
PKT_LEN(seg) = len;
DATA_LEN(seg) = len;
- seg->ol_flags = 0;
+ seg->ol_flags = rxq_cq_to_ol_flags(rxq, flags);
/* Return packet. */
*(pkts++) = seg;
@@ -2925,6 +2971,11 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
/* Number of descriptors and mbufs currently allocated. */
desc_n = (tmpl.elts_n * (tmpl.sp ? MLX4_PMD_SGE_WR_N : 1));
mbuf_n = desc_n;
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum) {
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum = tmpl.csum;
+ }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3146,6 +3197,9 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
rte_pktmbuf_tailroom(buf)) == tmpl.mb_len);
assert(rte_pktmbuf_headroom(buf) == RTE_PKTMBUF_HEADROOM);
rte_pktmbuf_free(buf);
+ /* Toggle RX checksum offload if hardware supports it. */
+ if (priv->hw_csum)
+ tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3643,6 +3697,18 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
info->max_rx_queues = max;
info->max_tx_queues = max;
info->max_mac_addrs = elemof(priv->mac);
+ info->rx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_RX_OFFLOAD_IPV4_CKSUM |
+ DEV_RX_OFFLOAD_UDP_CKSUM |
+ DEV_RX_OFFLOAD_TCP_CKSUM) :
+ 0);
+ info->tx_offload_capa =
+ (priv->hw_csum ?
+ (DEV_TX_OFFLOAD_IPV4_CKSUM |
+ DEV_TX_OFFLOAD_UDP_CKSUM |
+ DEV_TX_OFFLOAD_TCP_CKSUM) :
+ 0);
priv_unlock(priv);
}
@@ -4683,6 +4749,14 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
exp_device_attr.max_rss_tbl_sz);
#endif /* RSS_SUPPORT */
+ priv->hw_csum =
+ ((exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_TCP_UDP_PKT) &&
+ (exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_RX_CSUM_IP_PKT));
+ DEBUG("checksum offloading is %ssupported",
+ (priv->hw_csum ? "" : "not "));
+
#ifdef INLINE_RECV
priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 20/23] mlx4: add L2 tunnel (VXLAN) checksum offload support
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (18 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 19/23] mlx4: add L3 and L4 checksum offload support Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 21/23] mlx4: associate resource domain with CQs and QPs to enhance performance Adrien Mazarguil
` (3 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev
Depending on adapters features and VXLAN support in the kernel, VXLAN frames
can be automatically recognized, in which case checksum validation and
generation occurs on inner and outer L3 and L4.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 48 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index fa9216f..3c72235 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -203,6 +203,7 @@ struct rxq {
} elts;
unsigned int sp:1; /* Use scattered RX elements. */
unsigned int csum:1; /* Enable checksum offloading. */
+ unsigned int csum_l2tun:1; /* Same for L2 tunnels. */
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
@@ -276,6 +277,7 @@ struct priv {
unsigned int hw_tss:1; /* TSS is supported. */
unsigned int hw_rss:1; /* RSS is supported. */
unsigned int hw_csum:1; /* Checksum offload is supported. */
+ unsigned int hw_csum_l2tun:1; /* Same for L2 tunnels. */
unsigned int rss:1; /* RSS is enabled. */
unsigned int vf:1; /* This is a VF device. */
#ifdef INLINE_RECV
@@ -1243,8 +1245,21 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, uint16_t pkts_n)
}
/* Should we enable HW CKSUM offload */
if (buf->ol_flags &
- (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM))
+ (PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | PKT_TX_UDP_CKSUM)) {
send_flags |= IBV_EXP_QP_BURST_IP_CSUM;
+ /* HW does not support checksum offloads at arbitrary
+ * offsets but automatically recognizes the packet
+ * type. For inner L3/L4 checksums, only VXLAN (UDP)
+ * tunnels are currently supported.
+ *
+ * FIXME: since PKT_TX_UDP_TUNNEL_PKT has been removed,
+ * the outer packet type is unknown. All we know is
+ * that the L2 header is of unusual length (not
+ * ETHER_HDR_LEN with or without 802.1Q header). */
+ if ((buf->l2_len != ETHER_HDR_LEN) &&
+ (buf->l2_len != (ETHER_HDR_LEN + 4)))
+ send_flags |= IBV_EXP_QP_BURST_TUNNEL;
+ }
if (likely(segs == 1)) {
uintptr_t addr;
uint32_t length;
@@ -2443,6 +2458,25 @@ rxq_cq_to_ol_flags(const struct rxq *rxq, uint32_t flags)
TRANSPOSE(~flags,
IBV_EXP_CQ_RX_TCP_UDP_CSUM_OK,
PKT_RX_L4_CKSUM_BAD);
+ /*
+ * PKT_RX_IP_CKSUM_BAD and PKT_RX_L4_CKSUM_BAD are used in place
+ * of PKT_RX_EIP_CKSUM_BAD because the latter is not functional
+ * (its value is 0).
+ */
+ if ((flags & IBV_EXP_CQ_RX_TUNNEL_PACKET) && (rxq->csum_l2tun))
+ ol_flags |=
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV4_PACKET,
+ PKT_RX_TUNNEL_IPV4_HDR) |
+ TRANSPOSE(flags,
+ IBV_EXP_CQ_RX_OUTER_IPV6_PACKET,
+ PKT_RX_TUNNEL_IPV6_HDR) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_IP_CSUM_OK,
+ PKT_RX_IP_CKSUM_BAD) |
+ TRANSPOSE(~flags,
+ IBV_EXP_CQ_RX_OUTER_TCP_UDP_CSUM_OK,
+ PKT_RX_L4_CKSUM_BAD);
return ol_flags;
}
@@ -2976,6 +3010,10 @@ rxq_rehash(struct rte_eth_dev *dev, struct rxq *rxq)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
rxq->csum = tmpl.csum;
}
+ if (priv->hw_csum_l2tun) {
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ rxq->csum_l2tun = tmpl.csum_l2tun;
+ }
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -3200,6 +3238,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
/* Toggle RX checksum offload if hardware supports it. */
if (priv->hw_csum)
tmpl.csum = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
+ if (priv->hw_csum_l2tun)
+ tmpl.csum_l2tun = !!dev->data->dev_conf.rxmode.hw_ip_checksum;
/* Enable scattered packets support for this queue if necessary. */
if ((dev->data->dev_conf.rxmode.jumbo_frame) &&
(dev->data->dev_conf.rxmode.max_rx_pkt_len >
@@ -4427,6 +4467,8 @@ static const struct eth_dev_ops mlx4_dev_ops = {
.mac_addr_remove = mlx4_mac_addr_remove,
.mac_addr_add = mlx4_mac_addr_add,
.mtu_set = mlx4_dev_set_mtu,
+ .udp_tunnel_add = NULL,
+ .udp_tunnel_del = NULL,
.fdir_add_signature_filter = NULL,
.fdir_update_signature_filter = NULL,
.fdir_remove_signature_filter = NULL,
@@ -4757,6 +4799,11 @@ mlx4_pci_devinit(struct rte_pci_driver *pci_drv, struct rte_pci_device *pci_dev)
DEBUG("checksum offloading is %ssupported",
(priv->hw_csum ? "" : "not "));
+ priv->hw_csum_l2tun = !!(exp_device_attr.exp_device_cap_flags &
+ IBV_EXP_DEVICE_VXLAN_SUPPORT);
+ DEBUG("L2 tunnel checksum offloads are %ssupported",
+ (priv->hw_csum_l2tun ? "" : "not "));
+
#ifdef INLINE_RECV
priv->inl_recv_size = mlx4_getenv_int("MLX4_INLINE_RECV_SIZE");
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 21/23] mlx4: associate resource domain with CQs and QPs to enhance performance
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (19 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 20/23] mlx4: add L2 tunnel (VXLAN) " Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 22/23] mlx4: disable multicast echo when device is not VF Adrien Mazarguil
` (2 subsequent siblings)
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev; +Cc: Alex Rosenbaum
From: Alex Rosenbaum <Alexr@mellanox.com>
RDs are a new feature of MOFED 3.0 that makes Verbs aware of how CQ and QP
resources are being used for internal performance tuning.
Signed-off-by: Alex Rosenbaum <Alexr@mellanox.com>
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/mlx4.c | 96 ++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 84 insertions(+), 12 deletions(-)
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 3c72235..631ab02 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -207,6 +207,7 @@ struct rxq {
uint32_t mb_len; /* Length of a mp-issued mbuf. */
struct mlx4_rxq_stats stats; /* RX queue counters. */
unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
};
/* TX element. */
@@ -248,6 +249,7 @@ struct txq {
linear_t (*elts_linear)[]; /* Linearized buffers. */
struct ibv_mr *mr_linear; /* Memory Region for linearized buffers. */
unsigned int socket; /* CPU socket ID for allocations. */
+ struct ibv_exp_res_domain *rd; /* Resource Domain. */
};
struct priv {
@@ -908,6 +910,17 @@ txq_cleanup(struct txq *txq)
claim_zero(ibv_destroy_qp(txq->qp));
if (txq->cq != NULL)
claim_zero(ibv_destroy_cq(txq->cq));
+ if (txq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(txq->priv != NULL);
+ assert(txq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(txq->priv->ctx,
+ txq->rd,
+ &attr));
+ }
for (i = 0; (i != elemof(txq->mp2mr)); ++i) {
if (txq->mp2mr[i].mp == NULL)
break;
@@ -1388,7 +1401,9 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
};
union {
struct ibv_exp_query_intf_params params;
- struct ibv_qp_init_attr init;
+ struct ibv_exp_qp_init_attr init;
+ struct ibv_exp_res_domain_init_attr rd;
+ struct ibv_exp_cq_init_attr cq;
struct ibv_exp_qp_attr mod;
} attr;
enum ibv_exp_query_intf_status status;
@@ -1402,7 +1417,24 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
}
desc /= MLX4_PMD_SGE_WR_N;
/* MRs will be registered in mp2mr[] later. */
- tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
if (tmpl.cq == NULL) {
ret = ENOMEM;
ERROR("%p: CQ creation failure: %s",
@@ -1413,7 +1445,7 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
priv->device_attr.max_qp_wr);
DEBUG("priv->device_attr.max_sge is %d",
priv->device_attr.max_sge);
- attr.init = (struct ibv_qp_init_attr){
+ attr.init = (struct ibv_exp_qp_init_attr){
/* CQ to be associated with the send queue. */
.send_cq = tmpl.cq,
/* CQ to be associated with the receive queue. */
@@ -1435,9 +1467,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
.qp_type = IBV_QPT_RAW_PACKET,
/* Do *NOT* enable this, completions events are managed per
* TX burst. */
- .sq_sig_all = 0
+ .sq_sig_all = 0,
+ .pd = priv->pd,
+ .res_domain = tmpl.rd,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
};
- tmpl.qp = ibv_create_qp(priv->pd, &attr.init);
+ tmpl.qp = ibv_exp_create_qp(priv->ctx, &attr.init);
if (tmpl.qp == NULL) {
ret = (errno ? errno : EINVAL);
ERROR("%p: QP creation failure: %s",
@@ -2426,6 +2462,17 @@ rxq_cleanup(struct rxq *rxq)
}
if (rxq->cq != NULL)
claim_zero(ibv_destroy_cq(rxq->cq));
+ if (rxq->rd != NULL) {
+ struct ibv_exp_destroy_res_domain_attr attr = {
+ .comp_mask = 0,
+ };
+
+ assert(rxq->priv != NULL);
+ assert(rxq->priv->ctx != NULL);
+ claim_zero(ibv_exp_destroy_res_domain(rxq->priv->ctx,
+ rxq->rd,
+ &attr));
+ }
if (rxq->mr != NULL)
claim_zero(ibv_dereg_mr(rxq->mr));
memset(rxq, 0, sizeof(*rxq));
@@ -2873,7 +2920,8 @@ repost:
* QP pointer or NULL in case of error.
*/
static struct ibv_qp *
-rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
+rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
+ struct ibv_exp_res_domain *rd)
{
struct ibv_exp_qp_init_attr attr = {
/* CQ to be associated with the send queue. */
@@ -2892,8 +2940,10 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
MLX4_PMD_SGE_WR_N),
},
.qp_type = IBV_QPT_RAW_PACKET,
- .comp_mask = IBV_EXP_QP_INIT_ATTR_PD,
+ .comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN),
.pd = priv->pd,
+ .res_domain = rd,
};
#ifdef INLINE_RECV
@@ -2923,7 +2973,7 @@ rxq_setup_qp(struct priv *priv, struct ibv_cq *cq, uint16_t desc)
*/
static struct ibv_qp *
rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
- int parent)
+ int parent, struct ibv_exp_res_domain *rd)
{
struct ibv_exp_qp_init_attr attr = {
/* CQ to be associated with the send queue. */
@@ -2943,8 +2993,10 @@ rxq_setup_qp_rss(struct priv *priv, struct ibv_cq *cq, uint16_t desc,
},
.qp_type = IBV_QPT_RAW_PACKET,
.comp_mask = (IBV_EXP_QP_INIT_ATTR_PD |
+ IBV_EXP_QP_INIT_ATTR_RES_DOMAIN |
IBV_EXP_QP_INIT_ATTR_QPG),
- .pd = priv->pd
+ .pd = priv->pd,
+ .res_domain = rd,
};
#ifdef INLINE_RECV
@@ -3200,6 +3252,8 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
struct ibv_exp_qp_attr mod;
union {
struct ibv_exp_query_intf_params params;
+ struct ibv_exp_cq_init_attr cq;
+ struct ibv_exp_res_domain_init_attr rd;
} attr;
enum ibv_exp_query_intf_status status;
struct ibv_recv_wr *bad_wr;
@@ -3262,7 +3316,24 @@ rxq_setup(struct rte_eth_dev *dev, struct rxq *rxq, uint16_t desc,
goto error;
}
skip_mr:
- tmpl.cq = ibv_create_cq(priv->ctx, desc, NULL, NULL, 0);
+ attr.rd = (struct ibv_exp_res_domain_init_attr){
+ .comp_mask = (IBV_EXP_RES_DOMAIN_THREAD_MODEL |
+ IBV_EXP_RES_DOMAIN_MSG_MODEL),
+ .thread_model = IBV_EXP_THREAD_SINGLE,
+ .msg_model = IBV_EXP_MSG_HIGH_BW,
+ };
+ tmpl.rd = ibv_exp_create_res_domain(priv->ctx, &attr.rd);
+ if (tmpl.rd == NULL) {
+ ret = ENOMEM;
+ ERROR("%p: RD creation failure: %s",
+ (void *)dev, strerror(ret));
+ goto error;
+ }
+ attr.cq = (struct ibv_exp_cq_init_attr){
+ .comp_mask = IBV_EXP_CQ_INIT_ATTR_RES_DOMAIN,
+ .res_domain = tmpl.rd,
+ };
+ tmpl.cq = ibv_exp_create_cq(priv->ctx, desc, NULL, NULL, 0, &attr.cq);
if (tmpl.cq == NULL) {
ret = ENOMEM;
ERROR("%p: CQ creation failure: %s",
@@ -3275,10 +3346,11 @@ skip_mr:
priv->device_attr.max_sge);
#ifdef RSS_SUPPORT
if (priv->rss)
- tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent);
+ tmpl.qp = rxq_setup_qp_rss(priv, tmpl.cq, desc, parent,
+ tmpl.rd);
else
#endif /* RSS_SUPPORT */
- tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc);
+ tmpl.qp = rxq_setup_qp(priv, tmpl.cq, desc, tmpl.rd);
if (tmpl.qp == NULL) {
ret = (errno ? errno : EINVAL);
ERROR("%p: QP creation failure: %s",
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 22/23] mlx4: disable multicast echo when device is not VF
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (20 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 21/23] mlx4: associate resource domain with CQs and QPs to enhance performance Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 23/23] doc: update mlx4 documentation following MOFED 3.0 changes Adrien Mazarguil
2015-07-01 9:33 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Thomas Monjalon
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev
From: Olga Shern <olgas@mellanox.com>
Multicast loopback must be disabled on PF devices to prevent the adapter
from sending frames back. Required with MOFED 3.0.
Signed-off-by: Olga Shern <olgas@mellanox.com>
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
drivers/net/mlx4/Makefile | 5 +++++
drivers/net/mlx4/mlx4.c | 7 +++++++
2 files changed, 12 insertions(+)
diff --git a/drivers/net/mlx4/Makefile b/drivers/net/mlx4/Makefile
index fd74dc8..725717f 100644
--- a/drivers/net/mlx4/Makefile
+++ b/drivers/net/mlx4/Makefile
@@ -112,6 +112,11 @@ mlx4_autoconf.h: $(RTE_SDK)/scripts/auto-config-h.sh
HAVE_EXP_QUERY_DEVICE \
infiniband/verbs.h \
type 'struct ibv_exp_device_attr' $(AUTOCONF_OUTPUT)
+ $Q sh -- '$<' '$@' \
+ HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
+ infiniband/verbs.h \
+ enum IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK \
+ $(AUTOCONF_OUTPUT)
mlx4.o: mlx4_autoconf.h
diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c
index 631ab02..f4491e7 100644
--- a/drivers/net/mlx4/mlx4.c
+++ b/drivers/net/mlx4/mlx4.c
@@ -1534,6 +1534,13 @@ txq_setup(struct rte_eth_dev *dev, struct txq *txq, uint16_t desc,
.intf_scope = IBV_EXP_INTF_GLOBAL,
.intf = IBV_EXP_INTF_QP_BURST,
.obj = tmpl.qp,
+#ifdef HAVE_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK
+ /* MC loopback must be disabled when not using a VF. */
+ .family_flags =
+ (!priv->vf ?
+ IBV_EXP_QP_BURST_CREATE_DISABLE_ETH_LOOPBACK :
+ 0),
+#endif
};
tmpl.if_qp = ibv_exp_query_intf(priv->ctx, &attr.params, &status);
if (tmpl.if_qp == NULL) {
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* [dpdk-dev] [PATCH v2 23/23] doc: update mlx4 documentation following MOFED 3.0 changes
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (21 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 22/23] mlx4: disable multicast echo when device is not VF Adrien Mazarguil
@ 2015-06-30 9:28 ` Adrien Mazarguil
2015-07-01 9:33 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Thomas Monjalon
23 siblings, 0 replies; 42+ messages in thread
From: Adrien Mazarguil @ 2015-06-30 9:28 UTC (permalink / raw)
To: dev
- Add RX/TX L3/L4 checksum offloading and validation.
- Update kernel module parameters section.
- Update prerequisites for MOFED and firmware versions.
- Remove optimized external libraries section. MOFED now provides enhanced
support directly without having to install modified libraries.
Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
doc/guides/nics/mlx4.rst | 29 ++++++-----------------------
1 file changed, 6 insertions(+), 23 deletions(-)
diff --git a/doc/guides/nics/mlx4.rst b/doc/guides/nics/mlx4.rst
index ac2dd56..c33aa38 100644
--- a/doc/guides/nics/mlx4.rst
+++ b/doc/guides/nics/mlx4.rst
@@ -84,12 +84,13 @@ Features and limitations
- All multicast mode is supported.
- Multiple MAC addresses (unicast, multicast) can be configured.
- Scattered packets are supported for TX and RX.
+- Inner L3/L4 (IP, TCP and UDP) TX/RX checksum offloading and validation.
+- Outer L3 (IP) TX/RX checksum offloading and validation for VXLAN frames.
.. break
- RSS hash key cannot be modified.
- Hardware counters are not implemented (they are software counters).
-- Checksum offloads are not supported yet.
Configuration
-------------
@@ -175,9 +176,8 @@ below.
- **-1**: force device-managed flow steering (DMFS).
- **-7**: configure optimized steering mode to improve performance with the
- following limitation: Ethernet frames with the port MAC address as the
- destination cannot be received, even in promiscuous mode. Additional MAC
- addresses can still be set by ``rte_eth_dev_mac_addr_addr()``.
+ following limitation: VLAN filtering is not supported with this mode.
+ This is the recommended mode in case VLAN filter is not needed.
Prerequisites
-------------
@@ -232,8 +232,8 @@ DPDK and must be installed separately:
Currently supported by DPDK:
-- Mellanox OFED **2.4-1**.
-- Firmware version **2.33.5000** and higher.
+- Mellanox OFED **3.0**.
+- Firmware version **2.34.5000** and higher.
Getting Mellanox OFED
~~~~~~~~~~~~~~~~~~~~~
@@ -255,23 +255,6 @@ required from that distribution.
this DPDK release was developed and tested against is strongly
recommended. Please check the `prerequisites`_.
-Getting libibverbs and libmlx4 from DPDK.org
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Based on Mellanox OFED, optimized libibverbs and libmlx4 versions can be
-optionally downloaded from DPDK.org:
-
-`<http://www.dpdk.org/download/mlx4>`_
-
-Some enhancements are done for better performance with DPDK applications and
-are not merged upstream yet.
-
-Since it is partly achieved by tuning compilation options to disable features
-not needed by DPDK, linking these libraries statically and avoid system-wide
-installation is the preferred method.
-
-Installation documentation is available from the above link.
-
Usage example
-------------
--
2.1.0
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements
2015-06-30 9:27 ` [dpdk-dev] [PATCH v2 00/23] mlx4: MOFED 3.0 support, bugfixes and enhancements Adrien Mazarguil
` (22 preceding siblings ...)
2015-06-30 9:28 ` [dpdk-dev] [PATCH v2 23/23] doc: update mlx4 documentation following MOFED 3.0 changes Adrien Mazarguil
@ 2015-07-01 9:33 ` Thomas Monjalon
23 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2015-07-01 9:33 UTC (permalink / raw)
To: Adrien Mazarguil; +Cc: dev
2015-06-30 11:27, Adrien Mazarguil:
> This patchset adds compatibility with the upcoming Mellanox OFED 3.0
> release (new kernel drivers and userland support libraries), which supports
> new features such as L3/L4 checksum validation offloads and addresses
> several bugs and limitations at the same time.
>
> v2:
> - Bugfix for a possible crash when allocating mbufs.
> - Several API changes following the release of Mellanox OFED 3.0.
> - Performance improvements made possible by the new API.
> - Add TX checksum offloads.
> - Update documentation to reflect the changes.
>
> Adrien Mazarguil (6):
> mlx4: fix possible crash on scattered mbuf allocation failure
> mlx4: add MOFED 3.0 compatibility to interfaces names retrieval
> mlx4: use MOFED 3.0 fast verbs interface for TX operations
> mlx4: move scattered TX processing to helper function
> mlx4: add L2 tunnel (VXLAN) checksum offload support
> doc: update mlx4 documentation following MOFED 3.0 changes
>
> Alex Rosenbaum (8):
> mlx4: avoid looking up WR ID to improve RX performance
> mlx4: merge RX queue setup functions
> mlx4: use MOFED 3.0 extended flow steering API
> mlx4: use MOFED 3.0 fast verbs interface for RX operations
> mlx4: improve performance by requesting TX completion events less
> often
> mlx4: shrink TX queue elements for better performance
> mlx4: prefetch completed TX mbufs before releasing them
> mlx4: associate resource domain with CQs and QPs to enhance
> performance
>
> Gilad Berman (1):
> mlx4: add L3 and L4 checksum offload support
>
> Olga Shern (5):
> mlx4: make sure experimental device query function is implemented
> mlx4: allow applications to partially use fork()
> mlx4: improve accuracy of link status information
> mlx4: fix support for multiple VLAN filters
> mlx4: disable multicast echo when device is not VF
>
> Or Ami (3):
> mlx4: fix error message for invalid number of descriptors
> mlx4: remove provision for flow creation failure in DMFS A0 mode
> mlx4: query netdevice to get initial MAC address
Applied, thanks
^ permalink raw reply [flat|nested] 42+ messages in thread