* [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error
@ 2019-05-30 10:20 Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation Matan Azrad
` (9 more replies)
0 siblings, 10 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev
Add support for data-path Rx and Tx completions with error handling:
1. Detect the error.
2. Do not crash.
3. Report it in statistics counters.
4. Dump debug information to system log file.
5. Recover the error under the hood.
6. Add support for secondary process recovery.
No performance impact was shown.
Matan Azrad (9):
net/mlx5: remove Rx queues indexes correlation
net/mlx5: add log file procedure for debug data
net/mlx5: fix device arguments error detection
net/mlx5: mitigate Rx doorbell memory barrier
net/mlx5: separate Rx queue initialization
net/mlx5: extend Rx completion with error handling
net/mlx5: handle Tx completion with error
net/mlx5: recover secondary process Rx errors
net/mlx5: recover secondary process Tx errors
doc/guides/nics/mlx5.rst | 7 +
drivers/net/mlx5/mlx5.c | 14 +-
drivers/net/mlx5/mlx5.h | 12 +
drivers/net/mlx5/mlx5_mp.c | 46 +++
drivers/net/mlx5/mlx5_prm.h | 11 +
drivers/net/mlx5/mlx5_rxq.c | 42 +--
drivers/net/mlx5/mlx5_rxtx.c | 673 ++++++++++++++++++++++++++++------
drivers/net/mlx5/mlx5_rxtx.h | 193 +++++-----
drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +-
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 36 +-
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 36 +-
drivers/net/mlx5/mlx5_trigger.c | 1 +
drivers/net/mlx5/mlx5_txq.c | 4 +-
13 files changed, 792 insertions(+), 288 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data Matan Azrad
` (8 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
There is a full correlation between the CQE indexes to the WQE indexes
in the vectorized Rx queues management.
When the RQ is inserted to the reset state, the correlation may break
because the HW starts the RQ polling from index 0 while the CQ polling
continues regularly.
As an arrangement to CQE errors handling, when the RQ can be reset,
the correlation dependence should be removed from all the Rx queues
index managments.
Remove the aformentioned dependence from the vectorized Rx burst
functions.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_rxq.c | 1 +
drivers/net/mlx5/mlx5_rxtx.h | 6 +++++-
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 26 +++++++++++++-------------
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++++-------------
4 files changed, 32 insertions(+), 27 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index a00cb12..b248f38 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1006,6 +1006,7 @@ struct mlx5_rxq_ibv *
rxq_data->cq_uar = cq_info.cq_uar;
rxq_data->cqn = cq_info.cqn;
rxq_data->cq_arm_sn = 0;
+ rxq_data->decompressed = 0;
/* Update doorbell counter. */
rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
rte_wmb();
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 4339aaf..7bacdba 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -101,11 +101,15 @@ struct mlx5_rxq_data {
uint32_t rq_pi;
uint32_t cq_ci;
uint16_t rq_repl_thresh; /* Threshold for buffer replenishment. */
+ union {
+ struct rxq_zip zip; /* Compressed context. */
+ uint16_t decompressed;
+ /* Number of ready mbufs decompressed from the CQ. */
+ };
struct mlx5_mr_ctrl mr_ctrl; /* MR control descriptor. */
uint16_t mprq_max_memcpy_len; /* Maximum size of packet to memcpy. */
volatile void *wqes;
volatile struct mlx5_cqe(*cqes)[];
- struct rxq_zip zip; /* Compressed context. */
RTE_STD_C11
union {
struct rte_mbuf *(*elts)[];
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 38e915c..6a1b2bb 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -352,8 +352,11 @@
* @param elts
* Pointer to SW ring to be filled. The first mbuf has to be pre-built from
* the title completion descriptor to be copied to the rest of mbufs.
+ *
+ * @return
+ * Number of mini-CQEs successfully decompressed.
*/
-static inline void
+static inline uint16_t
rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
struct rte_mbuf **elts)
{
@@ -505,6 +508,7 @@
rxq->stats.ibytes += rcvd_byte;
#endif
rxq->cq_ci += mcqe_n;
+ return mcqe_n;
}
/**
@@ -729,24 +733,17 @@
rte_prefetch_non_temporal(cq + 2);
rte_prefetch_non_temporal(cq + 3);
pkts_n = RTE_MIN(pkts_n, MLX5_VPMD_RX_MAX_BURST);
- /*
- * Order of indexes:
- * rq_ci >= cq_ci >= rq_pi
- * Definition of indexes:
- * rq_ci - cq_ci := # of buffers owned by HW (posted).
- * cq_ci - rq_pi := # of buffers not returned to app (decompressed).
- * N - (rq_ci - rq_pi) := # of buffers consumed (to be replenished).
- */
repl_n = q_n - (rxq->rq_ci - rxq->rq_pi);
if (repl_n >= rxq->rq_repl_thresh)
mlx5_rx_replenish_bulk_mbuf(rxq, repl_n);
/* See if there're unreturned mbufs from compressed CQE. */
- rcvd_pkt = rxq->cq_ci - rxq->rq_pi;
+ rcvd_pkt = rxq->decompressed;
if (rcvd_pkt > 0) {
rcvd_pkt = RTE_MIN(rcvd_pkt, pkts_n);
rxq_copy_mbuf_v(rxq, pkts, rcvd_pkt);
rxq->rq_pi += rcvd_pkt;
pkts += rcvd_pkt;
+ rxq->decompressed -= rcvd_pkt;
}
elts_idx = rxq->rq_pi & q_mask;
elts = &(*rxq->elts)[elts_idx];
@@ -754,10 +751,11 @@
pkts_n = RTE_ALIGN_FLOOR(pkts_n - rcvd_pkt, MLX5_VPMD_DESCS_PER_LOOP);
/* Not to cross queue end. */
pkts_n = RTE_MIN(pkts_n, q_n - elts_idx);
+ pkts_n = RTE_MIN(pkts_n, q_n - cq_idx);
if (!pkts_n)
return rcvd_pkt;
/* At this point, there shouldn't be any remained packets. */
- assert(rxq->rq_pi == rxq->cq_ci);
+ assert(rxq->decompressed == 0);
/*
* Note that vectors have reverse order - {v3, v2, v1, v0}, because
* there's no instruction to count trailing zeros. __builtin_clzl() is
@@ -1003,15 +1001,17 @@
/* Decompress the last CQE if compressed. */
if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP && comp_idx == n) {
assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP));
- rxq_cq_decompress_v(rxq, &cq[nocmp_n], &elts[nocmp_n]);
+ rxq->decompressed = rxq_cq_decompress_v(rxq, &cq[nocmp_n],
+ &elts[nocmp_n]);
/* Return more packets if needed. */
if (nocmp_n < pkts_n) {
- uint16_t n = rxq->cq_ci - rxq->rq_pi;
+ uint16_t n = rxq->decompressed;
n = RTE_MIN(n, pkts_n - nocmp_n);
rxq_copy_mbuf_v(rxq, &pkts[nocmp_n], n);
rxq->rq_pi += n;
rcvd_pkt += n;
+ rxq->decompressed -= n;
}
}
rte_compiler_barrier();
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index fb384ef..cc2f251 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -349,8 +349,11 @@
* @param elts
* Pointer to SW ring to be filled. The first mbuf has to be pre-built from
* the title completion descriptor to be copied to the rest of mbufs.
+ *
+ * @return
+ * Number of mini-CQEs successfully decompressed.
*/
-static inline void
+static inline uint16_t
rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
struct rte_mbuf **elts)
{
@@ -486,6 +489,7 @@
rxq->stats.ibytes += rcvd_byte;
#endif
rxq->cq_ci += mcqe_n;
+ return mcqe_n;
}
/**
@@ -712,23 +716,16 @@
rte_prefetch0(cq + 2);
rte_prefetch0(cq + 3);
pkts_n = RTE_MIN(pkts_n, MLX5_VPMD_RX_MAX_BURST);
- /*
- * Order of indexes:
- * rq_ci >= cq_ci >= rq_pi
- * Definition of indexes:
- * rq_ci - cq_ci := # of buffers owned by HW (posted).
- * cq_ci - rq_pi := # of buffers not returned to app (decompressed).
- * N - (rq_ci - rq_pi) := # of buffers consumed (to be replenished).
- */
repl_n = q_n - (rxq->rq_ci - rxq->rq_pi);
if (repl_n >= rxq->rq_repl_thresh)
mlx5_rx_replenish_bulk_mbuf(rxq, repl_n);
/* See if there're unreturned mbufs from compressed CQE. */
- rcvd_pkt = rxq->cq_ci - rxq->rq_pi;
+ rcvd_pkt = rxq->decompressed;
if (rcvd_pkt > 0) {
rcvd_pkt = RTE_MIN(rcvd_pkt, pkts_n);
rxq_copy_mbuf_v(rxq, pkts, rcvd_pkt);
rxq->rq_pi += rcvd_pkt;
+ rxq->decompressed -= rcvd_pkt;
pkts += rcvd_pkt;
}
elts_idx = rxq->rq_pi & q_mask;
@@ -737,10 +734,11 @@
pkts_n = RTE_ALIGN_FLOOR(pkts_n - rcvd_pkt, MLX5_VPMD_DESCS_PER_LOOP);
/* Not to cross queue end. */
pkts_n = RTE_MIN(pkts_n, q_n - elts_idx);
+ pkts_n = RTE_MIN(pkts_n, q_n - cq_idx);
if (!pkts_n)
return rcvd_pkt;
/* At this point, there shouldn't be any remained packets. */
- assert(rxq->rq_pi == rxq->cq_ci);
+ assert(rxq->decompressed == 0);
/*
* A. load first Qword (8bytes) in one loop.
* B. copy 4 mbuf pointers from elts ring to returing pkts.
@@ -953,15 +951,17 @@
/* Decompress the last CQE if compressed. */
if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP && comp_idx == n) {
assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP));
- rxq_cq_decompress_v(rxq, &cq[nocmp_n], &elts[nocmp_n]);
+ rxq->decompressed = rxq_cq_decompress_v(rxq, &cq[nocmp_n],
+ &elts[nocmp_n]);
/* Return more packets if needed. */
if (nocmp_n < pkts_n) {
- uint16_t n = rxq->cq_ci - rxq->rq_pi;
+ uint16_t n = rxq->decompressed;
n = RTE_MIN(n, pkts_n - nocmp_n);
rxq_copy_mbuf_v(rxq, &pkts[nocmp_n], n);
rxq->rq_pi += n;
rcvd_pkt += n;
+ rxq->decompressed -= n;
}
}
rte_compiler_barrier();
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection Matan Azrad
` (7 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
Add a global function in the PMD which dumps debug information to
specific file.
The data can be printed in hexadecimal format or as regular string.
The number of debug files per PMD entity should be limited by a new PMD
probe parameter called max_dump_files_num.
The files will be created in the /var/log directory or in the current
directory.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
doc/guides/nics/mlx5.rst | 7 +++++++
drivers/net/mlx5/mlx5.c | 8 ++++++++
drivers/net/mlx5/mlx5.h | 1 +
drivers/net/mlx5/mlx5_rxtx.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.h | 2 ++
5 files changed, 62 insertions(+)
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 325e9f6..aa89bd9 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -507,6 +507,13 @@ Run-time configuration
representor=[0-2]
+- ``max_dump_files_num`` parameter [int]
+
+ The maximum number of files per PMD entity that may be created for debug information.
+ The files will be created in /var/log directory or in current directory.
+
+ set to 128 by default.
+
Firmware configuration
~~~~~~~~~~~~~~~~~~~~~~
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 9f5ec97..ebb49c8 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -116,6 +116,9 @@
/* Select port representors to instantiate. */
#define MLX5_REPRESENTOR "representor"
+/* Device parameter to configure the maximum number of dump files per queue. */
+#define MLX5_MAX_DUMP_FILES_NUM "max_dump_files_num"
+
#ifndef HAVE_IBV_MLX5_MOD_MPW
#define MLX5DV_CONTEXT_FLAGS_MPW_ALLOWED (1 << 2)
#define MLX5DV_CONTEXT_FLAGS_ENHANCED_MPW (1 << 3)
@@ -926,6 +929,8 @@ struct mlx5_dev_spawn_data {
config->dv_flow_en = !!tmp;
} else if (strcmp(MLX5_MR_EXT_MEMSEG_EN, key) == 0) {
config->mr_ext_memseg_en = !!tmp;
+ } else if (strcmp(MLX5_MAX_DUMP_FILES_NUM, key) == 0) {
+ config->max_dump_files_num = tmp;
} else {
DRV_LOG(WARNING, "%s: unknown parameter", key);
rte_errno = EINVAL;
@@ -970,6 +975,7 @@ struct mlx5_dev_spawn_data {
MLX5_DV_FLOW_EN,
MLX5_MR_EXT_MEMSEG_EN,
MLX5_REPRESENTOR,
+ MLX5_MAX_DUMP_FILES_NUM,
NULL,
};
struct rte_kvargs *kvlist;
@@ -1433,6 +1439,8 @@ struct mlx5_dev_spawn_data {
DRV_LOG(WARNING, "Multi-Packet RQ isn't supported");
config.mprq.enabled = 0;
}
+ if (config.max_dump_files_num == 0)
+ config.max_dump_files_num = 128;
eth_dev = rte_eth_dev_allocate(name);
if (eth_dev == NULL) {
DRV_LOG(ERR, "can not allocate rte ethdev");
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 3eaaafd..4c339d0 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -204,6 +204,7 @@ struct mlx5_dev_config {
unsigned int flow_prio; /* Number of flow priorities. */
unsigned int tso_max_payload_sz; /* Maximum TCP payload for TSO. */
unsigned int ind_table_max_size; /* Maximum indirection table size. */
+ unsigned int max_dump_files_num; /* Maximum dump files per queue. */
int txq_inline; /* Maximum packet size for inlining. */
int txqs_inline; /* Queue number threshold for inlining. */
int txqs_vec; /* Queue number threshold for vectorized Tx. */
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 3da3f62..2c8d066 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -524,6 +524,50 @@
return rx_queue_count(rxq);
}
+#define MLX5_SYSTEM_LOG_DIR "/var/log"
+/**
+ * Dump debug information to log file.
+ *
+ * @param fname
+ * The file name.
+ * @param hex_title
+ * If not NULL this string is printed as a header to the output
+ * and the output will be in hexadecimal view.
+ * @param buf
+ * This is the buffer address to print out.
+ * @param len
+ * The number of bytes to dump out.
+ */
+void
+mlx5_dump_debug_information(const char *fname, const char *hex_title,
+ const void *buf, unsigned int hex_len)
+{
+ FILE *fd;
+
+ MKSTR(path, "%s/%s", MLX5_SYSTEM_LOG_DIR, fname);
+ fd = fopen(path, "a+");
+ if (!fd) {
+ DRV_LOG(WARNING, "cannot open %s for debug dump\n",
+ path);
+ MKSTR(path2, "./%s", fname);
+ fd = fopen(path2, "a+");
+ if (!fd) {
+ DRV_LOG(ERR, "cannot open %s for debug dump\n",
+ path2);
+ return;
+ }
+ DRV_LOG(INFO, "New debug dump in file %s\n", path2);
+ } else {
+ DRV_LOG(INFO, "New debug dump in file %s\n", path);
+ }
+ if (hex_title)
+ rte_hexdump(fd, hex_title, buf, hex_len);
+ else
+ fprintf(fd, "%s", (const char *)buf);
+ fprintf(fd, "\n\n\n");
+ fclose(fd);
+}
+
/**
* DPDK callback for TX.
*
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 7bacdba..35e53fc 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -356,6 +356,8 @@ uint16_t removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
int mlx5_rx_descriptor_status(void *rx_queue, uint16_t offset);
int mlx5_tx_descriptor_status(void *tx_queue, uint16_t offset);
uint32_t mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id);
+void mlx5_dump_debug_information(const char *path, const char *title,
+ const void *buf, unsigned int len);
/* Vectorized version of mlx5_rxtx.c */
int mlx5_check_raw_vec_tx_support(struct rte_eth_dev *dev);
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier Matan Azrad
` (6 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
When bad device arguments are added to the DPDK command line, the PMD
ignores all the command line arguments specified by the user and uses
the default values instead.
This behavior doesn't make sense because the user intension is to force
some device parameters and expects to get an error in case of
problematic issues with the arguments.
Stop probing and report an error in case of problematic command line
arguments.
Fixes: e72dd09b614e ("net/mlx5: add support for configuration through kvargs")
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ebb49c8..23e397e 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -986,8 +986,10 @@ struct mlx5_dev_spawn_data {
return 0;
/* Following UGLY cast is done to pass checkpatch. */
kvlist = rte_kvargs_parse(devargs->args, params);
- if (kvlist == NULL)
- return 0;
+ if (kvlist == NULL) {
+ rte_errno = EINVAL;
+ return -rte_errno;
+ }
/* Process parameters. */
for (i = 0; (params[i] != NULL); ++i) {
if (rte_kvargs_count(kvlist, params[i])) {
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (2 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization Matan Azrad
` (5 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
The RQ WQEs must be written in the memory before the HW gets the RQ
doorbell, hence a memory barrier should be triggered after the WQEs
writing and before the doorbell writing.
The current code used rte_wmb barrier which ensures that all the memory
stores were done while it is enough to use rte_cio_wmb barrier for the
local memory stores because the WQEs are in local memory.
CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_rxq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index b248f38..282295f 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1009,7 +1009,7 @@ struct mlx5_rxq_ibv *
rxq_data->decompressed = 0;
/* Update doorbell counter. */
rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
- rte_wmb();
+ rte_cio_wmb();
*rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci);
DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
idx, (void *)&tmpl);
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (3 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling Matan Azrad
` (4 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
Move the RQ WQEs initialization code to separate function as an
arrangement to CQE error recovering for code reuse.
CC: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_rxq.c | 43 ++---------------------------------
drivers/net/mlx5/mlx5_rxtx.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 41 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 282295f..90e8c49 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -779,7 +779,6 @@ struct mlx5_rxq_ibv *
struct mlx5_rxq_ibv *tmpl;
struct mlx5dv_cq cq_info;
struct mlx5dv_rwq rwq;
- unsigned int i;
int ret = 0;
struct mlx5dv_obj obj;
struct mlx5_dev_config *config = &priv->config;
@@ -964,53 +963,15 @@ struct mlx5_rxq_ibv *
}
/* Fill the rings. */
rxq_data->wqes = rwq.buf;
- for (i = 0; (i != wqe_n); ++i) {
- volatile struct mlx5_wqe_data_seg *scat;
- uintptr_t addr;
- uint32_t byte_count;
-
- if (mprq_en) {
- struct mlx5_mprq_buf *buf = (*rxq_data->mprq_bufs)[i];
-
- scat = &((volatile struct mlx5_wqe_mprq *)
- rxq_data->wqes)[i].dseg;
- addr = (uintptr_t)mlx5_mprq_buf_addr(buf);
- byte_count = (1 << rxq_data->strd_sz_n) *
- (1 << rxq_data->strd_num_n);
- } else {
- struct rte_mbuf *buf = (*rxq_data->elts)[i];
-
- scat = &((volatile struct mlx5_wqe_data_seg *)
- rxq_data->wqes)[i];
- addr = rte_pktmbuf_mtod(buf, uintptr_t);
- byte_count = DATA_LEN(buf);
- }
- /* scat->addr must be able to store a pointer. */
- assert(sizeof(scat->addr) >= sizeof(uintptr_t));
- *scat = (struct mlx5_wqe_data_seg){
- .addr = rte_cpu_to_be_64(addr),
- .byte_count = rte_cpu_to_be_32(byte_count),
- .lkey = mlx5_rx_addr2mr(rxq_data, addr),
- };
- }
rxq_data->rq_db = rwq.dbrec;
rxq_data->cqe_n = log2above(cq_info.cqe_cnt);
- rxq_data->cq_ci = 0;
- rxq_data->consumed_strd = 0;
- rxq_data->rq_pi = 0;
- rxq_data->zip = (struct rxq_zip){
- .ai = 0,
- };
rxq_data->cq_db = cq_info.dbrec;
rxq_data->cqes = (volatile struct mlx5_cqe (*)[])(uintptr_t)cq_info.buf;
rxq_data->cq_uar = cq_info.cq_uar;
rxq_data->cqn = cq_info.cqn;
rxq_data->cq_arm_sn = 0;
- rxq_data->decompressed = 0;
- /* Update doorbell counter. */
- rxq_data->rq_ci = wqe_n >> rxq_data->sges_n;
- rte_cio_wmb();
- *rxq_data->rq_db = rte_cpu_to_be_32(rxq_data->rq_ci);
+ mlx5_rxq_initialize(rxq_data);
+ rxq_data->cq_ci = 0;
DRV_LOG(DEBUG, "port %u rxq %u updated with %p", dev->data->port_id,
idx, (void *)&tmpl);
rte_atomic32_inc(&tmpl->refcnt);
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 2c8d066..aec0185 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1831,6 +1831,59 @@
}
/**
+ * Initialize Rx WQ and indexes.
+ *
+ * @param[in] rxq
+ * Pointer to RX queue structure.
+ */
+void
+mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
+{
+ const unsigned int wqe_n = 1 << rxq->elts_n;
+ unsigned int i;
+
+ for (i = 0; (i != wqe_n); ++i) {
+ volatile struct mlx5_wqe_data_seg *scat;
+ uintptr_t addr;
+ uint32_t byte_count;
+
+ if (mlx5_rxq_mprq_enabled(rxq)) {
+ struct mlx5_mprq_buf *buf = (*rxq->mprq_bufs)[i];
+
+ scat = &((volatile struct mlx5_wqe_mprq *)
+ rxq->wqes)[i].dseg;
+ addr = (uintptr_t)mlx5_mprq_buf_addr(buf);
+ byte_count = (1 << rxq->strd_sz_n) *
+ (1 << rxq->strd_num_n);
+ } else {
+ struct rte_mbuf *buf = (*rxq->elts)[i];
+
+ scat = &((volatile struct mlx5_wqe_data_seg *)
+ rxq->wqes)[i];
+ addr = rte_pktmbuf_mtod(buf, uintptr_t);
+ byte_count = DATA_LEN(buf);
+ }
+ /* scat->addr must be able to store a pointer. */
+ assert(sizeof(scat->addr) >= sizeof(uintptr_t));
+ *scat = (struct mlx5_wqe_data_seg){
+ .addr = rte_cpu_to_be_64(addr),
+ .byte_count = rte_cpu_to_be_32(byte_count),
+ .lkey = mlx5_rx_addr2mr(rxq, addr),
+ };
+ }
+ rxq->consumed_strd = 0;
+ rxq->decompressed = 0;
+ rxq->rq_pi = 0;
+ rxq->zip = (struct rxq_zip){
+ .ai = 0,
+ };
+ /* Update doorbell counter. */
+ rxq->rq_ci = wqe_n >> rxq->sges_n;
+ rte_cio_wmb();
+ *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci);
+}
+
+/**
* Get size of the next packet for a given CQE. For compressed CQEs, the
* consumer index is updated only once all packets of the current one have
* been processed.
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (4 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error Matan Azrad
` (3 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
When WQEs are posted to the HW to receive packets, the PMD may receive
a completion report with error from the HW, aka error CQE which is
associated to a bad WQE.
The error reason may be bad address, wrong lkey, small buffer size,
etc. that can wrongly be configured by the PMD or by the user.
Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts, moreover, some error CQEs can be
triggered because of the packets coming from the wire when the DPDK
application has no any control.
Most of the error CQE types change the RQ state to error state what
causes all the next received packets to be dropped by the HW and to be
completed with CQE flush error forever.
The current solution detects these error CQEs and even reports the
errors to the user by the statistics error counters but without
recovery, so if the RQ inserted to the error state it never moves to
ready state again and all the next packets ever will be dropped.
Extend the error CQEs handling for recovery by moving the state to
ready again, and rearranging all the RQ WQEs and the management
variables appropriately.
Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily,
hence a dump file with debug information will be created for the first
number of error CQEs, this number can be configured by the PMD probe
parameters.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_rxtx.c | 328 +++++++++++++++++++++++++++------------
drivers/net/mlx5/mlx5_rxtx.h | 101 ++++--------
drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +-
3 files changed, 266 insertions(+), 168 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index aec0185..5369fc1 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -25,6 +25,7 @@
#include <rte_common.h>
#include <rte_branch_prediction.h>
#include <rte_ether.h>
+#include <rte_cycles.h>
#include "mlx5.h"
#include "mlx5_utils.h"
@@ -444,7 +445,7 @@
cq_ci = rxq->cq_ci;
}
cqe = &(*rxq->cqes)[cq_ci & cqe_cnt];
- while (check_cqe(cqe, cqe_n, cq_ci) == 0) {
+ while (check_cqe(cqe, cqe_n, cq_ci) != MLX5_CQE_STATUS_HW_OWN) {
int8_t op_own;
unsigned int n;
@@ -1884,6 +1885,130 @@
}
/**
+ * Handle a Rx error.
+ * The function inserts the RQ state to reset when the first error CQE is
+ * shown, then drains the CQ by the caller function loop. When the CQ is empty,
+ * it moves the RQ state to ready and initializes the RQ.
+ * Next CQE identification and error counting are in the caller responsibility.
+ *
+ * @param[in] rxq
+ * Pointer to RX queue structure.
+ * @param[in] mbuf_prepare
+ * Whether to prepare mbufs for the RQ.
+ *
+ * @return
+ * -1 in case of recovery error, otherwise the CQE status.
+ */
+int
+mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t mbuf_prepare)
+{
+ const uint16_t cqe_n = 1 << rxq->cqe_n;
+ const uint16_t cqe_mask = cqe_n - 1;
+ const unsigned int wqe_n = 1 << rxq->elts_n;
+ struct mlx5_rxq_ctrl *rxq_ctrl =
+ container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+ struct ibv_wq_attr mod = {
+ .attr_mask = IBV_WQ_ATTR_STATE,
+ };
+ union {
+ volatile struct mlx5_cqe *cqe;
+ volatile struct mlx5_err_cqe *err_cqe;
+ } u = {
+ .cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask],
+ };
+ int ret;
+
+ switch (rxq->err_state) {
+ case MLX5_RXQ_ERR_STATE_NO_ERROR:
+ rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_RESET;
+ /* Fall-through */
+ case MLX5_RXQ_ERR_STATE_NEED_RESET:
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ return -1;
+ mod.wq_state = IBV_WQS_RESET;
+ ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Rx WQ state to RESET %s\n",
+ strerror(errno));
+ return -1;
+ }
+ if (rxq_ctrl->dump_file_n <
+ rxq_ctrl->priv->config.max_dump_files_num) {
+ MKSTR(err_str, "Unexpected CQE error syndrome "
+ "0x%02x CQN = %u RQN = %u wqe_counter = %u"
+ " rq_ci = %u cq_ci = %u", u.err_cqe->syndrome,
+ rxq->cqn, rxq_ctrl->ibv->wq->wq_num,
+ rte_be_to_cpu_16(u.err_cqe->wqe_counter),
+ rxq->rq_ci << rxq->sges_n, rxq->cq_ci);
+ MKSTR(name, "dpdk_mlx5_port_%u_rxq_%u_%u",
+ rxq->port_id, rxq->idx, (uint32_t)rte_rdtsc());
+ mlx5_dump_debug_information(name, NULL, err_str, 0);
+ mlx5_dump_debug_information(name, "MLX5 Error CQ:",
+ (const void *)((uintptr_t)
+ rxq->cqes),
+ sizeof(*u.cqe) * cqe_n);
+ mlx5_dump_debug_information(name, "MLX5 Error RQ:",
+ (const void *)((uintptr_t)
+ rxq->wqes),
+ 16 * wqe_n);
+ rxq_ctrl->dump_file_n++;
+ }
+ rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_READY;
+ /* Fall-through */
+ case MLX5_RXQ_ERR_STATE_NEED_READY:
+ ret = check_cqe(u.cqe, cqe_n, rxq->cq_ci);
+ if (ret == MLX5_CQE_STATUS_HW_OWN) {
+ rte_cio_wmb();
+ *rxq->cq_db = rte_cpu_to_be_32(rxq->cq_ci);
+ rte_cio_wmb();
+ /*
+ * The RQ consumer index must be zeroed while moving
+ * from RESET state to RDY state.
+ */
+ *rxq->rq_db = rte_cpu_to_be_32(0);
+ rte_cio_wmb();
+ mod.wq_state = IBV_WQS_RDY;
+ ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Rx WQ state to RDY"
+ " %s\n", strerror(errno));
+ return -1;
+ }
+ if (mbuf_prepare) {
+ const uint16_t q_mask = wqe_n - 1;
+ uint16_t elt_idx;
+ struct rte_mbuf **elt;
+ int i;
+ unsigned int n = wqe_n - (rxq->rq_ci -
+ rxq->rq_pi);
+
+ for (i = 0; i < (int)n; ++i) {
+ elt_idx = (rxq->rq_ci + i) & q_mask;
+ elt = &(*rxq->elts)[elt_idx];
+ *elt = rte_mbuf_raw_alloc(rxq->mp);
+ if (!*elt) {
+ for (i--; i >= 0; --i) {
+ elt_idx = (rxq->rq_ci +
+ i) & q_mask;
+ elt = &(*rxq->elts)
+ [elt_idx];
+ rte_pktmbuf_free_seg
+ (*elt);
+ }
+ return -1;
+ }
+ }
+ }
+ mlx5_rxq_initialize(rxq);
+ rxq->err_state = MLX5_RXQ_ERR_STATE_NO_ERROR;
+ }
+ return ret;
+ default:
+ return -1;
+ }
+}
+
+/**
* Get size of the next packet for a given CQE. For compressed CQEs, the
* consumer index is updated only once all packets of the current one have
* been processed.
@@ -1897,8 +2022,7 @@
* written.
*
* @return
- * Packet size in bytes (0 if there is none), -1 in case of completion
- * with error.
+ * 0 in case of empty CQE, otherwise the packet size in bytes.
*/
static inline int
mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
@@ -1906,98 +2030,118 @@
{
struct rxq_zip *zip = &rxq->zip;
uint16_t cqe_n = cqe_cnt + 1;
- int len = 0;
+ int len;
uint16_t idx, end;
- /* Process compressed data in the CQE and mini arrays. */
- if (zip->ai) {
- volatile struct mlx5_mini_cqe8 (*mc)[8] =
- (volatile struct mlx5_mini_cqe8 (*)[8])
- (uintptr_t)(&(*rxq->cqes)[zip->ca & cqe_cnt].pkt_info);
-
- len = rte_be_to_cpu_32((*mc)[zip->ai & 7].byte_cnt);
- *mcqe = &(*mc)[zip->ai & 7];
- if ((++zip->ai & 7) == 0) {
- /* Invalidate consumed CQEs */
- idx = zip->ca;
- end = zip->na;
- while (idx != end) {
- (*rxq->cqes)[idx & cqe_cnt].op_own =
- MLX5_CQE_INVALIDATE;
- ++idx;
- }
- /*
- * Increment consumer index to skip the number of
- * CQEs consumed. Hardware leaves holes in the CQ
- * ring for software use.
- */
- zip->ca = zip->na;
- zip->na += 8;
- }
- if (unlikely(rxq->zip.ai == rxq->zip.cqe_cnt)) {
- /* Invalidate the rest */
- idx = zip->ca;
- end = zip->cq_ci;
-
- while (idx != end) {
- (*rxq->cqes)[idx & cqe_cnt].op_own =
- MLX5_CQE_INVALIDATE;
- ++idx;
- }
- rxq->cq_ci = zip->cq_ci;
- zip->ai = 0;
- }
- /* No compressed data, get next CQE and verify if it is compressed. */
- } else {
- int ret;
- int8_t op_own;
-
- ret = check_cqe(cqe, cqe_n, rxq->cq_ci);
- if (unlikely(ret == 1))
- return 0;
- ++rxq->cq_ci;
- op_own = cqe->op_own;
- rte_cio_rmb();
- if (MLX5_CQE_FORMAT(op_own) == MLX5_COMPRESSED) {
+ do {
+ len = 0;
+ /* Process compressed data in the CQE and mini arrays. */
+ if (zip->ai) {
volatile struct mlx5_mini_cqe8 (*mc)[8] =
(volatile struct mlx5_mini_cqe8 (*)[8])
- (uintptr_t)(&(*rxq->cqes)[rxq->cq_ci &
+ (uintptr_t)(&(*rxq->cqes)[zip->ca &
cqe_cnt].pkt_info);
- /* Fix endianness. */
- zip->cqe_cnt = rte_be_to_cpu_32(cqe->byte_cnt);
- /*
- * Current mini array position is the one returned by
- * check_cqe64().
- *
- * If completion comprises several mini arrays, as a
- * special case the second one is located 7 CQEs after
- * the initial CQE instead of 8 for subsequent ones.
- */
- zip->ca = rxq->cq_ci;
- zip->na = zip->ca + 7;
- /* Compute the next non compressed CQE. */
- --rxq->cq_ci;
- zip->cq_ci = rxq->cq_ci + zip->cqe_cnt;
- /* Get packet size to return. */
- len = rte_be_to_cpu_32((*mc)[0].byte_cnt);
- *mcqe = &(*mc)[0];
- zip->ai = 1;
- /* Prefetch all the entries to be invalidated */
- idx = zip->ca;
- end = zip->cq_ci;
- while (idx != end) {
- rte_prefetch0(&(*rxq->cqes)[(idx) & cqe_cnt]);
- ++idx;
+ len = rte_be_to_cpu_32((*mc)[zip->ai & 7].byte_cnt);
+ *mcqe = &(*mc)[zip->ai & 7];
+ if ((++zip->ai & 7) == 0) {
+ /* Invalidate consumed CQEs */
+ idx = zip->ca;
+ end = zip->na;
+ while (idx != end) {
+ (*rxq->cqes)[idx & cqe_cnt].op_own =
+ MLX5_CQE_INVALIDATE;
+ ++idx;
+ }
+ /*
+ * Increment consumer index to skip the number
+ * of CQEs consumed. Hardware leaves holes in
+ * the CQ ring for software use.
+ */
+ zip->ca = zip->na;
+ zip->na += 8;
+ }
+ if (unlikely(rxq->zip.ai == rxq->zip.cqe_cnt)) {
+ /* Invalidate the rest */
+ idx = zip->ca;
+ end = zip->cq_ci;
+
+ while (idx != end) {
+ (*rxq->cqes)[idx & cqe_cnt].op_own =
+ MLX5_CQE_INVALIDATE;
+ ++idx;
+ }
+ rxq->cq_ci = zip->cq_ci;
+ zip->ai = 0;
+ }
+ /*
+ * No compressed data, get next CQE and verify if it is
+ * compressed.
+ */
+ } else {
+ int ret;
+ int8_t op_own;
+
+ ret = check_cqe(cqe, cqe_n, rxq->cq_ci);
+ if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+ if (unlikely(ret == MLX5_CQE_STATUS_ERR ||
+ rxq->err_state)) {
+ ret = mlx5_rx_err_handle(rxq, 0);
+ if (ret == MLX5_CQE_STATUS_HW_OWN ||
+ ret == -1)
+ return 0;
+ } else {
+ return 0;
+ }
}
+ ++rxq->cq_ci;
+ op_own = cqe->op_own;
+ if (MLX5_CQE_FORMAT(op_own) == MLX5_COMPRESSED) {
+ volatile struct mlx5_mini_cqe8 (*mc)[8] =
+ (volatile struct mlx5_mini_cqe8 (*)[8])
+ (uintptr_t)(&(*rxq->cqes)
+ [rxq->cq_ci &
+ cqe_cnt].pkt_info);
+
+ /* Fix endianness. */
+ zip->cqe_cnt = rte_be_to_cpu_32(cqe->byte_cnt);
+ /*
+ * Current mini array position is the one
+ * returned by check_cqe64().
+ *
+ * If completion comprises several mini arrays,
+ * as a special case the second one is located
+ * 7 CQEs after the initial CQE instead of 8
+ * for subsequent ones.
+ */
+ zip->ca = rxq->cq_ci;
+ zip->na = zip->ca + 7;
+ /* Compute the next non compressed CQE. */
+ --rxq->cq_ci;
+ zip->cq_ci = rxq->cq_ci + zip->cqe_cnt;
+ /* Get packet size to return. */
+ len = rte_be_to_cpu_32((*mc)[0].byte_cnt);
+ *mcqe = &(*mc)[0];
+ zip->ai = 1;
+ /* Prefetch all to be invalidated */
+ idx = zip->ca;
+ end = zip->cq_ci;
+ while (idx != end) {
+ rte_prefetch0(&(*rxq->cqes)[(idx) &
+ cqe_cnt]);
+ ++idx;
+ }
+ } else {
+ len = rte_be_to_cpu_32(cqe->byte_cnt);
+ }
+ }
+ if (unlikely(rxq->err_state)) {
+ cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_cnt];
+ ++rxq->stats.idropped;
} else {
- len = rte_be_to_cpu_32(cqe->byte_cnt);
+ return len;
}
- /* Error while receiving packet. */
- if (unlikely(MLX5_CQE_OPCODE(op_own) == MLX5_CQE_RESP_ERR))
- return -1;
- }
- return len;
+ } while (1);
}
/**
@@ -2140,12 +2284,6 @@
rte_mbuf_raw_free(rep);
break;
}
- if (unlikely(len == -1)) {
- /* RX error, packet is likely too large. */
- rte_mbuf_raw_free(rep);
- ++rxq->stats.idropped;
- goto skip;
- }
pkt = seg;
assert(len >= (rxq->crc_present << 2));
pkt->ol_flags = 0;
@@ -2188,7 +2326,6 @@
pkt = NULL;
--pkts_n;
++i;
-skip:
/* Align consumer index to the next stride. */
rq_ci >>= sges_n;
++rq_ci;
@@ -2321,11 +2458,6 @@
ret = mlx5_rx_poll_len(rxq, cqe, cq_mask, &mcqe);
if (!ret)
break;
- if (unlikely(ret == -1)) {
- /* RX error, packet is likely too large. */
- ++rxq->stats.idropped;
- continue;
- }
byte_cnt = ret;
strd_cnt = (byte_cnt & MLX5_MPRQ_STRIDE_NUM_MASK) >>
MLX5_MPRQ_STRIDE_NUM_SHIFT;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 35e53fc..d944fbe 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -36,6 +36,7 @@
#include "mlx5_autoconf.h"
#include "mlx5_defs.h"
#include "mlx5_prm.h"
+#include "mlx5_glue.h"
/* Support tunnel matching. */
#define MLX5_FLOW_TUNNEL 5
@@ -78,6 +79,12 @@ struct mlx5_mprq_buf {
/* Get pointer to the first stride. */
#define mlx5_mprq_buf_addr(ptr) ((ptr) + 1)
+enum mlx5_rxq_err_state {
+ MLX5_RXQ_ERR_STATE_NO_ERROR = 0,
+ MLX5_RXQ_ERR_STATE_NEED_RESET,
+ MLX5_RXQ_ERR_STATE_NEED_READY,
+};
+
/* RX queue descriptor. */
struct mlx5_rxq_data {
unsigned int csum:1; /* Enable checksum offloading. */
@@ -92,7 +99,8 @@ struct mlx5_rxq_data {
unsigned int strd_num_n:5; /* Log 2 of the number of stride. */
unsigned int strd_sz_n:4; /* Log 2 of stride size. */
unsigned int strd_shift_en:1; /* Enable 2bytes shift on a stride. */
- unsigned int :6; /* Remaining bits. */
+ unsigned int err_state:2; /* enum mlx5_rxq_err_state. */
+ unsigned int :4; /* Remaining bits. */
volatile uint32_t *rq_db;
volatile uint32_t *cq_db;
uint16_t port_id;
@@ -153,6 +161,7 @@ struct mlx5_rxq_ctrl {
unsigned int irq:1; /* Whether IRQ is enabled. */
uint32_t flow_mark_n; /* Number of Mark/Flag flows using this Queue. */
uint32_t flow_tunnels_n[MLX5_FLOW_TUNNEL]; /* Tunnels counters. */
+ uint16_t dump_file_n; /* Number of dump files. */
};
/* Indirection table. */
@@ -345,6 +354,9 @@ uint16_t mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts,
uint16_t mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts,
uint16_t pkts_n);
uint16_t mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
+void mlx5_rxq_initialize(struct mlx5_rxq_data *rxq);
+__rte_noinline int mlx5_rx_err_handle(struct mlx5_rxq_data *rxq,
+ uint8_t mbuf_prepare);
void mlx5_mprq_buf_free_cb(void *addr, void *opaque);
void mlx5_mprq_buf_free(struct mlx5_mprq_buf *buf);
uint16_t mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf **pkts,
@@ -439,32 +451,12 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
#define mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock)
#endif
-#ifndef NDEBUG
-/**
- * Verify or set magic value in CQE.
- *
- * @param cqe
- * Pointer to CQE.
- *
- * @return
- * 0 the first time.
- */
-static inline int
-check_cqe_seen(volatile struct mlx5_cqe *cqe)
-{
- static const uint8_t magic[] = "seen";
- volatile uint8_t (*buf)[sizeof(cqe->rsvd1)] = &cqe->rsvd1;
- int ret = 1;
- unsigned int i;
-
- for (i = 0; i < sizeof(magic) && i < sizeof(*buf); ++i)
- if (!ret || (*buf)[i] != magic[i]) {
- ret = 0;
- (*buf)[i] = magic[i];
- }
- return ret;
-}
-#endif /* NDEBUG */
+/* CQE status. */
+enum mlx5_cqe_status {
+ MLX5_CQE_STATUS_SW_OWN,
+ MLX5_CQE_STATUS_HW_OWN,
+ MLX5_CQE_STATUS_ERR,
+};
/**
* Check whether CQE is valid.
@@ -477,51 +469,24 @@ int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova,
* Consumer index.
*
* @return
- * 0 on success, 1 on failure.
+ * The CQE status.
*/
-static __rte_always_inline int
-check_cqe(volatile struct mlx5_cqe *cqe,
- unsigned int cqes_n, const uint16_t ci)
+static __rte_always_inline enum mlx5_cqe_status
+check_cqe(volatile struct mlx5_cqe *cqe, const uint16_t cqes_n,
+ const uint16_t ci)
{
- uint16_t idx = ci & cqes_n;
- uint8_t op_own = cqe->op_own;
- uint8_t op_owner = MLX5_CQE_OWNER(op_own);
- uint8_t op_code = MLX5_CQE_OPCODE(op_own);
+ const uint16_t idx = ci & cqes_n;
+ const uint8_t op_own = cqe->op_own;
+ const uint8_t op_owner = MLX5_CQE_OWNER(op_own);
+ const uint8_t op_code = MLX5_CQE_OPCODE(op_own);
if (unlikely((op_owner != (!!(idx))) || (op_code == MLX5_CQE_INVALID)))
- return 1; /* No CQE. */
-#ifndef NDEBUG
- if ((op_code == MLX5_CQE_RESP_ERR) ||
- (op_code == MLX5_CQE_REQ_ERR)) {
- volatile struct mlx5_err_cqe *err_cqe = (volatile void *)cqe;
- uint8_t syndrome = err_cqe->syndrome;
-
- if ((syndrome == MLX5_CQE_SYNDROME_LOCAL_LENGTH_ERR) ||
- (syndrome == MLX5_CQE_SYNDROME_REMOTE_ABORTED_ERR))
- return 0;
- if (!check_cqe_seen(cqe)) {
- DRV_LOG(ERR,
- "unexpected CQE error %u (0x%02x) syndrome"
- " 0x%02x",
- op_code, op_code, syndrome);
- rte_hexdump(stderr, "MLX5 Error CQE:",
- (const void *)((uintptr_t)err_cqe),
- sizeof(*cqe));
- }
- return 1;
- } else if ((op_code != MLX5_CQE_RESP_SEND) &&
- (op_code != MLX5_CQE_REQ)) {
- if (!check_cqe_seen(cqe)) {
- DRV_LOG(ERR, "unexpected CQE opcode %u (0x%02x)",
- op_code, op_code);
- rte_hexdump(stderr, "MLX5 CQE:",
- (const void *)((uintptr_t)cqe),
- sizeof(*cqe));
- }
- return 1;
- }
-#endif /* NDEBUG */
- return 0;
+ return MLX5_CQE_STATUS_HW_OWN;
+ rte_cio_rmb();
+ if (unlikely(op_code == MLX5_CQE_RESP_ERR ||
+ op_code == MLX5_CQE_REQ_ERR))
+ return MLX5_CQE_STATUS_ERR;
+ return MLX5_CQE_STATUS_SW_OWN;
}
/**
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c b/drivers/net/mlx5/mlx5_rxtx_vec.c
index 9a3a5ae..073044f 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -197,7 +197,7 @@
for (i = 0; i < pkts_n; ++i) {
struct rte_mbuf *pkt = pkts[i];
- if (pkt->packet_type == RTE_PTYPE_ALL_MASK) {
+ if (pkt->packet_type == RTE_PTYPE_ALL_MASK || rxq->err_state) {
#ifdef MLX5_PMD_SOFT_COUNTERS
err_bytes += PKT_LEN(pkt);
#endif
@@ -212,6 +212,7 @@
rxq->stats.ipackets -= (pkts_n - n);
rxq->stats.ibytes -= err_bytes;
#endif
+ mlx5_rx_err_handle(rxq, 1);
return n;
}
@@ -236,7 +237,7 @@
uint64_t err = 0;
nb_rx = rxq_burst_v(rxq, pkts, pkts_n, &err);
- if (unlikely(err))
+ if (unlikely(err | rxq->err_state))
nb_rx = rxq_handle_pending_error(rxq, pkts, nb_rx);
return nb_rx;
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (5 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors Matan Azrad
` (2 subsequent siblings)
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
When WQEs are posted to the HW to send packets, the PMD may get a
completion report with error from the HW, aka error CQE which is
associated to a bad WQE.
The error reason may be bad address, wrong lkey, bad sizes, etc.
that can wrongly be configured by the PMD or by the user.
Checking all the optional mistakes to prevent error CQEs doesn't make
sense due to performance impacts and huge complexity.
The error CQEs change the SQ state to error state what causes all the
next posted WQEs to be completed with CQE flush error forever.
Currently, the PMD doesn't handle Tx error CQEs and even may crashed
when one of them appears.
Extend the Tx data-path to detect these error CQEs, to report them by
the statistics error counters, to recover the SQ by moving the state
to ready again and adjusting the management variables appropriately.
Sometimes the error CQE root cause is very hard to debug and even may
be related to some corner cases which are not reproducible easily, hence
a dump file with debug information will be created for the first number
of error CQEs, this number can be configured by the PMD probe
parameters.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_prm.h | 11 +++
drivers/net/mlx5/mlx5_rxtx.c | 166 ++++++++++++++++++++++++++++++++--
drivers/net/mlx5/mlx5_rxtx.h | 81 ++++++++++-------
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 10 +-
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
drivers/net/mlx5/mlx5_txq.c | 4 +-
6 files changed, 231 insertions(+), 51 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 8c42380..22db86b 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -153,6 +153,17 @@
/* Maximum number of DS in WQE. */
#define MLX5_DSEG_MAX 63
+/* The completion mode offset in the WQE control segment line 2. */
+#define MLX5_COMP_MODE_OFFSET 2
+
+/* Completion mode. */
+enum mlx5_completion_mode {
+ MLX5_COMP_ONLY_ERR = 0x0,
+ MLX5_COMP_ONLY_FIRST_ERR = 0x1,
+ MLX5_COMP_ALWAYS = 0x2,
+ MLX5_COMP_CQE_AND_EQE = 0x3,
+};
+
/* Subset of struct mlx5_wqe_eth_seg. */
struct mlx5_wqe_eth_seg_small {
uint32_t rsvd0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 5369fc1..36e2dd3 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -570,6 +570,141 @@
}
/**
+ * Move QP from error state to running state.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param qp
+ * The qp pointer for recovery.
+ *
+ * @return
+ * 0 on success, else errno value.
+ */
+static int
+tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp)
+{
+ int ret;
+ struct ibv_qp_attr mod = {
+ .qp_state = IBV_QPS_RESET,
+ .port_num = 1,
+ };
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n",
+ ret);
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_INIT;
+ ret = mlx5_glue->modify_qp(qp, &mod,
+ (IBV_QP_STATE | IBV_QP_PORT));
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret);
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_RTR;
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret);
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_RTS;
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret);
+ return ret;
+ }
+ txq->wqe_ci = 0;
+ txq->wqe_pi = 0;
+ txq->elts_comp = 0;
+ return 0;
+}
+
+/* Return 1 if the error CQE is signed otherwise, sign it and return 0. */
+static int
+check_err_cqe_seen(volatile struct mlx5_err_cqe *err_cqe)
+{
+ static const uint8_t magic[] = "seen";
+ int ret = 1;
+ unsigned int i;
+
+ for (i = 0; i < sizeof(magic); ++i)
+ if (!ret || err_cqe->rsvd1[i] != magic[i]) {
+ ret = 0;
+ err_cqe->rsvd1[i] = magic[i];
+ }
+ return ret;
+}
+
+/**
+ * Handle error CQE.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ * @param error_cqe
+ * Pointer to the error CQE.
+ *
+ * @return
+ * The last Tx buffer element to free.
+ */
+uint16_t
+mlx5_tx_error_cqe_handle(struct mlx5_txq_data *txq,
+ volatile struct mlx5_err_cqe *err_cqe)
+{
+ if (err_cqe->syndrome != MLX5_CQE_SYNDROME_WR_FLUSH_ERR) {
+ const uint16_t wqe_m = ((1 << txq->wqe_n) - 1);
+ struct mlx5_txq_ctrl *txq_ctrl =
+ container_of(txq, struct mlx5_txq_ctrl, txq);
+ uint16_t new_wqe_pi = rte_be_to_cpu_16(err_cqe->wqe_counter);
+ int seen = check_err_cqe_seen(err_cqe);
+
+ if (!seen && txq_ctrl->dump_file_n <
+ txq_ctrl->priv->config.max_dump_files_num) {
+ MKSTR(err_str, "Unexpected CQE error syndrome "
+ "0x%02x CQN = %u SQN = %u wqe_counter = %u "
+ "wq_ci = %u cq_ci = %u", err_cqe->syndrome,
+ txq_ctrl->cqn, txq->qp_num_8s >> 8,
+ rte_be_to_cpu_16(err_cqe->wqe_counter),
+ txq->wqe_ci, txq->cq_ci);
+ MKSTR(name, "dpdk_mlx5_port_%u_txq_%u_index_%u_%u",
+ PORT_ID(txq_ctrl->priv), txq->idx,
+ txq_ctrl->dump_file_n, (uint32_t)rte_rdtsc());
+ mlx5_dump_debug_information(name, NULL, err_str, 0);
+ mlx5_dump_debug_information(name, "MLX5 Error CQ:",
+ (const void *)((uintptr_t)
+ &(*txq->cqes)[0]),
+ sizeof(*err_cqe) *
+ (1 << txq->cqe_n));
+ mlx5_dump_debug_information(name, "MLX5 Error SQ:",
+ (const void *)((uintptr_t)
+ tx_mlx5_wqe(txq, 0)),
+ MLX5_WQE_SIZE *
+ (1 << txq->wqe_n));
+ txq_ctrl->dump_file_n++;
+ }
+ if (!seen)
+ /*
+ * Count errors in WQEs units.
+ * Later it can be improved to count error packets,
+ * for example, by SQ parsing to find how much packets
+ * should be counted for each WQE.
+ */
+ txq->stats.oerrors += ((txq->wqe_ci & wqe_m) -
+ new_wqe_pi) & wqe_m;
+ if ((rte_eal_process_type() == RTE_PROC_PRIMARY) &&
+ tx_recover_qp(txq, txq_ctrl->ibv->qp) == 0) {
+ txq->cq_ci++;
+ /* Release all the remaining buffers. */
+ return txq->elts_head;
+ }
+ /* Recovering failed - try again later on the same WQE. */
+ } else {
+ txq->cq_ci++;
+ }
+ /* Do not release buffers. */
+ return txq->elts_tail;
+}
+
+/**
* DPDK callback for TX.
*
* @param dpdk_txq
@@ -709,7 +844,9 @@
wqe->ctrl = (rte_v128u32_t){
rte_cpu_to_be_32(txq->wqe_ci << 8),
rte_cpu_to_be_32(txq->qp_num_8s | 1),
- 0,
+ rte_cpu_to_be_32
+ (MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET),
0,
};
ds = 1;
@@ -882,7 +1019,8 @@
rte_cpu_to_be_32((txq->wqe_ci << 8) |
MLX5_OPCODE_TSO),
rte_cpu_to_be_32(txq->qp_num_8s | ds),
- 0,
+ rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET),
0,
};
wqe->eseg = (rte_v128u32_t){
@@ -897,7 +1035,8 @@
rte_cpu_to_be_32((txq->wqe_ci << 8) |
MLX5_OPCODE_SEND),
rte_cpu_to_be_32(txq->qp_num_8s | ds),
- 0,
+ rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET),
0,
};
wqe->eseg = (rte_v128u32_t){
@@ -926,7 +1065,8 @@
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request completion on last WQE. */
- last_wqe->ctrl2 = rte_cpu_to_be_32(8);
+ last_wqe->ctrl2 = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
/* Save elts_head in unused "immediate" field of WQE. */
last_wqe->ctrl3 = txq->elts_head;
txq->elts_comp = 0;
@@ -973,7 +1113,8 @@
mpw->wqe->ctrl[0] = rte_cpu_to_be_32((MLX5_OPC_MOD_MPW << 24) |
(txq->wqe_ci << 8) |
MLX5_OPCODE_TSO);
- mpw->wqe->ctrl[2] = 0;
+ mpw->wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET);
mpw->wqe->ctrl[3] = 0;
mpw->data.dseg[0] = (volatile struct mlx5_wqe_data_seg *)
(((uintptr_t)mpw->wqe) + (2 * MLX5_WQE_DWORD_SIZE));
@@ -1145,7 +1286,8 @@
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request completion on last WQE. */
- wqe->ctrl[2] = rte_cpu_to_be_32(8);
+ wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
/* Save elts_head in unused "immediate" field of WQE. */
wqe->ctrl[3] = elts_head;
txq->elts_comp = 0;
@@ -1189,7 +1331,8 @@
mpw->wqe->ctrl[0] = rte_cpu_to_be_32((MLX5_OPC_MOD_MPW << 24) |
(txq->wqe_ci << 8) |
MLX5_OPCODE_TSO);
- mpw->wqe->ctrl[2] = 0;
+ mpw->wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET);
mpw->wqe->ctrl[3] = 0;
mpw->wqe->eseg.mss = rte_cpu_to_be_16(length);
mpw->wqe->eseg.inline_hdr_sz = 0;
@@ -1447,7 +1590,8 @@
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request completion on last WQE. */
- wqe->ctrl[2] = rte_cpu_to_be_32(8);
+ wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
/* Save elts_head in unused "immediate" field of WQE. */
wqe->ctrl[3] = elts_head;
txq->elts_comp = 0;
@@ -1491,7 +1635,8 @@
rte_cpu_to_be_32((MLX5_OPC_MOD_ENHANCED_MPSW << 24) |
(txq->wqe_ci << 8) |
MLX5_OPCODE_ENHANCED_MPSW);
- mpw->wqe->ctrl[2] = 0;
+ mpw->wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ONLY_FIRST_ERR <<
+ MLX5_COMP_MODE_OFFSET);
mpw->wqe->ctrl[3] = 0;
memset((void *)(uintptr_t)&mpw->wqe->eseg, 0, MLX5_WQE_DWORD_SIZE);
if (unlikely(padding)) {
@@ -1738,7 +1883,8 @@
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request completion on last WQE. */
- wqe->ctrl[2] = rte_cpu_to_be_32(8);
+ wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
/* Save elts_head in unused "immediate" field of WQE. */
wqe->ctrl[3] = elts_head;
txq->elts_comp = 0;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index d944fbe..f4538eb 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -248,6 +248,8 @@ struct mlx5_txq_ctrl {
struct mlx5_priv *priv; /* Back pointer to private data. */
off_t uar_mmap_offset; /* UAR mmap offset for non-primary process. */
void *bf_reg; /* BlueFlame register from Verbs. */
+ uint32_t cqn; /* CQ number. */
+ uint16_t dump_file_n; /* Number of dump files. */
};
#define MLX5_TX_BFREG(txq) \
@@ -353,6 +355,8 @@ uint16_t mlx5_tx_burst_mpw_inline(void *dpdk_txq, struct rte_mbuf **pkts,
uint16_t pkts_n);
uint16_t mlx5_tx_burst_empw(void *dpdk_txq, struct rte_mbuf **pkts,
uint16_t pkts_n);
+__rte_noinline uint16_t mlx5_tx_error_cqe_handle(struct mlx5_txq_data *txq,
+ volatile struct mlx5_err_cqe *err_cqe);
uint16_t mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n);
void mlx5_rxq_initialize(struct mlx5_rxq_data *rxq);
__rte_noinline int mlx5_rx_err_handle(struct mlx5_rxq_data *rxq,
@@ -508,6 +512,51 @@ enum mlx5_cqe_status {
}
/**
+ * Handle the next CQE.
+ *
+ * @param txq
+ * Pointer to TX queue structure.
+ *
+ * @return
+ * The last Tx buffer element to free.
+ */
+static __rte_always_inline uint16_t
+mlx5_tx_cqe_handle(struct mlx5_txq_data *txq)
+{
+ const unsigned int cqe_n = 1 << txq->cqe_n;
+ const unsigned int cqe_cnt = cqe_n - 1;
+ uint16_t last_elts;
+ union {
+ volatile struct mlx5_cqe *cqe;
+ volatile struct mlx5_err_cqe *err_cqe;
+ } u = {
+ .cqe = &(*txq->cqes)[txq->cq_ci & cqe_cnt],
+ };
+ int ret = check_cqe(u.cqe, cqe_n, txq->cq_ci);
+
+ if (unlikely(ret != MLX5_CQE_STATUS_SW_OWN)) {
+ if (unlikely(ret == MLX5_CQE_STATUS_ERR))
+ last_elts = mlx5_tx_error_cqe_handle(txq, u.err_cqe);
+ else
+ /* Do not release buffers. */
+ return txq->elts_tail;
+ } else {
+ uint16_t new_wqe_pi = rte_be_to_cpu_16(u.cqe->wqe_counter);
+ volatile struct mlx5_wqe_ctrl *ctrl =
+ (volatile struct mlx5_wqe_ctrl *)
+ tx_mlx5_wqe(txq, new_wqe_pi);
+
+ /* Release completion burst buffers. */
+ last_elts = ctrl->ctrl3;
+ txq->wqe_pi = new_wqe_pi;
+ txq->cq_ci++;
+ }
+ rte_compiler_barrier();
+ *txq->cq_db = rte_cpu_to_be_32(txq->cq_ci);
+ return last_elts;
+}
+
+/**
* Manage TX completions.
*
* When sending a burst, mlx5_tx_burst() posts several WRs.
@@ -520,39 +569,13 @@ enum mlx5_cqe_status {
{
const uint16_t elts_n = 1 << txq->elts_n;
const uint16_t elts_m = elts_n - 1;
- const unsigned int cqe_n = 1 << txq->cqe_n;
- const unsigned int cqe_cnt = cqe_n - 1;
uint16_t elts_free = txq->elts_tail;
uint16_t elts_tail;
- uint16_t cq_ci = txq->cq_ci;
- volatile struct mlx5_cqe *cqe = NULL;
- volatile struct mlx5_wqe_ctrl *ctrl;
struct rte_mbuf *m, *free[elts_n];
struct rte_mempool *pool = NULL;
unsigned int blk_n = 0;
- cqe = &(*txq->cqes)[cq_ci & cqe_cnt];
- if (unlikely(check_cqe(cqe, cqe_n, cq_ci)))
- return;
-#ifndef NDEBUG
- if ((MLX5_CQE_OPCODE(cqe->op_own) == MLX5_CQE_RESP_ERR) ||
- (MLX5_CQE_OPCODE(cqe->op_own) == MLX5_CQE_REQ_ERR)) {
- if (!check_cqe_seen(cqe)) {
- DRV_LOG(ERR, "unexpected error CQE, Tx stopped");
- rte_hexdump(stderr, "MLX5 TXQ:",
- (const void *)((uintptr_t)txq->wqes),
- ((1 << txq->wqe_n) *
- MLX5_WQE_SIZE));
- }
- return;
- }
-#endif /* NDEBUG */
- ++cq_ci;
- rte_cio_rmb();
- txq->wqe_pi = rte_be_to_cpu_16(cqe->wqe_counter);
- ctrl = (volatile struct mlx5_wqe_ctrl *)
- tx_mlx5_wqe(txq, txq->wqe_pi);
- elts_tail = ctrl->ctrl3;
+ elts_tail = mlx5_tx_cqe_handle(txq);
assert((elts_tail & elts_m) < (1 << txq->wqe_n));
/* Free buffers. */
while (elts_free != elts_tail) {
@@ -583,11 +606,7 @@ enum mlx5_cqe_status {
++elts_free;
}
#endif
- txq->cq_ci = cq_ci;
txq->elts_tail = elts_tail;
- /* Update the consumer index. */
- rte_compiler_barrier();
- *txq->cq_db = rte_cpu_to_be_32(cq_ci);
}
/**
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 6a1b2bb..fd64a6e 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -165,7 +165,7 @@
ctrl = vreinterpretq_u8_u32((uint32x4_t) {
MLX5_OPC_MOD_MPW << 24 |
txq->wqe_ci << 8 | MLX5_OPCODE_TSO,
- txq->qp_num_8s | ds, 0, 0});
+ txq->qp_num_8s | ds, 4, 0});
ctrl = vqtbl1q_u8(ctrl, ctrl_shuf_m);
vst1q_u8((void *)t_wqe, ctrl);
/* Fill ESEG in the header. */
@@ -182,7 +182,8 @@
if (txq->elts_comp >= MLX5_TX_COMP_THRESH) {
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
- wqe->ctrl[2] = rte_cpu_to_be_32(8);
+ wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
wqe->ctrl[3] = txq->elts_head;
txq->elts_comp = 0;
}
@@ -229,7 +230,7 @@
unsigned int pos;
uint16_t max_elts;
uint16_t max_wqe;
- uint32_t comp_req = 0;
+ uint32_t comp_req;
const uint16_t wq_n = 1 << txq->wqe_n;
const uint16_t wq_mask = wq_n - 1;
uint16_t wq_idx = txq->wqe_ci & wq_mask;
@@ -284,12 +285,13 @@
}
if (txq->elts_comp + pkts_n < MLX5_TX_COMP_THRESH) {
txq->elts_comp += pkts_n;
+ comp_req = MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET;
} else {
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request a completion. */
txq->elts_comp = 0;
- comp_req = 8;
+ comp_req = MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET;
}
/* Fill CTRL in the header. */
ctrl = vreinterpretq_u8_u32((uint32x4_t) {
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index cc2f251..a495cd9 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -163,7 +163,7 @@
} while (--segs_n);
++wqe_ci;
/* Fill CTRL in the header. */
- ctrl = _mm_set_epi32(0, 0, txq->qp_num_8s | ds,
+ ctrl = _mm_set_epi32(0, 4, txq->qp_num_8s | ds,
MLX5_OPC_MOD_MPW << 24 |
txq->wqe_ci << 8 | MLX5_OPCODE_TSO);
ctrl = _mm_shuffle_epi8(ctrl, shuf_mask_ctrl);
@@ -182,7 +182,8 @@
if (txq->elts_comp >= MLX5_TX_COMP_THRESH) {
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
- wqe->ctrl[2] = rte_cpu_to_be_32(8);
+ wqe->ctrl[2] = rte_cpu_to_be_32(MLX5_COMP_ALWAYS <<
+ MLX5_COMP_MODE_OFFSET);
wqe->ctrl[3] = txq->elts_head;
txq->elts_comp = 0;
}
@@ -229,7 +230,7 @@
unsigned int pos;
uint16_t max_elts;
uint16_t max_wqe;
- uint32_t comp_req = 0;
+ uint32_t comp_req;
const uint16_t wq_n = 1 << txq->wqe_n;
const uint16_t wq_mask = wq_n - 1;
uint16_t wq_idx = txq->wqe_ci & wq_mask;
@@ -284,12 +285,13 @@
}
if (txq->elts_comp + pkts_n < MLX5_TX_COMP_THRESH) {
txq->elts_comp += pkts_n;
+ comp_req = MLX5_COMP_ONLY_FIRST_ERR << MLX5_COMP_MODE_OFFSET;
} else {
/* A CQE slot must always be available. */
assert((1u << txq->cqe_n) - (txq->cq_pi++ - txq->cq_ci));
/* Request a completion. */
txq->elts_comp = 0;
- comp_req = 8;
+ comp_req = MLX5_COMP_ALWAYS << MLX5_COMP_MODE_OFFSET;
}
/* Fill CTRL in the header. */
ctrl = _mm_set_epi32(txq->elts_head, comp_req,
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index b281c45..ff6c564 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -430,8 +430,7 @@ struct mlx5_txq_ibv *
attr.cq = (struct ibv_cq_init_attr_ex){
.comp_mask = 0,
};
- cqe_n = ((desc / MLX5_TX_COMP_THRESH) - 1) ?
- ((desc / MLX5_TX_COMP_THRESH) - 1) : 1;
+ cqe_n = desc / MLX5_TX_COMP_THRESH + 1;
if (is_empw_burst_func(tx_pkt_burst))
cqe_n += MLX5_TX_COMP_THRESH_INLINE_DIV;
tmpl.cq = mlx5_glue->create_cq(priv->sh->ctx, cqe_n, NULL, NULL, 0);
@@ -563,6 +562,7 @@ struct mlx5_txq_ibv *
txq_ibv->cq = tmpl.cq;
rte_atomic32_inc(&txq_ibv->refcnt);
txq_ctrl->bf_reg = qp.bf.reg;
+ txq_ctrl->cqn = cq_info.cqn;
txq_uar_init(txq_ctrl);
if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) {
txq_ctrl->uar_mmap_offset = qp.uar_mmap_offset;
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (6 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors Matan Azrad
2019-09-12 12:14 ` [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Kevin Traynor
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
The RQ errors recovery mechanism in the PMD invokes a Verbs functions to
modify the RQ states in order to reset the RQ and to reactivate it.
These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured by
secondary processes queues.
Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.
Add support for secondary process Rx errors recovery.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5.h | 11 +++++
drivers/net/mlx5/mlx5_mp.c | 46 +++++++++++++++++++
drivers/net/mlx5/mlx5_rxtx.c | 98 +++++++++++++++++++++++++++++++++--------
drivers/net/mlx5/mlx5_rxtx.h | 3 ++
drivers/net/mlx5/mlx5_trigger.c | 1 +
5 files changed, 141 insertions(+), 18 deletions(-)
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4c339d0..85a6d02 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -61,6 +61,13 @@ enum mlx5_mp_req_type {
MLX5_MP_REQ_CREATE_MR,
MLX5_MP_REQ_START_RXTX,
MLX5_MP_REQ_STOP_RXTX,
+ MLX5_MP_REQ_QUEUE_STATE_MODIFY,
+};
+
+struct mlx5_mp_arg_queue_state_modify {
+ uint8_t is_wq; /* Set if WQ. */
+ uint16_t queue_id; /* DPDK queue ID. */
+ enum ibv_wq_state state; /* WQ requested state. */
};
/* Pameters for IPC. */
@@ -71,6 +78,8 @@ struct mlx5_mp_param {
RTE_STD_C11
union {
uintptr_t addr; /* MLX5_MP_REQ_CREATE_MR */
+ struct mlx5_mp_arg_queue_state_modify state_modify;
+ /* MLX5_MP_REQ_QUEUE_STATE_MODIFY */
} args;
};
@@ -542,6 +551,8 @@ int mlx5_ctrl_flow(struct rte_eth_dev *dev,
void mlx5_mp_req_stop_rxtx(struct rte_eth_dev *dev);
int mlx5_mp_req_mr_create(struct rte_eth_dev *dev, uintptr_t addr);
int mlx5_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
+int mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
+ struct mlx5_mp_arg_queue_state_modify *sm);
void mlx5_mp_init_primary(void);
void mlx5_mp_uninit_primary(void);
void mlx5_mp_init_secondary(void);
diff --git a/drivers/net/mlx5/mlx5_mp.c b/drivers/net/mlx5/mlx5_mp.c
index cea74ad..3ccae51 100644
--- a/drivers/net/mlx5/mlx5_mp.c
+++ b/drivers/net/mlx5/mlx5_mp.c
@@ -85,6 +85,12 @@
res->result = 0;
ret = rte_mp_reply(&mp_res, peer);
break;
+ case MLX5_MP_REQ_QUEUE_STATE_MODIFY:
+ mp_init_msg(dev, &mp_res, param->type);
+ res->result = mlx5_queue_state_modify_primary
+ (dev, ¶m->args.state_modify);
+ ret = rte_mp_reply(&mp_res, peer);
+ break;
default:
rte_errno = EINVAL;
DRV_LOG(ERR, "port %u invalid mp request type",
@@ -271,6 +277,46 @@
}
/**
+ * Request Verbs queue state modification to the primary process.
+ *
+ * @param[in] dev
+ * Pointer to Ethernet structure.
+ * @param sm
+ * State modify parameters.
+ *
+ * @return
+ * 0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_mp_req_queue_state_modify(struct rte_eth_dev *dev,
+ struct mlx5_mp_arg_queue_state_modify *sm)
+{
+ struct rte_mp_msg mp_req;
+ struct rte_mp_msg *mp_res;
+ struct rte_mp_reply mp_rep;
+ struct mlx5_mp_param *req = (struct mlx5_mp_param *)mp_req.param;
+ struct mlx5_mp_param *res;
+ struct timespec ts = {.tv_sec = MLX5_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
+ int ret;
+
+ assert(rte_eal_process_type() == RTE_PROC_SECONDARY);
+ mp_init_msg(dev, &mp_req, MLX5_MP_REQ_QUEUE_STATE_MODIFY);
+ req->args.state_modify = *sm;
+ ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
+ if (ret) {
+ DRV_LOG(ERR, "port %u request to primary process failed",
+ dev->data->port_id);
+ return -rte_errno;
+ }
+ assert(mp_rep.nb_received == 1);
+ mp_res = &mp_rep.msgs[0];
+ res = (struct mlx5_mp_param *)mp_res->param;
+ ret = res->result;
+ free(mp_rep.msgs);
+ return ret;
+}
+
+/**
* Request Verbs command file descriptor for mmap to the primary process.
*
* @param[in] dev
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 36e2dd3..cb3baad 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -2031,6 +2031,75 @@
}
/**
+ * Modify a Verbs queue state.
+ * This must be called from the primary process.
+ *
+ * @param dev
+ * Pointer to Ethernet device.
+ * @param sm
+ * State modify request parameters.
+ *
+ * @return
+ * 0 in case of success else non-zero value and rte_errno is set.
+ */
+int
+mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
+ const struct mlx5_mp_arg_queue_state_modify *sm)
+{
+ int ret;
+ struct mlx5_priv *priv = dev->data->dev_private;
+
+ if (sm->is_wq) {
+ struct ibv_wq_attr mod = {
+ .attr_mask = IBV_WQ_ATTR_STATE,
+ .wq_state = sm->state,
+ };
+ struct mlx5_rxq_data *rxq = (*priv->rxqs)[sm->queue_id];
+ struct mlx5_rxq_ctrl *rxq_ctrl =
+ container_of(rxq, struct mlx5_rxq_ctrl, rxq);
+
+ ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Rx WQ state to %u - %s\n",
+ sm->state, strerror(errno));
+ rte_errno = errno;
+ return ret;
+ }
+ }
+ return 0;
+}
+
+/**
+ * Modify a Verbs queue state.
+ *
+ * @param dev
+ * Pointer to Ethernet device.
+ * @param sm
+ * State modify request parameters.
+ *
+ * @return
+ * 0 in case of success else non-zero value.
+ */
+static int
+mlx5_queue_state_modify(struct rte_eth_dev *dev,
+ struct mlx5_mp_arg_queue_state_modify *sm)
+{
+ int ret = 0;
+
+ switch (rte_eal_process_type()) {
+ case RTE_PROC_PRIMARY:
+ ret = mlx5_queue_state_modify_primary(dev, sm);
+ break;
+ case RTE_PROC_SECONDARY:
+ ret = mlx5_mp_req_queue_state_modify(dev, sm);
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+/**
* Handle a Rx error.
* The function inserts the RQ state to reset when the first error CQE is
* shown, then drains the CQ by the caller function loop. When the CQ is empty,
@@ -2053,15 +2122,13 @@
const unsigned int wqe_n = 1 << rxq->elts_n;
struct mlx5_rxq_ctrl *rxq_ctrl =
container_of(rxq, struct mlx5_rxq_ctrl, rxq);
- struct ibv_wq_attr mod = {
- .attr_mask = IBV_WQ_ATTR_STATE,
- };
union {
volatile struct mlx5_cqe *cqe;
volatile struct mlx5_err_cqe *err_cqe;
} u = {
.cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_mask],
};
+ struct mlx5_mp_arg_queue_state_modify sm;
int ret;
switch (rxq->err_state) {
@@ -2069,21 +2136,17 @@
rxq->err_state = MLX5_RXQ_ERR_STATE_NEED_RESET;
/* Fall-through */
case MLX5_RXQ_ERR_STATE_NEED_RESET:
- if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ sm.is_wq = 1;
+ sm.queue_id = rxq->idx;
+ sm.state = IBV_WQS_RESET;
+ if (mlx5_queue_state_modify(ETH_DEV(rxq_ctrl->priv), &sm))
return -1;
- mod.wq_state = IBV_WQS_RESET;
- ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
- if (ret) {
- DRV_LOG(ERR, "Cannot change Rx WQ state to RESET %s\n",
- strerror(errno));
- return -1;
- }
if (rxq_ctrl->dump_file_n <
rxq_ctrl->priv->config.max_dump_files_num) {
MKSTR(err_str, "Unexpected CQE error syndrome "
"0x%02x CQN = %u RQN = %u wqe_counter = %u"
" rq_ci = %u cq_ci = %u", u.err_cqe->syndrome,
- rxq->cqn, rxq_ctrl->ibv->wq->wq_num,
+ rxq->cqn, rxq_ctrl->wqn,
rte_be_to_cpu_16(u.err_cqe->wqe_counter),
rxq->rq_ci << rxq->sges_n, rxq->cq_ci);
MKSTR(name, "dpdk_mlx5_port_%u_rxq_%u_%u",
@@ -2113,13 +2176,12 @@
*/
*rxq->rq_db = rte_cpu_to_be_32(0);
rte_cio_wmb();
- mod.wq_state = IBV_WQS_RDY;
- ret = mlx5_glue->modify_wq(rxq_ctrl->ibv->wq, &mod);
- if (ret) {
- DRV_LOG(ERR, "Cannot change Rx WQ state to RDY"
- " %s\n", strerror(errno));
+ sm.is_wq = 1;
+ sm.queue_id = rxq->idx;
+ sm.state = IBV_WQS_RDY;
+ if (mlx5_queue_state_modify(ETH_DEV(rxq_ctrl->priv),
+ &sm))
return -1;
- }
if (mbuf_prepare) {
const uint16_t q_mask = wqe_n - 1;
uint16_t elt_idx;
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index f4538eb..92fba29 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -161,6 +161,7 @@ struct mlx5_rxq_ctrl {
unsigned int irq:1; /* Whether IRQ is enabled. */
uint32_t flow_mark_n; /* Number of Mark/Flag flows using this Queue. */
uint32_t flow_tunnels_n[MLX5_FLOW_TUNNEL]; /* Tunnels counters. */
+ uint32_t wqn; /* WQ number. */
uint16_t dump_file_n; /* Number of dump files. */
};
@@ -374,6 +375,8 @@ uint16_t removed_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
uint32_t mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id);
void mlx5_dump_debug_information(const char *path, const char *title,
const void *buf, unsigned int len);
+int mlx5_queue_state_modify_primary(struct rte_eth_dev *dev,
+ const struct mlx5_mp_arg_queue_state_modify *sm);
/* Vectorized version of mlx5_rxtx.c */
int mlx5_check_raw_vec_tx_support(struct rte_eth_dev *dev);
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index b7fde35..b6af539 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -126,6 +126,7 @@
rxq_ctrl->ibv = mlx5_rxq_ibv_new(dev, i);
if (!rxq_ctrl->ibv)
goto error;
+ rxq_ctrl->wqn = rxq_ctrl->ibv->wq->wq_num;
}
return 0;
error:
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (7 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors Matan Azrad
@ 2019-05-30 10:20 ` Matan Azrad
2019-09-12 12:14 ` [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Kevin Traynor
9 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-05-30 10:20 UTC (permalink / raw)
To: Shahaf Shuler, Yongseok Koh; +Cc: dev, stable
The SQ errors recovery mechanism in the PMD invokes a Verbs
functions to modify the RQ states in order to reset the SQ and to
reactivate it.
These Verbs functions are not allowed to be invoked from a secondary
process, hence the PMD skips the recovery when the error is captured
by secondary processes queues.
Using the DPDK IPC mechanism the secondary process can request Verbs
queues state modifications to be done synchronically by the primary
process.
Add support for secondary process Tx errors recovery.
Cc: stable@dpdk.org
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
drivers/net/mlx5/mlx5_rxtx.c | 104 ++++++++++++++++++++++++++-----------------
1 file changed, 62 insertions(+), 42 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index cb3baad..9659478 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -51,6 +51,10 @@
static __rte_always_inline void
mprq_buf_replace(struct mlx5_rxq_data *rxq, uint16_t rq_idx);
+static int
+mlx5_queue_state_modify(struct rte_eth_dev *dev,
+ struct mlx5_mp_arg_queue_state_modify *sm);
+
uint32_t mlx5_ptype_table[] __rte_cache_aligned = {
[0xff] = RTE_PTYPE_ALL_MASK, /* Last entry for errored packet. */
};
@@ -570,52 +574,27 @@
}
/**
- * Move QP from error state to running state.
+ * Move QP from error state to running state and initialize indexes.
*
- * @param txq
- * Pointer to TX queue structure.
- * @param qp
- * The qp pointer for recovery.
+ * @param txq_ctrl
+ * Pointer to TX queue control structure.
*
* @return
- * 0 on success, else errno value.
+ * 0 on success, else -1.
*/
static int
-tx_recover_qp(struct mlx5_txq_data *txq, struct ibv_qp *qp)
+tx_recover_qp(struct mlx5_txq_ctrl *txq_ctrl)
{
- int ret;
- struct ibv_qp_attr mod = {
- .qp_state = IBV_QPS_RESET,
- .port_num = 1,
- };
- ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
- if (ret) {
- DRV_LOG(ERR, "Cannot change the Tx QP state to RESET %d\n",
- ret);
- return ret;
- }
- mod.qp_state = IBV_QPS_INIT;
- ret = mlx5_glue->modify_qp(qp, &mod,
- (IBV_QP_STATE | IBV_QP_PORT));
- if (ret) {
- DRV_LOG(ERR, "Cannot change Tx QP state to INIT %d\n", ret);
- return ret;
- }
- mod.qp_state = IBV_QPS_RTR;
- ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
- if (ret) {
- DRV_LOG(ERR, "Cannot change Tx QP state to RTR %d\n", ret);
- return ret;
- }
- mod.qp_state = IBV_QPS_RTS;
- ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
- if (ret) {
- DRV_LOG(ERR, "Cannot change Tx QP state to RTS %d\n", ret);
- return ret;
- }
- txq->wqe_ci = 0;
- txq->wqe_pi = 0;
- txq->elts_comp = 0;
+ struct mlx5_mp_arg_queue_state_modify sm = {
+ .is_wq = 0,
+ .queue_id = txq_ctrl->txq.idx,
+ };
+
+ if (mlx5_queue_state_modify(ETH_DEV(txq_ctrl->priv), &sm))
+ return -1;
+ txq_ctrl->txq.wqe_ci = 0;
+ txq_ctrl->txq.wqe_pi = 0;
+ txq_ctrl->txq.elts_comp = 0;
return 0;
}
@@ -690,8 +669,7 @@
*/
txq->stats.oerrors += ((txq->wqe_ci & wqe_m) -
new_wqe_pi) & wqe_m;
- if ((rte_eal_process_type() == RTE_PROC_PRIMARY) &&
- tx_recover_qp(txq, txq_ctrl->ibv->qp) == 0) {
+ if (tx_recover_qp(txq_ctrl) == 0) {
txq->cq_ci++;
/* Release all the remaining buffers. */
return txq->elts_head;
@@ -2065,6 +2043,48 @@
rte_errno = errno;
return ret;
}
+ } else {
+ struct mlx5_txq_data *txq = (*priv->txqs)[sm->queue_id];
+ struct mlx5_txq_ctrl *txq_ctrl =
+ container_of(txq, struct mlx5_txq_ctrl, txq);
+ struct ibv_qp_attr mod = {
+ .qp_state = IBV_QPS_RESET,
+ .port_num = (uint8_t)priv->ibv_port,
+ };
+ struct ibv_qp *qp = txq_ctrl->ibv->qp;
+
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change the Tx QP state to RESET "
+ "%s\n", strerror(errno));
+ rte_errno = errno;
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_INIT;
+ ret = mlx5_glue->modify_qp(qp, &mod,
+ (IBV_QP_STATE | IBV_QP_PORT));
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to INIT %s\n",
+ strerror(errno));
+ rte_errno = errno;
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_RTR;
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to RTR %s\n",
+ strerror(errno));
+ rte_errno = errno;
+ return ret;
+ }
+ mod.qp_state = IBV_QPS_RTS;
+ ret = mlx5_glue->modify_qp(qp, &mod, IBV_QP_STATE);
+ if (ret) {
+ DRV_LOG(ERR, "Cannot change Tx QP state to RTS %s\n",
+ strerror(errno));
+ rte_errno = errno;
+ return ret;
+ }
}
return 0;
}
--
1.8.3.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
` (8 preceding siblings ...)
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors Matan Azrad
@ 2019-09-12 12:14 ` Kevin Traynor
2019-09-22 7:03 ` Matan Azrad
9 siblings, 1 reply; 12+ messages in thread
From: Kevin Traynor @ 2019-09-12 12:14 UTC (permalink / raw)
To: Matan Azrad, Shahaf Shuler, Yongseok Koh; +Cc: dev
On 30/05/2019 11:20, Matan Azrad wrote:
> Add support for data-path Rx and Tx completions with error handling:
>
> 1. Detect the error.
> 2. Do not crash.
> 3. Report it in statistics counters.
> 4. Dump debug information to system log file.
> 5. Recover the error under the hood.
> 6. Add support for secondary process recovery.
>
> No performance impact was shown.
>
> Matan Azrad (9):
> net/mlx5: remove Rx queues indexes correlation
> net/mlx5: add log file procedure for debug data
> net/mlx5: fix device arguments error detection
> net/mlx5: mitigate Rx doorbell memory barrier
> net/mlx5: separate Rx queue initialization
> net/mlx5: extend Rx completion with error handling
> net/mlx5: handle Tx completion with error
> net/mlx5: recover secondary process Rx errors
> net/mlx5: recover secondary process Tx errors
>
> doc/guides/nics/mlx5.rst | 7 +
> drivers/net/mlx5/mlx5.c | 14 +-
> drivers/net/mlx5/mlx5.h | 12 +
> drivers/net/mlx5/mlx5_mp.c | 46 +++
> drivers/net/mlx5/mlx5_prm.h | 11 +
> drivers/net/mlx5/mlx5_rxq.c | 42 +--
> drivers/net/mlx5/mlx5_rxtx.c | 673 ++++++++++++++++++++++++++++------
> drivers/net/mlx5/mlx5_rxtx.h | 193 +++++-----
> drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 36 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 36 +-
> drivers/net/mlx5/mlx5_trigger.c | 1 +
> drivers/net/mlx5/mlx5_txq.c | 4 +-
> 13 files changed, 792 insertions(+), 288 deletions(-)
>
Hi - these changes are very invasive ^^^. I'm not really comfortable to
take this for 18.11.3. See
http://doc.dpdk.org/guides/contributing/stable.html#what-changes-should-be-backported
I will take patch 3/9 as it is a fix with Fixes: tag.
thanks,
Kevin.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error
2019-09-12 12:14 ` [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Kevin Traynor
@ 2019-09-22 7:03 ` Matan Azrad
0 siblings, 0 replies; 12+ messages in thread
From: Matan Azrad @ 2019-09-22 7:03 UTC (permalink / raw)
To: Kevin Traynor, Shahaf Shuler; +Cc: dev
From: Kevin Traynor
> On 30/05/2019 11:20, Matan Azrad wrote:
> > Add support for data-path Rx and Tx completions with error handling:
> >
> > 1. Detect the error.
> > 2. Do not crash.
> > 3. Report it in statistics counters.
> > 4. Dump debug information to system log file.
> > 5. Recover the error under the hood.
> > 6. Add support for secondary process recovery.
> >
> > No performance impact was shown.
> >
> > Matan Azrad (9):
> > net/mlx5: remove Rx queues indexes correlation
> > net/mlx5: add log file procedure for debug data
> > net/mlx5: fix device arguments error detection
> > net/mlx5: mitigate Rx doorbell memory barrier
> > net/mlx5: separate Rx queue initialization
> > net/mlx5: extend Rx completion with error handling
> > net/mlx5: handle Tx completion with error
> > net/mlx5: recover secondary process Rx errors
> > net/mlx5: recover secondary process Tx errors
> >
> > doc/guides/nics/mlx5.rst | 7 +
> > drivers/net/mlx5/mlx5.c | 14 +-
> > drivers/net/mlx5/mlx5.h | 12 +
> > drivers/net/mlx5/mlx5_mp.c | 46 +++
> > drivers/net/mlx5/mlx5_prm.h | 11 +
> > drivers/net/mlx5/mlx5_rxq.c | 42 +--
> > drivers/net/mlx5/mlx5_rxtx.c | 673
> ++++++++++++++++++++++++++++------
> > drivers/net/mlx5/mlx5_rxtx.h | 193 +++++-----
> > drivers/net/mlx5/mlx5_rxtx_vec.c | 5 +-
> > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 36 +-
> > drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 36 +-
> > drivers/net/mlx5/mlx5_trigger.c | 1 +
> > drivers/net/mlx5/mlx5_txq.c | 4 +-
> > 13 files changed, 792 insertions(+), 288 deletions(-)
> >
>
>
> Hi - these changes are very invasive ^^^. I'm not really comfortable to take
> this for 18.11.3. See
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoc.d
> pdk.org%2Fguides%2Fcontributing%2Fstable.html%23what-changes-should-
> be-
> backported&data=02%7C01%7Cmatan%40mellanox.com%7Cf8c9305dc6
> 7b4e7b128008d7377ad4f9%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C
> 0%7C637038873031127944&sdata=HAm39aLCoHz59wauVmglWrq5fwnM
> 3ZtLV0%2FebrZHiLY%3D&reserved=0
>
> I will take patch 3/9 as it is a fix with Fixes: tag.
It's ok, the others are not must in this version.
>
> thanks,
> Kevin.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-09-22 7:03 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-30 10:20 [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 1/9] net/mlx5: remove Rx queues indexes correlation Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 2/9] net/mlx5: add log file procedure for debug data Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 3/9] net/mlx5: fix device arguments error detection Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 4/9] net/mlx5: mitigate Rx doorbell memory barrier Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 5/9] net/mlx5: separate Rx queue initialization Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 6/9] net/mlx5: extend Rx completion with error handling Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 7/9] net/mlx5: handle Tx completion with error Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 8/9] net/mlx5: recover secondary process Rx errors Matan Azrad
2019-05-30 10:20 ` [dpdk-dev] [PATCH v1 9/9] net/mlx5: recover secondary process Tx errors Matan Azrad
2019-09-12 12:14 ` [dpdk-dev] [PATCH v1 0/9] mlx5: Handle data-path completions with error Kevin Traynor
2019-09-22 7:03 ` Matan Azrad
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).