DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack
@ 2019-07-29 12:41 Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix Tx completion descriptors fetching loop Viacheslav Ovsiienko
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Viacheslav Ovsiienko @ 2019-07-29 12:41 UTC (permalink / raw)
  To: dev; +Cc: yskoh, shahafs

This series contains the pack of bug fixes and performance
improvements:

  - limit the amount of freed buffers per one tx_burst call
  - set the completion generating request in more uniform
    fashion
  - limit the number of packets in one descriptor to improve
    latency

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

Viacheslav Ovsiienko (3):
  net/mlx5: fix Tx completion descriptors fetching loop
  net/mlx5: fix ConnectX-4LX minimal inline data limit
  net/mlx5: fix the Tx completion request generation

 drivers/net/mlx5/mlx5.c      |   7 +--
 drivers/net/mlx5/mlx5_defs.h |   9 +++-
 drivers/net/mlx5/mlx5_prm.h  |  17 ++++---
 drivers/net/mlx5/mlx5_rxtx.c | 110 +++++++++++++++++++++++++++++--------------
 4 files changed, 97 insertions(+), 46 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 1/3] net/mlx5: fix Tx completion descriptors fetching loop
  2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
@ 2019-07-29 12:41 ` Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix ConnectX-4LX minimal inline data limit Viacheslav Ovsiienko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Viacheslav Ovsiienko @ 2019-07-29 12:41 UTC (permalink / raw)
  To: dev; +Cc: yskoh, shahafs

This patch limits the amount of fetched and processed
completion descriptors in one tx_burst routine call.

The completion processing involves the buffer freeing
which may be time consuming and introduce the significant
latency, so limiting the amount of processed completions
mitigates the latency issue.

Fixes: 18a1c20044c0 ("net/mlx5: implement Tx burst template")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_defs.h |  7 +++++++
 drivers/net/mlx5/mlx5_rxtx.c | 46 +++++++++++++++++++++++++++++---------------
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 8c118d5..461e916 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -37,6 +37,13 @@
  */
 #define MLX5_TX_COMP_THRESH_INLINE_DIV (1 << 3)
 
+/*
+ * Maximal amount of normal completion CQEs
+ * processed in one call of tx_burst() routine.
+ */
+#define MLX5_TX_COMP_MAX_CQE 2u
+
+
 /* Size of per-queue MR cache array for linear search. */
 #define MLX5_MR_CACHE_N 8
 
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 007df8f..c2b93c6 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1992,13 +1992,13 @@ enum mlx5_txcmp_code {
 mlx5_tx_handle_completion(struct mlx5_txq_data *restrict txq,
 			  unsigned int olx __rte_unused)
 {
+	unsigned int count = MLX5_TX_COMP_MAX_CQE;
 	bool update = false;
+	uint16_t tail = txq->elts_tail;
 	int ret;
 
 	do {
-		volatile struct mlx5_wqe_cseg *cseg;
 		volatile struct mlx5_cqe *cqe;
-		uint16_t tail;
 
 		cqe = &txq->cqes[txq->cq_ci & txq->cqe_m];
 		ret = check_cqe(cqe, txq->cqe_s, txq->cq_ci);
@@ -2006,19 +2006,21 @@ enum mlx5_txcmp_code {
 			if (likely(ret != MLX5_CQE_STATUS_ERR)) {
 				/* No new CQEs in completion queue. */
 				assert(ret == MLX5_CQE_STATUS_HW_OWN);
-				if (likely(update)) {
-					/* Update the consumer index. */
-					rte_compiler_barrier();
-					*txq->cq_db =
-						rte_cpu_to_be_32(txq->cq_ci);
-				}
-				return;
+				break;
 			}
 			/* Some error occurred, try to restart. */
 			rte_wmb();
 			tail = mlx5_tx_error_cqe_handle
 				(txq, (volatile struct mlx5_err_cqe *)cqe);
+			if (likely(tail != txq->elts_tail)) {
+				mlx5_tx_free_elts(txq, tail, olx);
+				assert(tail == txq->elts_tail);
+			}
+			/* Allow flushing all CQEs from the queue. */
+			count = txq->cqe_s;
 		} else {
+			volatile struct mlx5_wqe_cseg *cseg;
+
 			/* Normal transmit completion. */
 			++txq->cq_ci;
 			rte_cio_rmb();
@@ -2031,13 +2033,27 @@ enum mlx5_txcmp_code {
 		if (txq->cq_pi)
 			--txq->cq_pi;
 #endif
-		if (likely(tail != txq->elts_tail)) {
-			/* Free data buffers from elts. */
-			mlx5_tx_free_elts(txq, tail, olx);
-			assert(tail == txq->elts_tail);
-		}
 		update = true;
-	} while (true);
+	/*
+	 * We have to restrict the amount of processed CQEs
+	 * in one tx_burst routine call. The CQ may be large
+	 * and many CQEs may be updated by the NIC in one
+	 * transaction. Buffers freeing is time consuming,
+	 * multiple iterations may introduce significant
+	 * latency.
+	 */
+	} while (--count);
+	if (likely(tail != txq->elts_tail)) {
+		/* Free data buffers from elts. */
+		mlx5_tx_free_elts(txq, tail, olx);
+		assert(tail == txq->elts_tail);
+	}
+	if (likely(update)) {
+		/* Update the consumer index. */
+		rte_compiler_barrier();
+		*txq->cq_db =
+		rte_cpu_to_be_32(txq->cq_ci);
+	}
 }
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 2/3] net/mlx5: fix ConnectX-4LX minimal inline data limit
  2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix Tx completion descriptors fetching loop Viacheslav Ovsiienko
@ 2019-07-29 12:41 ` Viacheslav Ovsiienko
  2019-08-01  7:43   ` [dpdk-dev] [PATCH] net/mlx5: fix default minimal data inline Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 3/3] net/mlx5: fix the Tx completion request generation Viacheslav Ovsiienko
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Viacheslav Ovsiienko @ 2019-07-29 12:41 UTC (permalink / raw)
  To: dev; +Cc: yskoh, shahafs

Mellanox ConnectX-4LX NIC in configurations with disabled
E-Switch can operate without minimal required inline data
into Tx descriptor. There was the hardcoded limit set to
18B in PMD, fixed to be no limit (0B).

Fixes: 38b4b397a57d ("net/mlx5: add Tx configuration and setup")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ad0883d..ef8c4aa 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1253,8 +1253,6 @@ struct mlx5_dev_spawn_data {
 		switch (spawn->pci_dev->id.device_id) {
 		case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
 		case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF:
-		case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX:
-		case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF:
 			if (config->txq_inline_min <
 				       (int)MLX5_INLINE_HSIZE_L2) {
 				DRV_LOG(DEBUG,
@@ -1325,9 +1323,12 @@ struct mlx5_dev_spawn_data {
 	switch (spawn->pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF:
+		config->txq_inline_min = MLX5_INLINE_HSIZE_L2;
+		config->hw_vlan_insert = 0;
+		break;
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX:
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF:
-		config->txq_inline_min = MLX5_INLINE_HSIZE_L2;
+		config->txq_inline_min = MLX5_INLINE_HSIZE_NONE;
 		config->hw_vlan_insert = 0;
 		break;
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH 3/3] net/mlx5: fix the Tx completion request generation
  2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix Tx completion descriptors fetching loop Viacheslav Ovsiienko
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix ConnectX-4LX minimal inline data limit Viacheslav Ovsiienko
@ 2019-07-29 12:41 ` Viacheslav Ovsiienko
  2019-07-29 15:13 ` [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Matan Azrad
  2019-07-29 15:23 ` Raslan Darawsheh
  4 siblings, 0 replies; 7+ messages in thread
From: Viacheslav Ovsiienko @ 2019-07-29 12:41 UTC (permalink / raw)
  To: dev; +Cc: yskoh, shahafs

The packets transmitting in mlx5 is performed by building
Tx descriptors (WQEs) and sending last ones to the NIC.
The descriptor can contain the special flags, telling the NIC
to generate Tx completion notification (CQEs). At the beginning
of tx_burst() routine PMD checks whether there are some Tx
completions and frees the transmitted packet buffers.

The flags to request completion generation must be set once
per specified amount of packets to provide uniform stream
of completions and freeing the Tx queue in unifirm fashion.
The previous implementation sets the completion request
generation once per burst, if burst size if big enough it may
latency in CQE generation and freeing large amount of buffers
in tx_burst routine on multiple completions which also
affects the latency and even causes the Tx queue overflow
and Tx drops.

This patches enforces the completion request will be set
in the exact Tx descriptor if specified amount of packets
is already sent.

Fixes: 18a1c20044c0 ("net/mlx5: implement Tx burst template")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5_defs.h |  2 +-
 drivers/net/mlx5/mlx5_prm.h  | 17 +++++++-----
 drivers/net/mlx5/mlx5_rxtx.c | 64 +++++++++++++++++++++++++++++---------------
 3 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h
index 461e916..d7440fd 100644
--- a/drivers/net/mlx5/mlx5_defs.h
+++ b/drivers/net/mlx5/mlx5_defs.h
@@ -28,7 +28,7 @@
  * Request TX completion every time descriptors reach this threshold since
  * the previous request. Must be a power of two for performance reasons.
  */
-#define MLX5_TX_COMP_THRESH 32
+#define MLX5_TX_COMP_THRESH 32u
 
 /*
  * Request TX completion every time the total number of WQEBBs used for inlining
diff --git a/drivers/net/mlx5/mlx5_prm.h b/drivers/net/mlx5/mlx5_prm.h
index 32bc7a6..89548d4 100644
--- a/drivers/net/mlx5/mlx5_prm.h
+++ b/drivers/net/mlx5/mlx5_prm.h
@@ -72,7 +72,7 @@
  * boundary with accounting the title Control and Ethernet
  * segments.
  */
-#define MLX5_EMPW_DEF_INLINE_LEN (3U * MLX5_WQE_SIZE + \
+#define MLX5_EMPW_DEF_INLINE_LEN (3u * MLX5_WQE_SIZE + \
 				  MLX5_DSEG_MIN_INLINE_SIZE - \
 				  MLX5_WQE_DSEG_SIZE)
 /*
@@ -90,11 +90,16 @@
  * If there are no enough resources to built minimal
  * EMPW the sending loop exits.
  */
-#define MLX5_EMPW_MIN_PACKETS (2 + 3 * 4)
-#define MLX5_EMPW_MAX_PACKETS ((MLX5_WQE_SIZE_MAX - \
-				MLX5_WQE_CSEG_SIZE - \
-				MLX5_WQE_ESEG_SIZE) / \
-				MLX5_WSEG_SIZE)
+#define MLX5_EMPW_MIN_PACKETS (2u + 3u * 4u)
+/*
+ * Maximal amount of packets to be sent with EMPW.
+ * This value is not recommended to exceed MLX5_TX_COMP_THRESH,
+ * otherwise there might be up to MLX5_EMPW_MAX_PACKETS mbufs
+ * without CQE generation request, being multiplied by
+ * MLX5_TX_COMP_MAX_CQE it may cause significant latency
+ * in tx burst routine at the moment of freeing multiple mbufs.
+ */
+#define MLX5_EMPW_MAX_PACKETS MLX5_TX_COMP_THRESH
 /*
  * Default packet length threshold to be inlined with
  * ordinary SEND. Inlining saves the MR key search
diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index c2b93c6..5984c50 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -2063,8 +2063,6 @@ enum mlx5_txcmp_code {
  *
  * @param txq
  *   Pointer to TX queue structure.
- * @param n_mbuf
- *   Number of mbuf not stored yet in elts array.
  * @param loc
  *   Pointer to burst routine local context.
  * @param olx
@@ -2073,18 +2071,23 @@ enum mlx5_txcmp_code {
  */
 static __rte_always_inline void
 mlx5_tx_request_completion(struct mlx5_txq_data *restrict txq,
-			   unsigned int n_mbuf,
 			   struct mlx5_txq_local *restrict loc,
-			   unsigned int olx __rte_unused)
+			   unsigned int olx)
 {
-	uint16_t head = txq->elts_head + n_mbuf;
+	uint16_t head = txq->elts_head;
+	unsigned int part;
 
+	part = MLX5_TXOFF_CONFIG(INLINE) ? 0 : loc->pkts_sent -
+		(MLX5_TXOFF_CONFIG(MULTI) ? loc->pkts_copy : 0);
+	head += part;
 	if ((uint16_t)(head - txq->elts_comp) >= MLX5_TX_COMP_THRESH ||
-	    (uint16_t)(txq->wqe_ci - txq->wqe_comp) >= txq->wqe_thres) {
+	     (MLX5_TXOFF_CONFIG(INLINE) &&
+	     (uint16_t)(txq->wqe_ci - txq->wqe_comp) >= txq->wqe_thres)) {
 		volatile struct mlx5_wqe *last = loc->wqe_last;
 
 		txq->elts_comp = head;
-		txq->wqe_comp = txq->wqe_ci;
+		if (MLX5_TXOFF_CONFIG(INLINE))
+			txq->wqe_comp = txq->wqe_ci;
 		/* Request unconditional completion on last WQE. */
 		last->cseg.flags = RTE_BE32(MLX5_COMP_ALWAYS <<
 					    MLX5_COMP_MODE_OFFSET);
@@ -3023,6 +3026,8 @@ enum mlx5_txcmp_code {
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
 	loc->wqe_free -= (ds + 3) / 4;
+	/* Request CQE generation if limits are reached. */
+	mlx5_tx_request_completion(txq, loc, olx);
 	return MLX5_TXCMP_CODE_MULTI;
 }
 
@@ -3131,6 +3136,8 @@ enum mlx5_txcmp_code {
 	} while (true);
 	txq->wqe_ci += (ds + 3) / 4;
 	loc->wqe_free -= (ds + 3) / 4;
+	/* Request CQE generation if limits are reached. */
+	mlx5_tx_request_completion(txq, loc, olx);
 	return MLX5_TXCMP_CODE_MULTI;
 }
 
@@ -3287,6 +3294,8 @@ enum mlx5_txcmp_code {
 	wqe->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
 	loc->wqe_free -= (ds + 3) / 4;
+	/* Request CQE generation if limits are reached. */
+	mlx5_tx_request_completion(txq, loc, olx);
 	return MLX5_TXCMP_CODE_MULTI;
 }
 
@@ -3496,6 +3505,8 @@ enum mlx5_txcmp_code {
 		--loc->elts_free;
 		++loc->pkts_sent;
 		--pkts_n;
+		/* Request CQE generation if limits are reached. */
+		mlx5_tx_request_completion(txq, loc, olx);
 		if (unlikely(!pkts_n || !loc->elts_free || !loc->wqe_free))
 			return MLX5_TXCMP_CODE_EXIT;
 		loc->mbuf = *pkts++;
@@ -3637,7 +3648,7 @@ enum mlx5_txcmp_code {
 		   struct mlx5_txq_local *restrict loc,
 		   unsigned int ds,
 		   unsigned int slen,
-		   unsigned int olx __rte_unused)
+		   unsigned int olx)
 {
 	assert(!MLX5_TXOFF_CONFIG(INLINE));
 #ifdef MLX5_PMD_SOFT_COUNTERS
@@ -3652,6 +3663,8 @@ enum mlx5_txcmp_code {
 	loc->wqe_last->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | ds);
 	txq->wqe_ci += (ds + 3) / 4;
 	loc->wqe_free -= (ds + 3) / 4;
+	/* Request CQE generation if limits are reached. */
+	mlx5_tx_request_completion(txq, loc, olx);
 }
 
 /*
@@ -3694,6 +3707,8 @@ enum mlx5_txcmp_code {
 	loc->wqe_last->cseg.sq_ds = rte_cpu_to_be_32(txq->qp_num_8s | len);
 	txq->wqe_ci += (len + 3) / 4;
 	loc->wqe_free -= (len + 3) / 4;
+	/* Request CQE generation if limits are reached. */
+	mlx5_tx_request_completion(txq, loc, olx);
 }
 
 /**
@@ -3865,6 +3880,7 @@ enum mlx5_txcmp_code {
 				if (unlikely(!loc->elts_free ||
 					     !loc->wqe_free))
 					return MLX5_TXCMP_CODE_EXIT;
+				pkts_n -= part;
 				goto next_empw;
 			}
 			/* Packet attributes match, continue the same eMPW. */
@@ -3884,6 +3900,8 @@ enum mlx5_txcmp_code {
 		txq->wqe_ci += (2 + part + 3) / 4;
 		loc->wqe_free -= (2 + part + 3) / 4;
 		pkts_n -= part;
+		/* Request CQE generation if limits are reached. */
+		mlx5_tx_request_completion(txq, loc, olx);
 		if (unlikely(!pkts_n || !loc->elts_free || !loc->wqe_free))
 			return MLX5_TXCMP_CODE_EXIT;
 		loc->mbuf = *pkts++;
@@ -3922,10 +3940,14 @@ enum mlx5_txcmp_code {
 		struct mlx5_wqe_dseg *restrict dseg;
 		struct mlx5_wqe_eseg *restrict eseg;
 		enum mlx5_txcmp_code ret;
-		unsigned int room, part;
+		unsigned int room, part, nlim;
 		unsigned int slen = 0;
 
-next_empw:
+		/*
+		 * Limits the amount of packets in one WQE
+		 * to improve CQE latency generation.
+		 */
+		nlim = RTE_MIN(pkts_n, MLX5_EMPW_MAX_PACKETS);
 		/* Check whether we have minimal amount WQEs */
 		if (unlikely(loc->wqe_free <
 			    ((2 + MLX5_EMPW_MIN_PACKETS + 3) / 4)))
@@ -4044,12 +4066,6 @@ enum mlx5_txcmp_code {
 				mlx5_tx_idone_empw(txq, loc, part, slen, olx);
 				return MLX5_TXCMP_CODE_EXIT;
 			}
-			/* Check if we have minimal room left. */
-			if (room < MLX5_WQE_DSEG_SIZE) {
-				part -= room;
-				mlx5_tx_idone_empw(txq, loc, part, slen, olx);
-				goto next_empw;
-			}
 			loc->mbuf = *pkts++;
 			if (likely(pkts_n > 1))
 				rte_prefetch0(*pkts);
@@ -4089,6 +4105,10 @@ enum mlx5_txcmp_code {
 				mlx5_tx_idone_empw(txq, loc, part, slen, olx);
 				return MLX5_TXCMP_CODE_ERROR;
 			}
+			/* Check if we have minimal room left. */
+			nlim--;
+			if (unlikely(!nlim || room < MLX5_WQE_DSEG_SIZE))
+				break;
 			/*
 			 * Check whether packet parameters coincide
 			 * within assumed eMPW batch:
@@ -4114,7 +4134,7 @@ enum mlx5_txcmp_code {
 		if (unlikely(!loc->elts_free ||
 			     !loc->wqe_free))
 			return MLX5_TXCMP_CODE_EXIT;
-		goto next_empw;
+		/* Continue the loop with new eMPW session. */
 	}
 	assert(false);
 }
@@ -4355,6 +4375,8 @@ enum mlx5_txcmp_code {
 		}
 		++loc->pkts_sent;
 		--pkts_n;
+		/* Request CQE generation if limits are reached. */
+		mlx5_tx_request_completion(txq, loc, olx);
 		if (unlikely(!pkts_n || !loc->elts_free || !loc->wqe_free))
 			return MLX5_TXCMP_CODE_EXIT;
 		loc->mbuf = *pkts++;
@@ -4630,9 +4652,6 @@ enum mlx5_txcmp_code {
 	/* Take a shortcut if nothing is sent. */
 	if (unlikely(loc.pkts_sent == 0))
 		return 0;
-	/* Not all of the mbufs may be stored into elts yet. */
-	part = MLX5_TXOFF_CONFIG(INLINE) ? 0 : loc.pkts_sent - loc.pkts_copy;
-	mlx5_tx_request_completion(txq, part, &loc, olx);
 	/*
 	 * Ring QP doorbell immediately after WQE building completion
 	 * to improve latencies. The pure software related data treatment
@@ -4640,10 +4659,13 @@ enum mlx5_txcmp_code {
 	 * processed in this thread only by the polling.
 	 */
 	mlx5_tx_dbrec_cond_wmb(txq, loc.wqe_last, 0);
+	/* Not all of the mbufs may be stored into elts yet. */
+	part = MLX5_TXOFF_CONFIG(INLINE) ? 0 : loc.pkts_sent -
+		(MLX5_TXOFF_CONFIG(MULTI) ? loc.pkts_copy : 0);
 	if (!MLX5_TXOFF_CONFIG(INLINE) && part) {
 		/*
 		 * There are some single-segment mbufs not stored in elts.
-		 * It can be only if last packet was single-segment.
+		 * It can be only if the last packet was single-segment.
 		 * The copying is gathered into one place due to it is
 		 * a good opportunity to optimize that with SIMD.
 		 * Unfortunately if inlining is enabled the gaps in
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack
  2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
                   ` (2 preceding siblings ...)
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 3/3] net/mlx5: fix the Tx completion request generation Viacheslav Ovsiienko
@ 2019-07-29 15:13 ` Matan Azrad
  2019-07-29 15:23 ` Raslan Darawsheh
  4 siblings, 0 replies; 7+ messages in thread
From: Matan Azrad @ 2019-07-29 15:13 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Yongseok Koh, Shahaf Shuler



From: Viacheslav Ovsiienko
> This series contains the pack of bug fixes and performance
> improvements:
> 
>   - limit the amount of freed buffers per one tx_burst call
>   - set the completion generating request in more uniform
>     fashion
>   - limit the number of packets in one descriptor to improve
>     latency
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>

> Viacheslav Ovsiienko (3):
>   net/mlx5: fix Tx completion descriptors fetching loop
>   net/mlx5: fix ConnectX-4LX minimal inline data limit
>   net/mlx5: fix the Tx completion request generation
> 
>  drivers/net/mlx5/mlx5.c      |   7 +--
>  drivers/net/mlx5/mlx5_defs.h |   9 +++-
>  drivers/net/mlx5/mlx5_prm.h  |  17 ++++---  drivers/net/mlx5/mlx5_rxtx.c |
> 110 +++++++++++++++++++++++++++++--------------
>  4 files changed, 97 insertions(+), 46 deletions(-)
> 
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack
  2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
                   ` (3 preceding siblings ...)
  2019-07-29 15:13 ` [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Matan Azrad
@ 2019-07-29 15:23 ` Raslan Darawsheh
  4 siblings, 0 replies; 7+ messages in thread
From: Raslan Darawsheh @ 2019-07-29 15:23 UTC (permalink / raw)
  To: Slava Ovsiienko, dev; +Cc: Yongseok Koh, Shahaf Shuler

Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Viacheslav Ovsiienko
> Sent: Monday, July 29, 2019 3:41 PM
> To: dev@dpdk.org
> Cc: Yongseok Koh <yskoh@mellanox.com>; Shahaf Shuler
> <shahafs@mellanox.com>
> Subject: [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix
> pack
> 
> This series contains the pack of bug fixes and performance
> improvements:
> 
>   - limit the amount of freed buffers per one tx_burst call
>   - set the completion generating request in more uniform
>     fashion
>   - limit the number of packets in one descriptor to improve
>     latency
> 
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> Viacheslav Ovsiienko (3):
>   net/mlx5: fix Tx completion descriptors fetching loop
>   net/mlx5: fix ConnectX-4LX minimal inline data limit
>   net/mlx5: fix the Tx completion request generation
> 
>  drivers/net/mlx5/mlx5.c      |   7 +--
>  drivers/net/mlx5/mlx5_defs.h |   9 +++-
>  drivers/net/mlx5/mlx5_prm.h  |  17 ++++---  drivers/net/mlx5/mlx5_rxtx.c |
> 110 +++++++++++++++++++++++++++++--------------
>  4 files changed, 97 insertions(+), 46 deletions(-)
> 
> --
> 1.8.3.1

Series applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [dpdk-dev] [PATCH] net/mlx5: fix default minimal data inline
  2019-07-29 12:41 ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix ConnectX-4LX minimal inline data limit Viacheslav Ovsiienko
@ 2019-08-01  7:43   ` Viacheslav Ovsiienko
  0 siblings, 0 replies; 7+ messages in thread
From: Viacheslav Ovsiienko @ 2019-08-01  7:43 UTC (permalink / raw)
  To: dev; +Cc: yskoh, shahafs

The patch [Fixes] sets the default value of required minimal
inline data to 0 bytes. On some configurations (depends
on switchdev/legacy settings and FW version/settings)
the ConnectX-4LX NIC requires minimal 18 bytes of
Tx descriptor inline data to operate correctly.

Wrongly set to 0 default value may prevent NIC from operating
with out-of-the-box settings, this patch reverts default
value for ConnectX-4LX back to 18 bytes (inline L2).

Fixes: 9f350504bb32 ("net/mlx5: fix ConnectX-4LX minimal inline data limit")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
 drivers/net/mlx5/mlx5.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f5bc31f..5a1ea80 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -1325,12 +1325,9 @@ struct mlx5_dev_spawn_data {
 	switch (spawn->pci_dev->id.device_id) {
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4:
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4VF:
-		config->txq_inline_min = MLX5_INLINE_HSIZE_L2;
-		config->hw_vlan_insert = 0;
-		break;
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4LX:
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX4LXVF:
-		config->txq_inline_min = MLX5_INLINE_HSIZE_NONE;
+		config->txq_inline_min = MLX5_INLINE_HSIZE_L2;
 		config->hw_vlan_insert = 0;
 		break;
 	case PCI_DEVICE_ID_MELLANOX_CONNECTX5:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-08-01  7:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 12:41 [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Viacheslav Ovsiienko
2019-07-29 12:41 ` [dpdk-dev] [PATCH 1/3] net/mlx5: fix Tx completion descriptors fetching loop Viacheslav Ovsiienko
2019-07-29 12:41 ` [dpdk-dev] [PATCH 2/3] net/mlx5: fix ConnectX-4LX minimal inline data limit Viacheslav Ovsiienko
2019-08-01  7:43   ` [dpdk-dev] [PATCH] net/mlx5: fix default minimal data inline Viacheslav Ovsiienko
2019-07-29 12:41 ` [dpdk-dev] [PATCH 3/3] net/mlx5: fix the Tx completion request generation Viacheslav Ovsiienko
2019-07-29 15:13 ` [dpdk-dev] [PATCH 0/3] net/mlx5: transmit datapath cumulative fix pack Matan Azrad
2019-07-29 15:23 ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).