DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.ananyev@intel.com>
To: dev@dpdk.org
Subject: [dpdk-dev] [PATCHv6 2/2] ixgbe: fix TX hang when RS distance exceeds HW limit
Date: Mon,  9 Nov 2015 19:21:13 +0000	[thread overview]
Message-ID: <1447096873-8353-3-git-send-email-konstantin.ananyev@intel.com> (raw)
In-Reply-To: <1447096873-8353-1-git-send-email-konstantin.ananyev@intel.com>
In-Reply-To: <1440085070-13989-1-git-send-email-vladz@cloudius-systems.com>

One of the ways to reproduce the issue:

testpmd <EAL-OPTIONS> -- -i --txqflags=0
testpmd> set fwd txonly
testpmd> set txpkts 64,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
testpmd> set txsplit rand
testpmd> start

After some time TX on ixgbe queue will hang,
and all packet transmission on that queue will stop.

This bug was first reported and investigated by
Vlad Zolotarov <vladz@cloudius-systems.com>:
"We can reproduce this issue when stressed the xmit path with a lot of highly
fragmented TCP frames (packets with up to 33 fragments with non-headers
fragments as small as 4 bytes) with all offload features enabled."

The root cause is that ixgbe_xmit_pkts() in some cases violates the HW rule
that the distance between TDs with RS bit set should not exceed 40 TDs.

>From the latest 82599 spec update:
"When WTHRESH is set to zero, the software device driver should set the RS bit
in the Tx descriptors with the EOP bit set and at least once in the 40
descriptors."

The fix is to make sure that the distance between TDs with RS bit set
would never exceed HW limit.
As part of that fix, tx_rs_thresh for ixgbe PMD is not allowed to be greater
then to 32 to comply with HW restrictions.

With that fix slight slowdown for the full-featured ixgbe TX path
might be observed (from our testing - up to 4%).

ixgbe simple TX path is unaffected by that patch.

v5 changes:
- rework the patch to avoid setting RS bit on every EOP descriptor
 (while that approach is valid, it causes significant slowdown
  on the TX path: up to 25%).

v6 changes:
- fix pmd_perf_autotest
- fix error description
- update RN

Reported-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_pmd_perf.c             |  8 ++++----
 doc/guides/rel_notes/release_2_2.rst |  7 +++++++
 drivers/net/ixgbe/ixgbe_rxtx.c       | 32 +++++++++++++++++++++++++++-----
 3 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 1fd6843..ef9262c 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -841,10 +841,10 @@ test_set_rxtx_conf(cmdline_fixed_string_t mode)
 		port_conf.rxmode.enable_scatter = 0;
 		return 0;
 	} else if (!strcmp(mode, "scalar")) {
-		/* bulk alloc rx, simple tx */
-		tx_conf.txq_flags = 0xf01;
-		tx_conf.tx_rs_thresh = 128;
-		tx_conf.tx_free_thresh = 128;
+		/* bulk alloc rx, full-featured tx */
+		tx_conf.txq_flags = 0;
+		tx_conf.tx_rs_thresh = 32;
+		tx_conf.tx_free_thresh = 32;
 		port_conf.rxmode.hw_ip_checksum = 1;
 		port_conf.rxmode.enable_scatter = 0;
 		return 0;
diff --git a/doc/guides/rel_notes/release_2_2.rst b/doc/guides/rel_notes/release_2_2.rst
index 59dda59..62e225b 100644
--- a/doc/guides/rel_notes/release_2_2.rst
+++ b/doc/guides/rel_notes/release_2_2.rst
@@ -134,6 +134,13 @@ Drivers
 
   VF needs the PF interrupt support initialized even if not started.
 
+* **ixgbe: Fixed TX hang when RS distance exceeds HW limit.**
+
+  Fixed an issue when TX queue can hang when a lot of highly fragmented
+  packets have to be sent.
+  As part of that fix, tx_rs_thresh for ixgbe PMD is not allowed to be greater
+  then to 32 to comply with HW restrictions.
+
 * **i40e: Fixed base driver allocation when not using first numa node.**
 
   Fixed i40e issue that occurred when a DPDK application didn't initialize
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 5561195..ca6fb69 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -572,7 +572,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ixgbe_tx_entry *sw_ring;
 	struct ixgbe_tx_entry *txe, *txn;
 	volatile union ixgbe_adv_tx_desc *txr;
-	volatile union ixgbe_adv_tx_desc *txd;
+	volatile union ixgbe_adv_tx_desc *txd, *txp;
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	uint64_t buf_dma_addr;
@@ -595,6 +595,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	txr     = txq->tx_ring;
 	tx_id   = txq->tx_tail;
 	txe = &sw_ring[tx_id];
+	txp = NULL;
 
 	/* Determine if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -638,6 +639,12 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
+		if (txp != NULL &&
+				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
+			/* set RS on the previous packet in the burst */
+			txp->read.cmd_type_len |=
+				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
+
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -840,10 +847,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 			/* Update txq RS bit counters */
 			txq->nb_tx_used = 0;
-		}
+			txp = NULL;
+		} else
+			txp = txd;
+
 		txd->read.cmd_type_len |= rte_cpu_to_le_32(cmd_type_len);
 	}
+
 end_of_tx:
+	/* set RS on last packet in the burst */
+	if (txp != NULL)
+		txp->read.cmd_type_len |= rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
+
 	rte_wmb();
 
 	/*
@@ -2019,9 +2034,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
 	if (tx_rs_thresh >= (nb_desc - 2)) {
 		PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number "
-			     "of TX descriptors minus 2. (tx_rs_thresh=%u "
-			     "port=%d queue=%d)", (unsigned int)tx_rs_thresh,
-			     (int)dev->data->port_id, (int)queue_idx);
+			"of TX descriptors minus 2. (tx_rs_thresh=%u "
+			"port=%d queue=%d)", (unsigned int)tx_rs_thresh,
+			(int)dev->data->port_id, (int)queue_idx);
+		return -(EINVAL);
+	}
+	if (tx_rs_thresh > DEFAULT_TX_RS_THRESH) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be less or equal than %u. "
+			"(tx_rs_thresh=%u port=%d queue=%d)",
+			DEFAULT_TX_RS_THRESH, (unsigned int)tx_rs_thresh,
+			(int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
 	if (tx_free_thresh >= (nb_desc - 3)) {
-- 
1.8.3.1

  parent reply	other threads:[~2015-11-09 19:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-20 15:37 [dpdk-dev] [PATCH v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598 Vlad Zolotarov
2015-08-24  8:11 ` Vlad Zolotarov
2015-10-27 18:09   ` Thomas Monjalon
2015-10-27 18:47     ` Vlad Zolotarov
2015-10-27 18:50       ` Brandeburg, Jesse
2015-10-27 19:10       ` Ananyev, Konstantin
2015-10-27 19:14         ` Vlad Zolotarov
2015-11-09 19:21 ` [dpdk-dev] [PATCHv6 0/2] ixgbe: fix TX hang when RS distance exceeds HW limit Konstantin Ananyev
2015-11-09 19:21 ` [dpdk-dev] [PATCHv6 1/2] testpmd: add ability to split outgoing packets Konstantin Ananyev
2015-11-09 19:21 ` Konstantin Ananyev [this message]
2015-11-10 13:48 ` [dpdk-dev] [PATCHv7 0/2] ixgbe: fix TX hang when RS distance exceeds HW limit Konstantin Ananyev
2015-11-10 14:06   ` De Lara Guarch, Pablo
2015-11-11 23:15     ` Thomas Monjalon
2015-11-10 13:48 ` [dpdk-dev] [PATCHv7 1/2] testpmd: add ability to split outgoing packets Konstantin Ananyev
2015-11-10 13:48 ` [dpdk-dev] [PATCHv7 2/2] ixgbe: fix TX hang when RS distance exceeds HW limit Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1447096873-8353-3-git-send-email-konstantin.ananyev@intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).