* [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 @ 2015-08-13 18:06 Vlad Zolotarov 2015-08-13 20:28 ` Zhang, Helin 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-13 18:06 UTC (permalink / raw) To: dev According to 82599 and x540 HW specifications RS bit *must* be set in the last descriptor of *every* packet. This patch fixes the Tx hang we were constantly hitting with a seastar-based application on x540 NIC. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> --- drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index b8ee1e9..6714fd9 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | ETH_TXQ_FLAGS_NOOFFLOADS, }; + + /* + * According to 82599 and x540 specifications RS bit *must* be set on the + * last descriptor of *every* packet. Therefore we will not allow the + * tx_rs_thresh above 1 for all NICs newer than 82598. + */ + if (hw->mac.type > ixgbe_mac_82598EB) + dev_info->default_txconf.tx_rs_thresh = 1; + dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); dev_info->reta_size = ETH_RSS_RETA_SIZE_128; dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 91023b9..8dbdffc 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq; struct ixgbe_hw *hw; uint16_t tx_rs_thresh, tx_free_thresh; + bool rs_deferring_allowed; PMD_INIT_FUNC_TRACE(); hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); /* + * According to 82599 and x540 specifications RS bit *must* be set on the + * last descriptor of *every* packet. Therefore we will not allow the + * tx_rs_thresh above 1 for all NICs newer than 82598. + */ + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); + + /* * Validate number of transmit descriptors. * It must not exceed hardware maximum, and must be multiple * of IXGBE_ALIGN. @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, * to transmit a packet is greater than the number of free TX * descriptors. * The following constraints must be satisfied: + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is + * forbidden (all but 82598). * tx_rs_thresh must be greater than 0. * tx_rs_thresh must be less than the size of the ring minus 2. * tx_rs_thresh must be less than or equal to tx_free_thresh. @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, * When set to zero use default values. */ tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); + tx_conf->tx_rs_thresh : + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); + + if (!rs_deferring_allowed && tx_rs_thresh > 1) { + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS " + "must be set for every packet for this HW. " + "(tx_rs_thresh=%u port=%d queue=%d)", + (unsigned int)tx_rs_thresh, + (int)dev->data->port_id, (int)queue_idx); + return -(EINVAL); + } + if (tx_rs_thresh >= (nb_desc - 2)) { PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number " "of TX descriptors minus 2. (tx_rs_thresh=%u " -- 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-13 18:06 [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 Vlad Zolotarov @ 2015-08-13 20:28 ` Zhang, Helin 2015-08-14 5:37 ` Vlad Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Zhang, Helin @ 2015-08-13 20:28 UTC (permalink / raw) To: Vlad Zolotarov, dev Hi Vlad I don't think the changes are needed. It says in datasheet that the RS bit should be set on the last descriptor of every packet, ONLY WHEN TXDCTL.WTHRESH equals to ZERO. Regards, Helin > -----Original Message----- > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Thursday, August 13, 2015 11:07 AM > To: dev@dpdk.org > Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com; Vlad > Zolotarov > Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all > NICs but 82598 > > According to 82599 and x540 HW specifications RS bit *must* be set in the last > descriptor of *every* packet. There is a condition that if TXDCTL.WTHRESH equal to zero. > > This patch fixes the Tx hang we were constantly hitting with a seastar-based > application on x540 NIC. Could you help to share with us how to reproduce the tx hang issue, with using typical DPDK examples? > > Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > 2 files changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index b8ee1e9..6714fd9 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct > rte_eth_dev_info *dev_info) > .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > ETH_TXQ_FLAGS_NOOFFLOADS, > }; > + > + /* > + * According to 82599 and x540 specifications RS bit *must* be set on the > + * last descriptor of *every* packet. Therefore we will not allow the > + * tx_rs_thresh above 1 for all NICs newer than 82598. > + */ > + if (hw->mac.type > ixgbe_mac_82598EB) > + dev_info->default_txconf.tx_rs_thresh = 1; > + > dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --git > a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > 91023b9..8dbdffc 100644 > --- a/drivers/net/ixgbe/ixgbe_rxtx.c > +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > *dev, > struct ixgbe_tx_queue *txq; > struct ixgbe_hw *hw; > uint16_t tx_rs_thresh, tx_free_thresh; > + bool rs_deferring_allowed; > > PMD_INIT_FUNC_TRACE(); > hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > /* > + * According to 82599 and x540 specifications RS bit *must* be set on the > + * last descriptor of *every* packet. Therefore we will not allow the > + * tx_rs_thresh above 1 for all NICs newer than 82598. > + */ > + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > + > + /* > * Validate number of transmit descriptors. > * It must not exceed hardware maximum, and must be multiple > * of IXGBE_ALIGN. > @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, > * to transmit a packet is greater than the number of free TX > * descriptors. > * The following constraints must be satisfied: > + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > + * forbidden (all but 82598). > * tx_rs_thresh must be greater than 0. > * tx_rs_thresh must be less than the size of the ring minus 2. > * tx_rs_thresh must be less than or equal to tx_free_thresh. > @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, > * When set to zero use default values. > */ > tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > + tx_conf->tx_rs_thresh : > + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > + > + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS " > + "must be set for every packet for this HW. " > + "(tx_rs_thresh=%u port=%d queue=%d)", > + (unsigned int)tx_rs_thresh, > + (int)dev->data->port_id, (int)queue_idx); > + return -(EINVAL); > + } > + > if (tx_rs_thresh >= (nb_desc - 2)) { > PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number " > "of TX descriptors minus 2. (tx_rs_thresh=%u " > -- > 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-13 20:28 ` Zhang, Helin @ 2015-08-14 5:37 ` Vlad Zolotarov 2015-08-19 0:42 ` Lu, Wenzhuo 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-14 5:37 UTC (permalink / raw) To: Zhang, Helin, dev On 08/13/15 23:28, Zhang, Helin wrote: > Hi Vlad > > I don't think the changes are needed. It says in datasheet that the RS bit should be > set on the last descriptor of every packet, ONLY WHEN TXDCTL.WTHRESH equals to ZERO. Of course it's needed! See below. Exactly the same spec a few lines above the place u've just quoted states: "Software should not set the RS bit when TXDCTL.WTHRESH is greater than zero." And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD is actually not supporting any value of WTHRESH different from zero. > > Regards, > Helin > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Thursday, August 13, 2015 11:07 AM >> To: dev@dpdk.org >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com; Vlad >> Zolotarov >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all >> NICs but 82598 >> >> According to 82599 and x540 HW specifications RS bit *must* be set in the last >> descriptor of *every* packet. > There is a condition that if TXDCTL.WTHRESH equal to zero. Right and ixgbe PMD requires this condition to be fulfilled in order to function. See above. > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based >> application on x540 NIC. > Could you help to share with us how to reproduce the tx hang issue, with using > typical DPDK examples? Sorry. I'm not very familiar with the typical DPDK examples to help u here. However this is quite irrelevant since without this this patch ixgbe PMD obviously abuses the HW spec as has been explained above. We saw the issue when u stressed the xmit path with a lot of highly fragmented TCP frames (packets with up to 33 fragments with non-headers fragments as small as 4 bytes) with all offload features enabled. Thanks, vlad > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> >> --- >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- >> 2 files changed, 31 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >> b/drivers/net/ixgbe/ixgbe_ethdev.c >> index b8ee1e9..6714fd9 100644 >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct >> rte_eth_dev_info *dev_info) >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >> ETH_TXQ_FLAGS_NOOFFLOADS, >> }; >> + >> + /* >> + * According to 82599 and x540 specifications RS bit *must* be set on the >> + * last descriptor of *every* packet. Therefore we will not allow the >> + * tx_rs_thresh above 1 for all NICs newer than 82598. >> + */ >> + if (hw->mac.type > ixgbe_mac_82598EB) >> + dev_info->default_txconf.tx_rs_thresh = 1; >> + >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff --git >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index >> 91023b9..8dbdffc 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >> *dev, >> struct ixgbe_tx_queue *txq; >> struct ixgbe_hw *hw; >> uint16_t tx_rs_thresh, tx_free_thresh; >> + bool rs_deferring_allowed; >> >> PMD_INIT_FUNC_TRACE(); >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >> >> /* >> + * According to 82599 and x540 specifications RS bit *must* be set on the >> + * last descriptor of *every* packet. Therefore we will not allow the >> + * tx_rs_thresh above 1 for all NICs newer than 82598. >> + */ >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >> + >> + /* >> * Validate number of transmit descriptors. >> * It must not exceed hardware maximum, and must be multiple >> * of IXGBE_ALIGN. >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, >> * to transmit a packet is greater than the number of free TX >> * descriptors. >> * The following constraints must be satisfied: >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is >> + * forbidden (all but 82598). >> * tx_rs_thresh must be greater than 0. >> * tx_rs_thresh must be less than the size of the ring minus 2. >> * tx_rs_thresh must be less than or equal to tx_free_thresh. >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, >> * When set to zero use default values. >> */ >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); >> + tx_conf->tx_rs_thresh : >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); >> + >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS " >> + "must be set for every packet for this HW. " >> + "(tx_rs_thresh=%u port=%d queue=%d)", >> + (unsigned int)tx_rs_thresh, >> + (int)dev->data->port_id, (int)queue_idx); >> + return -(EINVAL); >> + } >> + >> if (tx_rs_thresh >= (nb_desc - 2)) { >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number " >> "of TX descriptors minus 2. (tx_rs_thresh=%u " >> -- >> 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-14 5:37 ` Vlad Zolotarov @ 2015-08-19 0:42 ` Lu, Wenzhuo 2015-08-19 4:55 ` Vladislav Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Lu, Wenzhuo @ 2015-08-19 0:42 UTC (permalink / raw) To: Zhang, Helin; +Cc: dev Hi Helin, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov > Sent: Friday, August 14, 2015 1:38 PM > To: Zhang, Helin; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > all NICs but 82598 > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > Hi Vlad > > > > I don't think the changes are needed. It says in datasheet that the RS > > bit should be set on the last descriptor of every packet, ONLY WHEN > TXDCTL.WTHRESH equals to ZERO. > > Of course it's needed! See below. > Exactly the same spec a few lines above the place u've just quoted states: > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > zero." > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > is actually not supporting any value of WTHRESH different from zero. I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. > > > > > Regards, > > Helin > > > >> -----Original Message----- > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > >> Sent: Thursday, August 13, 2015 11:07 AM > >> To: dev@dpdk.org > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com; Vlad > >> Zolotarov > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > all > >> NICs but 82598 > >> > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > last > >> descriptor of *every* packet. > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > Right and ixgbe PMD requires this condition to be fulfilled in order to > function. See above. > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > >> application on x540 NIC. > > Could you help to share with us how to reproduce the tx hang issue, with using > > typical DPDK examples? > > Sorry. I'm not very familiar with the typical DPDK examples to help u > here. However this is quite irrelevant since without this this patch > ixgbe PMD obviously abuses the HW spec as has been explained above. > > We saw the issue when u stressed the xmit path with a lot of highly > fragmented TCP frames (packets with up to 33 fragments with non-headers > fragments as small as 4 bytes) with all offload features enabled. > > Thanks, > vlad > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > >> --- > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > >> 2 files changed, 31 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > >> index b8ee1e9..6714fd9 100644 > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > struct > >> rte_eth_dev_info *dev_info) > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > >> ETH_TXQ_FLAGS_NOOFFLOADS, > >> }; > >> + > >> + /* > >> + * According to 82599 and x540 specifications RS bit *must* be set on > the > >> + * last descriptor of *every* packet. Therefore we will not allow the > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >> + */ > >> + if (hw->mac.type > ixgbe_mac_82598EB) > >> + dev_info->default_txconf.tx_rs_thresh = 1; > >> + > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > git > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > >> 91023b9..8dbdffc 100644 > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > >> *dev, > >> struct ixgbe_tx_queue *txq; > >> struct ixgbe_hw *hw; > >> uint16_t tx_rs_thresh, tx_free_thresh; > >> + bool rs_deferring_allowed; > >> > >> PMD_INIT_FUNC_TRACE(); > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > >> > >> /* > >> + * According to 82599 and x540 specifications RS bit *must* be set on > the > >> + * last descriptor of *every* packet. Therefore we will not allow the > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >> + */ > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > >> + > >> + /* > >> * Validate number of transmit descriptors. > >> * It must not exceed hardware maximum, and must be multiple > >> * of IXGBE_ALIGN. > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > *dev, > >> * to transmit a packet is greater than the number of free TX > >> * descriptors. > >> * The following constraints must be satisfied: > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > >> + * forbidden (all but 82598). > >> * tx_rs_thresh must be greater than 0. > >> * tx_rs_thresh must be less than the size of the ring minus 2. > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > *dev, > >> * When set to zero use default values. > >> */ > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > >> + tx_conf->tx_rs_thresh : > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > >> + > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > " > >> + "must be set for every packet for this HW. " > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > >> + (unsigned int)tx_rs_thresh, > >> + (int)dev->data->port_id, (int)queue_idx); > >> + return -(EINVAL); > >> + } > >> + > >> if (tx_rs_thresh >= (nb_desc - 2)) { > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > number " > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > >> -- > >> 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 0:42 ` Lu, Wenzhuo @ 2015-08-19 4:55 ` Vladislav Zolotarov 2015-08-19 7:43 ` Ananyev, Konstantin 2015-08-19 17:29 ` Zhang, Helin 0 siblings, 2 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-08-19 4:55 UTC (permalink / raw) To: Lu, Wenzhuo; +Cc: dev On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com> wrote: > > Hi Helin, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov > > Sent: Friday, August 14, 2015 1:38 PM > > To: Zhang, Helin; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > Hi Vlad > > > > > > I don't think the changes are needed. It says in datasheet that the RS > > > bit should be set on the last descriptor of every packet, ONLY WHEN > > TXDCTL.WTHRESH equals to ZERO. > > > > Of course it's needed! See below. > > Exactly the same spec a few lines above the place u've just quoted states: > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > > zero." > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > > is actually not supporting any value of WTHRESH different from zero. > I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. Pls., note that there is a v2 of this patch on the list. I forgot to patch ixgbevf_dev_info_get() in v1. > > > > > > > > > Regards, > > > Helin > > > > > >> -----Original Message----- > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > >> Sent: Thursday, August 13, 2015 11:07 AM > > >> To: dev@dpdk.org > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com; Vlad > > >> Zolotarov > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all > > >> NICs but 82598 > > >> > > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > > last > > >> descriptor of *every* packet. > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > Right and ixgbe PMD requires this condition to be fulfilled in order to > > function. See above. > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > > >> application on x540 NIC. > > > Could you help to share with us how to reproduce the tx hang issue, with using > > > typical DPDK examples? > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > here. However this is quite irrelevant since without this this patch > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > We saw the issue when u stressed the xmit path with a lot of highly > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > fragments as small as 4 bytes) with all offload features enabled. > > > > Thanks, > > vlad > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > > >> --- > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> index b8ee1e9..6714fd9 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > struct > > >> rte_eth_dev_info *dev_info) > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > >> }; > > >> + > > >> + /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > >> + > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > git > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > > >> 91023b9..8dbdffc 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > >> *dev, > > >> struct ixgbe_tx_queue *txq; > > >> struct ixgbe_hw *hw; > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > >> + bool rs_deferring_allowed; > > >> > > >> PMD_INIT_FUNC_TRACE(); > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > >> > > >> /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > >> + > > >> + /* > > >> * Validate number of transmit descriptors. > > >> * It must not exceed hardware maximum, and must be multiple > > >> * of IXGBE_ALIGN. > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * to transmit a packet is greater than the number of free TX > > >> * descriptors. > > >> * The following constraints must be satisfied: > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > > >> + * forbidden (all but 82598). > > >> * tx_rs_thresh must be greater than 0. > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * When set to zero use default values. > > >> */ > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > >> + tx_conf->tx_rs_thresh : > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > >> + > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > > " > > >> + "must be set for every packet for this HW. " > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > >> + (unsigned int)tx_rs_thresh, > > >> + (int)dev->data->port_id, (int)queue_idx); > > >> + return -(EINVAL); > > >> + } > > >> + > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > number " > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > > >> -- > > >> 2.1.0 > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 4:55 ` Vladislav Zolotarov @ 2015-08-19 7:43 ` Ananyev, Konstantin 2015-08-19 10:02 ` Vlad Zolotarov 2015-08-19 17:29 ` Zhang, Helin 1 sibling, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-08-19 7:43 UTC (permalink / raw) To: Vladislav Zolotarov, Lu, Wenzhuo; +Cc: dev Hi Vlad, Sorry for delay with review, I am OOO till next week. Meanwhile, few questions/comments from me. > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a > seastar-based > > > >> application on x540 NIC. > > > > Could you help to share with us how to reproduce the tx hang issue, > with using > > > > typical DPDK examples? > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > > here. However this is quite irrelevant since without this this patch > > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > > > We saw the issue when u stressed the xmit path with a lot of highly > > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > > fragments as small as 4 bytes) with all offload features enabled. Could you provide us with the pcap file to reproduce the issue? My concern with you approach is that it would affect TX performance. Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) TXDs, While with your patch (if I understand it correctly) it has to read all TXDs in the HW TX ring. Even if we really need to setup RS bit in each TXD (I still doubt we really do) - , I think inside PMD it still should be possible to check TX completion in chunks. Konstantin > > > > > > Thanks, > > > vlad > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > > > >> --- > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> index b8ee1e9..6714fd9 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > > struct > > > >> rte_eth_dev_info *dev_info) > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > >> }; > > > >> + > > > >> + /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be > set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow > the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > >> + > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > > git > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > index > > > >> 91023b9..8dbdffc 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > >> *dev, > > > >> struct ixgbe_tx_queue *txq; > > > >> struct ixgbe_hw *hw; > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > >> + bool rs_deferring_allowed; > > > >> > > > >> PMD_INIT_FUNC_TRACE(); > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > >> > > > >> /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be > set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow > the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > > >> + > > > >> + /* > > > >> * Validate number of transmit descriptors. > > > >> * It must not exceed hardware maximum, and must be multiple > > > >> * of IXGBE_ALIGN. > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * to transmit a packet is greater than the number of free TX > > > >> * descriptors. > > > >> * The following constraints must be satisfied: > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS > deferring is > > > >> + * forbidden (all but 82598). > > > >> * tx_rs_thresh must be greater than 0. > > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * When set to zero use default values. > > > >> */ > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > >> + tx_conf->tx_rs_thresh : > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : > 1)); > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > >> + > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since > RS > > > " > > > >> + "must be set for every packet for this > HW. " > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > >> + (unsigned int)tx_rs_thresh, > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > >> + return -(EINVAL); > > > >> + } > > > >> + > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > > number " > > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u > " > > > >> -- > > > >> 2.1.0 > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 7:43 ` Ananyev, Konstantin @ 2015-08-19 10:02 ` Vlad Zolotarov 2015-08-20 8:41 ` Ananyev, Konstantin 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-19 10:02 UTC (permalink / raw) To: Ananyev, Konstantin, Lu, Wenzhuo; +Cc: dev On 08/19/15 10:43, Ananyev, Konstantin wrote: > Hi Vlad, > Sorry for delay with review, I am OOO till next week. > Meanwhile, few questions/comments from me. Hi, Konstantin, long time no see... ;) > >>>>>> This patch fixes the Tx hang we were constantly hitting with a >> seastar-based >>>>>> application on x540 NIC. >>>>> Could you help to share with us how to reproduce the tx hang issue, >> with using >>>>> typical DPDK examples? >>>> Sorry. I'm not very familiar with the typical DPDK examples to help u >>>> here. However this is quite irrelevant since without this this patch >>>> ixgbe PMD obviously abuses the HW spec as has been explained above. >>>> >>>> We saw the issue when u stressed the xmit path with a lot of highly >>>> fragmented TCP frames (packets with up to 33 fragments with non-headers >>>> fragments as small as 4 bytes) with all offload features enabled. > Could you provide us with the pcap file to reproduce the issue? Well, the thing is it takes some time to reproduce it (a few minutes of heavy load) therefore a pcap would be quite large. > My concern with you approach is that it would affect TX performance. It certainly will ;) But it seem inevitable. See below. > Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) TXDs, > While with your patch (if I understand it correctly) it has to read all TXDs in the HW TX ring. If by "simple" u refer an always single fragment per Tx packet - then u are absolutely correct. My initial patch was to only set RS on every EOP descriptor without changing the rs_thresh value and this patch worked. However HW spec doesn't ensure in a general case that packets are always handled/completion write-back completes in the same order the packets are placed on the ring (see "Tx arbitration schemes" chapter in 82599 spec for instance). Therefore AFAIU one should not assume that if packet[x+1] DD bit is set then packet[x] is completed too. That's why I changed the patch to be as u see it now. However if I miss something here and your HW people ensure the in-order completion this of course may be changed back. > Even if we really need to setup RS bit in each TXD (I still doubt we really do) - , Well, if u doubt u may ask the guys from the Intel networking division that wrote the 82599 and x540 HW specs where they clearly state that. ;) > I think inside PMD it still should be possible to check TX completion in chunks. > Konstantin > > >>>> Thanks, >>>> vlad >>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> >>>>>> --- >>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- >>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> index b8ee1e9..6714fd9 100644 >>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, >>>> struct >>>>>> rte_eth_dev_info *dev_info) >>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >>>>>> }; >>>>>> + >>>>>> + /* >>>>>> + * According to 82599 and x540 specifications RS bit *must* be >> set on >>>> the >>>>>> + * last descriptor of *every* packet. Therefore we will not allow >> the >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>> + */ >>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) >>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; >>>>>> + >>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); >>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- >>>> git >>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >> index >>>>>> 91023b9..8dbdffc 100644 >>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>> *dev, >>>>>> struct ixgbe_tx_queue *txq; >>>>>> struct ixgbe_hw *hw; >>>>>> uint16_t tx_rs_thresh, tx_free_thresh; >>>>>> + bool rs_deferring_allowed; >>>>>> >>>>>> PMD_INIT_FUNC_TRACE(); >>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>> >>>>>> /* >>>>>> + * According to 82599 and x540 specifications RS bit *must* be >> set on >>>> the >>>>>> + * last descriptor of *every* packet. Therefore we will not allow >> the >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>> + */ >>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >>>>>> + >>>>>> + /* >>>>>> * Validate number of transmit descriptors. >>>>>> * It must not exceed hardware maximum, and must be multiple >>>>>> * of IXGBE_ALIGN. >>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>> *dev, >>>>>> * to transmit a packet is greater than the number of free TX >>>>>> * descriptors. >>>>>> * The following constraints must be satisfied: >>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS >> deferring is >>>>>> + * forbidden (all but 82598). >>>>>> * tx_rs_thresh must be greater than 0. >>>>>> * tx_rs_thresh must be less than the size of the ring minus 2. >>>>>> * tx_rs_thresh must be less than or equal to tx_free_thresh. >>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>> *dev, >>>>>> * When set to zero use default values. >>>>>> */ >>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >>>>>> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); >>>>>> + tx_conf->tx_rs_thresh : >>>>>> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : >> 1)); >>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >>>>>> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); >>>>>> + >>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since >> RS >>>> " >>>>>> + "must be set for every packet for this >> HW. " >>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", >>>>>> + (unsigned int)tx_rs_thresh, >>>>>> + (int)dev->data->port_id, (int)queue_idx); >>>>>> + return -(EINVAL); >>>>>> + } >>>>>> + >>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { >>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the >>>> number " >>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u >> " >>>>>> -- >>>>>> 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 10:02 ` Vlad Zolotarov @ 2015-08-20 8:41 ` Ananyev, Konstantin 2015-08-20 8:56 ` Vlad Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-08-20 8:41 UTC (permalink / raw) To: Vlad Zolotarov, Lu, Wenzhuo; +Cc: dev Hi Vlad, > -----Original Message----- > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Wednesday, August 19, 2015 11:03 AM > To: Ananyev, Konstantin; Lu, Wenzhuo > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > On 08/19/15 10:43, Ananyev, Konstantin wrote: > > Hi Vlad, > > Sorry for delay with review, I am OOO till next week. > > Meanwhile, few questions/comments from me. > > Hi, Konstantin, long time no see... ;) > > > > >>>>>> This patch fixes the Tx hang we were constantly hitting with a > >> seastar-based > >>>>>> application on x540 NIC. > >>>>> Could you help to share with us how to reproduce the tx hang issue, > >> with using > >>>>> typical DPDK examples? > >>>> Sorry. I'm not very familiar with the typical DPDK examples to help u > >>>> here. However this is quite irrelevant since without this this patch > >>>> ixgbe PMD obviously abuses the HW spec as has been explained above. > >>>> > >>>> We saw the issue when u stressed the xmit path with a lot of highly > >>>> fragmented TCP frames (packets with up to 33 fragments with non-headers > >>>> fragments as small as 4 bytes) with all offload features enabled. > > Could you provide us with the pcap file to reproduce the issue? > > Well, the thing is it takes some time to reproduce it (a few minutes of > heavy load) therefore a pcap would be quite large. Probably you can upload it to some place, from which we will be able to download it? Or might be you have some sort of scapy script to generate it? I suppose we'll need something to reproduce the issue and verify the fix. > > > My concern with you approach is that it would affect TX performance. > > It certainly will ;) But it seem inevitable. See below. > > > Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) TXDs, > > While with your patch (if I understand it correctly) it has to read all TXDs in the HW TX ring. > > If by "simple" u refer an always single fragment per Tx packet - then u > are absolutely correct. > > My initial patch was to only set RS on every EOP descriptor without > changing the rs_thresh value and this patch worked. > However HW spec doesn't ensure in a general case that packets are always > handled/completion write-back completes in the same order the packets > are placed on the ring (see "Tx arbitration schemes" chapter in 82599 > spec for instance). Therefore AFAIU one should not assume that if > packet[x+1] DD bit is set then packet[x] is completed too. From my understanding, TX arbitration controls the order in which TXDs from different queues are fetched/processed. But descriptors from the same TX queue are processed in FIFO order. So, I think that - yes, if TXD[x+1] DD bit is set, then TXD[x] is completed too, and setting RS on every EOP TXD should be enough. > > That's why I changed the patch to be as u see it now. However if I miss > something here and your HW people ensure the in-order completion this of > course may be changed back. > > > Even if we really need to setup RS bit in each TXD (I still doubt we really do) - , > > Well, if u doubt u may ask the guys from the Intel networking division > that wrote the 82599 and x540 HW specs where they clearly state that. ;) Good point, we'll see what we can do here :) Konstantin > > > I think inside PMD it still should be possible to check TX completion in chunks. > > Konstantin > > > > > >>>> Thanks, > >>>> vlad > >>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > >>>>>> --- > >>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > >>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > >>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>> index b8ee1e9..6714fd9 100644 > >>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > >>>> struct > >>>>>> rte_eth_dev_info *dev_info) > >>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > >>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, > >>>>>> }; > >>>>>> + > >>>>>> + /* > >>>>>> + * According to 82599 and x540 specifications RS bit *must* be > >> set on > >>>> the > >>>>>> + * last descriptor of *every* packet. Therefore we will not allow > >> the > >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >>>>>> + */ > >>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) > >>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; > >>>>>> + > >>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > >>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > >>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > >>>> git > >>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c > >> index > >>>>>> 91023b9..8dbdffc 100644 > >>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > >>>>>> *dev, > >>>>>> struct ixgbe_tx_queue *txq; > >>>>>> struct ixgbe_hw *hw; > >>>>>> uint16_t tx_rs_thresh, tx_free_thresh; > >>>>>> + bool rs_deferring_allowed; > >>>>>> > >>>>>> PMD_INIT_FUNC_TRACE(); > >>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > >>>>>> > >>>>>> /* > >>>>>> + * According to 82599 and x540 specifications RS bit *must* be > >> set on > >>>> the > >>>>>> + * last descriptor of *every* packet. Therefore we will not allow > >> the > >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >>>>>> + */ > >>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > >>>>>> + > >>>>>> + /* > >>>>>> * Validate number of transmit descriptors. > >>>>>> * It must not exceed hardware maximum, and must be multiple > >>>>>> * of IXGBE_ALIGN. > >>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > >>>> *dev, > >>>>>> * to transmit a packet is greater than the number of free TX > >>>>>> * descriptors. > >>>>>> * The following constraints must be satisfied: > >>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS > >> deferring is > >>>>>> + * forbidden (all but 82598). > >>>>>> * tx_rs_thresh must be greater than 0. > >>>>>> * tx_rs_thresh must be less than the size of the ring minus 2. > >>>>>> * tx_rs_thresh must be less than or equal to tx_free_thresh. > >>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > >>>> *dev, > >>>>>> * When set to zero use default values. > >>>>>> */ > >>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > >>>>>> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > >>>>>> + tx_conf->tx_rs_thresh : > >>>>>> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : > >> 1)); > >>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > >>>>>> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > >>>>>> + > >>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > >>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since > >> RS > >>>> " > >>>>>> + "must be set for every packet for this > >> HW. " > >>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", > >>>>>> + (unsigned int)tx_rs_thresh, > >>>>>> + (int)dev->data->port_id, (int)queue_idx); > >>>>>> + return -(EINVAL); > >>>>>> + } > >>>>>> + > >>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { > >>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > >>>> number " > >>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u > >> " > >>>>>> -- > >>>>>> 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-20 8:41 ` Ananyev, Konstantin @ 2015-08-20 8:56 ` Vlad Zolotarov 2015-08-20 9:05 ` Vlad Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-20 8:56 UTC (permalink / raw) To: Ananyev, Konstantin, Lu, Wenzhuo; +Cc: dev On 08/20/15 11:41, Ananyev, Konstantin wrote: > Hi Vlad, > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Wednesday, August 19, 2015 11:03 AM >> To: Ananyev, Konstantin; Lu, Wenzhuo >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 >> >> >> >> On 08/19/15 10:43, Ananyev, Konstantin wrote: >>> Hi Vlad, >>> Sorry for delay with review, I am OOO till next week. >>> Meanwhile, few questions/comments from me. >> Hi, Konstantin, long time no see... ;) >> >>>>>>>> This patch fixes the Tx hang we were constantly hitting with a >>>> seastar-based >>>>>>>> application on x540 NIC. >>>>>>> Could you help to share with us how to reproduce the tx hang issue, >>>> with using >>>>>>> typical DPDK examples? >>>>>> Sorry. I'm not very familiar with the typical DPDK examples to help u >>>>>> here. However this is quite irrelevant since without this this patch >>>>>> ixgbe PMD obviously abuses the HW spec as has been explained above. >>>>>> >>>>>> We saw the issue when u stressed the xmit path with a lot of highly >>>>>> fragmented TCP frames (packets with up to 33 fragments with non-headers >>>>>> fragments as small as 4 bytes) with all offload features enabled. >>> Could you provide us with the pcap file to reproduce the issue? >> Well, the thing is it takes some time to reproduce it (a few minutes of >> heavy load) therefore a pcap would be quite large. > Probably you can upload it to some place, from which we will be able to download it? I'll see what I can do but no promises... > Or might be you have some sort of scapy script to generate it? > I suppose we'll need something to reproduce the issue and verify the fix. Since the original code abuses the HW spec u don't have to... ;) > >>> My concern with you approach is that it would affect TX performance. >> It certainly will ;) But it seem inevitable. See below. >> >>> Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) TXDs, >>> While with your patch (if I understand it correctly) it has to read all TXDs in the HW TX ring. >> If by "simple" u refer an always single fragment per Tx packet - then u >> are absolutely correct. >> >> My initial patch was to only set RS on every EOP descriptor without >> changing the rs_thresh value and this patch worked. >> However HW spec doesn't ensure in a general case that packets are always >> handled/completion write-back completes in the same order the packets >> are placed on the ring (see "Tx arbitration schemes" chapter in 82599 >> spec for instance). Therefore AFAIU one should not assume that if >> packet[x+1] DD bit is set then packet[x] is completed too. > From my understanding, TX arbitration controls the order in which TXDs from > different queues are fetched/processed. > But descriptors from the same TX queue are processed in FIFO order. > So, I think that - yes, if TXD[x+1] DD bit is set, then TXD[x] is completed too, > and setting RS on every EOP TXD should be enough. Ok. I'll rework the patch under this assumption then. > >> That's why I changed the patch to be as u see it now. However if I miss >> something here and your HW people ensure the in-order completion this of >> course may be changed back. >> >>> Even if we really need to setup RS bit in each TXD (I still doubt we really do) - , >> Well, if u doubt u may ask the guys from the Intel networking division >> that wrote the 82599 and x540 HW specs where they clearly state that. ;) > Good point, we'll see what we can do here :) > Konstantin > >>> I think inside PMD it still should be possible to check TX completion in chunks. >>> Konstantin >>> >>> >>>>>> Thanks, >>>>>> vlad >>>>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> >>>>>>>> --- >>>>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >>>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- >>>>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>> index b8ee1e9..6714fd9 100644 >>>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, >>>>>> struct >>>>>>>> rte_eth_dev_info *dev_info) >>>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >>>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >>>>>>>> }; >>>>>>>> + >>>>>>>> + /* >>>>>>>> + * According to 82599 and x540 specifications RS bit *must* be >>>> set on >>>>>> the >>>>>>>> + * last descriptor of *every* packet. Therefore we will not allow >>>> the >>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>> + */ >>>>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) >>>>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; >>>>>>>> + >>>>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); >>>>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >>>>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- >>>>>> git >>>>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >>>> index >>>>>>>> 91023b9..8dbdffc 100644 >>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>>>> *dev, >>>>>>>> struct ixgbe_tx_queue *txq; >>>>>>>> struct ixgbe_hw *hw; >>>>>>>> uint16_t tx_rs_thresh, tx_free_thresh; >>>>>>>> + bool rs_deferring_allowed; >>>>>>>> >>>>>>>> PMD_INIT_FUNC_TRACE(); >>>>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>>>> >>>>>>>> /* >>>>>>>> + * According to 82599 and x540 specifications RS bit *must* be >>>> set on >>>>>> the >>>>>>>> + * last descriptor of *every* packet. Therefore we will not allow >>>> the >>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>> + */ >>>>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >>>>>>>> + >>>>>>>> + /* >>>>>>>> * Validate number of transmit descriptors. >>>>>>>> * It must not exceed hardware maximum, and must be multiple >>>>>>>> * of IXGBE_ALIGN. >>>>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>> *dev, >>>>>>>> * to transmit a packet is greater than the number of free TX >>>>>>>> * descriptors. >>>>>>>> * The following constraints must be satisfied: >>>>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS >>>> deferring is >>>>>>>> + * forbidden (all but 82598). >>>>>>>> * tx_rs_thresh must be greater than 0. >>>>>>>> * tx_rs_thresh must be less than the size of the ring minus 2. >>>>>>>> * tx_rs_thresh must be less than or equal to tx_free_thresh. >>>>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>> *dev, >>>>>>>> * When set to zero use default values. >>>>>>>> */ >>>>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >>>>>>>> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); >>>>>>>> + tx_conf->tx_rs_thresh : >>>>>>>> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : >>>> 1)); >>>>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >>>>>>>> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); >>>>>>>> + >>>>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >>>>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since >>>> RS >>>>>> " >>>>>>>> + "must be set for every packet for this >>>> HW. " >>>>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", >>>>>>>> + (unsigned int)tx_rs_thresh, >>>>>>>> + (int)dev->data->port_id, (int)queue_idx); >>>>>>>> + return -(EINVAL); >>>>>>>> + } >>>>>>>> + >>>>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { >>>>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the >>>>>> number " >>>>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u >>>> " >>>>>>>> -- >>>>>>>> 2.1.0 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-20 8:56 ` Vlad Zolotarov @ 2015-08-20 9:05 ` Vlad Zolotarov 2015-08-20 9:06 ` Vlad Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-20 9:05 UTC (permalink / raw) To: Ananyev, Konstantin, Lu, Wenzhuo; +Cc: dev On 08/20/15 11:56, Vlad Zolotarov wrote: > > > On 08/20/15 11:41, Ananyev, Konstantin wrote: >> Hi Vlad, >> >>> -----Original Message----- >>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >>> Sent: Wednesday, August 19, 2015 11:03 AM >>> To: Ananyev, Konstantin; Lu, Wenzhuo >>> Cc: dev@dpdk.org >>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh >>> above 1 for all NICs but 82598 >>> >>> >>> >>> On 08/19/15 10:43, Ananyev, Konstantin wrote: >>>> Hi Vlad, >>>> Sorry for delay with review, I am OOO till next week. >>>> Meanwhile, few questions/comments from me. >>> Hi, Konstantin, long time no see... ;) >>> >>>>>>>>> This patch fixes the Tx hang we were constantly hitting with a >>>>> seastar-based >>>>>>>>> application on x540 NIC. >>>>>>>> Could you help to share with us how to reproduce the tx hang >>>>>>>> issue, >>>>> with using >>>>>>>> typical DPDK examples? >>>>>>> Sorry. I'm not very familiar with the typical DPDK examples to >>>>>>> help u >>>>>>> here. However this is quite irrelevant since without this this >>>>>>> patch >>>>>>> ixgbe PMD obviously abuses the HW spec as has been explained above. >>>>>>> >>>>>>> We saw the issue when u stressed the xmit path with a lot of highly >>>>>>> fragmented TCP frames (packets with up to 33 fragments with >>>>>>> non-headers >>>>>>> fragments as small as 4 bytes) with all offload features enabled. >>>> Could you provide us with the pcap file to reproduce the issue? >>> Well, the thing is it takes some time to reproduce it (a few minutes of >>> heavy load) therefore a pcap would be quite large. >> Probably you can upload it to some place, from which we will be able >> to download it? > > I'll see what I can do but no promises... On a second thought pcap file won't help u much since in order to reproduce the issue u have to reproduce exactly the same structure of clusters i give to HW and it's not what u see on wire in a TSO case. > >> Or might be you have some sort of scapy script to generate it? >> I suppose we'll need something to reproduce the issue and verify the >> fix. > > Since the original code abuses the HW spec u don't have to... ;) > >> >>>> My concern with you approach is that it would affect TX performance. >>> It certainly will ;) But it seem inevitable. See below. >>> >>>> Right now, for simple TX PMD usually reads only >>>> (nb_tx_desc/tx_rs_thresh) TXDs, >>>> While with your patch (if I understand it correctly) it has to read >>>> all TXDs in the HW TX ring. >>> If by "simple" u refer an always single fragment per Tx packet - then u >>> are absolutely correct. >>> >>> My initial patch was to only set RS on every EOP descriptor without >>> changing the rs_thresh value and this patch worked. >>> However HW spec doesn't ensure in a general case that packets are >>> always >>> handled/completion write-back completes in the same order the packets >>> are placed on the ring (see "Tx arbitration schemes" chapter in 82599 >>> spec for instance). Therefore AFAIU one should not assume that if >>> packet[x+1] DD bit is set then packet[x] is completed too. >> From my understanding, TX arbitration controls the order in which >> TXDs from >> different queues are fetched/processed. >> But descriptors from the same TX queue are processed in FIFO order. >> So, I think that - yes, if TXD[x+1] DD bit is set, then TXD[x] is >> completed too, >> and setting RS on every EOP TXD should be enough. > > Ok. I'll rework the patch under this assumption then. > >> >>> That's why I changed the patch to be as u see it now. However if I miss >>> something here and your HW people ensure the in-order completion >>> this of >>> course may be changed back. >>> >>>> Even if we really need to setup RS bit in each TXD (I still doubt >>>> we really do) - , >>> Well, if u doubt u may ask the guys from the Intel networking division >>> that wrote the 82599 and x540 HW specs where they clearly state >>> that. ;) >> Good point, we'll see what we can do here :) >> Konstantin >> >>>> I think inside PMD it still should be possible to check TX >>>> completion in chunks. >>>> Konstantin >>>> >>>> >>>>>>> Thanks, >>>>>>> vlad >>>>>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> >>>>>>>>> --- >>>>>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >>>>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- >>>>>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>> index b8ee1e9..6714fd9 100644 >>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev >>>>>>>>> *dev, >>>>>>> struct >>>>>>>>> rte_eth_dev_info *dev_info) >>>>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >>>>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >>>>>>>>> }; >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * According to 82599 and x540 specifications RS bit *must* be >>>>> set on >>>>>>> the >>>>>>>>> + * last descriptor of *every* packet. Therefore we will not >>>>>>>>> allow >>>>> the >>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>>> + */ >>>>>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) >>>>>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; >>>>>>>>> + >>>>>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * >>>>>>>>> sizeof(uint32_t); >>>>>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >>>>>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; >>>>>>>>> diff -- >>>>>>> git >>>>>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>> index >>>>>>>>> 91023b9..8dbdffc 100644 >>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct >>>>>>>>> rte_eth_dev >>>>>>>>> *dev, >>>>>>>>> struct ixgbe_tx_queue *txq; >>>>>>>>> struct ixgbe_hw *hw; >>>>>>>>> uint16_t tx_rs_thresh, tx_free_thresh; >>>>>>>>> + bool rs_deferring_allowed; >>>>>>>>> >>>>>>>>> PMD_INIT_FUNC_TRACE(); >>>>>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>>>>> >>>>>>>>> /* >>>>>>>>> + * According to 82599 and x540 specifications RS bit *must* be >>>>> set on >>>>>>> the >>>>>>>>> + * last descriptor of *every* packet. Therefore we will not >>>>>>>>> allow >>>>> the >>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>>> + */ >>>>>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> * Validate number of transmit descriptors. >>>>>>>>> * It must not exceed hardware maximum, and must be multiple >>>>>>>>> * of IXGBE_ALIGN. >>>>>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>>> *dev, >>>>>>>>> * to transmit a packet is greater than the number of >>>>>>>>> free TX >>>>>>>>> * descriptors. >>>>>>>>> * The following constraints must be satisfied: >>>>>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS >>>>> deferring is >>>>>>>>> + * forbidden (all but 82598). >>>>>>>>> * tx_rs_thresh must be greater than 0. >>>>>>>>> * tx_rs_thresh must be less than the size of the ring >>>>>>>>> minus 2. >>>>>>>>> * tx_rs_thresh must be less than or equal to >>>>>>>>> tx_free_thresh. >>>>>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct >>>>>>>>> rte_eth_dev >>>>>>> *dev, >>>>>>>>> * When set to zero use default values. >>>>>>>>> */ >>>>>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >>>>>>>>> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); >>>>>>>>> + tx_conf->tx_rs_thresh : >>>>>>>>> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : >>>>> 1)); >>>>>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >>>>>>>>> tx_conf->tx_free_thresh : >>>>>>>>> DEFAULT_TX_FREE_THRESH); >>>>>>>>> + >>>>>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >>>>>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 >>>>>>>>> since >>>>> RS >>>>>>> " >>>>>>>>> + "must be set for every packet for this >>>>> HW. " >>>>>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", >>>>>>>>> + (unsigned int)tx_rs_thresh, >>>>>>>>> + (int)dev->data->port_id, (int)queue_idx); >>>>>>>>> + return -(EINVAL); >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { >>>>>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than >>>>>>>>> the >>>>>>> number " >>>>>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u >>>>> " >>>>>>>>> -- >>>>>>>>> 2.1.0 > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-20 9:05 ` Vlad Zolotarov @ 2015-08-20 9:06 ` Vlad Zolotarov 2015-08-25 17:33 ` Ananyev, Konstantin 0 siblings, 1 reply; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-20 9:06 UTC (permalink / raw) To: Ananyev, Konstantin, Lu, Wenzhuo; +Cc: dev On 08/20/15 12:05, Vlad Zolotarov wrote: > > > On 08/20/15 11:56, Vlad Zolotarov wrote: >> >> >> On 08/20/15 11:41, Ananyev, Konstantin wrote: >>> Hi Vlad, >>> >>>> -----Original Message----- >>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >>>> Sent: Wednesday, August 19, 2015 11:03 AM >>>> To: Ananyev, Konstantin; Lu, Wenzhuo >>>> Cc: dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh >>>> above 1 for all NICs but 82598 >>>> >>>> >>>> >>>> On 08/19/15 10:43, Ananyev, Konstantin wrote: >>>>> Hi Vlad, >>>>> Sorry for delay with review, I am OOO till next week. >>>>> Meanwhile, few questions/comments from me. >>>> Hi, Konstantin, long time no see... ;) >>>> >>>>>>>>>> This patch fixes the Tx hang we were constantly hitting with a >>>>>> seastar-based >>>>>>>>>> application on x540 NIC. >>>>>>>>> Could you help to share with us how to reproduce the tx hang >>>>>>>>> issue, >>>>>> with using >>>>>>>>> typical DPDK examples? >>>>>>>> Sorry. I'm not very familiar with the typical DPDK examples to >>>>>>>> help u >>>>>>>> here. However this is quite irrelevant since without this this >>>>>>>> patch >>>>>>>> ixgbe PMD obviously abuses the HW spec as has been explained >>>>>>>> above. >>>>>>>> >>>>>>>> We saw the issue when u stressed the xmit path with a lot of >>>>>>>> highly >>>>>>>> fragmented TCP frames (packets with up to 33 fragments with >>>>>>>> non-headers >>>>>>>> fragments as small as 4 bytes) with all offload features enabled. >>>>> Could you provide us with the pcap file to reproduce the issue? >>>> Well, the thing is it takes some time to reproduce it (a few >>>> minutes of >>>> heavy load) therefore a pcap would be quite large. >>> Probably you can upload it to some place, from which we will be able >>> to download it? >> >> I'll see what I can do but no promises... > > On a second thought pcap file won't help u much since in order to > reproduce the issue u have to reproduce exactly the same structure of > clusters i give to HW and it's not what u see on wire in a TSO case. And not only in a TSO case... ;) > >> >>> Or might be you have some sort of scapy script to generate it? >>> I suppose we'll need something to reproduce the issue and verify the >>> fix. >> >> Since the original code abuses the HW spec u don't have to... ;) >> >>> >>>>> My concern with you approach is that it would affect TX performance. >>>> It certainly will ;) But it seem inevitable. See below. >>>> >>>>> Right now, for simple TX PMD usually reads only >>>>> (nb_tx_desc/tx_rs_thresh) TXDs, >>>>> While with your patch (if I understand it correctly) it has to >>>>> read all TXDs in the HW TX ring. >>>> If by "simple" u refer an always single fragment per Tx packet - >>>> then u >>>> are absolutely correct. >>>> >>>> My initial patch was to only set RS on every EOP descriptor without >>>> changing the rs_thresh value and this patch worked. >>>> However HW spec doesn't ensure in a general case that packets are >>>> always >>>> handled/completion write-back completes in the same order the packets >>>> are placed on the ring (see "Tx arbitration schemes" chapter in 82599 >>>> spec for instance). Therefore AFAIU one should not assume that if >>>> packet[x+1] DD bit is set then packet[x] is completed too. >>> From my understanding, TX arbitration controls the order in which >>> TXDs from >>> different queues are fetched/processed. >>> But descriptors from the same TX queue are processed in FIFO order. >>> So, I think that - yes, if TXD[x+1] DD bit is set, then TXD[x] is >>> completed too, >>> and setting RS on every EOP TXD should be enough. >> >> Ok. I'll rework the patch under this assumption then. >> >>> >>>> That's why I changed the patch to be as u see it now. However if I >>>> miss >>>> something here and your HW people ensure the in-order completion >>>> this of >>>> course may be changed back. >>>> >>>>> Even if we really need to setup RS bit in each TXD (I still doubt >>>>> we really do) - , >>>> Well, if u doubt u may ask the guys from the Intel networking division >>>> that wrote the 82599 and x540 HW specs where they clearly state >>>> that. ;) >>> Good point, we'll see what we can do here :) >>> Konstantin >>> >>>>> I think inside PMD it still should be possible to check TX >>>>> completion in chunks. >>>>> Konstantin >>>>> >>>>> >>>>>>>> Thanks, >>>>>>>> vlad >>>>>>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> >>>>>>>>>> --- >>>>>>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >>>>>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 >>>>>>>>>> ++++++++++++++++++++++- >>>>>>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>>> index b8ee1e9..6714fd9 100644 >>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev >>>>>>>>>> *dev, >>>>>>>> struct >>>>>>>>>> rte_eth_dev_info *dev_info) >>>>>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >>>>>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >>>>>>>>>> }; >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> + * According to 82599 and x540 specifications RS bit >>>>>>>>>> *must* be >>>>>> set on >>>>>>>> the >>>>>>>>>> + * last descriptor of *every* packet. Therefore we will >>>>>>>>>> not allow >>>>>> the >>>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>>>> + */ >>>>>>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) >>>>>>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; >>>>>>>>>> + >>>>>>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * >>>>>>>>>> sizeof(uint32_t); >>>>>>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >>>>>>>>>> dev_info->flow_type_rss_offloads = >>>>>>>>>> IXGBE_RSS_OFFLOAD_ALL; diff -- >>>>>>>> git >>>>>>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>> index >>>>>>>>>> 91023b9..8dbdffc 100644 >>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct >>>>>>>>>> rte_eth_dev >>>>>>>>>> *dev, >>>>>>>>>> struct ixgbe_tx_queue *txq; >>>>>>>>>> struct ixgbe_hw *hw; >>>>>>>>>> uint16_t tx_rs_thresh, tx_free_thresh; >>>>>>>>>> + bool rs_deferring_allowed; >>>>>>>>>> >>>>>>>>>> PMD_INIT_FUNC_TRACE(); >>>>>>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>>>>>> >>>>>>>>>> /* >>>>>>>>>> + * According to 82599 and x540 specifications RS bit >>>>>>>>>> *must* be >>>>>> set on >>>>>>>> the >>>>>>>>>> + * last descriptor of *every* packet. Therefore we will >>>>>>>>>> not allow >>>>>> the >>>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>>>>>> + */ >>>>>>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >>>>>>>>>> + >>>>>>>>>> + /* >>>>>>>>>> * Validate number of transmit descriptors. >>>>>>>>>> * It must not exceed hardware maximum, and must be >>>>>>>>>> multiple >>>>>>>>>> * of IXGBE_ALIGN. >>>>>>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct >>>>>>>>>> rte_eth_dev >>>>>>>> *dev, >>>>>>>>>> * to transmit a packet is greater than the number of >>>>>>>>>> free TX >>>>>>>>>> * descriptors. >>>>>>>>>> * The following constraints must be satisfied: >>>>>>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS >>>>>> deferring is >>>>>>>>>> + * forbidden (all but 82598). >>>>>>>>>> * tx_rs_thresh must be greater than 0. >>>>>>>>>> * tx_rs_thresh must be less than the size of the ring >>>>>>>>>> minus 2. >>>>>>>>>> * tx_rs_thresh must be less than or equal to >>>>>>>>>> tx_free_thresh. >>>>>>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct >>>>>>>>>> rte_eth_dev >>>>>>>> *dev, >>>>>>>>>> * When set to zero use default values. >>>>>>>>>> */ >>>>>>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >>>>>>>>>> - tx_conf->tx_rs_thresh : >>>>>>>>>> DEFAULT_TX_RS_THRESH); >>>>>>>>>> + tx_conf->tx_rs_thresh : >>>>>>>>>> + (rs_deferring_allowed ? >>>>>>>>>> DEFAULT_TX_RS_THRESH : >>>>>> 1)); >>>>>>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >>>>>>>>>> tx_conf->tx_free_thresh : >>>>>>>>>> DEFAULT_TX_FREE_THRESH); >>>>>>>>>> + >>>>>>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >>>>>>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than >>>>>>>>>> 2 since >>>>>> RS >>>>>>>> " >>>>>>>>>> + "must be set for every packet for this >>>>>> HW. " >>>>>>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", >>>>>>>>>> + (unsigned int)tx_rs_thresh, >>>>>>>>>> + (int)dev->data->port_id, (int)queue_idx); >>>>>>>>>> + return -(EINVAL); >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { >>>>>>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less >>>>>>>>>> than the >>>>>>>> number " >>>>>>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u >>>>>> " >>>>>>>>>> -- >>>>>>>>>> 2.1.0 >> > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-20 9:06 ` Vlad Zolotarov @ 2015-08-25 17:33 ` Ananyev, Konstantin 2015-08-25 17:39 ` Avi Kivity 0 siblings, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-08-25 17:33 UTC (permalink / raw) To: Vlad Zolotarov, Lu, Wenzhuo; +Cc: dev Hi Vlad, > -----Original Message----- > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Thursday, August 20, 2015 10:07 AM > To: Ananyev, Konstantin; Lu, Wenzhuo > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > On 08/20/15 12:05, Vlad Zolotarov wrote: > > > > > > On 08/20/15 11:56, Vlad Zolotarov wrote: > >> > >> > >> On 08/20/15 11:41, Ananyev, Konstantin wrote: > >>> Hi Vlad, > >>> > >>>> -----Original Message----- > >>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > >>>> Sent: Wednesday, August 19, 2015 11:03 AM > >>>> To: Ananyev, Konstantin; Lu, Wenzhuo > >>>> Cc: dev@dpdk.org > >>>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > >>>> above 1 for all NICs but 82598 > >>>> > >>>> > >>>> > >>>> On 08/19/15 10:43, Ananyev, Konstantin wrote: > >>>>> Hi Vlad, > >>>>> Sorry for delay with review, I am OOO till next week. > >>>>> Meanwhile, few questions/comments from me. > >>>> Hi, Konstantin, long time no see... ;) > >>>> > >>>>>>>>>> This patch fixes the Tx hang we were constantly hitting with a > >>>>>> seastar-based > >>>>>>>>>> application on x540 NIC. > >>>>>>>>> Could you help to share with us how to reproduce the tx hang > >>>>>>>>> issue, > >>>>>> with using > >>>>>>>>> typical DPDK examples? > >>>>>>>> Sorry. I'm not very familiar with the typical DPDK examples to > >>>>>>>> help u > >>>>>>>> here. However this is quite irrelevant since without this this > >>>>>>>> patch > >>>>>>>> ixgbe PMD obviously abuses the HW spec as has been explained > >>>>>>>> above. > >>>>>>>> > >>>>>>>> We saw the issue when u stressed the xmit path with a lot of > >>>>>>>> highly > >>>>>>>> fragmented TCP frames (packets with up to 33 fragments with > >>>>>>>> non-headers > >>>>>>>> fragments as small as 4 bytes) with all offload features enabled. > >>>>> Could you provide us with the pcap file to reproduce the issue? > >>>> Well, the thing is it takes some time to reproduce it (a few > >>>> minutes of > >>>> heavy load) therefore a pcap would be quite large. > >>> Probably you can upload it to some place, from which we will be able > >>> to download it? > >> > >> I'll see what I can do but no promises... > > > > On a second thought pcap file won't help u much since in order to > > reproduce the issue u have to reproduce exactly the same structure of > > clusters i give to HW and it's not what u see on wire in a TSO case. > > And not only in a TSO case... ;) I understand that, but my thought was you can add some sort of TX callback for the rte_eth_tx_burst() into your code that would write the packet into pcap file and then re-run your hang scenario. I know that it means extra work for you - but I think it would be very helpful if we would be able to reproduce your hang scenario: - if HW guys would confirm that setting RS bit for every EOP packet is not really required, then we probably have to look at what else can cause it. - it might be added to our validation cycle, to prevent hitting similar problem in future. Thanks Konstantin > > > > >> > >>> Or might be you have some sort of scapy script to generate it? > >>> I suppose we'll need something to reproduce the issue and verify the > >>> fix. > >> > >> Since the original code abuses the HW spec u don't have to... ;) > >> > >>> > >>>>> My concern with you approach is that it would affect TX performance. > >>>> It certainly will ;) But it seem inevitable. See below. > >>>> > >>>>> Right now, for simple TX PMD usually reads only > >>>>> (nb_tx_desc/tx_rs_thresh) TXDs, > >>>>> While with your patch (if I understand it correctly) it has to > >>>>> read all TXDs in the HW TX ring. > >>>> If by "simple" u refer an always single fragment per Tx packet - > >>>> then u > >>>> are absolutely correct. > >>>> > >>>> My initial patch was to only set RS on every EOP descriptor without > >>>> changing the rs_thresh value and this patch worked. > >>>> However HW spec doesn't ensure in a general case that packets are > >>>> always > >>>> handled/completion write-back completes in the same order the packets > >>>> are placed on the ring (see "Tx arbitration schemes" chapter in 82599 > >>>> spec for instance). Therefore AFAIU one should not assume that if > >>>> packet[x+1] DD bit is set then packet[x] is completed too. > >>> From my understanding, TX arbitration controls the order in which > >>> TXDs from > >>> different queues are fetched/processed. > >>> But descriptors from the same TX queue are processed in FIFO order. > >>> So, I think that - yes, if TXD[x+1] DD bit is set, then TXD[x] is > >>> completed too, > >>> and setting RS on every EOP TXD should be enough. > >> > >> Ok. I'll rework the patch under this assumption then. > >> > >>> > >>>> That's why I changed the patch to be as u see it now. However if I > >>>> miss > >>>> something here and your HW people ensure the in-order completion > >>>> this of > >>>> course may be changed back. > >>>> > >>>>> Even if we really need to setup RS bit in each TXD (I still doubt > >>>>> we really do) - , > >>>> Well, if u doubt u may ask the guys from the Intel networking division > >>>> that wrote the 82599 and x540 HW specs where they clearly state > >>>> that. ;) > >>> Good point, we'll see what we can do here :) > >>> Konstantin > >>> > >>>>> I think inside PMD it still should be possible to check TX > >>>>> completion in chunks. > >>>>> Konstantin > >>>>> > >>>>> > >>>>>>>> Thanks, > >>>>>>>> vlad > >>>>>>>>>> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > >>>>>>>>>> --- > >>>>>>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > >>>>>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 > >>>>>>>>>> ++++++++++++++++++++++- > >>>>>>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>>>>>> index b8ee1e9..6714fd9 100644 > >>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > >>>>>>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > >>>>>>>>>> *dev, > >>>>>>>> struct > >>>>>>>>>> rte_eth_dev_info *dev_info) > >>>>>>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > >>>>>>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, > >>>>>>>>>> }; > >>>>>>>>>> + > >>>>>>>>>> + /* > >>>>>>>>>> + * According to 82599 and x540 specifications RS bit > >>>>>>>>>> *must* be > >>>>>> set on > >>>>>>>> the > >>>>>>>>>> + * last descriptor of *every* packet. Therefore we will > >>>>>>>>>> not allow > >>>>>> the > >>>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >>>>>>>>>> + */ > >>>>>>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) > >>>>>>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; > >>>>>>>>>> + > >>>>>>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > >>>>>>>>>> sizeof(uint32_t); > >>>>>>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > >>>>>>>>>> dev_info->flow_type_rss_offloads = > >>>>>>>>>> IXGBE_RSS_OFFLOAD_ALL; diff -- > >>>>>>>> git > >>>>>>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>> index > >>>>>>>>>> 91023b9..8dbdffc 100644 > >>>>>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > >>>>>>>>>> rte_eth_dev > >>>>>>>>>> *dev, > >>>>>>>>>> struct ixgbe_tx_queue *txq; > >>>>>>>>>> struct ixgbe_hw *hw; > >>>>>>>>>> uint16_t tx_rs_thresh, tx_free_thresh; > >>>>>>>>>> + bool rs_deferring_allowed; > >>>>>>>>>> > >>>>>>>>>> PMD_INIT_FUNC_TRACE(); > >>>>>>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > >>>>>>>>>> > >>>>>>>>>> /* > >>>>>>>>>> + * According to 82599 and x540 specifications RS bit > >>>>>>>>>> *must* be > >>>>>> set on > >>>>>>>> the > >>>>>>>>>> + * last descriptor of *every* packet. Therefore we will > >>>>>>>>>> not allow > >>>>>> the > >>>>>>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. > >>>>>>>>>> + */ > >>>>>>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > >>>>>>>>>> + > >>>>>>>>>> + /* > >>>>>>>>>> * Validate number of transmit descriptors. > >>>>>>>>>> * It must not exceed hardware maximum, and must be > >>>>>>>>>> multiple > >>>>>>>>>> * of IXGBE_ALIGN. > >>>>>>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > >>>>>>>>>> rte_eth_dev > >>>>>>>> *dev, > >>>>>>>>>> * to transmit a packet is greater than the number of > >>>>>>>>>> free TX > >>>>>>>>>> * descriptors. > >>>>>>>>>> * The following constraints must be satisfied: > >>>>>>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS > >>>>>> deferring is > >>>>>>>>>> + * forbidden (all but 82598). > >>>>>>>>>> * tx_rs_thresh must be greater than 0. > >>>>>>>>>> * tx_rs_thresh must be less than the size of the ring > >>>>>>>>>> minus 2. > >>>>>>>>>> * tx_rs_thresh must be less than or equal to > >>>>>>>>>> tx_free_thresh. > >>>>>>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > >>>>>>>>>> rte_eth_dev > >>>>>>>> *dev, > >>>>>>>>>> * When set to zero use default values. > >>>>>>>>>> */ > >>>>>>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > >>>>>>>>>> - tx_conf->tx_rs_thresh : > >>>>>>>>>> DEFAULT_TX_RS_THRESH); > >>>>>>>>>> + tx_conf->tx_rs_thresh : > >>>>>>>>>> + (rs_deferring_allowed ? > >>>>>>>>>> DEFAULT_TX_RS_THRESH : > >>>>>> 1)); > >>>>>>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > >>>>>>>>>> tx_conf->tx_free_thresh : > >>>>>>>>>> DEFAULT_TX_FREE_THRESH); > >>>>>>>>>> + > >>>>>>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > >>>>>>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > >>>>>>>>>> 2 since > >>>>>> RS > >>>>>>>> " > >>>>>>>>>> + "must be set for every packet for this > >>>>>> HW. " > >>>>>>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", > >>>>>>>>>> + (unsigned int)tx_rs_thresh, > >>>>>>>>>> + (int)dev->data->port_id, (int)queue_idx); > >>>>>>>>>> + return -(EINVAL); > >>>>>>>>>> + } > >>>>>>>>>> + > >>>>>>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { > >>>>>>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less > >>>>>>>>>> than the > >>>>>>>> number " > >>>>>>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u > >>>>>> " > >>>>>>>>>> -- > >>>>>>>>>> 2.1.0 > >> > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 17:33 ` Ananyev, Konstantin @ 2015-08-25 17:39 ` Avi Kivity 0 siblings, 0 replies; 48+ messages in thread From: Avi Kivity @ 2015-08-25 17:39 UTC (permalink / raw) To: Ananyev, Konstantin, Vlad Zolotarov, Lu, Wenzhuo; +Cc: dev On 08/25/2015 08:33 PM, Ananyev, Konstantin wrote: > Hi Vlad, > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Thursday, August 20, 2015 10:07 AM >> To: Ananyev, Konstantin; Lu, Wenzhuo >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 >> >> >> >> On 08/20/15 12:05, Vlad Zolotarov wrote: >>> >>> On 08/20/15 11:56, Vlad Zolotarov wrote: >>>> >>>> On 08/20/15 11:41, Ananyev, Konstantin wrote: >>>>> Hi Vlad, >>>>> >>>>>> -----Original Message----- >>>>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >>>>>> Sent: Wednesday, August 19, 2015 11:03 AM >>>>>> To: Ananyev, Konstantin; Lu, Wenzhuo >>>>>> Cc: dev@dpdk.org >>>>>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh >>>>>> above 1 for all NICs but 82598 >>>>>> >>>>>> >>>>>> >>>>>> On 08/19/15 10:43, Ananyev, Konstantin wrote: >>>>>>> Hi Vlad, >>>>>>> Sorry for delay with review, I am OOO till next week. >>>>>>> Meanwhile, few questions/comments from me. >>>>>> Hi, Konstantin, long time no see... ;) >>>>>> >>>>>>>>>>>> This patch fixes the Tx hang we were constantly hitting with a >>>>>>>> seastar-based >>>>>>>>>>>> application on x540 NIC. >>>>>>>>>>> Could you help to share with us how to reproduce the tx hang >>>>>>>>>>> issue, >>>>>>>> with using >>>>>>>>>>> typical DPDK examples? >>>>>>>>>> Sorry. I'm not very familiar with the typical DPDK examples to >>>>>>>>>> help u >>>>>>>>>> here. However this is quite irrelevant since without this this >>>>>>>>>> patch >>>>>>>>>> ixgbe PMD obviously abuses the HW spec as has been explained >>>>>>>>>> above. >>>>>>>>>> >>>>>>>>>> We saw the issue when u stressed the xmit path with a lot of >>>>>>>>>> highly >>>>>>>>>> fragmented TCP frames (packets with up to 33 fragments with >>>>>>>>>> non-headers >>>>>>>>>> fragments as small as 4 bytes) with all offload features enabled. >>>>>>> Could you provide us with the pcap file to reproduce the issue? >>>>>> Well, the thing is it takes some time to reproduce it (a few >>>>>> minutes of >>>>>> heavy load) therefore a pcap would be quite large. >>>>> Probably you can upload it to some place, from which we will be able >>>>> to download it? >>>> I'll see what I can do but no promises... >>> On a second thought pcap file won't help u much since in order to >>> reproduce the issue u have to reproduce exactly the same structure of >>> clusters i give to HW and it's not what u see on wire in a TSO case. >> And not only in a TSO case... ;) > I understand that, but my thought was you can add some sort of TX callback for the rte_eth_tx_burst() > into your code that would write the packet into pcap file and then re-run your hang scenario. > I know that it means extra work for you - but I think it would be very helpful if we would be able to reproduce your hang scenario: > - if HW guys would confirm that setting RS bit for every EOP packet is not really required, > then we probably have to look at what else can cause it. > - it might be added to our validation cycle, to prevent hitting similar problem in future. > Thanks > Konstantin > I think if you send packets with random fragment chains up to 32 mbufs you might see this. TSO was not required to trigger this problem. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 4:55 ` Vladislav Zolotarov 2015-08-19 7:43 ` Ananyev, Konstantin @ 2015-08-19 17:29 ` Zhang, Helin 2015-08-25 18:13 ` Zhang, Helin 1 sibling, 1 reply; 48+ messages in thread From: Zhang, Helin @ 2015-08-19 17:29 UTC (permalink / raw) To: Vladislav Zolotarov; +Cc: dev Hi Vlad Thank you very much for the patches! Give me a few more time to double check with more guys, and possibly hardware experts. Regards, Helin From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] Sent: Tuesday, August 18, 2015 9:56 PM To: Lu, Wenzhuo Cc: dev@dpdk.org; Zhang, Helin Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com>> wrote: > > Hi Helin, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > Sent: Friday, August 14, 2015 1:38 PM > > To: Zhang, Helin; dev@dpdk.org<mailto:dev@dpdk.org> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > Hi Vlad > > > > > > I don't think the changes are needed. It says in datasheet that the RS > > > bit should be set on the last descriptor of every packet, ONLY WHEN > > TXDCTL.WTHRESH equals to ZERO. > > > > Of course it's needed! See below. > > Exactly the same spec a few lines above the place u've just quoted states: > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > > zero." > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > > is actually not supporting any value of WTHRESH different from zero. > I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. Pls., note that there is a v2 of this patch on the list. I forgot to patch ixgbevf_dev_info_get() in v1. > > > > > > > > > Regards, > > > Helin > > > > > >> -----Original Message----- > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > > >> Sent: Thursday, August 13, 2015 11:07 AM > > >> To: dev@dpdk.org<mailto:dev@dpdk.org> > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com<mailto:avi@cloudius-systems.com>; Vlad > > >> Zolotarov > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all > > >> NICs but 82598 > > >> > > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > > last > > >> descriptor of *every* packet. > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > Right and ixgbe PMD requires this condition to be fulfilled in order to > > function. See above. > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > > >> application on x540 NIC. > > > Could you help to share with us how to reproduce the tx hang issue, with using > > > typical DPDK examples? > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > here. However this is quite irrelevant since without this this patch > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > We saw the issue when u stressed the xmit path with a lot of highly > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > fragments as small as 4 bytes) with all offload features enabled. > > > > Thanks, > > vlad > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>> > > >> --- > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> index b8ee1e9..6714fd9 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > struct > > >> rte_eth_dev_info *dev_info) > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > >> }; > > >> + > > >> + /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > >> + > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > git > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > > >> 91023b9..8dbdffc 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > >> *dev, > > >> struct ixgbe_tx_queue *txq; > > >> struct ixgbe_hw *hw; > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > >> + bool rs_deferring_allowed; > > >> > > >> PMD_INIT_FUNC_TRACE(); > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > >> > > >> /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > >> + > > >> + /* > > >> * Validate number of transmit descriptors. > > >> * It must not exceed hardware maximum, and must be multiple > > >> * of IXGBE_ALIGN. > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * to transmit a packet is greater than the number of free TX > > >> * descriptors. > > >> * The following constraints must be satisfied: > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > > >> + * forbidden (all but 82598). > > >> * tx_rs_thresh must be greater than 0. > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * When set to zero use default values. > > >> */ > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > >> + tx_conf->tx_rs_thresh : > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > >> + > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > > " > > >> + "must be set for every packet for this HW. " > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > >> + (unsigned int)tx_rs_thresh, > > >> + (int)dev->data->port_id, (int)queue_idx); > > >> + return -(EINVAL); > > >> + } > > >> + > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > number " > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > > >> -- > > >> 2.1.0 > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-19 17:29 ` Zhang, Helin @ 2015-08-25 18:13 ` Zhang, Helin 2015-08-25 18:33 ` Vladislav Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Zhang, Helin @ 2015-08-25 18:13 UTC (permalink / raw) To: 'Vladislav Zolotarov'; +Cc: 'dev@dpdk.org' Hi Vlad In addition, I’d double check with you what’s the maximum number of descriptors would be used for a single packet transmitting? Datasheet said that it supports up to 8. I am wondering if more than 8 were used in your case? Thank you very much! Regards, Helin From: Zhang, Helin Sent: Wednesday, August 19, 2015 10:29 AM To: Vladislav Zolotarov Cc: Lu, Wenzhuo; dev@dpdk.org Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 Hi Vlad Thank you very much for the patches! Give me a few more time to double check with more guys, and possibly hardware experts. Regards, Helin From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] Sent: Tuesday, August 18, 2015 9:56 PM To: Lu, Wenzhuo Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Zhang, Helin Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com>> wrote: > > Hi Helin, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > Sent: Friday, August 14, 2015 1:38 PM > > To: Zhang, Helin; dev@dpdk.org<mailto:dev@dpdk.org> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > Hi Vlad > > > > > > I don't think the changes are needed. It says in datasheet that the RS > > > bit should be set on the last descriptor of every packet, ONLY WHEN > > TXDCTL.WTHRESH equals to ZERO. > > > > Of course it's needed! See below. > > Exactly the same spec a few lines above the place u've just quoted states: > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > > zero." > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > > is actually not supporting any value of WTHRESH different from zero. > I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. Pls., note that there is a v2 of this patch on the list. I forgot to patch ixgbevf_dev_info_get() in v1. > > > > > > > > > Regards, > > > Helin > > > > > >> -----Original Message----- > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > > >> Sent: Thursday, August 13, 2015 11:07 AM > > >> To: dev@dpdk.org<mailto:dev@dpdk.org> > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com<mailto:avi@cloudius-systems.com>; Vlad > > >> Zolotarov > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all > > >> NICs but 82598 > > >> > > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > > last > > >> descriptor of *every* packet. > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > Right and ixgbe PMD requires this condition to be fulfilled in order to > > function. See above. > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > > >> application on x540 NIC. > > > Could you help to share with us how to reproduce the tx hang issue, with using > > > typical DPDK examples? > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > here. However this is quite irrelevant since without this this patch > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > We saw the issue when u stressed the xmit path with a lot of highly > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > fragments as small as 4 bytes) with all offload features enabled. > > > > Thanks, > > vlad > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>> > > >> --- > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> index b8ee1e9..6714fd9 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > struct > > >> rte_eth_dev_info *dev_info) > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > >> }; > > >> + > > >> + /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > >> + > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > git > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > > >> 91023b9..8dbdffc 100644 > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > >> *dev, > > >> struct ixgbe_tx_queue *txq; > > >> struct ixgbe_hw *hw; > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > >> + bool rs_deferring_allowed; > > >> > > >> PMD_INIT_FUNC_TRACE(); > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > >> > > >> /* > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > the > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > >> + */ > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > >> + > > >> + /* > > >> * Validate number of transmit descriptors. > > >> * It must not exceed hardware maximum, and must be multiple > > >> * of IXGBE_ALIGN. > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * to transmit a packet is greater than the number of free TX > > >> * descriptors. > > >> * The following constraints must be satisfied: > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > > >> + * forbidden (all but 82598). > > >> * tx_rs_thresh must be greater than 0. > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > *dev, > > >> * When set to zero use default values. > > >> */ > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > >> + tx_conf->tx_rs_thresh : > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > >> + > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > > " > > >> + "must be set for every packet for this HW. " > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > >> + (unsigned int)tx_rs_thresh, > > >> + (int)dev->data->port_id, (int)queue_idx); > > >> + return -(EINVAL); > > >> + } > > >> + > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > number " > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > > >> -- > > >> 2.1.0 > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 18:13 ` Zhang, Helin @ 2015-08-25 18:33 ` Vladislav Zolotarov 2015-08-25 18:43 ` Zhang, Helin 0 siblings, 1 reply; 48+ messages in thread From: Vladislav Zolotarov @ 2015-08-25 18:33 UTC (permalink / raw) To: Helin Zhang; +Cc: dev On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com> wrote: > > Hi Vlad > > > > In addition, I’d double check with you what’s the maximum number of descriptors would be used for a single packet transmitting? > > Datasheet said that it supports up to 8. I am wondering if more than 8 were used in your case? If memory serves me well the maximum number of data descriptors per single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK is always zero it gives us 38 segments. We limit them by 33. > > Thank you very much! > > > > Regards, > > Helin > > > > From: Zhang, Helin > Sent: Wednesday, August 19, 2015 10:29 AM > To: Vladislav Zolotarov > Cc: Lu, Wenzhuo; dev@dpdk.org > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > Hi Vlad > > > > Thank you very much for the patches! Give me a few more time to double check with more guys, and possibly hardware experts. > > > > Regards, > > Helin > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Tuesday, August 18, 2015 9:56 PM > To: Lu, Wenzhuo > Cc: dev@dpdk.org; Zhang, Helin > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com> wrote: > > > > Hi Helin, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vlad Zolotarov > > > Sent: Friday, August 14, 2015 1:38 PM > > > To: Zhang, Helin; dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > > all NICs but 82598 > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > Hi Vlad > > > > > > > > I don't think the changes are needed. It says in datasheet that the RS > > > > bit should be set on the last descriptor of every packet, ONLY WHEN > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > Of course it's needed! See below. > > > Exactly the same spec a few lines above the place u've just quoted states: > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > > > zero." > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > > > is actually not supporting any value of WTHRESH different from zero. > > I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. > > Pls., note that there is a v2 of this patch on the list. I forgot to patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > Regards, > > > > Helin > > > > > > > >> -----Original Message----- > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > >> To: dev@dpdk.org > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com; Vlad > > > >> Zolotarov > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > > all > > > >> NICs but 82598 > > > >> > > > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > > > last > > > >> descriptor of *every* packet. > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in order to > > > function. See above. > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > > > >> application on x540 NIC. > > > > Could you help to share with us how to reproduce the tx hang issue, with using > > > > typical DPDK examples? > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > > here. However this is quite irrelevant since without this this patch > > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > > > We saw the issue when u stressed the xmit path with a lot of highly > > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > Thanks, > > > vlad > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> > > > >> --- > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> index b8ee1e9..6714fd9 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > > struct > > > >> rte_eth_dev_info *dev_info) > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > >> }; > > > >> + > > > >> + /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > >> + > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > > git > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > >> 91023b9..8dbdffc 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > >> *dev, > > > >> struct ixgbe_tx_queue *txq; > > > >> struct ixgbe_hw *hw; > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > >> + bool rs_deferring_allowed; > > > >> > > > >> PMD_INIT_FUNC_TRACE(); > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > >> > > > >> /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > > >> + > > > >> + /* > > > >> * Validate number of transmit descriptors. > > > >> * It must not exceed hardware maximum, and must be multiple > > > >> * of IXGBE_ALIGN. > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * to transmit a packet is greater than the number of free TX > > > >> * descriptors. > > > >> * The following constraints must be satisfied: > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > > > >> + * forbidden (all but 82598). > > > >> * tx_rs_thresh must be greater than 0. > > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * When set to zero use default values. > > > >> */ > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > >> + tx_conf->tx_rs_thresh : > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > >> + > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > > > " > > > >> + "must be set for every packet for this HW. " > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > >> + (unsigned int)tx_rs_thresh, > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > >> + return -(EINVAL); > > > >> + } > > > >> + > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > > number " > > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > > > >> -- > > > >> 2.1.0 > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 18:33 ` Vladislav Zolotarov @ 2015-08-25 18:43 ` Zhang, Helin 2015-08-25 18:52 ` Vlad Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Zhang, Helin @ 2015-08-25 18:43 UTC (permalink / raw) To: Vladislav Zolotarov; +Cc: dev Hi Vlad I think this could possibly be the root cause of your TX hang issue. Please try to limit the number to 8 or less, and then see if the issue will still be there or not? It does not have any check for the number of descriptors to be used for a single packet, and it relies on the users to give correct mbuf chains. We may need a check of this somewhere. Of cause the point you indicated we also need to carefully investigate or fix. Regards, Helin From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] Sent: Tuesday, August 25, 2015 11:34 AM To: Zhang, Helin Cc: Lu, Wenzhuo; dev@dpdk.org Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com<mailto:helin.zhang@intel.com>> wrote: > > Hi Vlad > > > > In addition, I’d double check with you what’s the maximum number of descriptors would be used for a single packet transmitting? > > Datasheet said that it supports up to 8. I am wondering if more than 8 were used in your case? If memory serves me well the maximum number of data descriptors per single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK is always zero it gives us 38 segments. We limit them by 33. > > Thank you very much! > > > > Regards, > > Helin > > > > From: Zhang, Helin > Sent: Wednesday, August 19, 2015 10:29 AM > To: Vladislav Zolotarov > Cc: Lu, Wenzhuo; dev@dpdk.org<mailto:dev@dpdk.org> > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > Hi Vlad > > > > Thank you very much for the patches! Give me a few more time to double check with more guys, and possibly hardware experts. > > > > Regards, > > Helin > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > Sent: Tuesday, August 18, 2015 9:56 PM > To: Lu, Wenzhuo > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Zhang, Helin > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com>> wrote: > > > > Hi Helin, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > > Sent: Friday, August 14, 2015 1:38 PM > > > To: Zhang, Helin; dev@dpdk.org<mailto:dev@dpdk.org> > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > > all NICs but 82598 > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > Hi Vlad > > > > > > > > I don't think the changes are needed. It says in datasheet that the RS > > > > bit should be set on the last descriptor of every packet, ONLY WHEN > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > Of course it's needed! See below. > > > Exactly the same spec a few lines above the place u've just quoted states: > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than > > > zero." > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation ixgbe PMD > > > is actually not supporting any value of WTHRESH different from zero. > > I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd like to ack this patch. > > Pls., note that there is a v2 of this patch on the list. I forgot to patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > Regards, > > > > Helin > > > > > > > >> -----Original Message----- > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > >> To: dev@dpdk.org<mailto:dev@dpdk.org> > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi@cloudius-systems.com<mailto:avi@cloudius-systems.com>; Vlad > > > >> Zolotarov > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > > all > > > >> NICs but 82598 > > > >> > > > >> According to 82599 and x540 HW specifications RS bit *must* be set in the > > > last > > > >> descriptor of *every* packet. > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in order to > > > function. See above. > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with a seastar-based > > > >> application on x540 NIC. > > > > Could you help to share with us how to reproduce the tx hang issue, with using > > > > typical DPDK examples? > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to help u > > > here. However this is quite irrelevant since without this this patch > > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > > > We saw the issue when u stressed the xmit path with a lot of highly > > > fragmented TCP frames (packets with up to 33 fragments with non-headers > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > Thanks, > > > vlad > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>> > > > >> --- > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> index b8ee1e9..6714fd9 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, > > > struct > > > >> rte_eth_dev_info *dev_info) > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > >> }; > > > >> + > > > >> + /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > >> + > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > > git > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > >> 91023b9..8dbdffc 100644 > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > >> *dev, > > > >> struct ixgbe_tx_queue *txq; > > > >> struct ixgbe_hw *hw; > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > >> + bool rs_deferring_allowed; > > > >> > > > >> PMD_INIT_FUNC_TRACE(); > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > >> > > > >> /* > > > >> + * According to 82599 and x540 specifications RS bit *must* be set on > > > the > > > >> + * last descriptor of *every* packet. Therefore we will not allow the > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > >> + */ > > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > > >> + > > > >> + /* > > > >> * Validate number of transmit descriptors. > > > >> * It must not exceed hardware maximum, and must be multiple > > > >> * of IXGBE_ALIGN. > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * to transmit a packet is greater than the number of free TX > > > >> * descriptors. > > > >> * The following constraints must be satisfied: > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS deferring is > > > >> + * forbidden (all but 82598). > > > >> * tx_rs_thresh must be greater than 0. > > > >> * tx_rs_thresh must be less than the size of the ring minus 2. > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > *dev, > > > >> * When set to zero use default values. > > > >> */ > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > >> + tx_conf->tx_rs_thresh : > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > >> + > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since RS > > > " > > > >> + "must be set for every packet for this HW. " > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > >> + (unsigned int)tx_rs_thresh, > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > >> + return -(EINVAL); > > > >> + } > > > >> + > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > > number " > > > >> "of TX descriptors minus 2. (tx_rs_thresh=%u " > > > >> -- > > > >> 2.1.0 > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 18:43 ` Zhang, Helin @ 2015-08-25 18:52 ` Vlad Zolotarov 2015-08-25 19:16 ` Zhang, Helin 2015-09-11 14:25 ` didier.pallard 0 siblings, 2 replies; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-25 18:52 UTC (permalink / raw) To: Zhang, Helin; +Cc: dev On 08/25/15 21:43, Zhang, Helin wrote: > > Hi Vlad > > I think this could possibly be the root cause of your TX hang issue. > Please try to limit the number to 8 or less, and then see if the issue > will still be there or not? > Helin, the issue has been seen on x540 devices. Pls., see a chapter 7.2.1.1 of x540 devices spec: A packet (or multiple packets in transmit segmentation) can span any number of buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH details). For best performance it is recommended to minimize the number of buffers as possible. Could u, pls., clarify why do u think that the maximum number of data buffers is limited by 8? thanks, vlad > It does not have any check for the number of descriptors to be used > for a single packet, and it relies on the users to give correct mbuf > chains. > > We may need a check of this somewhere. Of cause the point you > indicated we also need to carefully investigate or fix. > > Regards, > > Helin > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > *Sent:* Tuesday, August 25, 2015 11:34 AM > *To:* Zhang, Helin > *Cc:* Lu, Wenzhuo; dev@dpdk.org > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > above 1 for all NICs but 82598 > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com > <mailto:helin.zhang@intel.com>> wrote: > > > > Hi Vlad > > > > > > > > In addition, I’d double check with you what’s the maximum number of > descriptors would be used for a single packet transmitting? > > > > Datasheet said that it supports up to 8. I am wondering if more than > 8 were used in your case? > > If memory serves me well the maximum number of data descriptors per > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK > is always zero it gives us 38 segments. We limit them by 33. > > > > > Thank you very much! > > > > > > > > Regards, > > > > Helin > > > > > > > > From: Zhang, Helin > > Sent: Wednesday, August 19, 2015 10:29 AM > > To: Vladislav Zolotarov > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > above 1 for all NICs but 82598 > > > > > > > > Hi Vlad > > > > > > > > Thank you very much for the patches! Give me a few more time to > double check with more guys, and possibly hardware experts. > > > > > > > > Regards, > > > > Helin > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>] > > Sent: Tuesday, August 18, 2015 9:56 PM > > To: Lu, Wenzhuo > > Cc: dev@dpdk.org <mailto:dev@dpdk.org>; Zhang, Helin > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > above 1 for all NICs but 82598 > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com > <mailto:wenzhuo.lu@intel.com>> wrote: > > > > > > Hi Helin, > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org > <mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > To: Zhang, Helin; dev@dpdk.org <mailto:dev@dpdk.org> > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > tx_rs_thresh above 1 for > > > > all NICs but 82598 > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > Hi Vlad > > > > > > > > > > I don't think the changes are needed. It says in datasheet > that the RS > > > > > bit should be set on the last descriptor of every packet, ONLY > WHEN > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > Of course it's needed! See below. > > > > Exactly the same spec a few lines above the place u've just > quoted states: > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > greater than > > > > zero." > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > bit notation ixgbe PMD > > > > is actually not supporting any value of WTHRESH different from zero. > > > I think Vlad is right. We need to fix this issue. Any suggestion? > If not, I'd like to ack this patch. > > > > Pls., note that there is a v2 of this patch on the list. I forgot to > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > Regards, > > > > > Helin > > > > > > > > > >> -----Original Message----- > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>] > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > >> To: dev@dpdk.org <mailto:dev@dpdk.org> > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > avi@cloudius-systems.com <mailto:avi@cloudius-systems.com>; Vlad > > > > >> Zolotarov > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > above 1 for > > > > all > > > > >> NICs but 82598 > > > > >> > > > > >> According to 82599 and x540 HW specifications RS bit *must* > be set in the > > > > last > > > > >> descriptor of *every* packet. > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > order to > > > > function. See above. > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with > a seastar-based > > > > >> application on x540 NIC. > > > > > Could you help to share with us how to reproduce the tx hang > issue, with using > > > > > typical DPDK examples? > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > help u > > > > here. However this is quite irrelevant since without this this patch > > > > ixgbe PMD obviously abuses the HW spec as has been explained above. > > > > > > > > We saw the issue when u stressed the xmit path with a lot of highly > > > > fragmented TCP frames (packets with up to 33 fragments with > non-headers > > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > > > Thanks, > > > > vlad > > > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>> > > > > >> --- > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > >> > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > >> index b8ee1e9..6714fd9 100644 > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > *dev, > > > > struct > > > > >> rte_eth_dev_info *dev_info) > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > >> }; > > > > >> + > > > > >> + /* > > > > >> + * According to 82599 and x540 specifications RS bit > *must* be set on > > > > the > > > > >> + * last descriptor of *every* packet. Therefore we will > not allow the > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > >> + */ > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > > >> + > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > sizeof(uint32_t); > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- > > > > git > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > >> 91023b9..8dbdffc 100644 > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > rte_eth_dev > > > > >> *dev, > > > > >> struct ixgbe_tx_queue *txq; > > > > >> struct ixgbe_hw *hw; > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > >> + bool rs_deferring_allowed; > > > > >> > > > > >> PMD_INIT_FUNC_TRACE(); > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > >> > > > > >> /* > > > > >> + * According to 82599 and x540 specifications RS bit > *must* be set on > > > > the > > > > >> + * last descriptor of *every* packet. Therefore we will > not allow the > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > >> + */ > > > > >> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); > > > > >> + > > > > >> + /* > > > > >> * Validate number of transmit descriptors. > > > > >> * It must not exceed hardware maximum, and must be multiple > > > > >> * of IXGBE_ALIGN. > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev > > > > *dev, > > > > >> * to transmit a packet is greater than the number of free TX > > > > >> * descriptors. > > > > >> * The following constraints must be satisfied: > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS > deferring is > > > > >> + * forbidden (all but 82598). > > > > >> * tx_rs_thresh must be greater than 0. > > > > >> * tx_rs_thresh must be less than the size of the ring > minus 2. > > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > rte_eth_dev > > > > *dev, > > > > >> * When set to zero use default values. > > > > >> */ > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > >> + tx_conf->tx_rs_thresh : > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > >> + > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > 2 since RS > > > > " > > > > >> + "must be set for every packet > for this HW. " > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > > >> + (unsigned int)tx_rs_thresh, > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > >> + return -(EINVAL); > > > > >> + } > > > > >> + > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the > > > > number " > > > > >> "of TX descriptors minus 2. > (tx_rs_thresh=%u " > > > > >> -- > > > > >> 2.1.0 > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 18:52 ` Vlad Zolotarov @ 2015-08-25 19:16 ` Zhang, Helin 2015-08-25 19:23 ` Avi Kivity 2015-08-25 19:30 ` Vladislav Zolotarov 2015-09-11 14:25 ` didier.pallard 1 sibling, 2 replies; 48+ messages in thread From: Zhang, Helin @ 2015-08-25 19:16 UTC (permalink / raw) To: Vlad Zolotarov; +Cc: dev > -----Original Message----- > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > Sent: Tuesday, August 25, 2015 11:53 AM > To: Zhang, Helin > Cc: Lu, Wenzhuo; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > all NICs but 82598 > > > > On 08/25/15 21:43, Zhang, Helin wrote: > > > > Hi Vlad > > > > I think this could possibly be the root cause of your TX hang issue. > > Please try to limit the number to 8 or less, and then see if the issue > > will still be there or not? > > > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > 7.2.1.1 of x540 devices spec: > > A packet (or multiple packets in transmit segmentation) can span any number of > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH > details). For best performance it is recommended to minimize the number of > buffers as possible. > > Could u, pls., clarify why do u think that the maximum number of data buffers is > limited by 8? OK, i40e hardware is 8, so I'd assume x540 could have a similar one. Yes, in your case, the limit could be around 38, right? Could you help to make sure there is no packet to be transmitted uses more than 38 descriptors? I heard that there is a similar hang issue on X710 if using more than 8 descriptors for a single packet. I am wondering if the issue is similar on x540. Regards, Helin > > thanks, > vlad > > > It does not have any check for the number of descriptors to be used > > for a single packet, and it relies on the users to give correct mbuf > > chains. > > > > We may need a check of this somewhere. Of cause the point you > > indicated we also need to carefully investigate or fix. > > > > Regards, > > > > Helin > > > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > > *Sent:* Tuesday, August 25, 2015 11:34 AM > > *To:* Zhang, Helin > > *Cc:* Lu, Wenzhuo; dev@dpdk.org > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > above 1 for all NICs but 82598 > > > > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com > > <mailto:helin.zhang@intel.com>> wrote: > > > > > > Hi Vlad > > > > > > > > > > > > In addition, I’d double check with you what’s the maximum number of > > descriptors would be used for a single packet transmitting? > > > > > > Datasheet said that it supports up to 8. I am wondering if more than > > 8 were used in your case? > > > > If memory serves me well the maximum number of data descriptors per > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK > > is always zero it gives us 38 segments. We limit them by 33. > > > > > > > > Thank you very much! > > > > > > > > > > > > Regards, > > > > > > Helin > > > > > > > > > > > > From: Zhang, Helin > > > Sent: Wednesday, August 19, 2015 10:29 AM > > > To: Vladislav Zolotarov > > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > above 1 for all NICs but 82598 > > > > > > > > > > > > Hi Vlad > > > > > > > > > > > > Thank you very much for the patches! Give me a few more time to > > double check with more guys, and possibly hardware experts. > > > > > > > > > > > > Regards, > > > > > > Helin > > > > > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > > <mailto:vladz@cloudius-systems.com>] > > > Sent: Tuesday, August 18, 2015 9:56 PM > > > To: Lu, Wenzhuo > > > Cc: dev@dpdk.org <mailto:dev@dpdk.org>; Zhang, Helin > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com > > <mailto:wenzhuo.lu@intel.com>> wrote: > > > > > > > > Hi Helin, > > > > > > > > > -----Original Message----- > > > > > From: dev [mailto:dev-bounces@dpdk.org > > <mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > > To: Zhang, Helin; dev@dpdk.org <mailto:dev@dpdk.org> > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > > tx_rs_thresh above 1 for > > > > > all NICs but 82598 > > > > > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > > Hi Vlad > > > > > > > > > > > > I don't think the changes are needed. It says in datasheet > > that the RS > > > > > > bit should be set on the last descriptor of every packet, ONLY > > WHEN > > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > > > Of course it's needed! See below. > > > > > Exactly the same spec a few lines above the place u've just > > quoted states: > > > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > > greater than > > > > > zero." > > > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > > bit notation ixgbe PMD > > > > > is actually not supporting any value of WTHRESH different from zero. > > > > I think Vlad is right. We need to fix this issue. Any suggestion? > > If not, I'd like to ack this patch. > > > > > > Pls., note that there is a v2 of this patch on the list. I forgot to > > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Helin > > > > > > > > > > > >> -----Original Message----- > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > > <mailto:vladz@cloudius-systems.com>] > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > > >> To: dev@dpdk.org <mailto:dev@dpdk.org> > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > > avi@cloudius-systems.com <mailto:avi@cloudius-systems.com>; Vlad > > > > > >> Zolotarov > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > above 1 for > > > > > all > > > > > >> NICs but 82598 > > > > > >> > > > > > >> According to 82599 and x540 HW specifications RS bit *must* > > be set in the > > > > > last > > > > > >> descriptor of *every* packet. > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > > order to > > > > > function. See above. > > > > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with > > a seastar-based > > > > > >> application on x540 NIC. > > > > > > Could you help to share with us how to reproduce the tx hang > > issue, with using > > > > > > typical DPDK examples? > > > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > > help u > > > > > here. However this is quite irrelevant since without this this > > > > > patch ixgbe PMD obviously abuses the HW spec as has been explained > above. > > > > > > > > > > We saw the issue when u stressed the xmit path with a lot of > > > > > highly fragmented TCP frames (packets with up to 33 fragments > > > > > with > > non-headers > > > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > > > > > Thanks, > > > > > vlad > > > > > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com > > <mailto:vladz@cloudius-systems.com>> > > > > > >> --- > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > > >> > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > >> index b8ee1e9..6714fd9 100644 > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > > *dev, > > > > > struct > > > > > >> rte_eth_dev_info *dev_info) > > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > > >> }; > > > > > >> + > > > > > >> + /* > > > > > >> + * According to 82599 and x540 specifications RS bit > > *must* be set on > > > > > the > > > > > >> + * last descriptor of *every* packet. Therefore we will > > not allow the > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > >> + */ > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > > > >> + > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > > sizeof(uint32_t); > > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; > > > > > >> diff -- > > > > > git > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > > >> 91023b9..8dbdffc 100644 > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > > rte_eth_dev > > > > > >> *dev, > > > > > >> struct ixgbe_tx_queue *txq; > > > > > >> struct ixgbe_hw *hw; > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > > >> + bool rs_deferring_allowed; > > > > > >> > > > > > >> PMD_INIT_FUNC_TRACE(); > > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > > >> > > > > > >> /* > > > > > >> + * According to 82599 and x540 specifications RS bit > > *must* be set on > > > > > the > > > > > >> + * last descriptor of *every* packet. Therefore we will > > not allow the > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > >> + */ > > > > > >> + rs_deferring_allowed = (hw->mac.type <= > > > > > >> + ixgbe_mac_82598EB); > > > > > >> + > > > > > >> + /* > > > > > >> * Validate number of transmit descriptors. > > > > > >> * It must not exceed hardware maximum, and must be multiple > > > > > >> * of IXGBE_ALIGN. > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > > > > > >> rte_eth_dev > > > > > *dev, > > > > > >> * to transmit a packet is greater than the number of free TX > > > > > >> * descriptors. > > > > > >> * The following constraints must be satisfied: > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS > > deferring is > > > > > >> + * forbidden (all but 82598). > > > > > >> * tx_rs_thresh must be greater than 0. > > > > > >> * tx_rs_thresh must be less than the size of the ring > > minus 2. > > > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > > rte_eth_dev > > > > > *dev, > > > > > >> * When set to zero use default values. > > > > > >> */ > > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > > >> + tx_conf->tx_rs_thresh : > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > > >> + > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > 2 since RS > > > > > " > > > > > >> + "must be set for every packet > > for this HW. " > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > > > >> + (unsigned int)tx_rs_thresh, > > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > > >> + return -(EINVAL); > > > > > >> + } > > > > > >> + > > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > > >> the > > > > > number " > > > > > >> "of TX descriptors minus 2. > > (tx_rs_thresh=%u " > > > > > >> -- > > > > > >> 2.1.0 > > > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 19:16 ` Zhang, Helin @ 2015-08-25 19:23 ` Avi Kivity 2015-08-25 19:30 ` Vladislav Zolotarov 1 sibling, 0 replies; 48+ messages in thread From: Avi Kivity @ 2015-08-25 19:23 UTC (permalink / raw) To: Zhang, Helin, Vlad Zolotarov; +Cc: dev On 08/25/2015 10:16 PM, Zhang, Helin wrote: > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Tuesday, August 25, 2015 11:53 AM >> To: Zhang, Helin >> Cc: Lu, Wenzhuo; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for >> all NICs but 82598 >> >> >> >> On 08/25/15 21:43, Zhang, Helin wrote: >>> Hi Vlad >>> >>> I think this could possibly be the root cause of your TX hang issue. >>> Please try to limit the number to 8 or less, and then see if the issue >>> will still be there or not? >>> >> Helin, the issue has been seen on x540 devices. Pls., see a chapter >> 7.2.1.1 of x540 devices spec: >> >> A packet (or multiple packets in transmit segmentation) can span any number of >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH >> details). For best performance it is recommended to minimize the number of >> buffers as possible. >> >> Could u, pls., clarify why do u think that the maximum number of data buffers is >> limited by 8? > OK, i40e hardware is 8, so I'd assume x540 could have a similar one. Yes, in your case, > the limit could be around 38, right? > Could you help to make sure there is no packet to be transmitted uses more than > 38 descriptors? > I heard that there is a similar hang issue on X710 if using more than 8 descriptors for > a single packet. I am wondering if the issue is similar on x540. > > I believe that the ixgbe Linux driver does not limit packets to 8 fragments, so apparently the hardware is capable. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 19:16 ` Zhang, Helin 2015-08-25 19:23 ` Avi Kivity @ 2015-08-25 19:30 ` Vladislav Zolotarov 2015-08-25 20:07 ` Vlad Zolotarov 2015-08-25 20:13 ` Zhang, Helin 1 sibling, 2 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-08-25 19:30 UTC (permalink / raw) To: Helin Zhang; +Cc: dev On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang@intel.com> wrote: > > > > > -----Original Message----- > > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > Sent: Tuesday, August 25, 2015 11:53 AM > > To: Zhang, Helin > > Cc: Lu, Wenzhuo; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/25/15 21:43, Zhang, Helin wrote: > > > > > > Hi Vlad > > > > > > I think this could possibly be the root cause of your TX hang issue. > > > Please try to limit the number to 8 or less, and then see if the issue > > > will still be there or not? > > > > > > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > > 7.2.1.1 of x540 devices spec: > > > > A packet (or multiple packets in transmit segmentation) can span any number of > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH > > details). For best performance it is recommended to minimize the number of > > buffers as possible. > > > > Could u, pls., clarify why do u think that the maximum number of data buffers is > > limited by 8? > OK, i40e hardware is 8 For i40 it's a bit more complicated than just "not more than 8" - it's not more than 8 for a non-TSO packet and not more than 8 for each MSS including headers buffers for TSO. But this thread is not about i40e so this doesn't seem to be relevant anyway. , so I'd assume x540 could have a similar one. x540 spec assumes otherwise... 😉 Yes, in your case, > the limit could be around 38, right? If by "around 38" u mean "exactly 38" then u are absolutely right... 😉 > Could you help to make sure there is no packet to be transmitted uses more than > 38 descriptors? Just like i've already mentioned, we limit the cluster by at most 33 data segments. Therefore we are good here... > I heard that there is a similar hang issue on X710 if using more than 8 descriptors for > a single packet. I am wondering if the issue is similar on x540. What's x710? If that's xl710 40G nics (i40e driver), then it has its own specs with its own HW limitations i've mentioned above. It has nothing to do with this thread that is all about 10G nics managed by ixgbe driver. There is a different thread, where i've raised the 40G NICs xmit issues. See "i40e xmit path HW limitation" thread. > > Regards, > Helin > > > > > thanks, > > vlad > > > > > It does not have any check for the number of descriptors to be used > > > for a single packet, and it relies on the users to give correct mbuf > > > chains. > > > > > > We may need a check of this somewhere. Of cause the point you > > > indicated we also need to carefully investigate or fix. > > > > > > Regards, > > > > > > Helin > > > > > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > > > *Sent:* Tuesday, August 25, 2015 11:34 AM > > > *To:* Zhang, Helin > > > *Cc:* Lu, Wenzhuo; dev@dpdk.org > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com > > > <mailto:helin.zhang@intel.com>> wrote: > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > In addition, I’d double check with you what’s the maximum number of > > > descriptors would be used for a single packet transmitting? > > > > > > > > Datasheet said that it supports up to 8. I am wondering if more than > > > 8 were used in your case? > > > > > > If memory serves me well the maximum number of data descriptors per > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK > > > is always zero it gives us 38 segments. We limit them by 33. > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Zhang, Helin > > > > Sent: Wednesday, August 19, 2015 10:29 AM > > > > To: Vladislav Zolotarov > > > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > > > > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > Thank you very much for the patches! Give me a few more time to > > > double check with more guys, and possibly hardware experts. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > > > <mailto:vladz@cloudius-systems.com>] > > > > Sent: Tuesday, August 18, 2015 9:56 PM > > > > To: Lu, Wenzhuo > > > > Cc: dev@dpdk.org <mailto:dev@dpdk.org>; Zhang, Helin > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com > > > <mailto:wenzhuo.lu@intel.com>> wrote: > > > > > > > > > > Hi Helin, > > > > > > > > > > > -----Original Message----- > > > > > > From: dev [mailto:dev-bounces@dpdk.org > > > <mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov > > > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > > > To: Zhang, Helin; dev@dpdk.org <mailto:dev@dpdk.org> > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > > > tx_rs_thresh above 1 for > > > > > > all NICs but 82598 > > > > > > > > > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > > > Hi Vlad > > > > > > > > > > > > > > I don't think the changes are needed. It says in datasheet > > > that the RS > > > > > > > bit should be set on the last descriptor of every packet, ONLY > > > WHEN > > > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > > > > > Of course it's needed! See below. > > > > > > Exactly the same spec a few lines above the place u've just > > > quoted states: > > > > > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > > > greater than > > > > > > zero." > > > > > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > > > bit notation ixgbe PMD > > > > > > is actually not supporting any value of WTHRESH different from zero. > > > > > I think Vlad is right. We need to fix this issue. Any suggestion? > > > If not, I'd like to ack this patch. > > > > > > > > Pls., note that there is a v2 of this patch on the list. I forgot to > > > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Helin > > > > > > > > > > > > > >> -----Original Message----- > > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > > > <mailto:vladz@cloudius-systems.com>] > > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > > > >> To: dev@dpdk.org <mailto:dev@dpdk.org> > > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > > > avi@cloudius-systems.com <mailto:avi@cloudius-systems.com>; Vlad > > > > > > >> Zolotarov > > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for > > > > > > all > > > > > > >> NICs but 82598 > > > > > > >> > > > > > > >> According to 82599 and x540 HW specifications RS bit *must* > > > be set in the > > > > > > last > > > > > > >> descriptor of *every* packet. > > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > > > order to > > > > > > function. See above. > > > > > > > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with > > > a seastar-based > > > > > > >> application on x540 NIC. > > > > > > > Could you help to share with us how to reproduce the tx hang > > > issue, with using > > > > > > > typical DPDK examples? > > > > > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > > > help u > > > > > > here. However this is quite irrelevant since without this this > > > > > > patch ixgbe PMD obviously abuses the HW spec as has been explained > > above. > > > > > > > > > > > > We saw the issue when u stressed the xmit path with a lot of > > > > > > highly fragmented TCP frames (packets with up to 33 fragments > > > > > > with > > > non-headers > > > > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > > > > > > > Thanks, > > > > > > vlad > > > > > > > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com > > > <mailto:vladz@cloudius-systems.com>> > > > > > > >> --- > > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > > > >> > > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> index b8ee1e9..6714fd9 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > > > *dev, > > > > > > struct > > > > > > >> rte_eth_dev_info *dev_info) > > > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > > > >> }; > > > > > > >> + > > > > > > >> + /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > > > > >> + > > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > > > sizeof(uint32_t); > > > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; > > > > > > >> diff -- > > > > > > git > > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > > > >> 91023b9..8dbdffc 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > >> *dev, > > > > > > >> struct ixgbe_tx_queue *txq; > > > > > > >> struct ixgbe_hw *hw; > > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > > > >> + bool rs_deferring_allowed; > > > > > > >> > > > > > > >> PMD_INIT_FUNC_TRACE(); > > > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > > > >> > > > > > > >> /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + rs_deferring_allowed = (hw->mac.type <= > > > > > > >> + ixgbe_mac_82598EB); > > > > > > >> + > > > > > > >> + /* > > > > > > >> * Validate number of transmit descriptors. > > > > > > >> * It must not exceed hardware maximum, and must be multiple > > > > > > >> * of IXGBE_ALIGN. > > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > > > > > > >> rte_eth_dev > > > > > > *dev, > > > > > > >> * to transmit a packet is greater than the number of free TX > > > > > > >> * descriptors. > > > > > > >> * The following constraints must be satisfied: > > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS > > > deferring is > > > > > > >> + * forbidden (all but 82598). > > > > > > >> * tx_rs_thresh must be greater than 0. > > > > > > >> * tx_rs_thresh must be less than the size of the ring > > > minus 2. > > > > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > *dev, > > > > > > >> * When set to zero use default values. > > > > > > >> */ > > > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > > > >> + tx_conf->tx_rs_thresh : > > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > > > >> + > > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > 2 since RS > > > > > > " > > > > > > >> + "must be set for every packet > > > for this HW. " > > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > > > > >> + (unsigned int)tx_rs_thresh, > > > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > > > >> + return -(EINVAL); > > > > > > >> + } > > > > > > >> + > > > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > > > >> the > > > > > > number " > > > > > > >> "of TX descriptors minus 2. > > > (tx_rs_thresh=%u " > > > > > > >> -- > > > > > > >> 2.1.0 > > > > > > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 19:30 ` Vladislav Zolotarov @ 2015-08-25 20:07 ` Vlad Zolotarov 2015-08-25 20:13 ` Zhang, Helin 1 sibling, 0 replies; 48+ messages in thread From: Vlad Zolotarov @ 2015-08-25 20:07 UTC (permalink / raw) To: Helin Zhang; +Cc: dev On 08/25/15 22:30, Vladislav Zolotarov wrote: > > > On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang@intel.com > <mailto:helin.zhang@intel.com>> wrote: > > > > > > > > > -----Original Message----- > > > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>] > > > Sent: Tuesday, August 25, 2015 11:53 AM > > > To: Zhang, Helin > > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > above 1 for > > > all NICs but 82598 > > > > > > > > > > > > On 08/25/15 21:43, Zhang, Helin wrote: > > > > > > > > Hi Vlad > > > > > > > > I think this could possibly be the root cause of your TX hang issue. > > > > Please try to limit the number to 8 or less, and then see if the > issue > > > > will still be there or not? > > > > > > > > > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > > > 7.2.1.1 of x540 devices spec: > > > > > > A packet (or multiple packets in transmit segmentation) can span > any number of > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > minus 2 (see > > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 > for WTHRESH > > > details). For best performance it is recommended to minimize the > number of > > > buffers as possible. > > > > > > Could u, pls., clarify why do u think that the maximum number of > data buffers is > > > limited by 8? > > OK, i40e hardware is 8 > > For i40 it's a bit more complicated than just "not more than 8" - it's > not more than 8 for a non-TSO packet and not more than 8 for each MSS > including headers buffers for TSO. But this thread is not about i40e > so this doesn't seem to be relevant anyway. > > , so I'd assume x540 could have a similar one. > > x540 spec assumes otherwise... 😉 > > Yes, in your case, > > the limit could be around 38, right? > > If by "around 38" u mean "exactly 38" then u are absolutely right... 😉 > > > Could you help to make sure there is no packet to be transmitted > uses more than > > 38 descriptors? > > Just like i've already mentioned, we limit the cluster by at most 33 > data segments. Therefore we are good here... > > > I heard that there is a similar hang issue on X710 if using more > than 8 descriptors for > > a single packet. I am wondering if the issue is similar on x540. > > What's x710? If that's xl710 40G nics (i40e driver), > I've found what x710 NICs are - they are another NICs family managed by i40e PMD. Therefore the rest of what I said stands the same... ;) > then it has its own specs with its own HW limitations i've mentioned > above. It has nothing to do with this thread that is all about 10G > nics managed by ixgbe driver. > > There is a different thread, where i've raised the 40G NICs xmit > issues. See "i40e xmit path HW limitation" thread. > > > > > Regards, > > Helin > > > > > > > > thanks, > > > vlad > > > > > > > It does not have any check for the number of descriptors to be used > > > > for a single packet, and it relies on the users to give correct mbuf > > > > chains. > > > > > > > > We may need a check of this somewhere. Of cause the point you > > > > indicated we also need to carefully investigate or fix. > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>] > > > > *Sent:* Tuesday, August 25, 2015 11:34 AM > > > > *To:* Zhang, Helin > > > > *Cc:* Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > > above 1 for all NICs but 82598 > > > > > > > > > > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com > <mailto:helin.zhang@intel.com> > > > > <mailto:helin.zhang@intel.com <mailto:helin.zhang@intel.com>>> > wrote: > > > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > > > > > In addition, I’d double check with you what’s the maximum > number of > > > > descriptors would be used for a single packet transmitting? > > > > > > > > > > Datasheet said that it supports up to 8. I am wondering if > more than > > > > 8 were used in your case? > > > > > > > > If memory serves me well the maximum number of data descriptors per > > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in > DPDK > > > > is always zero it gives us 38 segments. We limit them by 33. > > > > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Helin > > > > > > > > > > > > > > > > > > > > From: Zhang, Helin > > > > > Sent: Wednesday, August 19, 2015 10:29 AM > > > > > To: Vladislav Zolotarov > > > > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org> > <mailto:dev@dpdk.org <mailto:dev@dpdk.org>> > > > > > > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > > > > > Thank you very much for the patches! Give me a few more time to > > > > double check with more guys, and possibly hardware experts. > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Helin > > > > > > > > > > > > > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com> > > > > <mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>>] > > > > > Sent: Tuesday, August 18, 2015 9:56 PM > > > > > To: Lu, Wenzhuo > > > > > Cc: dev@dpdk.org <mailto:dev@dpdk.org> <mailto:dev@dpdk.org > <mailto:dev@dpdk.org>>; Zhang, Helin > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com > <mailto:wenzhuo.lu@intel.com> > > > > <mailto:wenzhuo.lu@intel.com <mailto:wenzhuo.lu@intel.com>>> wrote: > > > > > > > > > > > > Hi Helin, > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: dev [mailto:dev-bounces@dpdk.org > <mailto:dev-bounces@dpdk.org> > > > > <mailto:dev-bounces@dpdk.org <mailto:dev-bounces@dpdk.org>>] On > Behalf Of Vlad Zolotarov > > > > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > > > > To: Zhang, Helin; dev@dpdk.org <mailto:dev@dpdk.org> > <mailto:dev@dpdk.org <mailto:dev@dpdk.org>> > > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > > > > tx_rs_thresh above 1 for > > > > > > > all NICs but 82598 > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > I don't think the changes are needed. It says in datasheet > > > > that the RS > > > > > > > > bit should be set on the last descriptor of every > packet, ONLY > > > > WHEN > > > > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > > > > > > > Of course it's needed! See below. > > > > > > > Exactly the same spec a few lines above the place u've just > > > > quoted states: > > > > > > > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > > > > greater than > > > > > > > zero." > > > > > > > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > > > > bit notation ixgbe PMD > > > > > > > is actually not supporting any value of WTHRESH different > from zero. > > > > > > I think Vlad is right. We need to fix this issue. Any > suggestion? > > > > If not, I'd like to ack this patch. > > > > > > > > > > Pls., note that there is a v2 of this patch on the list. I > forgot to > > > > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com> > > > > <mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>>] > > > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > > > > >> To: dev@dpdk.org <mailto:dev@dpdk.org> > <mailto:dev@dpdk.org <mailto:dev@dpdk.org>> > > > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > > > > avi@cloudius-systems.com <mailto:avi@cloudius-systems.com> > <mailto:avi@cloudius-systems.com <mailto:avi@cloudius-systems.com>>; Vlad > > > > > > > >> Zolotarov > > > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > tx_rs_thresh > > > > above 1 for > > > > > > > all > > > > > > > >> NICs but 82598 > > > > > > > >> > > > > > > > >> According to 82599 and x540 HW specifications RS bit *must* > > > > be set in the > > > > > > > last > > > > > > > >> descriptor of *every* packet. > > > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > > > > order to > > > > > > > function. See above. > > > > > > > > > > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting > with > > > > a seastar-based > > > > > > > >> application on x540 NIC. > > > > > > > > Could you help to share with us how to reproduce the tx hang > > > > issue, with using > > > > > > > > typical DPDK examples? > > > > > > > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > > > > help u > > > > > > > here. However this is quite irrelevant since without this this > > > > > > > patch ixgbe PMD obviously abuses the HW spec as has been > explained > > > above. > > > > > > > > > > > > > > We saw the issue when u stressed the xmit path with a lot of > > > > > > > highly fragmented TCP frames (packets with up to 33 fragments > > > > > > > with > > > > non-headers > > > > > > > fragments as small as 4 bytes) with all offload features > enabled. > > > > > > > > > > > > > > Thanks, > > > > > > > vlad > > > > > > > > > > > > > > > >> Signed-off-by: Vlad Zolotarov > <vladz@cloudius-systems.com <mailto:vladz@cloudius-systems.com> > > > > <mailto:vladz@cloudius-systems.com > <mailto:vladz@cloudius-systems.com>>> > > > > > > > >> --- > > > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 > ++++++++++++++++++++++- > > > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > > > > >> > > > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > > >> index b8ee1e9..6714fd9 100644 > > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct > rte_eth_dev > > > > *dev, > > > > > > > struct > > > > > > > >> rte_eth_dev_info *dev_info) > > > > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > > > > >> }; > > > > > > > >> + > > > > > > > >> + /* > > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > > *must* be set on > > > > > > > the > > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > > not allow the > > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > > >> + */ > > > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > > > > > >> + > > > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > > > > sizeof(uint32_t); > > > > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; > > > > > > > >> diff -- > > > > > > > git > > > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > > > > >> 91023b9..8dbdffc 100644 > > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > > > > rte_eth_dev > > > > > > > >> *dev, > > > > > > > >> struct ixgbe_tx_queue *txq; > > > > > > > >> struct ixgbe_hw *hw; > > > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > > > > >> + bool rs_deferring_allowed; > > > > > > > >> > > > > > > > >> PMD_INIT_FUNC_TRACE(); > > > > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > > > > >> > > > > > > > >> /* > > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > > *must* be set on > > > > > > > the > > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > > not allow the > > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > > >> + */ > > > > > > > >> + rs_deferring_allowed = (hw->mac.type <= > > > > > > > >> + ixgbe_mac_82598EB); > > > > > > > >> + > > > > > > > >> + /* > > > > > > > >> * Validate number of transmit descriptors. > > > > > > > >> * It must not exceed hardware maximum, and must be > multiple > > > > > > > >> * of IXGBE_ALIGN. > > > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > > > > > > > >> rte_eth_dev > > > > > > > *dev, > > > > > > > >> * to transmit a packet is greater than the number > of free TX > > > > > > > >> * descriptors. > > > > > > > >> * The following constraints must be satisfied: > > > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for > which RS > > > > deferring is > > > > > > > >> + * forbidden (all but 82598). > > > > > > > >> * tx_rs_thresh must be greater than 0. > > > > > > > >> * tx_rs_thresh must be less than the size of the ring > > > > minus 2. > > > > > > > >> * tx_rs_thresh must be less than or equal to > tx_free_thresh. > > > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > > > > rte_eth_dev > > > > > > > *dev, > > > > > > > >> * When set to zero use default values. > > > > > > > >> */ > > > > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > > > > >> + tx_conf->tx_rs_thresh : > > > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > > > > >> + > > > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > 2 since RS > > > > > > > " > > > > > > > >> + "must be set for every packet > > > > for this HW. " > > > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > > > > > >> + (unsigned int)tx_rs_thresh, > > > > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > > > > >> + return -(EINVAL); > > > > > > > >> + } > > > > > > > >> + > > > > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > > > > >> the > > > > > > > number " > > > > > > > >> "of TX descriptors minus 2. > > > > (tx_rs_thresh=%u " > > > > > > > >> -- > > > > > > > >> 2.1.0 > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 19:30 ` Vladislav Zolotarov 2015-08-25 20:07 ` Vlad Zolotarov @ 2015-08-25 20:13 ` Zhang, Helin 2015-09-09 12:18 ` Thomas Monjalon 1 sibling, 1 reply; 48+ messages in thread From: Zhang, Helin @ 2015-08-25 20:13 UTC (permalink / raw) To: Vladislav Zolotarov; +Cc: dev Yes, I got the perfect answers. Thank you very much! I just wanted to make sure the test case was OK with the limit of maximum number of descriptors, as I heard there is a hang issue on other NICs of using more descriptors than hardware allowed. OK. I am still waiting for the answers/confirmation from x540 hardware designers. We need all agree on your patches to avoid risks. Regards, Helin From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] Sent: Tuesday, August 25, 2015 12:30 PM To: Zhang, Helin Cc: Lu, Wenzhuo; dev@dpdk.org Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang@intel.com<mailto:helin.zhang@intel.com>> wrote: > > > > > -----Original Message----- > > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > > Sent: Tuesday, August 25, 2015 11:53 AM > > To: Zhang, Helin > > Cc: Lu, Wenzhuo; dev@dpdk.org<mailto:dev@dpdk.org> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/25/15 21:43, Zhang, Helin wrote: > > > > > > Hi Vlad > > > > > > I think this could possibly be the root cause of your TX hang issue. > > > Please try to limit the number to 8 or less, and then see if the issue > > > will still be there or not? > > > > > > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > > 7.2.1.1 of x540 devices spec: > > > > A packet (or multiple packets in transmit segmentation) can span any number of > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH > > details). For best performance it is recommended to minimize the number of > > buffers as possible. > > > > Could u, pls., clarify why do u think that the maximum number of data buffers is > > limited by 8? > OK, i40e hardware is 8 For i40 it's a bit more complicated than just "not more than 8" - it's not more than 8 for a non-TSO packet and not more than 8 for each MSS including headers buffers for TSO. But this thread is not about i40e so this doesn't seem to be relevant anyway. , so I'd assume x540 could have a similar one. x540 spec assumes otherwise... 😉 Yes, in your case, > the limit could be around 38, right? If by "around 38" u mean "exactly 38" then u are absolutely right... 😉 > Could you help to make sure there is no packet to be transmitted uses more than > 38 descriptors? Just like i've already mentioned, we limit the cluster by at most 33 data segments. Therefore we are good here... > I heard that there is a similar hang issue on X710 if using more than 8 descriptors for > a single packet. I am wondering if the issue is similar on x540. What's x710? If that's xl710 40G nics (i40e driver), then it has its own specs with its own HW limitations i've mentioned above. It has nothing to do with this thread that is all about 10G nics managed by ixgbe driver. There is a different thread, where i've raised the 40G NICs xmit issues. See "i40e xmit path HW limitation" thread. > > Regards, > Helin > > > > > thanks, > > vlad > > > > > It does not have any check for the number of descriptors to be used > > > for a single packet, and it relies on the users to give correct mbuf > > > chains. > > > > > > We may need a check of this somewhere. Of cause the point you > > > indicated we also need to carefully investigate or fix. > > > > > > Regards, > > > > > > Helin > > > > > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>] > > > *Sent:* Tuesday, August 25, 2015 11:34 AM > > > *To:* Zhang, Helin > > > *Cc:* Lu, Wenzhuo; dev@dpdk.org<mailto:dev@dpdk.org> > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com<mailto:helin.zhang@intel.com> > > > <mailto:helin.zhang@intel.com<mailto:helin.zhang@intel.com>>> wrote: > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > In addition, I’d double check with you what’s the maximum number of > > > descriptors would be used for a single packet transmitting? > > > > > > > > Datasheet said that it supports up to 8. I am wondering if more than > > > 8 were used in your case? > > > > > > If memory serves me well the maximum number of data descriptors per > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK > > > is always zero it gives us 38 segments. We limit them by 33. > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Zhang, Helin > > > > Sent: Wednesday, August 19, 2015 10:29 AM > > > > To: Vladislav Zolotarov > > > > Cc: Lu, Wenzhuo; dev@dpdk.org<mailto:dev@dpdk.org> <mailto:dev@dpdk.org<mailto:dev@dpdk.org>> > > > > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > Thank you very much for the patches! Give me a few more time to > > > double check with more guys, and possibly hardware experts. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com> > > > <mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>>] > > > > Sent: Tuesday, August 18, 2015 9:56 PM > > > > To: Lu, Wenzhuo > > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org> <mailto:dev@dpdk.org<mailto:dev@dpdk.org>>; Zhang, Helin > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com> > > > <mailto:wenzhuo.lu@intel.com<mailto:wenzhuo.lu@intel.com>>> wrote: > > > > > > > > > > Hi Helin, > > > > > > > > > > > -----Original Message----- > > > > > > From: dev [mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org> > > > <mailto:dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>>] On Behalf Of Vlad Zolotarov > > > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > > > To: Zhang, Helin; dev@dpdk.org<mailto:dev@dpdk.org> <mailto:dev@dpdk.org<mailto:dev@dpdk.org>> > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > > > tx_rs_thresh above 1 for > > > > > > all NICs but 82598 > > > > > > > > > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > > > Hi Vlad > > > > > > > > > > > > > > I don't think the changes are needed. It says in datasheet > > > that the RS > > > > > > > bit should be set on the last descriptor of every packet, ONLY > > > WHEN > > > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > > > > > Of course it's needed! See below. > > > > > > Exactly the same spec a few lines above the place u've just > > > quoted states: > > > > > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > > > greater than > > > > > > zero." > > > > > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > > > bit notation ixgbe PMD > > > > > > is actually not supporting any value of WTHRESH different from zero. > > > > > I think Vlad is right. We need to fix this issue. Any suggestion? > > > If not, I'd like to ack this patch. > > > > > > > > Pls., note that there is a v2 of this patch on the list. I forgot to > > > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Helin > > > > > > > > > > > > > >> -----Original Message----- > > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com> > > > <mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>>] > > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > > > >> To: dev@dpdk.org<mailto:dev@dpdk.org> <mailto:dev@dpdk.org<mailto:dev@dpdk.org>> > > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > > > avi@cloudius-systems.com<mailto:avi@cloudius-systems.com> <mailto:avi@cloudius-systems.com<mailto:avi@cloudius-systems.com>>; Vlad > > > > > > >> Zolotarov > > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for > > > > > > all > > > > > > >> NICs but 82598 > > > > > > >> > > > > > > >> According to 82599 and x540 HW specifications RS bit *must* > > > be set in the > > > > > > last > > > > > > >> descriptor of *every* packet. > > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > > > order to > > > > > > function. See above. > > > > > > > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with > > > a seastar-based > > > > > > >> application on x540 NIC. > > > > > > > Could you help to share with us how to reproduce the tx hang > > > issue, with using > > > > > > > typical DPDK examples? > > > > > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > > > help u > > > > > > here. However this is quite irrelevant since without this this > > > > > > patch ixgbe PMD obviously abuses the HW spec as has been explained > > above. > > > > > > > > > > > > We saw the issue when u stressed the xmit path with a lot of > > > > > > highly fragmented TCP frames (packets with up to 33 fragments > > > > > > with > > > non-headers > > > > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > > > > > > > Thanks, > > > > > > vlad > > > > > > > > > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com> > > > <mailto:vladz@cloudius-systems.com<mailto:vladz@cloudius-systems.com>>> > > > > > > >> --- > > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > > > >> > > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> index b8ee1e9..6714fd9 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > > > *dev, > > > > > > struct > > > > > > >> rte_eth_dev_info *dev_info) > > > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | > > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > > > >> }; > > > > > > >> + > > > > > > >> + /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1; > > > > > > >> + > > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * > > > sizeof(uint32_t); > > > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; > > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; > > > > > > >> diff -- > > > > > > git > > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > > > >> 91023b9..8dbdffc 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > >> *dev, > > > > > > >> struct ixgbe_tx_queue *txq; > > > > > > >> struct ixgbe_hw *hw; > > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > > > >> + bool rs_deferring_allowed; > > > > > > >> > > > > > > >> PMD_INIT_FUNC_TRACE(); > > > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > > > >> > > > > > > >> /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + rs_deferring_allowed = (hw->mac.type <= > > > > > > >> + ixgbe_mac_82598EB); > > > > > > >> + > > > > > > >> + /* > > > > > > >> * Validate number of transmit descriptors. > > > > > > >> * It must not exceed hardware maximum, and must be multiple > > > > > > >> * of IXGBE_ALIGN. > > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > > > > > > >> rte_eth_dev > > > > > > *dev, > > > > > > >> * to transmit a packet is greater than the number of free TX > > > > > > >> * descriptors. > > > > > > >> * The following constraints must be satisfied: > > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS > > > deferring is > > > > > > >> + * forbidden (all but 82598). > > > > > > >> * tx_rs_thresh must be greater than 0. > > > > > > >> * tx_rs_thresh must be less than the size of the ring > > > minus 2. > > > > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > *dev, > > > > > > >> * When set to zero use default values. > > > > > > >> */ > > > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > > > >> + tx_conf->tx_rs_thresh : > > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? > > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > > > >> + > > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > 2 since RS > > > > > > " > > > > > > >> + "must be set for every packet > > > for this HW. " > > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)", > > > > > > >> + (unsigned int)tx_rs_thresh, > > > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > > > >> + return -(EINVAL); > > > > > > >> + } > > > > > > >> + > > > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) { > > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > > > >> the > > > > > > number " > > > > > > >> "of TX descriptors minus 2. > > > (tx_rs_thresh=%u " > > > > > > >> -- > > > > > > >> 2.1.0 > > > > > > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 20:13 ` Zhang, Helin @ 2015-09-09 12:18 ` Thomas Monjalon 2015-09-09 13:19 ` Ananyev, Konstantin 0 siblings, 1 reply; 48+ messages in thread From: Thomas Monjalon @ 2015-09-09 12:18 UTC (permalink / raw) To: Zhang, Helin; +Cc: dev 2015-08-25 20:13, Zhang, Helin: > Yes, I got the perfect answers. Thank you very much! > I just wanted to make sure the test case was OK with the limit of maximum number of descriptors, as I heard there is a hang issue on other NICs of using more descriptors than hardware allowed. > OK. I am still waiting for the answers/confirmation from x540 hardware designers. We need all agree on your patches to avoid risks. Helin, any news? Can we apply v4 of this patch? ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-09 12:18 ` Thomas Monjalon @ 2015-09-09 13:19 ` Ananyev, Konstantin 2015-09-11 15:17 ` Vladislav Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-09-09 13:19 UTC (permalink / raw) To: Thomas Monjalon, Zhang, Helin; +Cc: dev Hi Thomas, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Wednesday, September 09, 2015 1:19 PM > To: Zhang, Helin > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > 2015-08-25 20:13, Zhang, Helin: > > Yes, I got the perfect answers. Thank you very much! > > I just wanted to make sure the test case was OK with the limit of maximum number of descriptors, as I heard there is a hang issue on > other NICs of using more descriptors than hardware allowed. > > OK. I am still waiting for the answers/confirmation from x540 hardware designers. We need all agree on your patches to avoid risks. > > Helin, any news? > Can we apply v4 of this patch? Unfortunately we are seeing a huge performance drop with that patch: On my box bi-directional traffic (64B packet) over one port can't reach even 11 Mpps. We are still doing some experiments and consultations with ND guys how we can overcome this problem and keep performance intact. Konstantin ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-09 13:19 ` Ananyev, Konstantin @ 2015-09-11 15:17 ` Vladislav Zolotarov 0 siblings, 0 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 15:17 UTC (permalink / raw) To: Konstantin Ananyev; +Cc: dev On Sep 9, 2015 4:19 PM, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote: > > Hi Thomas, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon > > Sent: Wednesday, September 09, 2015 1:19 PM > > To: Zhang, Helin > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > 2015-08-25 20:13, Zhang, Helin: > > > Yes, I got the perfect answers. Thank you very much! > > > I just wanted to make sure the test case was OK with the limit of maximum number of descriptors, as I heard there is a hang issue on > > other NICs of using more descriptors than hardware allowed. > > > OK. I am still waiting for the answers/confirmation from x540 hardware designers. We need all agree on your patches to avoid risks. > > > > Helin, any news? > > Can we apply v4 of this patch? > > Unfortunately we are seeing a huge performance drop with that patch: > On my box bi-directional traffic (64B packet) over one port can't reach even 11 Mpps. Konstantin, could u clarify - u saw "only" 11 Mpps with v3 of this patch which doesn't change the rs_thresh and only sets RS on every packet? What is the performance in the same test without this patch? > We are still doing some experiments and consultations with ND guys how we can overcome > this problem and keep performance intact. > > Konstantin ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-08-25 18:52 ` Vlad Zolotarov 2015-08-25 19:16 ` Zhang, Helin @ 2015-09-11 14:25 ` didier.pallard 2015-09-11 14:47 ` Avi Kivity 1 sibling, 1 reply; 48+ messages in thread From: didier.pallard @ 2015-09-11 14:25 UTC (permalink / raw) To: Vlad Zolotarov, Zhang, Helin; +Cc: dev On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > 7.2.1.1 of x540 devices spec: > > A packet (or multiple packets in transmit segmentation) can span any > number of > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > minus 2 (see > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for > WTHRESH > details). For best performance it is recommended to minimize the > number of buffers > as possible. > > Could u, pls., clarify why do u think that the maximum number of data > buffers is limited by 8? > > thanks, > vlad Hi vlad, Documentation states that a packet (or multiple packets in transmit segmentation) can span any number of buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2. Shouldn't there be a test in transmit function that drops properly the mbufs with a too large number of segments, while incrementing a statistic; otherwise transmit function may be locked by the faulty packet without notification. thanks didier ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 14:25 ` didier.pallard @ 2015-09-11 14:47 ` Avi Kivity 2015-09-11 14:55 ` Thomas Monjalon 0 siblings, 1 reply; 48+ messages in thread From: Avi Kivity @ 2015-09-11 14:47 UTC (permalink / raw) To: didier.pallard, Vlad Zolotarov, Zhang, Helin; +Cc: dev On 09/11/2015 05:25 PM, didier.pallard wrote: > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: >> >> Helin, the issue has been seen on x540 devices. Pls., see a chapter >> 7.2.1.1 of x540 devices spec: >> >> A packet (or multiple packets in transmit segmentation) can span any >> number of >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >> minus 2 (see >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for >> WTHRESH >> details). For best performance it is recommended to minimize the >> number of buffers >> as possible. >> >> Could u, pls., clarify why do u think that the maximum number of data >> buffers is limited by 8? >> >> thanks, >> vlad > > Hi vlad, > > Documentation states that a packet (or multiple packets in transmit > segmentation) can span any number of > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > minus 2. > > Shouldn't there be a test in transmit function that drops properly the > mbufs with a too large number of > segments, while incrementing a statistic; otherwise transmit function > may be locked by the faulty packet without > notification. > What we proposed is that the pmd expose to dpdk, and dpdk expose to the application, an mbuf check function. This way applications that can generate complex packets can verify that the device will be able to process them, and applications that only generate simple mbufs can avoid the overhead by not calling the function. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 14:47 ` Avi Kivity @ 2015-09-11 14:55 ` Thomas Monjalon 2015-09-11 15:12 ` Vladislav Zolotarov 0 siblings, 1 reply; 48+ messages in thread From: Thomas Monjalon @ 2015-09-11 14:55 UTC (permalink / raw) To: Avi Kivity, didier.pallard, Vlad Zolotarov; +Cc: dev 2015-09-11 17:47, Avi Kivity: > On 09/11/2015 05:25 PM, didier.pallard wrote: > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > >> > >> Helin, the issue has been seen on x540 devices. Pls., see a chapter > >> 7.2.1.1 of x540 devices spec: > >> > >> A packet (or multiple packets in transmit segmentation) can span any > >> number of > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > >> minus 2 (see > >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for > >> WTHRESH > >> details). For best performance it is recommended to minimize the > >> number of buffers > >> as possible. > >> > >> Could u, pls., clarify why do u think that the maximum number of data > >> buffers is limited by 8? > >> > >> thanks, > >> vlad > > > > Hi vlad, > > > > Documentation states that a packet (or multiple packets in transmit > > segmentation) can span any number of > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > minus 2. > > > > Shouldn't there be a test in transmit function that drops properly the > > mbufs with a too large number of > > segments, while incrementing a statistic; otherwise transmit function > > may be locked by the faulty packet without > > notification. > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose to the > application, an mbuf check function. This way applications that can > generate complex packets can verify that the device will be able to > process them, and applications that only generate simple mbufs can avoid > the overhead by not calling the function. More than a check, it should be exposed as a capability of the port. Anyway, if the application sends too much segments, the driver must drop it to avoid hang, and maintain a dedicated statistic counter to allow easy debugging. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 14:55 ` Thomas Monjalon @ 2015-09-11 15:12 ` Vladislav Zolotarov 2015-09-11 15:43 ` Avi Kivity 2015-09-11 16:00 ` Richardson, Bruce 0 siblings, 2 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 15:12 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com> wrote: > > 2015-09-11 17:47, Avi Kivity: > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a chapter > > >> 7.2.1.1 of x540 devices spec: > > >> > > >> A packet (or multiple packets in transmit segmentation) can span any > > >> number of > > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > >> minus 2 (see > > >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for > > >> WTHRESH > > >> details). For best performance it is recommended to minimize the > > >> number of buffers > > >> as possible. > > >> > > >> Could u, pls., clarify why do u think that the maximum number of data > > >> buffers is limited by 8? > > >> > > >> thanks, > > >> vlad > > > > > > Hi vlad, > > > > > > Documentation states that a packet (or multiple packets in transmit > > > segmentation) can span any number of > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > > minus 2. > > > > > > Shouldn't there be a test in transmit function that drops properly the > > > mbufs with a too large number of > > > segments, while incrementing a statistic; otherwise transmit function > > > may be locked by the faulty packet without > > > notification. > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose to the > > application, an mbuf check function. This way applications that can > > generate complex packets can verify that the device will be able to > > process them, and applications that only generate simple mbufs can avoid > > the overhead by not calling the function. > > More than a check, it should be exposed as a capability of the port. > Anyway, if the application sends too much segments, the driver must > drop it to avoid hang, and maintain a dedicated statistic counter to allow > easy debugging. I agree with Thomas - this should not be optional. Malformed packets should be dropped. In the icgbe case it's a very simple test - it's a single branch per packet so i doubt that it could impose any measurable performance degradation. > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 15:12 ` Vladislav Zolotarov @ 2015-09-11 15:43 ` Avi Kivity 2015-09-11 16:04 ` Vladislav Zolotarov 2015-09-11 16:08 ` Thomas Monjalon 2015-09-11 16:00 ` Richardson, Bruce 1 sibling, 2 replies; 48+ messages in thread From: Avi Kivity @ 2015-09-11 15:43 UTC (permalink / raw) To: Vladislav Zolotarov, Thomas Monjalon; +Cc: dev On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com > <mailto:thomas.monjalon@6wind.com>> wrote: > > > > 2015-09-11 17:47, Avi Kivity: > > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > > >> > > > >> Helin, the issue has been seen on x540 devices. Pls., see a chapter > > > >> 7.2.1.1 of x540 devices spec: > > > >> > > > >> A packet (or multiple packets in transmit segmentation) can > span any > > > >> number of > > > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > > >> minus 2 (see > > > >> Section 7.2.3.3 for Tx Ring details and section Section > 7.2.3.5.1 for > > > >> WTHRESH > > > >> details). For best performance it is recommended to minimize the > > > >> number of buffers > > > >> as possible. > > > >> > > > >> Could u, pls., clarify why do u think that the maximum number > of data > > > >> buffers is limited by 8? > > > >> > > > >> thanks, > > > >> vlad > > > > > > > > Hi vlad, > > > > > > > > Documentation states that a packet (or multiple packets in transmit > > > > segmentation) can span any number of > > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > > > minus 2. > > > > > > > > Shouldn't there be a test in transmit function that drops > properly the > > > > mbufs with a too large number of > > > > segments, while incrementing a statistic; otherwise transmit > function > > > > may be locked by the faulty packet without > > > > notification. > > > > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose > to the > > > application, an mbuf check function. This way applications that can > > > generate complex packets can verify that the device will be able to > > > process them, and applications that only generate simple mbufs can > avoid > > > the overhead by not calling the function. > > > > More than a check, it should be exposed as a capability of the port. > > Anyway, if the application sends too much segments, the driver must > > drop it to avoid hang, and maintain a dedicated statistic counter to > allow > > easy debugging. > > I agree with Thomas - this should not be optional. Malformed packets > should be dropped. In the icgbe case it's a very simple test - it's a > single branch per packet so i doubt that it could impose any > measurable performance degradation. > > A drop allows the application no chance to recover. The driver must either provide the ability for the application to know that it cannot accept the packet, or it must fix it up itself. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 15:43 ` Avi Kivity @ 2015-09-11 16:04 ` Vladislav Zolotarov 2015-09-11 16:07 ` Richardson, Bruce 2015-09-11 16:08 ` Thomas Monjalon 1 sibling, 1 reply; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 16:04 UTC (permalink / raw) To: Avi Kivity; +Cc: dev On Sep 11, 2015 6:43 PM, "Avi Kivity" <avi@cloudius-systems.com> wrote: > > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >> >> >> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com> wrote: >> > >> > 2015-09-11 17:47, Avi Kivity: >> > > On 09/11/2015 05:25 PM, didier.pallard wrote: >> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: >> > > >> >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a chapter >> > > >> 7.2.1.1 of x540 devices spec: >> > > >> >> > > >> A packet (or multiple packets in transmit segmentation) can span any >> > > >> number of >> > > >> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >> > > >> minus 2 (see >> > > >> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for >> > > >> WTHRESH >> > > >> details). For best performance it is recommended to minimize the >> > > >> number of buffers >> > > >> as possible. >> > > >> >> > > >> Could u, pls., clarify why do u think that the maximum number of data >> > > >> buffers is limited by 8? >> > > >> >> > > >> thanks, >> > > >> vlad >> > > > >> > > > Hi vlad, >> > > > >> > > > Documentation states that a packet (or multiple packets in transmit >> > > > segmentation) can span any number of >> > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH >> > > > minus 2. >> > > > >> > > > Shouldn't there be a test in transmit function that drops properly the >> > > > mbufs with a too large number of >> > > > segments, while incrementing a statistic; otherwise transmit function >> > > > may be locked by the faulty packet without >> > > > notification. >> > > > >> > > >> > > What we proposed is that the pmd expose to dpdk, and dpdk expose to the >> > > application, an mbuf check function. This way applications that can >> > > generate complex packets can verify that the device will be able to >> > > process them, and applications that only generate simple mbufs can avoid >> > > the overhead by not calling the function. >> > >> > More than a check, it should be exposed as a capability of the port. >> > Anyway, if the application sends too much segments, the driver must >> > drop it to avoid hang, and maintain a dedicated statistic counter to allow >> > easy debugging. >> >> I agree with Thomas - this should not be optional. Malformed packets should be dropped. In the icgbe case it's a very simple test - it's a single branch per packet so i doubt that it could impose any measurable performance degradation. >> >> > > A drop allows the application no chance to recover. The driver must either provide the ability for the application to know that it cannot accept the packet, or it must fix it up itself. An appropriate statistics counter would be a perfect tool to detect such issues. Knowingly sending a packet that will cause a HW to hang is not acceptable. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:04 ` Vladislav Zolotarov @ 2015-09-11 16:07 ` Richardson, Bruce 2015-09-11 16:14 ` Vladislav Zolotarov 2015-09-11 17:44 ` Avi Kivity 0 siblings, 2 replies; 48+ messages in thread From: Richardson, Bruce @ 2015-09-11 16:07 UTC (permalink / raw) To: Vladislav Zolotarov, Avi Kivity; +Cc: dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladislav Zolotarov > Sent: Friday, September 11, 2015 5:04 PM > To: Avi Kivity > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 > for all NICs but 82598 > > On Sep 11, 2015 6:43 PM, "Avi Kivity" <avi@cloudius-systems.com> wrote: > > > > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > >> > >> > >> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" > >> <thomas.monjalon@6wind.com> > wrote: > >> > > >> > 2015-09-11 17:47, Avi Kivity: > >> > > On 09/11/2015 05:25 PM, didier.pallard wrote: > >> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > >> > > >> > >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a > chapter > >> > > >> 7.2.1.1 of x540 devices spec: > >> > > >> > >> > > >> A packet (or multiple packets in transmit segmentation) can > >> > > >> span > any > >> > > >> number of > >> > > >> buffers (and their descriptors) up to a limit of 40 minus > >> > > >> WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and > >> > > >> section Section 7.2.3.5.1 > for > >> > > >> WTHRESH > >> > > >> details). For best performance it is recommended to minimize > >> > > >> the number of buffers as possible. > >> > > >> > >> > > >> Could u, pls., clarify why do u think that the maximum number > >> > > >> of > data > >> > > >> buffers is limited by 8? > >> > > >> > >> > > >> thanks, > >> > > >> vlad > >> > > > > >> > > > Hi vlad, > >> > > > > >> > > > Documentation states that a packet (or multiple packets in > >> > > > transmit > >> > > > segmentation) can span any number of buffers (and their > >> > > > descriptors) up to a limit of 40 minus WTHRESH minus 2. > >> > > > > >> > > > Shouldn't there be a test in transmit function that drops > >> > > > properly > the > >> > > > mbufs with a too large number of segments, while incrementing a > >> > > > statistic; otherwise transmit > function > >> > > > may be locked by the faulty packet without notification. > >> > > > > >> > > > >> > > What we proposed is that the pmd expose to dpdk, and dpdk expose > >> > > to > the > >> > > application, an mbuf check function. This way applications that > >> > > can generate complex packets can verify that the device will be > >> > > able to process them, and applications that only generate simple > >> > > mbufs can > avoid > >> > > the overhead by not calling the function. > >> > > >> > More than a check, it should be exposed as a capability of the port. > >> > Anyway, if the application sends too much segments, the driver must > >> > drop it to avoid hang, and maintain a dedicated statistic counter > >> > to > allow > >> > easy debugging. > >> > >> I agree with Thomas - this should not be optional. Malformed packets > should be dropped. In the icgbe case it's a very simple test - it's a > single branch per packet so i doubt that it could impose any measurable > performance degradation. > >> > >> > > > > A drop allows the application no chance to recover. The driver must > either provide the ability for the application to know that it cannot > accept the packet, or it must fix it up itself. > > An appropriate statistics counter would be a perfect tool to detect such > issues. Knowingly sending a packet that will cause a HW to hang is not > acceptable. I would agree. Drivers should provide a function to query the max number of segments they can accept and the driver should be able to discard any packets exceeding that number, and just track it via a stat. /Bruce ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:07 ` Richardson, Bruce @ 2015-09-11 16:14 ` Vladislav Zolotarov 2015-09-11 17:44 ` Avi Kivity 1 sibling, 0 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 16:14 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev On Sep 11, 2015 7:07 PM, "Richardson, Bruce" <bruce.richardson@intel.com> wrote: > > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladislav Zolotarov > > Sent: Friday, September 11, 2015 5:04 PM > > To: Avi Kivity > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 > > for all NICs but 82598 > > > > On Sep 11, 2015 6:43 PM, "Avi Kivity" <avi@cloudius-systems.com> wrote: > > > > > > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > > >> > > >> > > >> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" > > >> <thomas.monjalon@6wind.com> > > wrote: > > >> > > > >> > 2015-09-11 17:47, Avi Kivity: > > >> > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > >> > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > >> > > >> > > >> > > >> Helin, the issue has been seen on x540 devices. Pls., see a > > chapter > > >> > > >> 7.2.1.1 of x540 devices spec: > > >> > > >> > > >> > > >> A packet (or multiple packets in transmit segmentation) can > > >> > > >> span > > any > > >> > > >> number of > > >> > > >> buffers (and their descriptors) up to a limit of 40 minus > > >> > > >> WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and > > >> > > >> section Section 7.2.3.5.1 > > for > > >> > > >> WTHRESH > > >> > > >> details). For best performance it is recommended to minimize > > >> > > >> the number of buffers as possible. > > >> > > >> > > >> > > >> Could u, pls., clarify why do u think that the maximum number > > >> > > >> of > > data > > >> > > >> buffers is limited by 8? > > >> > > >> > > >> > > >> thanks, > > >> > > >> vlad > > >> > > > > > >> > > > Hi vlad, > > >> > > > > > >> > > > Documentation states that a packet (or multiple packets in > > >> > > > transmit > > >> > > > segmentation) can span any number of buffers (and their > > >> > > > descriptors) up to a limit of 40 minus WTHRESH minus 2. > > >> > > > > > >> > > > Shouldn't there be a test in transmit function that drops > > >> > > > properly > > the > > >> > > > mbufs with a too large number of segments, while incrementing a > > >> > > > statistic; otherwise transmit > > function > > >> > > > may be locked by the faulty packet without notification. > > >> > > > > > >> > > > > >> > > What we proposed is that the pmd expose to dpdk, and dpdk expose > > >> > > to > > the > > >> > > application, an mbuf check function. This way applications that > > >> > > can generate complex packets can verify that the device will be > > >> > > able to process them, and applications that only generate simple > > >> > > mbufs can > > avoid > > >> > > the overhead by not calling the function. > > >> > > > >> > More than a check, it should be exposed as a capability of the port. > > >> > Anyway, if the application sends too much segments, the driver must > > >> > drop it to avoid hang, and maintain a dedicated statistic counter > > >> > to > > allow > > >> > easy debugging. > > >> > > >> I agree with Thomas - this should not be optional. Malformed packets > > should be dropped. In the icgbe case it's a very simple test - it's a > > single branch per packet so i doubt that it could impose any measurable > > performance degradation. > > >> > > >> > > > > > > A drop allows the application no chance to recover. The driver must > > either provide the ability for the application to know that it cannot > > accept the packet, or it must fix it up itself. > > > > An appropriate statistics counter would be a perfect tool to detect such > > issues. Knowingly sending a packet that will cause a HW to hang is not > > acceptable. > > I would agree. Drivers should provide a function to query the max number of > segments they can accept and the driver should be able to discard any packets > exceeding that number, and just track it via a stat. +1 > > /Bruce ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:07 ` Richardson, Bruce 2015-09-11 16:14 ` Vladislav Zolotarov @ 2015-09-11 17:44 ` Avi Kivity 1 sibling, 0 replies; 48+ messages in thread From: Avi Kivity @ 2015-09-11 17:44 UTC (permalink / raw) To: Richardson, Bruce, Vladislav Zolotarov; +Cc: dev On 09/11/2015 07:07 PM, Richardson, Bruce wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladislav Zolotarov >> Sent: Friday, September 11, 2015 5:04 PM >> To: Avi Kivity >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 >> for all NICs but 82598 >> >> On Sep 11, 2015 6:43 PM, "Avi Kivity" <avi@cloudius-systems.com> wrote: >>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >>>> >>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" >>>> <thomas.monjalon@6wind.com> >> wrote: >>>>> 2015-09-11 17:47, Avi Kivity: >>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: >>>>>>> On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: >>>>>>>> Helin, the issue has been seen on x540 devices. Pls., see a >> chapter >>>>>>>> 7.2.1.1 of x540 devices spec: >>>>>>>> >>>>>>>> A packet (or multiple packets in transmit segmentation) can >>>>>>>> span >> any >>>>>>>> number of >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus >>>>>>>> WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details and >>>>>>>> section Section 7.2.3.5.1 >> for >>>>>>>> WTHRESH >>>>>>>> details). For best performance it is recommended to minimize >>>>>>>> the number of buffers as possible. >>>>>>>> >>>>>>>> Could u, pls., clarify why do u think that the maximum number >>>>>>>> of >> data >>>>>>>> buffers is limited by 8? >>>>>>>> >>>>>>>> thanks, >>>>>>>> vlad >>>>>>> Hi vlad, >>>>>>> >>>>>>> Documentation states that a packet (or multiple packets in >>>>>>> transmit >>>>>>> segmentation) can span any number of buffers (and their >>>>>>> descriptors) up to a limit of 40 minus WTHRESH minus 2. >>>>>>> >>>>>>> Shouldn't there be a test in transmit function that drops >>>>>>> properly >> the >>>>>>> mbufs with a too large number of segments, while incrementing a >>>>>>> statistic; otherwise transmit >> function >>>>>>> may be locked by the faulty packet without notification. >>>>>>> >>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose >>>>>> to >> the >>>>>> application, an mbuf check function. This way applications that >>>>>> can generate complex packets can verify that the device will be >>>>>> able to process them, and applications that only generate simple >>>>>> mbufs can >> avoid >>>>>> the overhead by not calling the function. >>>>> More than a check, it should be exposed as a capability of the port. >>>>> Anyway, if the application sends too much segments, the driver must >>>>> drop it to avoid hang, and maintain a dedicated statistic counter >>>>> to >> allow >>>>> easy debugging. >>>> I agree with Thomas - this should not be optional. Malformed packets >> should be dropped. In the icgbe case it's a very simple test - it's a >> single branch per packet so i doubt that it could impose any measurable >> performance degradation.allows >>>> >>> A drop allows the application no chance to recover. The driver must >> either provide the ability for the application to know that it cannot >> accept the packet, or it must fix it up itself. >> >> An appropriate statistics counter would be a perfect tool to detect such >> issues. Knowingly sending a packet that will cause a HW to hang is not >> acceptable. > I would agree. Drivers should provide a function to query the max number of > segments they can accept and the driver should be able to discard any packets > exceeding that number, and just track it via a stat. > There is no such max number of segments. The i40e card, as an extreme example, allows 8 fragments per packet, but that is after TSO segmentation. So if the header is in three fragments, that leaves 5 data fragments per packet. Another card (ixgbe) has a 38-fragment pre-TSO limit. With such a variety of limitations, the only generic way to expose them is via a function. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 15:43 ` Avi Kivity 2015-09-11 16:04 ` Vladislav Zolotarov @ 2015-09-11 16:08 ` Thomas Monjalon 2015-09-11 16:18 ` Vladislav Zolotarov 2015-09-11 17:48 ` Avi Kivity 1 sibling, 2 replies; 48+ messages in thread From: Thomas Monjalon @ 2015-09-11 16:08 UTC (permalink / raw) To: Avi Kivity, Vladislav Zolotarov, didier.pallard; +Cc: dev 2015-09-11 18:43, Avi Kivity: > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com > > <mailto:thomas.monjalon@6wind.com>> wrote: > > > 2015-09-11 17:47, Avi Kivity: > > > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > > > Hi vlad, > > > > > > > > > > Documentation states that a packet (or multiple packets in transmit > > > > > segmentation) can span any number of > > > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > > > > minus 2. > > > > > > > > > > Shouldn't there be a test in transmit function that drops > > properly the > > > > > mbufs with a too large number of > > > > > segments, while incrementing a statistic; otherwise transmit > > function > > > > > may be locked by the faulty packet without > > > > > notification. > > > > > > > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose > > to the > > > > application, an mbuf check function. This way applications that can > > > > generate complex packets can verify that the device will be able to > > > > process them, and applications that only generate simple mbufs can > > avoid > > > > the overhead by not calling the function. > > > > > > More than a check, it should be exposed as a capability of the port. > > > Anyway, if the application sends too much segments, the driver must > > > drop it to avoid hang, and maintain a dedicated statistic counter to > > > allow easy debugging. > > > > I agree with Thomas - this should not be optional. Malformed packets > > should be dropped. In the icgbe case it's a very simple test - it's a > > single branch per packet so i doubt that it could impose any > > measurable performance degradation. > > A drop allows the application no chance to recover. The driver must > either provide the ability for the application to know that it cannot > accept the packet, or it must fix it up itself. I have the feeling that everybody agrees on the same thing: the application must be able to make a well formed packet by checking limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? In case the application fails in its checks, the driver must drop it and notify the user via a stat counter. The driver can also remove the hardware limitation by gathering the segments but it may be hard to implement and would be a slow operation. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:08 ` Thomas Monjalon @ 2015-09-11 16:18 ` Vladislav Zolotarov 2015-09-11 17:17 ` Matthew Hall 2015-09-11 17:48 ` Avi Kivity 1 sibling, 1 reply; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 16:18 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev On Sep 11, 2015 7:09 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com> wrote: > > 2015-09-11 18:43, Avi Kivity: > > On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com > > > <mailto:thomas.monjalon@6wind.com>> wrote: > > > > 2015-09-11 17:47, Avi Kivity: > > > > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > > > > Hi vlad, > > > > > > > > > > > > Documentation states that a packet (or multiple packets in transmit > > > > > > segmentation) can span any number of > > > > > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > > > > > minus 2. > > > > > > > > > > > > Shouldn't there be a test in transmit function that drops > > > properly the > > > > > > mbufs with a too large number of > > > > > > segments, while incrementing a statistic; otherwise transmit > > > function > > > > > > may be locked by the faulty packet without > > > > > > notification. > > > > > > > > > > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose > > > to the > > > > > application, an mbuf check function. This way applications that can > > > > > generate complex packets can verify that the device will be able to > > > > > process them, and applications that only generate simple mbufs can > > > avoid > > > > > the overhead by not calling the function. > > > > > > > > More than a check, it should be exposed as a capability of the port. > > > > Anyway, if the application sends too much segments, the driver must > > > > drop it to avoid hang, and maintain a dedicated statistic counter to > > > > allow easy debugging. > > > > > > I agree with Thomas - this should not be optional. Malformed packets > > > should be dropped. In the icgbe case it's a very simple test - it's a > > > single branch per packet so i doubt that it could impose any > > > measurable performance degradation. > > > > A drop allows the application no chance to recover. The driver must > > either provide the ability for the application to know that it cannot > > accept the packet, or it must fix it up itself. > > I have the feeling that everybody agrees on the same thing: > the application must be able to make a well formed packet by checking > limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? > In case the application fails in its checks, the driver must drop it and > notify the user via a stat counter. > The driver can also remove the hardware limitation by gathering the segments > but it may be hard to implement and would be a slow operation. We thought about linearization too. It's doable with extra mempool and it may be optional so that those that don't need could compile it out and/or disable it in a runtime... ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:18 ` Vladislav Zolotarov @ 2015-09-11 17:17 ` Matthew Hall 2015-09-11 17:42 ` Ananyev, Konstantin 0 siblings, 1 reply; 48+ messages in thread From: Matthew Hall @ 2015-09-11 17:17 UTC (permalink / raw) To: Vladislav Zolotarov; +Cc: dev On Fri, Sep 11, 2015 at 07:18:20PM +0300, Vladislav Zolotarov wrote: > We thought about linearization too. It's doable with extra mempool and it > may be optional so that those that don't need could compile it out and/or > disable it in a runtime... High-level question. How realistic is sending a 40-segment frame in the first place? This whole thing seems kind of academic to me unless I missed something. I usually use 2K pktmbufs and I don't think this is an uncommon size. Most jumbo frame hardware only does 9.5KB max frame size or so. So I am having a hard time imagining how I'd end up with more than 10 segments as a worst-case scenario. Matthew. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 17:17 ` Matthew Hall @ 2015-09-11 17:42 ` Ananyev, Konstantin 2015-09-11 17:58 ` Matthew Hall 0 siblings, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-09-11 17:42 UTC (permalink / raw) To: Matthew Hall, Vladislav Zolotarov; +Cc: dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matthew Hall > Sent: Friday, September 11, 2015 6:18 PM > To: Vladislav Zolotarov > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > On Fri, Sep 11, 2015 at 07:18:20PM +0300, Vladislav Zolotarov wrote: > > We thought about linearization too. It's doable with extra mempool and it > > may be optional so that those that don't need could compile it out and/or > > disable it in a runtime... > > High-level question. How realistic is sending a 40-segment frame in the first > place? This whole thing seems kind of academic to me unless I missed > something. > > I usually use 2K pktmbufs and I don't think this is an uncommon size. Most > jumbo frame hardware only does 9.5KB max frame size or so. > > So I am having a hard time imagining how I'd end up with more than 10 segments > as a worst-case scenario. As I remember, with freebsd stack when TSO is on it was not unusual to see chains of ~30 segments. That's over port with 'normal' mtu (1.5K). Konstantin > > Matthew. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 17:42 ` Ananyev, Konstantin @ 2015-09-11 17:58 ` Matthew Hall 0 siblings, 0 replies; 48+ messages in thread From: Matthew Hall @ 2015-09-11 17:58 UTC (permalink / raw) To: Ananyev, Konstantin; +Cc: dev On Fri, Sep 11, 2015 at 05:42:48PM +0000, Ananyev, Konstantin wrote: > As I remember, with freebsd stack when TSO is on it was not unusual to see chains of ~30 segments. > That's over port with 'normal' mtu (1.5K). > Konstantin This makes things quite tricky, because the TSO logic itself would need to handle the segment generation versus the limit, as the app would not really be able to force TSO to change its behavior, right? Matthew. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:08 ` Thomas Monjalon 2015-09-11 16:18 ` Vladislav Zolotarov @ 2015-09-11 17:48 ` Avi Kivity 2015-09-13 11:47 ` Ananyev, Konstantin 1 sibling, 1 reply; 48+ messages in thread From: Avi Kivity @ 2015-09-11 17:48 UTC (permalink / raw) To: Thomas Monjalon, Vladislav Zolotarov, didier.pallard; +Cc: dev On 09/11/2015 07:08 PM, Thomas Monjalon wrote: > 2015-09-11 18:43, Avi Kivity: >> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com >>> <mailto:thomas.monjalon@6wind.com>> wrote: >>>> 2015-09-11 17:47, Avi Kivity: >>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: >>>>>> Hi vlad, >>>>>> >>>>>> Documentation states that a packet (or multiple packets in transmit >>>>>> segmentation) can span any number of >>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >>>>>> minus 2. >>>>>> >>>>>> Shouldn't there be a test in transmit function that drops >>> properly the >>>>>> mbufs with a too large number of >>>>>> segments, while incrementing a statistic; otherwise transmit >>> function >>>>>> may be locked by the faulty packet without >>>>>> notification. >>>>>> >>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose >>> to the >>>>> application, an mbuf check function. This way applications that can >>>>> generate complex packets can verify that the device will be able to >>>>> process them, and applications that only generate simple mbufs can >>> avoid >>>>> the overhead by not calling the function. >>>> More than a check, it should be exposed as a capability of the port. >>>> Anyway, if the application sends too much segments, the driver must >>>> drop it to avoid hang, and maintain a dedicated statistic counter to >>>> allow easy debugging. >>> I agree with Thomas - this should not be optional. Malformed packets >>> should be dropped. In the icgbe case it's a very simple test - it's a >>> single branch per packet so i doubt that it could impose any >>> measurable performance degradation. >> A drop allows the application no chance to recover. The driver must >> either provide the ability for the application to know that it cannot >> accept the packet, or it must fix it up itself. > I have the feeling that everybody agrees on the same thing: > the application must be able to make a well formed packet by checking > limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? It is not generic enough. i40e has a limit that it imposes post-TSO. > In case the application fails in its checks, the driver must drop it and > notify the user via a stat counter. > The driver can also remove the hardware limitation by gathering the segments > but it may be hard to implement and would be a slow operation. I think that to satisfy both the 64b full line rate applications and the more complicated full stack applications, this must be made optional. In particular, and application that only forwards packets will never hit a NIC's limits, so it need not take any action. That's why I think a verification function is ideal; a forwarding application can ignore it, and a complex application can call it, and if it fails the packet, it can linearize it itself, removing complexity from dpdk itself. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 17:48 ` Avi Kivity @ 2015-09-13 11:47 ` Ananyev, Konstantin 2015-09-13 12:24 ` Vlad Zolotarov 2015-09-13 12:32 ` Avi Kivity 0 siblings, 2 replies; 48+ messages in thread From: Ananyev, Konstantin @ 2015-09-13 11:47 UTC (permalink / raw) To: Avi Kivity, Thomas Monjalon, Vladislav Zolotarov, didier.pallard; +Cc: dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity > Sent: Friday, September 11, 2015 6:48 PM > To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > On 09/11/2015 07:08 PM, Thomas Monjalon wrote: > > 2015-09-11 18:43, Avi Kivity: > >> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > >>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com > >>> <mailto:thomas.monjalon@6wind.com>> wrote: > >>>> 2015-09-11 17:47, Avi Kivity: > >>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: > >>>>>> Hi vlad, > >>>>>> > >>>>>> Documentation states that a packet (or multiple packets in transmit > >>>>>> segmentation) can span any number of > >>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > >>>>>> minus 2. > >>>>>> > >>>>>> Shouldn't there be a test in transmit function that drops > >>> properly the > >>>>>> mbufs with a too large number of > >>>>>> segments, while incrementing a statistic; otherwise transmit > >>> function > >>>>>> may be locked by the faulty packet without > >>>>>> notification. > >>>>>> > >>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose > >>> to the > >>>>> application, an mbuf check function. This way applications that can > >>>>> generate complex packets can verify that the device will be able to > >>>>> process them, and applications that only generate simple mbufs can > >>> avoid > >>>>> the overhead by not calling the function. > >>>> More than a check, it should be exposed as a capability of the port. > >>>> Anyway, if the application sends too much segments, the driver must > >>>> drop it to avoid hang, and maintain a dedicated statistic counter to > >>>> allow easy debugging. > >>> I agree with Thomas - this should not be optional. Malformed packets > >>> should be dropped. In the icgbe case it's a very simple test - it's a > >>> single branch per packet so i doubt that it could impose any > >>> measurable performance degradation. > >> A drop allows the application no chance to recover. The driver must > >> either provide the ability for the application to know that it cannot > >> accept the packet, or it must fix it up itself. > > I have the feeling that everybody agrees on the same thing: > > the application must be able to make a well formed packet by checking > > limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? > > It is not generic enough. i40e has a limit that it imposes post-TSO. > > > > In case the application fails in its checks, the driver must drop it and > > notify the user via a stat counter. > > The driver can also remove the hardware limitation by gathering the segments > > but it may be hard to implement and would be a slow operation. > > I think that to satisfy both the 64b full line rate applications and the > more complicated full stack applications, this must be made optional. > In particular, and application that only forwards packets will never hit > a NIC's limits, so it need not take any action. That's why I think a > verification function is ideal; a forwarding application can ignore it, > and a complex application can call it, and if it fails the packet, it > can linearize it itself, removing complexity from dpdk itself. I think that's a good approach to that problem. As I remember we discussed something similar a while ago - A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, calculate pseudo-header checksum, reset ip header len, etc. >From other hand we also can add two more fields into rte_eth_dev_info: 1) Max num of segs per TSO packet (tx_max_seg ?). 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). So for ixgbe both will have value 40 - wthresh, while for i40e 1) would be UINT8_MAX and 2) will be 8. Then upper layer can use that information to select an optimal size for its TX buffers. Konstantin ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-13 11:47 ` Ananyev, Konstantin @ 2015-09-13 12:24 ` Vlad Zolotarov 2015-09-13 12:32 ` Avi Kivity 1 sibling, 0 replies; 48+ messages in thread From: Vlad Zolotarov @ 2015-09-13 12:24 UTC (permalink / raw) To: Ananyev, Konstantin, Avi Kivity, Thomas Monjalon, didier.pallard; +Cc: dev On 09/13/15 14:47, Ananyev, Konstantin wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity >> Sent: Friday, September 11, 2015 6:48 PM >> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 >> >> On 09/11/2015 07:08 PM, Thomas Monjalon wrote: >>> 2015-09-11 18:43, Avi Kivity: >>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com >>>>> <mailto:thomas.monjalon@6wind.com>> wrote: >>>>>> 2015-09-11 17:47, Avi Kivity: >>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: >>>>>>>> Hi vlad, >>>>>>>> >>>>>>>> Documentation states that a packet (or multiple packets in transmit >>>>>>>> segmentation) can span any number of >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >>>>>>>> minus 2. >>>>>>>> >>>>>>>> Shouldn't there be a test in transmit function that drops >>>>> properly the >>>>>>>> mbufs with a too large number of >>>>>>>> segments, while incrementing a statistic; otherwise transmit >>>>> function >>>>>>>> may be locked by the faulty packet without >>>>>>>> notification. >>>>>>>> >>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose >>>>> to the >>>>>>> application, an mbuf check function. This way applications that can >>>>>>> generate complex packets can verify that the device will be able to >>>>>>> process them, and applications that only generate simple mbufs can >>>>> avoid >>>>>>> the overhead by not calling the function. >>>>>> More than a check, it should be exposed as a capability of the port. >>>>>> Anyway, if the application sends too much segments, the driver must >>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to >>>>>> allow easy debugging. >>>>> I agree with Thomas - this should not be optional. Malformed packets >>>>> should be dropped. In the icgbe case it's a very simple test - it's a >>>>> single branch per packet so i doubt that it could impose any >>>>> measurable performance degradation. >>>> A drop allows the application no chance to recover. The driver must >>>> either provide the ability for the application to know that it cannot >>>> accept the packet, or it must fix it up itself. >>> I have the feeling that everybody agrees on the same thing: >>> the application must be able to make a well formed packet by checking >>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? >> It is not generic enough. i40e has a limit that it imposes post-TSO. >> >> >>> In case the application fails in its checks, the driver must drop it and >>> notify the user via a stat counter. >>> The driver can also remove the hardware limitation by gathering the segments >>> but it may be hard to implement and would be a slow operation. >> I think that to satisfy both the 64b full line rate applications and the >> more complicated full stack applications, this must be made optional. >> In particular, and application that only forwards packets will never hit >> a NIC's limits, so it need not take any action. That's why I think a >> verification function is ideal; a forwarding application can ignore it, >> and a complex application can call it, and if it fails the packet, it >> can linearize it itself, removing complexity from dpdk itself. > I think that's a good approach to that problem. > As I remember we discussed something similar a while ago - > A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, > calculate pseudo-header checksum, reset ip header len, etc. > > From other hand we also can add two more fields into rte_eth_dev_info: > 1) Max num of segs per TSO packet (tx_max_seg ?). > 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). > So for ixgbe both will have value 40 - wthresh, > while for i40e 1) would be UINT8_MAX and 2) will be 8. > Then upper layer can use that information to select an optimal size for its TX buffers. HW limitations differ from HW to HW not only by values but also by their nature - for instance for Qlogic bnx2x NICs the limitations may not be expressed in the values above so this must be a callback. > > Konstantin > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-13 11:47 ` Ananyev, Konstantin 2015-09-13 12:24 ` Vlad Zolotarov @ 2015-09-13 12:32 ` Avi Kivity 2015-09-13 15:54 ` Ananyev, Konstantin 1 sibling, 1 reply; 48+ messages in thread From: Avi Kivity @ 2015-09-13 12:32 UTC (permalink / raw) To: Ananyev, Konstantin, Thomas Monjalon, Vladislav Zolotarov, didier.pallard Cc: dev On 09/13/2015 02:47 PM, Ananyev, Konstantin wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity >> Sent: Friday, September 11, 2015 6:48 PM >> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 >> >> On 09/11/2015 07:08 PM, Thomas Monjalon wrote: >>> 2015-09-11 18:43, Avi Kivity: >>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com >>>>> <mailto:thomas.monjalon@6wind.com>> wrote: >>>>>> 2015-09-11 17:47, Avi Kivity: >>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: >>>>>>>> Hi vlad, >>>>>>>> >>>>>>>> Documentation states that a packet (or multiple packets in transmit >>>>>>>> segmentation) can span any number of >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >>>>>>>> minus 2. >>>>>>>> >>>>>>>> Shouldn't there be a test in transmit function that drops >>>>> properly the >>>>>>>> mbufs with a too large number of >>>>>>>> segments, while incrementing a statistic; otherwise transmit >>>>> function >>>>>>>> may be locked by the faulty packet without >>>>>>>> notification. >>>>>>>> >>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose >>>>> to the >>>>>>> application, an mbuf check function. This way applications that can >>>>>>> generate complex packets can verify that the device will be able to >>>>>>> process them, and applications that only generate simple mbufs can >>>>> avoid >>>>>>> the overhead by not calling the function. >>>>>> More than a check, it should be exposed as a capability of the port. >>>>>> Anyway, if the application sends too much segments, the driver must >>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to >>>>>> allow easy debugging. >>>>> I agree with Thomas - this should not be optional. Malformed packets >>>>> should be dropped. In the icgbe case it's a very simple test - it's a >>>>> single branch per packet so i doubt that it could impose any >>>>> measurable performance degradation. >>>> A drop allows the application no chance to recover. The driver must >>>> either provide the ability for the application to know that it cannot >>>> accept the packet, or it must fix it up itself. >>> I have the feeling that everybody agrees on the same thing: >>> the application must be able to make a well formed packet by checking >>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? >> It is not generic enough. i40e has a limit that it imposes post-TSO. >> >> >>> In case the application fails in its checks, the driver must drop it and >>> notify the user via a stat counter. >>> The driver can also remove the hardware limitation by gathering the segments >>> but it may be hard to implement and would be a slow operation. >> I think that to satisfy both the 64b full line rate applications and the >> more complicated full stack applications, this must be made optional. >> In particular, and application that only forwards packets will never hit >> a NIC's limits, so it need not take any action. That's why I think a >> verification function is ideal; a forwarding application can ignore it, >> and a complex application can call it, and if it fails the packet, it >> can linearize it itself, removing complexity from dpdk itself. > I think that's a good approach to that problem. > As I remember we discussed something similar a while ago - > A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, > calculate pseudo-header checksum, reset ip header len, etc. > > From other hand we also can add two more fields into rte_eth_dev_info: > 1) Max num of segs per TSO packet (tx_max_seg ?). > 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). > So for ixgbe both will have value 40 - wthresh, > while for i40e 1) would be UINT8_MAX and 2) will be 8. > Then upper layer can use that information to select an optimal size for its TX buffers. > > This will break whenever the fevered imagination of hardware designers comes up with a new limit. We can have an internal function that accepts these two parameters, and then the driver-specific function can call this internal function: static bool i40e_validate_packet(mbuf* m) { return rte_generic_validate_packet(m, 0, 8); } static bool ixgbe_validate_packet(mbuf* m) { return rte_generic_validate_packet(m, 40, 2); } this way, the application is isolated from changes in how invalid packets are detected. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-13 12:32 ` Avi Kivity @ 2015-09-13 15:54 ` Ananyev, Konstantin 2015-09-13 16:01 ` Avi Kivity 0 siblings, 1 reply; 48+ messages in thread From: Ananyev, Konstantin @ 2015-09-13 15:54 UTC (permalink / raw) To: Avi Kivity, Thomas Monjalon, Vladislav Zolotarov, didier.pallard; +Cc: dev > -----Original Message----- > From: Avi Kivity [mailto:avi@cloudius-systems.com] > Sent: Sunday, September 13, 2015 1:33 PM > To: Ananyev, Konstantin; Thomas Monjalon; Vladislav Zolotarov; didier.pallard > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > On 09/13/2015 02:47 PM, Ananyev, Konstantin wrote: > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity > >> Sent: Friday, September 11, 2015 6:48 PM > >> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard > >> Cc: dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > >> > >> On 09/11/2015 07:08 PM, Thomas Monjalon wrote: > >>> 2015-09-11 18:43, Avi Kivity: > >>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > >>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com > >>>>> <mailto:thomas.monjalon@6wind.com>> wrote: > >>>>>> 2015-09-11 17:47, Avi Kivity: > >>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: > >>>>>>>> Hi vlad, > >>>>>>>> > >>>>>>>> Documentation states that a packet (or multiple packets in transmit > >>>>>>>> segmentation) can span any number of > >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > >>>>>>>> minus 2. > >>>>>>>> > >>>>>>>> Shouldn't there be a test in transmit function that drops > >>>>> properly the > >>>>>>>> mbufs with a too large number of > >>>>>>>> segments, while incrementing a statistic; otherwise transmit > >>>>> function > >>>>>>>> may be locked by the faulty packet without > >>>>>>>> notification. > >>>>>>>> > >>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose > >>>>> to the > >>>>>>> application, an mbuf check function. This way applications that can > >>>>>>> generate complex packets can verify that the device will be able to > >>>>>>> process them, and applications that only generate simple mbufs can > >>>>> avoid > >>>>>>> the overhead by not calling the function. > >>>>>> More than a check, it should be exposed as a capability of the port. > >>>>>> Anyway, if the application sends too much segments, the driver must > >>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to > >>>>>> allow easy debugging. > >>>>> I agree with Thomas - this should not be optional. Malformed packets > >>>>> should be dropped. In the icgbe case it's a very simple test - it's a > >>>>> single branch per packet so i doubt that it could impose any > >>>>> measurable performance degradation. > >>>> A drop allows the application no chance to recover. The driver must > >>>> either provide the ability for the application to know that it cannot > >>>> accept the packet, or it must fix it up itself. > >>> I have the feeling that everybody agrees on the same thing: > >>> the application must be able to make a well formed packet by checking > >>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? > >> It is not generic enough. i40e has a limit that it imposes post-TSO. > >> > >> > >>> In case the application fails in its checks, the driver must drop it and > >>> notify the user via a stat counter. > >>> The driver can also remove the hardware limitation by gathering the segments > >>> but it may be hard to implement and would be a slow operation. > >> I think that to satisfy both the 64b full line rate applications and the > >> more complicated full stack applications, this must be made optional. > >> In particular, and application that only forwards packets will never hit > >> a NIC's limits, so it need not take any action. That's why I think a > >> verification function is ideal; a forwarding application can ignore it, > >> and a complex application can call it, and if it fails the packet, it > >> can linearize it itself, removing complexity from dpdk itself. > > I think that's a good approach to that problem. > > As I remember we discussed something similar a while ago - > > A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, > > calculate pseudo-header checksum, reset ip header len, etc. > > > > From other hand we also can add two more fields into rte_eth_dev_info: > > 1) Max num of segs per TSO packet (tx_max_seg ?). > > 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). > > So for ixgbe both will have value 40 - wthresh, > > while for i40e 1) would be UINT8_MAX and 2) will be 8. > > Then upper layer can use that information to select an optimal size for its TX buffers. > > > > > > This will break whenever the fevered imagination of hardware designers > comes up with a new limit. > > We can have an internal function that accepts these two parameters, and > then the driver-specific function can call this internal function: > > static bool i40e_validate_packet(mbuf* m) { > return rte_generic_validate_packet(m, 0, 8); > } > > static bool ixgbe_validate_packet(mbuf* m) { > return rte_generic_validate_packet(m, 40, 2); > } > > this way, the application is isolated from changes in how invalid > packets are detected. > > I am not saying we shouldn't have tx_prep (tx_validate?) function per PMD. As I said before I like that approach. I think we should have tx_prep (as you suggested) that most people using full-path TX would call, *plus* these extra fields in re_eth_dev_conf, so if someone needs that information - it would be there. Konstantin ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-13 15:54 ` Ananyev, Konstantin @ 2015-09-13 16:01 ` Avi Kivity 0 siblings, 0 replies; 48+ messages in thread From: Avi Kivity @ 2015-09-13 16:01 UTC (permalink / raw) To: Konstantin Ananyev; +Cc: <dev@dpdk.org> On Sep 13, 2015 6:54 PM, "Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote: > > > > > -----Original Message----- > > From: Avi Kivity [mailto:avi@cloudius-systems.com] > > Sent: Sunday, September 13, 2015 1:33 PM > > To: Ananyev, Konstantin; Thomas Monjalon; Vladislav Zolotarov; didier.pallard > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > > > On 09/13/2015 02:47 PM, Ananyev, Konstantin wrote: > > > > > >> -----Original Message----- > > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity > > >> Sent: Friday, September 11, 2015 6:48 PM > > >> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard > > >> Cc: dev@dpdk.org > > >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 > > >> > > >> On 09/11/2015 07:08 PM, Thomas Monjalon wrote: > > >>> 2015-09-11 18:43, Avi Kivity: > > >>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: > > >>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" < thomas.monjalon@6wind.com > > >>>>> <mailto:thomas.monjalon@6wind.com>> wrote: > > >>>>>> 2015-09-11 17:47, Avi Kivity: > > >>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: > > >>>>>>>> Hi vlad, > > >>>>>>>> > > >>>>>>>> Documentation states that a packet (or multiple packets in transmit > > >>>>>>>> segmentation) can span any number of > > >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH > > >>>>>>>> minus 2. > > >>>>>>>> > > >>>>>>>> Shouldn't there be a test in transmit function that drops > > >>>>> properly the > > >>>>>>>> mbufs with a too large number of > > >>>>>>>> segments, while incrementing a statistic; otherwise transmit > > >>>>> function > > >>>>>>>> may be locked by the faulty packet without > > >>>>>>>> notification. > > >>>>>>>> > > >>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose > > >>>>> to the > > >>>>>>> application, an mbuf check function. This way applications that can > > >>>>>>> generate complex packets can verify that the device will be able to > > >>>>>>> process them, and applications that only generate simple mbufs can > > >>>>> avoid > > >>>>>>> the overhead by not calling the function. > > >>>>>> More than a check, it should be exposed as a capability of the port. > > >>>>>> Anyway, if the application sends too much segments, the driver must > > >>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to > > >>>>>> allow easy debugging. > > >>>>> I agree with Thomas - this should not be optional. Malformed packets > > >>>>> should be dropped. In the icgbe case it's a very simple test - it's a > > >>>>> single branch per packet so i doubt that it could impose any > > >>>>> measurable performance degradation. > > >>>> A drop allows the application no chance to recover. The driver must > > >>>> either provide the ability for the application to know that it cannot > > >>>> accept the packet, or it must fix it up itself. > > >>> I have the feeling that everybody agrees on the same thing: > > >>> the application must be able to make a well formed packet by checking > > >>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? > > >> It is not generic enough. i40e has a limit that it imposes post-TSO. > > >> > > >> > > >>> In case the application fails in its checks, the driver must drop it and > > >>> notify the user via a stat counter. > > >>> The driver can also remove the hardware limitation by gathering the segments > > >>> but it may be hard to implement and would be a slow operation. > > >> I think that to satisfy both the 64b full line rate applications and the > > >> more complicated full stack applications, this must be made optional. > > >> In particular, and application that only forwards packets will never hit > > >> a NIC's limits, so it need not take any action. That's why I think a > > >> verification function is ideal; a forwarding application can ignore it, > > >> and a complex application can call it, and if it fails the packet, it > > >> can linearize it itself, removing complexity from dpdk itself. > > > I think that's a good approach to that problem. > > > As I remember we discussed something similar a while ago - > > > A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, > > > calculate pseudo-header checksum, reset ip header len, etc. > > > > > > From other hand we also can add two more fields into rte_eth_dev_info: > > > 1) Max num of segs per TSO packet (tx_max_seg ?). > > > 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). > > > So for ixgbe both will have value 40 - wthresh, > > > while for i40e 1) would be UINT8_MAX and 2) will be 8. > > > Then upper layer can use that information to select an optimal size for its TX buffers. > > > > > > > > > > This will break whenever the fevered imagination of hardware designers > > comes up with a new limit. > > > > We can have an internal function that accepts these two parameters, and > > then the driver-specific function can call this internal function: > > > > static bool i40e_validate_packet(mbuf* m) { > > return rte_generic_validate_packet(m, 0, 8); > > } > > > > static bool ixgbe_validate_packet(mbuf* m) { > > return rte_generic_validate_packet(m, 40, 2); > > } > > > > this way, the application is isolated from changes in how invalid > > packets are detected. > > > > > > I am not saying we shouldn't have tx_prep (tx_validate?) function per PMD. > As I said before I like that approach. > I think we should have tx_prep (as you suggested) that most people using full-path TX would call, > *plus* these extra fields in re_eth_dev_conf, so if someone needs that information - it would be there. I think this is reasonable. Having those values can allow the application to avoid generating bad packets in the first place. > Konstantin > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 15:12 ` Vladislav Zolotarov 2015-09-11 15:43 ` Avi Kivity @ 2015-09-11 16:00 ` Richardson, Bruce 2015-09-11 16:13 ` Vladislav Zolotarov 1 sibling, 1 reply; 48+ messages in thread From: Richardson, Bruce @ 2015-09-11 16:00 UTC (permalink / raw) To: Vladislav Zolotarov, Thomas Monjalon; +Cc: dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladislav Zolotarov > Sent: Friday, September 11, 2015 4:13 PM > To: Thomas Monjalon > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 > for all NICs but 82598 > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com> > wrote: > > > > 2015-09-11 17:47, Avi Kivity: > > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > > >> > > > >> Helin, the issue has been seen on x540 devices. Pls., see a > > > >> chapter > > > >> 7.2.1.1 of x540 devices spec: > > > >> > > > >> A packet (or multiple packets in transmit segmentation) can span > > > >> any number of buffers (and their descriptors) up to a limit of 40 > > > >> minus WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details > > > >> and section Section 7.2.3.5.1 for WTHRESH details). For best > > > >> performance it is recommended to minimize the number of buffers > > > >> as possible. > > > >> > > > >> Could u, pls., clarify why do u think that the maximum number of > > > >> data buffers is limited by 8? > > > >> > > > >> thanks, > > > >> vlad > > > > > > > > Hi vlad, > > > > > > > > Documentation states that a packet (or multiple packets in > > > > transmit > > > > segmentation) can span any number of buffers (and their > > > > descriptors) up to a limit of 40 minus WTHRESH minus 2. > > > > > > > > Shouldn't there be a test in transmit function that drops properly > > > > the mbufs with a too large number of segments, while incrementing > > > > a statistic; otherwise transmit function may be locked by the > > > > faulty packet without notification. > > > > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose to > > > the application, an mbuf check function. This way applications that > > > can generate complex packets can verify that the device will be able > > > to process them, and applications that only generate simple mbufs > > > can avoid the overhead by not calling the function. > > > > More than a check, it should be exposed as a capability of the port. > > Anyway, if the application sends too much segments, the driver must > > drop it to avoid hang, and maintain a dedicated statistic counter to > > allow easy debugging. > > I agree with Thomas - this should not be optional. Malformed packets > should be dropped. In the icgbe case it's a very simple test - it's a > single branch per packet so i doubt that it could impose any measurable > performance degradation. > Actually, it could very well do - we'd have to test it. For the vector IO paths, every additional cycle in the RX or TX paths causes a noticeable perf drop. /Bruce ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 2015-09-11 16:00 ` Richardson, Bruce @ 2015-09-11 16:13 ` Vladislav Zolotarov 0 siblings, 0 replies; 48+ messages in thread From: Vladislav Zolotarov @ 2015-09-11 16:13 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev On Sep 11, 2015 7:00 PM, "Richardson, Bruce" <bruce.richardson@intel.com> wrote: > > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Vladislav Zolotarov > > Sent: Friday, September 11, 2015 4:13 PM > > To: Thomas Monjalon > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 > > for all NICs but 82598 > > > > On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon@6wind.com> > > wrote: > > > > > > 2015-09-11 17:47, Avi Kivity: > > > > On 09/11/2015 05:25 PM, didier.pallard wrote: > > > > > On 08/25/2015 08:52 PM, Vlad Zolotarov wrote: > > > > >> > > > > >> Helin, the issue has been seen on x540 devices. Pls., see a > > > > >> chapter > > > > >> 7.2.1.1 of x540 devices spec: > > > > >> > > > > >> A packet (or multiple packets in transmit segmentation) can span > > > > >> any number of buffers (and their descriptors) up to a limit of 40 > > > > >> minus WTHRESH minus 2 (see Section 7.2.3.3 for Tx Ring details > > > > >> and section Section 7.2.3.5.1 for WTHRESH details). For best > > > > >> performance it is recommended to minimize the number of buffers > > > > >> as possible. > > > > >> > > > > >> Could u, pls., clarify why do u think that the maximum number of > > > > >> data buffers is limited by 8? > > > > >> > > > > >> thanks, > > > > >> vlad > > > > > > > > > > Hi vlad, > > > > > > > > > > Documentation states that a packet (or multiple packets in > > > > > transmit > > > > > segmentation) can span any number of buffers (and their > > > > > descriptors) up to a limit of 40 minus WTHRESH minus 2. > > > > > > > > > > Shouldn't there be a test in transmit function that drops properly > > > > > the mbufs with a too large number of segments, while incrementing > > > > > a statistic; otherwise transmit function may be locked by the > > > > > faulty packet without notification. > > > > > > > > > > > > > What we proposed is that the pmd expose to dpdk, and dpdk expose to > > > > the application, an mbuf check function. This way applications that > > > > can generate complex packets can verify that the device will be able > > > > to process them, and applications that only generate simple mbufs > > > > can avoid the overhead by not calling the function. > > > > > > More than a check, it should be exposed as a capability of the port. > > > Anyway, if the application sends too much segments, the driver must > > > drop it to avoid hang, and maintain a dedicated statistic counter to > > > allow easy debugging. > > > > I agree with Thomas - this should not be optional. Malformed packets > > should be dropped. In the icgbe case it's a very simple test - it's a > > single branch per packet so i doubt that it could impose any measurable > > performance degradation. > > > Actually, it could very well do - we'd have to test it. For the vector IO > paths, every additional cycle in the RX or TX paths causes a noticeable perf > drop. Well if your application is willing to know all different HW limitations then u may not need it. However usually application doesn't want to know the HW technical details. And it this case ignoring them may cause HW to hang. Of course, if your app always sends single fragment packets of less than 1500 bytes then u r right and u will most likely not hit any HW limitation, however what i have in mind is a full featured case where packets are bit more big and complicated and where a single branch per packet will change nothing. This is regarding 40 segments case. In regard to the RS bit - this is related to any packet and according to spec it should be set in every packet. > > /Bruce ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2015-09-13 16:01 UTC | newest] Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-13 18:06 [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 Vlad Zolotarov 2015-08-13 20:28 ` Zhang, Helin 2015-08-14 5:37 ` Vlad Zolotarov 2015-08-19 0:42 ` Lu, Wenzhuo 2015-08-19 4:55 ` Vladislav Zolotarov 2015-08-19 7:43 ` Ananyev, Konstantin 2015-08-19 10:02 ` Vlad Zolotarov 2015-08-20 8:41 ` Ananyev, Konstantin 2015-08-20 8:56 ` Vlad Zolotarov 2015-08-20 9:05 ` Vlad Zolotarov 2015-08-20 9:06 ` Vlad Zolotarov 2015-08-25 17:33 ` Ananyev, Konstantin 2015-08-25 17:39 ` Avi Kivity 2015-08-19 17:29 ` Zhang, Helin 2015-08-25 18:13 ` Zhang, Helin 2015-08-25 18:33 ` Vladislav Zolotarov 2015-08-25 18:43 ` Zhang, Helin 2015-08-25 18:52 ` Vlad Zolotarov 2015-08-25 19:16 ` Zhang, Helin 2015-08-25 19:23 ` Avi Kivity 2015-08-25 19:30 ` Vladislav Zolotarov 2015-08-25 20:07 ` Vlad Zolotarov 2015-08-25 20:13 ` Zhang, Helin 2015-09-09 12:18 ` Thomas Monjalon 2015-09-09 13:19 ` Ananyev, Konstantin 2015-09-11 15:17 ` Vladislav Zolotarov 2015-09-11 14:25 ` didier.pallard 2015-09-11 14:47 ` Avi Kivity 2015-09-11 14:55 ` Thomas Monjalon 2015-09-11 15:12 ` Vladislav Zolotarov 2015-09-11 15:43 ` Avi Kivity 2015-09-11 16:04 ` Vladislav Zolotarov 2015-09-11 16:07 ` Richardson, Bruce 2015-09-11 16:14 ` Vladislav Zolotarov 2015-09-11 17:44 ` Avi Kivity 2015-09-11 16:08 ` Thomas Monjalon 2015-09-11 16:18 ` Vladislav Zolotarov 2015-09-11 17:17 ` Matthew Hall 2015-09-11 17:42 ` Ananyev, Konstantin 2015-09-11 17:58 ` Matthew Hall 2015-09-11 17:48 ` Avi Kivity 2015-09-13 11:47 ` Ananyev, Konstantin 2015-09-13 12:24 ` Vlad Zolotarov 2015-09-13 12:32 ` Avi Kivity 2015-09-13 15:54 ` Ananyev, Konstantin 2015-09-13 16:01 ` Avi Kivity 2015-09-11 16:00 ` Richardson, Bruce 2015-09-11 16:13 ` Vladislav Zolotarov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).