From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f45.google.com (mail-qg0-f45.google.com [209.85.192.45]) by dpdk.org (Postfix) with ESMTP id 886155954 for ; Tue, 25 Aug 2015 21:30:21 +0200 (CEST) Received: by qgeb6 with SMTP id b6so112823325qge.3 for ; Tue, 25 Aug 2015 12:30:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=l6h3vUu2/Yrf/MoKsFByrTlLZy8bk7uyvQs1lDVK5rg=; b=i/Uz/K4aMWIhVfzkvUvH7QdUaYFrupnNxdKCOyG9DxyK06aknx1TsEFwpoGylsrDA6 7VK2ZGenSZwfUQeI46tY4+F7kr0wk8ST6CdnwvMw35JoLG+kLjDg4qnk9fDRA/8XiRvI zmC7Ym+z3dRI0shUSfUBv8mE7J0q1EccY5umbFyV7U0yCLwB4giDPTHsXF7CeMEfNI+B ZeJ+r65V4rR61KHmcD7moaE/Ys8n6JeA+s3KiNLqP5VYaBSVFQ7JsVii/fJtok2A6EP+ 2xu3W9Ebt96Tuyqt327mf6AQvk8q18TQTSYX9FJAdEUO69GyuxIeQktWa+gpfnHVD9yF 1X6g== X-Gm-Message-State: ALoCoQkNJfDQA50N7Dq8j5huGmKEsx+oRvj8R+UxjR0S9G1BxU2XUNyPo2+UzoQksBgmM/zjHnk4 MIME-Version: 1.0 X-Received: by 10.140.235.14 with SMTP id g14mr72392716qhc.5.1440531020555; Tue, 25 Aug 2015 12:30:20 -0700 (PDT) Received: by 10.55.114.5 with HTTP; Tue, 25 Aug 2015 12:30:20 -0700 (PDT) Received: by 10.55.114.5 with HTTP; Tue, 25 Aug 2015 12:30:20 -0700 (PDT) In-Reply-To: References: <1439489195-31553-1-git-send-email-vladz@cloudius-systems.com> <55CD7EA5.6060100@cloudius-systems.com> <6A0DE07E22DDAD4C9103DF62FEBC0909D3E116@shsmsx102.ccr.corp.intel.com> <55DCB975.2030000@cloudius-systems.com> Date: Tue, 25 Aug 2015 22:30:20 +0300 Message-ID: From: Vladislav Zolotarov To: Helin Zhang Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2015 19:30:22 -0000 On Aug 25, 2015 22:16, "Zhang, Helin" wrote: > > > > > -----Original Message----- > > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > Sent: Tuesday, August 25, 2015 11:53 AM > > To: Zhang, Helin > > Cc: Lu, Wenzhuo; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for > > all NICs but 82598 > > > > > > > > On 08/25/15 21:43, Zhang, Helin wrote: > > > > > > Hi Vlad > > > > > > I think this could possibly be the root cause of your TX hang issue. > > > Please try to limit the number to 8 or less, and then see if the issu= e > > > will still be there or not? > > > > > > > Helin, the issue has been seen on x540 devices. Pls., see a chapter > > 7.2.1.1 of x540 devices spec: > > > > A packet (or multiple packets in transmit segmentation) can span any number of > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH > > details). For best performance it is recommended to minimize the number of > > buffers as possible. > > > > Could u, pls., clarify why do u think that the maximum number of data buffers is > > limited by 8? > OK, i40e hardware is 8 For i40 it's a bit more complicated than just "not more than 8" - it's not more than 8 for a non-TSO packet and not more than 8 for each MSS including headers buffers for TSO. But this thread is not about i40e so this doesn't seem to be relevant anyway. , so I'd assume x540 could have a similar one. x540 spec assumes otherwise... =F0=9F=98=89 Yes, in your case, > the limit could be around 38, right? If by "around 38" u mean "exactly 38" then u are absolutely right... =F0=9F= =98=89 > Could you help to make sure there is no packet to be transmitted uses more than > 38 descriptors? Just like i've already mentioned, we limit the cluster by at most 33 data segments. Therefore we are good here... > I heard that there is a similar hang issue on X710 if using more than 8 descriptors for > a single packet. I am wondering if the issue is similar on x540. What's x710? If that's xl710 40G nics (i40e driver), then it has its own specs with its own HW limitations i've mentioned above. It has nothing to do with this thread that is all about 10G nics managed by ixgbe driver. There is a different thread, where i've raised the 40G NICs xmit issues. See "i40e xmit path HW limitation" thread. > > Regards, > Helin > > > > > thanks, > > vlad > > > > > It does not have any check for the number of descriptors to be used > > > for a single packet, and it relies on the users to give correct mbuf > > > chains. > > > > > > We may need a check of this somewhere. Of cause the point you > > > indicated we also need to carefully investigate or fix. > > > > > > Regards, > > > > > > Helin > > > > > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com] > > > *Sent:* Tuesday, August 25, 2015 11:34 AM > > > *To:* Zhang, Helin > > > *Cc:* Lu, Wenzhuo; dev@dpdk.org > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > On Aug 25, 2015 21:14, "Zhang, Helin" > > > wrote: > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > In addition, I=E2=80=99d double check with you what=E2=80=99s the m= aximum number of > > > descriptors would be used for a single packet transmitting? > > > > > > > > Datasheet said that it supports up to 8. I am wondering if more tha= n > > > 8 were used in your case? > > > > > > If memory serves me well the maximum number of data descriptors per > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK > > > is always zero it gives us 38 segments. We limit them by 33. > > > > > > > > > > > Thank you very much! > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Zhang, Helin > > > > Sent: Wednesday, August 19, 2015 10:29 AM > > > > To: Vladislav Zolotarov > > > > Cc: Lu, Wenzhuo; dev@dpdk.org > > > > > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > Hi Vlad > > > > > > > > > > > > > > > > Thank you very much for the patches! Give me a few more time to > > > double check with more guys, and possibly hardware experts. > > > > > > > > > > > > > > > > Regards, > > > > > > > > Helin > > > > > > > > > > > > > > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com > > > ] > > > > Sent: Tuesday, August 18, 2015 9:56 PM > > > > To: Lu, Wenzhuo > > > > Cc: dev@dpdk.org ; Zhang, Helin > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh > > > above 1 for all NICs but 82598 > > > > > > > > > > > > > > > > > > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" > > > wrote: > > > > > > > > > > Hi Helin, > > > > > > > > > > > -----Original Message----- > > > > > > From: dev [mailto:dev-bounces@dpdk.org > > > ] On Behalf Of Vlad Zolotarov > > > > > > Sent: Friday, August 14, 2015 1:38 PM > > > > > > To: Zhang, Helin; dev@dpdk.org > > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid > > > tx_rs_thresh above 1 for > > > > > > all NICs but 82598 > > > > > > > > > > > > > > > > > > > > > > > > On 08/13/15 23:28, Zhang, Helin wrote: > > > > > > > Hi Vlad > > > > > > > > > > > > > > I don't think the changes are needed. It says in datasheet > > > that the RS > > > > > > > bit should be set on the last descriptor of every packet, ONL= Y > > > WHEN > > > > > > TXDCTL.WTHRESH equals to ZERO. > > > > > > > > > > > > Of course it's needed! See below. > > > > > > Exactly the same spec a few lines above the place u've just > > > quoted states: > > > > > > > > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is > > > greater than > > > > > > zero." > > > > > > > > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS > > > bit notation ixgbe PMD > > > > > > is actually not supporting any value of WTHRESH different from zero. > > > > > I think Vlad is right. We need to fix this issue. Any suggestion? > > > If not, I'd like to ack this patch. > > > > > > > > Pls., note that there is a v2 of this patch on the list. I forgot t= o > > > patch ixgbevf_dev_info_get() in v1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > > Helin > > > > > > > > > > > > > >> -----Original Message----- > > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com > > > ] > > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM > > > > > > >> To: dev@dpdk.org > > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; > > > avi@cloudius-systems.com ; Vlad > > > > > > >> Zolotarov > > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thres= h > > > above 1 for > > > > > > all > > > > > > >> NICs but 82598 > > > > > > >> > > > > > > >> According to 82599 and x540 HW specifications RS bit *must* > > > be set in the > > > > > > last > > > > > > >> descriptor of *every* packet. > > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero. > > > > > > > > > > > > Right and ixgbe PMD requires this condition to be fulfilled in > > > order to > > > > > > function. See above. > > > > > > > > > > > > > > > > > > > >> This patch fixes the Tx hang we were constantly hitting with > > > a seastar-based > > > > > > >> application on x540 NIC. > > > > > > > Could you help to share with us how to reproduce the tx hang > > > issue, with using > > > > > > > typical DPDK examples? > > > > > > > > > > > > Sorry. I'm not very familiar with the typical DPDK examples to > > > help u > > > > > > here. However this is quite irrelevant since without this this > > > > > > patch ixgbe PMD obviously abuses the HW spec as has been explained > > above. > > > > > > > > > > > > We saw the issue when u stressed the xmit path with a lot of > > > > > > highly fragmented TCP frames (packets with up to 33 fragments > > > > > > with > > > non-headers > > > > > > fragments as small as 4 bytes) with all offload features enabled. > > > > > > > > > > > > Thanks, > > > > > > vlad > > > > > > > > > > > > > >> Signed-off-by: Vlad Zolotarov > > > > > > > > > >> --- > > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ > > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- > > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-) > > > > > > >> > > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> index b8ee1e9..6714fd9 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev > > > *dev, > > > > > > struct > > > > > > >> rte_eth_dev_info *dev_info) > > > > > > >> .txq_flags =3D ETH_TXQ_FLAGS_NOMULTSEGS | > > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS, > > > > > > >> }; > > > > > > >> + > > > > > > >> + /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB) > > > > > > >> + dev_info->default_txconf.tx_rs_thresh =3D 1; > > > > > > >> + > > > > > > >> dev_info->hash_key_size =3D IXGBE_HKEY_MAX_INDEX * > > > sizeof(uint32_t); > > > > > > >> dev_info->reta_size =3D ETH_RSS_RETA_SIZE_128; > > > > > > >> dev_info->flow_type_rss_offloads =3D IXGBE_RSS_OFFLOAD_ALL; > > > > > > >> diff -- > > > > > > git > > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c > > > b/drivers/net/ixgbe/ixgbe_rxtx.c index > > > > > > >> 91023b9..8dbdffc 100644 > > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > >> *dev, > > > > > > >> struct ixgbe_tx_queue *txq; > > > > > > >> struct ixgbe_hw *hw; > > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh; > > > > > > >> + bool rs_deferring_allowed; > > > > > > >> > > > > > > >> PMD_INIT_FUNC_TRACE(); > > > > > > >> hw =3D IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > > > > > > >> > > > > > > >> /* > > > > > > >> + * According to 82599 and x540 specifications RS bit > > > *must* be set on > > > > > > the > > > > > > >> + * last descriptor of *every* packet. Therefore we will > > > not allow the > > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598. > > > > > > >> + */ > > > > > > >> + rs_deferring_allowed =3D (hw->mac.type <=3D > > > > > > >> + ixgbe_mac_82598EB); > > > > > > >> + > > > > > > >> + /* > > > > > > >> * Validate number of transmit descriptors. > > > > > > >> * It must not exceed hardware maximum, and must be multiple > > > > > > >> * of IXGBE_ALIGN. > > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct > > > > > > >> rte_eth_dev > > > > > > *dev, > > > > > > >> * to transmit a packet is greater than the number of free TX > > > > > > >> * descriptors. > > > > > > >> * The following constraints must be satisfied: > > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which R= S > > > deferring is > > > > > > >> + * forbidden (all but 82598). > > > > > > >> * tx_rs_thresh must be greater than 0. > > > > > > >> * tx_rs_thresh must be less than the size of the ring > > > minus 2. > > > > > > >> * tx_rs_thresh must be less than or equal to tx_free_thresh. > > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct > > > rte_eth_dev > > > > > > *dev, > > > > > > >> * When set to zero use default values. > > > > > > >> */ > > > > > > >> tx_rs_thresh =3D (uint16_t)((tx_conf->tx_rs_thresh) ? > > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); > > > > > > >> + tx_conf->tx_rs_thresh : > > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1)); > > > > > > >> tx_free_thresh =3D (uint16_t)((tx_conf->tx_free_thresh) ? > > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); > > > > > > >> + > > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { > > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > 2 since RS > > > > > > " > > > > > > >> + "must be set for every packet > > > for this HW. " > > > > > > >> + "(tx_rs_thresh=3D%u port=3D%d queue=3D%d)", > > > > > > >> + (unsigned int)tx_rs_thresh, > > > > > > >> + (int)dev->data->port_id, (int)queue_idx); > > > > > > >> + return -(EINVAL); > > > > > > >> + } > > > > > > >> + > > > > > > >> if (tx_rs_thresh >=3D (nb_desc - 2)) { > > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than > > > > > > >> the > > > > > > number " > > > > > > >> "of TX descriptors minus 2. > > > (tx_rs_thresh=3D%u " > > > > > > >> -- > > > > > > >> 2.1.0 > > > > > > > > >