From: Vladislav Zolotarov <vladz@cloudius-systems.com>
To: Helin Zhang <helin.zhang@intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598
Date: Tue, 25 Aug 2015 22:30:20 +0300 [thread overview]
Message-ID: <CAOYyTHagAdJa-NzPyo0ZfrVm9F8tBd+Z9TfnR2tZ5V4DU8LKkg@mail.gmail.com> (raw)
In-Reply-To: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A8E66A4@SHSMSX104.ccr.corp.intel.com>
On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
> > Sent: Tuesday, August 25, 2015 11:53 AM
> > To: Zhang, Helin
> > Cc: Lu, Wenzhuo; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for
> > all NICs but 82598
> >
> >
> >
> > On 08/25/15 21:43, Zhang, Helin wrote:
> > >
> > > Hi Vlad
> > >
> > > I think this could possibly be the root cause of your TX hang issue.
> > > Please try to limit the number to 8 or less, and then see if the issue
> > > will still be there or not?
> > >
> >
> > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > 7.2.1.1 of x540 devices spec:
> >
> > A packet (or multiple packets in transmit segmentation) can span any
number of
> > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus
2 (see
> > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for
WTHRESH
> > details). For best performance it is recommended to minimize the number
of
> > buffers as possible.
> >
> > Could u, pls., clarify why do u think that the maximum number of data
buffers is
> > limited by 8?
> OK, i40e hardware is 8
For i40 it's a bit more complicated than just "not more than 8" - it's not
more than 8 for a non-TSO packet and not more than 8 for each MSS including
headers buffers for TSO. But this thread is not about i40e so this doesn't
seem to be relevant anyway.
, so I'd assume x540 could have a similar one.
x540 spec assumes otherwise... 😉
Yes, in your case,
> the limit could be around 38, right?
If by "around 38" u mean "exactly 38" then u are absolutely right... 😉
> Could you help to make sure there is no packet to be transmitted uses
more than
> 38 descriptors?
Just like i've already mentioned, we limit the cluster by at most 33 data
segments. Therefore we are good here...
> I heard that there is a similar hang issue on X710 if using more than 8
descriptors for
> a single packet. I am wondering if the issue is similar on x540.
What's x710? If that's xl710 40G nics (i40e driver), then it has its own
specs with its own HW limitations i've mentioned above. It has nothing to
do with this thread that is all about 10G nics managed by ixgbe driver.
There is a different thread, where i've raised the 40G NICs xmit issues.
See "i40e xmit path HW limitation" thread.
>
> Regards,
> Helin
>
> >
> > thanks,
> > vlad
> >
> > > It does not have any check for the number of descriptors to be used
> > > for a single packet, and it relies on the users to give correct mbuf
> > > chains.
> > >
> > > We may need a check of this somewhere. Of cause the point you
> > > indicated we also need to carefully investigate or fix.
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > > *From:*Vladislav Zolotarov [mailto:vladz@cloudius-systems.com]
> > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > *To:* Zhang, Helin
> > > *Cc:* Lu, Wenzhuo; dev@dpdk.org
> > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > >
> > >
> > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang@intel.com
> > > <mailto:helin.zhang@intel.com>> wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > In addition, I’d double check with you what’s the maximum number of
> > > descriptors would be used for a single packet transmitting?
> > > >
> > > > Datasheet said that it supports up to 8. I am wondering if more than
> > > 8 were used in your case?
> > >
> > > If memory serves me well the maximum number of data descriptors per
> > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > > is always zero it gives us 38 segments. We limit them by 33.
> > >
> > > >
> > > > Thank you very much!
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Zhang, Helin
> > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > To: Vladislav Zolotarov
> > > > Cc: Lu, Wenzhuo; dev@dpdk.org <mailto:dev@dpdk.org>
> > > >
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > Thank you very much for the patches! Give me a few more time to
> > > double check with more guys, and possibly hardware experts.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Vladislav Zolotarov [mailto:vladz@cloudius-systems.com
> > > <mailto:vladz@cloudius-systems.com>]
> > > > Sent: Tuesday, August 18, 2015 9:56 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev@dpdk.org <mailto:dev@dpdk.org>; Zhang, Helin
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > >
> > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu@intel.com
> > > <mailto:wenzhuo.lu@intel.com>> wrote:
> > > > >
> > > > > Hi Helin,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces@dpdk.org
> > > <mailto:dev-bounces@dpdk.org>] On Behalf Of Vlad Zolotarov
> > > > > > Sent: Friday, August 14, 2015 1:38 PM
> > > > > > To: Zhang, Helin; dev@dpdk.org <mailto:dev@dpdk.org>
> > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid
> > > tx_rs_thresh above 1 for
> > > > > > all NICs but 82598
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > > > > Hi Vlad
> > > > > > >
> > > > > > > I don't think the changes are needed. It says in datasheet
> > > that the RS
> > > > > > > bit should be set on the last descriptor of every packet, ONLY
> > > WHEN
> > > > > > TXDCTL.WTHRESH equals to ZERO.
> > > > > >
> > > > > > Of course it's needed! See below.
> > > > > > Exactly the same spec a few lines above the place u've just
> > > quoted states:
> > > > > >
> > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is
> > > greater than
> > > > > > zero."
> > > > > >
> > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS
> > > bit notation ixgbe PMD
> > > > > > is actually not supporting any value of WTHRESH different from
zero.
> > > > > I think Vlad is right. We need to fix this issue. Any suggestion?
> > > If not, I'd like to ack this patch.
> > > >
> > > > Pls., note that there is a v2 of this patch on the list. I forgot to
> > > patch ixgbevf_dev_info_get() in v1.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Helin
> > > > > > >
> > > > > > >> -----Original Message-----
> > > > > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com
> > > <mailto:vladz@cloudius-systems.com>]
> > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > > > > >> To: dev@dpdk.org <mailto:dev@dpdk.org>
> > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin;
> > > avi@cloudius-systems.com <mailto:avi@cloudius-systems.com>; Vlad
> > > > > > >> Zolotarov
> > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for
> > > > > > all
> > > > > > >> NICs but 82598
> > > > > > >>
> > > > > > >> According to 82599 and x540 HW specifications RS bit *must*
> > > be set in the
> > > > > > last
> > > > > > >> descriptor of *every* packet.
> > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > > > > >
> > > > > > Right and ixgbe PMD requires this condition to be fulfilled in
> > > order to
> > > > > > function. See above.
> > > > > >
> > > > > > >
> > > > > > >> This patch fixes the Tx hang we were constantly hitting with
> > > a seastar-based
> > > > > > >> application on x540 NIC.
> > > > > > > Could you help to share with us how to reproduce the tx hang
> > > issue, with using
> > > > > > > typical DPDK examples?
> > > > > >
> > > > > > Sorry. I'm not very familiar with the typical DPDK examples to
> > > help u
> > > > > > here. However this is quite irrelevant since without this this
> > > > > > patch ixgbe PMD obviously abuses the HW spec as has been
explained
> > above.
> > > > > >
> > > > > > We saw the issue when u stressed the xmit path with a lot of
> > > > > > highly fragmented TCP frames (packets with up to 33 fragments
> > > > > > with
> > > non-headers
> > > > > > fragments as small as 4 bytes) with all offload features
enabled.
> > > > > >
> > > > > > Thanks,
> > > > > > vlad
> > > > > > >
> > > > > > >> Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com
> > > <mailto:vladz@cloudius-systems.com>>
> > > > > > >> ---
> > > > > > >> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++
> > > > > > >> drivers/net/ixgbe/ixgbe_rxtx.c | 23
++++++++++++++++++++++-
> > > > > > >> 2 files changed, 31 insertions(+), 1 deletion(-)
> > > > > > >>
> > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> index b8ee1e9..6714fd9 100644
> > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev
> > > *dev,
> > > > > > struct
> > > > > > >> rte_eth_dev_info *dev_info)
> > > > > > >> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS,
> > > > > > >> };
> > > > > > >> +
> > > > > > >> + /*
> > > > > > >> + * According to 82599 and x540 specifications RS bit
> > > *must* be set on
> > > > > > the
> > > > > > >> + * last descriptor of *every* packet. Therefore we will
> > > not allow the
> > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > >> + */
> > > > > > >> + if (hw->mac.type > ixgbe_mac_82598EB)
> > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1;
> > > > > > >> +
> > > > > > >> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX *
> > > sizeof(uint32_t);
> > > > > > >> dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
> > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
> > > > > > >> diff --
> > > > > > git
> > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > b/drivers/net/ixgbe/ixgbe_rxtx.c index
> > > > > > >> 91023b9..8dbdffc 100644
> > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct
> > > rte_eth_dev
> > > > > > >> *dev,
> > > > > > >> struct ixgbe_tx_queue *txq;
> > > > > > >> struct ixgbe_hw *hw;
> > > > > > >> uint16_t tx_rs_thresh, tx_free_thresh;
> > > > > > >> + bool rs_deferring_allowed;
> > > > > > >>
> > > > > > >> PMD_INIT_FUNC_TRACE();
> > > > > > >> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > > > > >>
> > > > > > >> /*
> > > > > > >> + * According to 82599 and x540 specifications RS bit
> > > *must* be set on
> > > > > > the
> > > > > > >> + * last descriptor of *every* packet. Therefore we will
> > > not allow the
> > > > > > >> + * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > >> + */
> > > > > > >> + rs_deferring_allowed = (hw->mac.type <=
> > > > > > >> + ixgbe_mac_82598EB);
> > > > > > >> +
> > > > > > >> + /*
> > > > > > >> * Validate number of transmit descriptors.
> > > > > > >> * It must not exceed hardware maximum, and must be
multiple
> > > > > > >> * of IXGBE_ALIGN.
> > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct
> > > > > > >> rte_eth_dev
> > > > > > *dev,
> > > > > > >> * to transmit a packet is greater than the number of
free TX
> > > > > > >> * descriptors.
> > > > > > >> * The following constraints must be satisfied:
> > > > > > >> + * tx_rs_thresh must be less than 2 for NICs for which RS
> > > deferring is
> > > > > > >> + * forbidden (all but 82598).
> > > > > > >> * tx_rs_thresh must be greater than 0.
> > > > > > >> * tx_rs_thresh must be less than the size of the ring
> > > minus 2.
> > > > > > >> * tx_rs_thresh must be less than or equal to
tx_free_thresh.
> > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct
> > > rte_eth_dev
> > > > > > *dev,
> > > > > > >> * When set to zero use default values.
> > > > > > >> */
> > > > > > >> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ?
> > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > > > > > >> + tx_conf->tx_rs_thresh :
> > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1));
> > > > > > >> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > > > > > >> +
> > > > > > >> + if (!rs_deferring_allowed && tx_rs_thresh > 1) {
> > > > > > >> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > 2 since RS
> > > > > > "
> > > > > > >> + "must be set for every packet
> > > for this HW. "
> > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)",
> > > > > > >> + (unsigned int)tx_rs_thresh,
> > > > > > >> + (int)dev->data->port_id, (int)queue_idx);
> > > > > > >> + return -(EINVAL);
> > > > > > >> + }
> > > > > > >> +
> > > > > > >> if (tx_rs_thresh >= (nb_desc - 2)) {
> > > > > > >> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > > > > >> the
> > > > > > number "
> > > > > > >> "of TX descriptors minus 2.
> > > (tx_rs_thresh=%u "
> > > > > > >> --
> > > > > > >> 2.1.0
> > > > >
> > >
>
next prev parent reply other threads:[~2015-08-25 19:30 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 18:06 Vlad Zolotarov
2015-08-13 20:28 ` Zhang, Helin
2015-08-14 5:37 ` Vlad Zolotarov
2015-08-19 0:42 ` Lu, Wenzhuo
2015-08-19 4:55 ` Vladislav Zolotarov
2015-08-19 7:43 ` Ananyev, Konstantin
2015-08-19 10:02 ` Vlad Zolotarov
2015-08-20 8:41 ` Ananyev, Konstantin
2015-08-20 8:56 ` Vlad Zolotarov
2015-08-20 9:05 ` Vlad Zolotarov
2015-08-20 9:06 ` Vlad Zolotarov
2015-08-25 17:33 ` Ananyev, Konstantin
2015-08-25 17:39 ` Avi Kivity
2015-08-19 17:29 ` Zhang, Helin
2015-08-25 18:13 ` Zhang, Helin
2015-08-25 18:33 ` Vladislav Zolotarov
2015-08-25 18:43 ` Zhang, Helin
2015-08-25 18:52 ` Vlad Zolotarov
2015-08-25 19:16 ` Zhang, Helin
2015-08-25 19:23 ` Avi Kivity
2015-08-25 19:30 ` Vladislav Zolotarov [this message]
2015-08-25 20:07 ` Vlad Zolotarov
2015-08-25 20:13 ` Zhang, Helin
2015-09-09 12:18 ` Thomas Monjalon
2015-09-09 13:19 ` Ananyev, Konstantin
2015-09-11 15:17 ` Vladislav Zolotarov
2015-09-11 14:25 ` didier.pallard
2015-09-11 14:47 ` Avi Kivity
2015-09-11 14:55 ` Thomas Monjalon
2015-09-11 15:12 ` Vladislav Zolotarov
2015-09-11 15:43 ` Avi Kivity
2015-09-11 16:04 ` Vladislav Zolotarov
2015-09-11 16:07 ` Richardson, Bruce
2015-09-11 16:14 ` Vladislav Zolotarov
2015-09-11 17:44 ` Avi Kivity
2015-09-11 16:08 ` Thomas Monjalon
2015-09-11 16:18 ` Vladislav Zolotarov
2015-09-11 17:17 ` Matthew Hall
2015-09-11 17:42 ` Ananyev, Konstantin
2015-09-11 17:58 ` Matthew Hall
2015-09-11 17:48 ` Avi Kivity
2015-09-13 11:47 ` Ananyev, Konstantin
2015-09-13 12:24 ` Vlad Zolotarov
2015-09-13 12:32 ` Avi Kivity
2015-09-13 15:54 ` Ananyev, Konstantin
2015-09-13 16:01 ` Avi Kivity
2015-09-11 16:00 ` Richardson, Bruce
2015-09-11 16:13 ` Vladislav Zolotarov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOYyTHagAdJa-NzPyo0ZfrVm9F8tBd+Z9TfnR2tZ5V4DU8LKkg@mail.gmail.com \
--to=vladz@cloudius-systems.com \
--cc=dev@dpdk.org \
--cc=helin.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).