From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by dpdk.org (Postfix) with ESMTP id 6937B8E92 for ; Wed, 19 Aug 2015 12:02:40 +0200 (CEST) Received: by wicja10 with SMTP id ja10so3266253wic.1 for ; Wed, 19 Aug 2015 03:02:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=gXEpxwUw7mZnVPCfJzL9Zn1kKtq2jmBGSWmP95g6ao4=; b=lBZzalvZTembQzO5Y7an2BV3u46WLxEM7K5ZK46B8Hn5iVutwZ6Igm0nEsX1GiItPf vdPW6Jg5lSMqk5nqJ2T6z3Sr5VniGzRIcLojWV/M5ldl35RZiHrUxjULk93i75T+pZdv CWQz8LcmQcVEFbFs+BuY+K5ZoBCfs3yeQpV7ewJePCkxH8zmUPJfpE0ac5rkpPilWYGh 4o+SqvuP4To2nUnKTBTR70e3Tuirem+v5HxPykN1B1E0qZAsQIAs1pJO5nTvji+YUmG5 Qtudv4M/YXHq2JG6iRHRMORzWR7x5ME2J7zTTMozUf8/BItywVoXLKMlIuO0SeoB3Mju tp6Q== X-Gm-Message-State: ALoCoQkfR6sGhqAdDW45ken3l9XFsiObKopvDu9VJHGOHhec4Yv1g0+AlLQ3iAY6NNSbdmQJc/eJ X-Received: by 10.180.90.198 with SMTP id by6mr1614344wib.82.1439978560205; Wed, 19 Aug 2015 03:02:40 -0700 (PDT) Received: from [10.0.0.4] (bzq-109-64-112-195.red.bezeqint.net. [109.64.112.195]) by smtp.googlemail.com with ESMTPSA id df1sm25256351wib.12.2015.08.19.03.02.38 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 19 Aug 2015 03:02:39 -0700 (PDT) To: "Ananyev, Konstantin" , "Lu, Wenzhuo" References: <1439489195-31553-1-git-send-email-vladz@cloudius-systems.com> <55CD7EA5.6060100@cloudius-systems.com> <6A0DE07E22DDAD4C9103DF62FEBC0909D3E116@shsmsx102.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB97725836A76983@irsmsx105.ger.corp.intel.com> From: Vlad Zolotarov Message-ID: <55D4543E.9000808@cloudius-systems.com> Date: Wed, 19 Aug 2015 13:02:38 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB97725836A76983@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Aug 2015 10:02:40 -0000 On 08/19/15 10:43, Ananyev, Konstantin wrote: > Hi Vlad, > Sorry for delay with review, I am OOO till next week. > Meanwhile, few questions/comments from me. Hi, Konstantin, long time no see... ;) > >>>>>> This patch fixes the Tx hang we were constantly hitting with a >> seastar-based >>>>>> application on x540 NIC. >>>>> Could you help to share with us how to reproduce the tx hang issue, >> with using >>>>> typical DPDK examples? >>>> Sorry. I'm not very familiar with the typical DPDK examples to help u >>>> here. However this is quite irrelevant since without this this patch >>>> ixgbe PMD obviously abuses the HW spec as has been explained above. >>>> >>>> We saw the issue when u stressed the xmit path with a lot of highly >>>> fragmented TCP frames (packets with up to 33 fragments with non-headers >>>> fragments as small as 4 bytes) with all offload features enabled. > Could you provide us with the pcap file to reproduce the issue? Well, the thing is it takes some time to reproduce it (a few minutes of heavy load) therefore a pcap would be quite large. > My concern with you approach is that it would affect TX performance. It certainly will ;) But it seem inevitable. See below. > Right now, for simple TX PMD usually reads only (nb_tx_desc/tx_rs_thresh) TXDs, > While with your patch (if I understand it correctly) it has to read all TXDs in the HW TX ring. If by "simple" u refer an always single fragment per Tx packet - then u are absolutely correct. My initial patch was to only set RS on every EOP descriptor without changing the rs_thresh value and this patch worked. However HW spec doesn't ensure in a general case that packets are always handled/completion write-back completes in the same order the packets are placed on the ring (see "Tx arbitration schemes" chapter in 82599 spec for instance). Therefore AFAIU one should not assume that if packet[x+1] DD bit is set then packet[x] is completed too. That's why I changed the patch to be as u see it now. However if I miss something here and your HW people ensure the in-order completion this of course may be changed back. > Even if we really need to setup RS bit in each TXD (I still doubt we really do) - , Well, if u doubt u may ask the guys from the Intel networking division that wrote the 82599 and x540 HW specs where they clearly state that. ;) > I think inside PMD it still should be possible to check TX completion in chunks. > Konstantin > > >>>> Thanks, >>>> vlad >>>>>> Signed-off-by: Vlad Zolotarov >>>>>> --- >>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 9 +++++++++ >>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 23 ++++++++++++++++++++++- >>>>>> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> index b8ee1e9..6714fd9 100644 >>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c >>>>>> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, >>>> struct >>>>>> rte_eth_dev_info *dev_info) >>>>>> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | >>>>>> ETH_TXQ_FLAGS_NOOFFLOADS, >>>>>> }; >>>>>> + >>>>>> + /* >>>>>> + * According to 82599 and x540 specifications RS bit *must* be >> set on >>>> the >>>>>> + * last descriptor of *every* packet. Therefore we will not allow >> the >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>> + */ >>>>>> + if (hw->mac.type > ixgbe_mac_82598EB) >>>>>> + dev_info->default_txconf.tx_rs_thresh = 1; >>>>>> + >>>>>> dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX * sizeof(uint32_t); >>>>>> dev_info->reta_size = ETH_RSS_RETA_SIZE_128; >>>>>> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL; diff -- >>>> git >>>>>> a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c >> index >>>>>> 91023b9..8dbdffc 100644 >>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>>>> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>>>> *dev, >>>>>> struct ixgbe_tx_queue *txq; >>>>>> struct ixgbe_hw *hw; >>>>>> uint16_t tx_rs_thresh, tx_free_thresh; >>>>>> + bool rs_deferring_allowed; >>>>>> >>>>>> PMD_INIT_FUNC_TRACE(); >>>>>> hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); >>>>>> >>>>>> /* >>>>>> + * According to 82599 and x540 specifications RS bit *must* be >> set on >>>> the >>>>>> + * last descriptor of *every* packet. Therefore we will not allow >> the >>>>>> + * tx_rs_thresh above 1 for all NICs newer than 82598. >>>>>> + */ >>>>>> + rs_deferring_allowed = (hw->mac.type <= ixgbe_mac_82598EB); >>>>>> + >>>>>> + /* >>>>>> * Validate number of transmit descriptors. >>>>>> * It must not exceed hardware maximum, and must be multiple >>>>>> * of IXGBE_ALIGN. >>>>>> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>> *dev, >>>>>> * to transmit a packet is greater than the number of free TX >>>>>> * descriptors. >>>>>> * The following constraints must be satisfied: >>>>>> + * tx_rs_thresh must be less than 2 for NICs for which RS >> deferring is >>>>>> + * forbidden (all but 82598). >>>>>> * tx_rs_thresh must be greater than 0. >>>>>> * tx_rs_thresh must be less than the size of the ring minus 2. >>>>>> * tx_rs_thresh must be less than or equal to tx_free_thresh. >>>>>> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev >>>> *dev, >>>>>> * When set to zero use default values. >>>>>> */ >>>>>> tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ? >>>>>> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH); >>>>>> + tx_conf->tx_rs_thresh : >>>>>> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : >> 1)); >>>>>> tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ? >>>>>> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); >>>>>> + >>>>>> + if (!rs_deferring_allowed && tx_rs_thresh > 1) { >>>>>> + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than 2 since >> RS >>>> " >>>>>> + "must be set for every packet for this >> HW. " >>>>>> + "(tx_rs_thresh=%u port=%d queue=%d)", >>>>>> + (unsigned int)tx_rs_thresh, >>>>>> + (int)dev->data->port_id, (int)queue_idx); >>>>>> + return -(EINVAL); >>>>>> + } >>>>>> + >>>>>> if (tx_rs_thresh >= (nb_desc - 2)) { >>>>>> PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the >>>> number " >>>>>> "of TX descriptors minus 2. (tx_rs_thresh=%u >> " >>>>>> -- >>>>>> 2.1.0