From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9C8F7A0540; Mon, 30 May 2022 16:02:38 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 50DEC40694; Mon, 30 May 2022 16:02:38 +0200 (CEST) Received: from smtp-gw.pt.net (smtp-gw.pt.net [206.210.194.15]) by mails.dpdk.org (Postfix) with ESMTP id ACF2D400D6 for ; Mon, 30 May 2022 16:02:37 +0200 (CEST) X-ASG-Debug-ID: 1653919355-09411a0f72a15270001-TfluYd Received: from mail.pt.net (mail.pt.net [206.210.194.11]) by smtp-gw.pt.net with ESMTP id BnKfITMKnsihM9Tq (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 30 May 2022 09:02:36 -0500 (CDT) X-Barracuda-Envelope-From: lew@donzis.com X-Barracuda-Effective-Source-IP: mail.pt.net[206.210.194.11] X-Barracuda-Apparent-Source-IP: 206.210.194.11 Received: from localhost (localhost [IPv6:::1]) by mail.pt.net (Postfix) with ESMTP id D8D1F1176C82; Mon, 30 May 2022 09:02:35 -0500 (CDT) Received: from mail.pt.net ([IPv6:::1]) by localhost (mail.pt.net [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id IJOadSvaZnrW; Mon, 30 May 2022 09:02:35 -0500 (CDT) Received: from localhost (localhost [IPv6:::1]) by mail.pt.net (Postfix) with ESMTP id A05D71176C81; Mon, 30 May 2022 09:02:35 -0500 (CDT) X-Virus-Scanned: amavisd-new at pt.net Received: from mail.pt.net ([IPv6:::1]) by localhost (mail.pt.net [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id mG_6hXJqUHug; Mon, 30 May 2022 09:02:35 -0500 (CDT) Received: from mail.pt.net (mail.pt.net [206.210.194.11]) by mail.pt.net (Postfix) with ESMTP id 7C0241176C50; Mon, 30 May 2022 09:02:35 -0500 (CDT) Date: Mon, 30 May 2022 09:02:35 -0500 (CDT) From: Lewis Donzis To: dev Cc: qiming.yang@intel.com, wenjun1.wu@intel.com, anatoly.burakov@intel.com Message-ID: <633959622.4587761.1653919355365.JavaMail.zimbra@donzis.com> Subject: Hang in ixgbe_dev_link_update_share() MIME-Version: 1.0 X-ASG-Orig-Subj: Hang in ixgbe_dev_link_update_share() Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Originating-IP: [206.210.194.11] X-Mailer: Zimbra 8.8.15_GA_4272 (ZimbraWebClient - GC101 (Mac)/8.8.15_GA_4257) Thread-Index: 1rcryacYKW/frc1aumgv+uaFiSMmlQ== Thread-Topic: Hang in ixgbe_dev_link_update_share() X-Barracuda-Connect: mail.pt.net[206.210.194.11] X-Barracuda-Start-Time: 1653919355 X-Barracuda-Encrypted: TLS_AES_256_GCM_SHA384 X-Barracuda-URL: https://smtp-gw.pt.net:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at pt.net X-Barracuda-Scan-Msg-Size: 1038 X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.00 X-Barracuda-Spam-Status: No, SCORE=0.00 using global scores of TAG_LEVEL=1000.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=9.0 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.98365 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Using DPDK 21.11.1 on FreeBSD 13.1, calling rte_eth_link_get_nowait() appears to hang waiting for the link to come up on an XL710 or 82599 based NIC. This call eventually makes its way to ixgbe_dev_link_update_share() with wait_to_complete set to false. Inside that function, there is this code: /* BSD has no interrupt mechanism, so force NIC status synchronization. */ #ifdef RTE_EXEC_ENV_FREEBSD wait = 1; #endif This then calls ixgbe_check_link() with wait == true, which then calls ixgbe_check_mac_link_generic(), but the parameter is now called "link_up_wait_to_complete", and it loops forever waiting for the link to be up, with a 100ms delay between polls. Perhaps our understanding is incorrect, but we're using rte_eth_link_get_nowait() because we can't tolerate any significant delay in the function call, but we certainly don't want to wait for the link to be up. We're just trying to find out if it's up or down. Empirically, removing the "wait = 1" restores normal operation. Thanks, lew