From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 14824A0C43; Mon, 22 Nov 2021 11:23:26 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 89DB6410E4; Mon, 22 Nov 2021 11:23:25 +0100 (CET) Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by mails.dpdk.org (Postfix) with ESMTP id 336E640395; Mon, 22 Nov 2021 11:23:24 +0100 (CET) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id A54F25C01A2; Mon, 22 Nov 2021 05:23:22 -0500 (EST) Received: from imap48 ([10.202.2.98]) by compute3.internal (MEProxy); Mon, 22 Nov 2021 05:23:22 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=u256.net; h= mime-version:message-id:in-reply-to:references:date:from:to:cc :subject:content-type; s=fm3; bh=jbxbe+fG2yZ5tczUSMCbkSbn8xLU65L 0qVBZc/W3rdY=; b=pana8aOo4dXFc0/Y3GnxDfIABd3wwC4OK/i8dAPn6HMlsyI sseKeP+10yotHbMEEk6rP7O86Usi4EbjXmFk8BxlN2DZ4wC3sUKKUdeSQ4+k71ED 4PWp6Hk/87HIJ61jUOl/6aBFZEbprLdGdb9neKcF1g0iCUxvb98hKiHwNR0OBb6N hJJe7AU9ERH3SvyNTlH/abx2XkZoWznpmvybpbSPz1Z6O3sHwZ6uAGYCJvmJqkM9 EVNFL4cn+moTxxgQvWYb2Yo7xNIneZ+Q6OTmJgLIJi0+VToK6Yxx2pTB0v4MFwLh 32+LTtlNTbWkiVYq8ddEL6/SPhMDfnbhK2v3MfQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=jbxbe+ fG2yZ5tczUSMCbkSbn8xLU65L0qVBZc/W3rdY=; b=U3xwE2nAJlaWw/QtsPUAyd HLEzgghrKsdgcB3A2YHioyWHH0oCNwGpJAJsBrySnv3cDVmv4bZPidLfurTnLq/o JvelxHk6JcFcfEv1vAVgA3q11KNLeESIwyEd2ykR41h/j1c2Or7RvfKfVjGbXUU5 k9yZ7LSeYlI2M9Oq5RItH6irPcS5aBK5gNhIiF0VWyvdvvuUOtDwlDM0Ld4u4IKb jvebHU4erkebIKZYeC0Wb3itzD9oG2qgvH3LguJdOP8UaORCv9migX6xUIHJmU0X KLngGsbR8ER3CpJgZUnBqNShGZ4EEEGkbUeoW4ESmdTh1K6IOrm7q5HQI6kXTNww == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrgeeggdduhecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepofgfggfkjghffffhvffutgesthdtre dtreerjeenucfhrhhomhepifgrtohtrghnpgftihhvvghtuceoghhrihhvvgesuhdvheei rdhnvghtqeenucggtffrrghtthgvrhhnpeevfeehffektdduhfevtdekueetfeekudeiie ethfduffeluddvgeetffdvueelgfenucffohhmrghinhepphhmugdrnhgvthenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehgrhhivhgvsehuvd ehiedrnhgvth X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 65D6321E006E; Mon, 22 Nov 2021 05:23:22 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-1371-g2296cc3491-fm-20211109.003-g2296cc34 Mime-Version: 1.0 Message-Id: <87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com> In-Reply-To: <20211021214215.1633-1-vipul.ashri@oracle.com> References: <20211021115139.2634-1-vipul.ashri@oracle.com> <20211021214215.1633-1-vipul.ashri@oracle.com> Date: Mon, 22 Nov 2021 11:23:01 +0100 From: =?UTF-8?Q?Ga=C3=ABtan_Rivet?= To: vipul.ashri@oracle.com, dev@dpdk.org Cc: stable@dpdk.org Subject: Re: [PATCH v2] net/failsafe: link_update request crashing at boot Content-Type: text/plain X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, Oct 21, 2021, at 23:42, vipul.ashri@oracle.com wrote: > From: Vipul Ashri > > failsafe crashed while sending early link_update request during > boot time initialization. > Based on debugging we found failsafe device was good but sub- > devices were progressing towards initialization and SUBOPS macro > where expanding macro gives [partial_dev]->dev_ops->link_update() > execution of which triggered crash because dev_ops==0. similar > crash seen at failsafe_eth_dev_close() > > Failsafe driver need a separate check for subdevices similar to > "RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);" which is > called to almost every eth_dev function. > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD") > Cc: stable@dpdk.org > Signed-off-by: Vipul Ashri Hello Vipul, I'm sorry for the delay, I missed your fix on the mailing list. IIUC, the issue is that failsafe finished init and received an ethdev operation call, but one of its sub-device, although marked DEV_ACTIVE, has its eth_dev->dev_ops field NULL. It is really surprising to me, because there aren't many ways for a sub-device to become DEV_ACTIVE. The only two ways are * by executing 'fs_dev_configure()', which will first execute rte_eth_dev_configure() on the sub-device, and on error would stop *without* setting DEV_ACTIVE. rte_eth_dev_configure() will itself execute RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV), so it would return negative errno and fs_dev_configure() would abort. * by executing 'fs_dev_remove()' and the sub-device was 'DEV_STARTED' to begin with, then it is retrograded to DEV_ACTIVE once stopped. So I don't understand yet how it is possible for a sub-device to become DEV_ACTIVE while its eth_dev->dev_ops are NULL. It seems more like a bug, memory corruption or just an unexpected execution pattern. Could describe in more detail the execution? In particular, setting the EAL log-level to debug with the option: ' --log-level pmd.net.failsafe:debug ' for example while using testpmd or your DPDK app. It should show ethdev level accesses to the sub-devices, and error values. Best regards, -- Gaetan Rivet