From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2BEDA45910; Thu, 5 Sep 2024 18:07:28 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A9DB342E5F; Thu, 5 Sep 2024 18:07:27 +0200 (CEST) Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by mails.dpdk.org (Postfix) with ESMTP id 1366642DFA for ; Thu, 5 Sep 2024 18:07:26 +0200 (CEST) Received: by mail-pj1-f44.google.com with SMTP id 98e67ed59e1d1-2d87a0bfaa7so716666a91.2 for ; Thu, 05 Sep 2024 09:07:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; t=1725552445; x=1726157245; darn=dpdk.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=1B+iqkQFoSl8opVKERRymk/x09aYAE3vmLT84NIgIkw=; b=PRrIZy1sWwmxHuLouM/QXhepkn7T2BkjVK9k43ajfSGO03CRAcwUNmlY5jQAjxzid9 x6sQ9uPM+xZFIT9eAMgs4MrFAziKEIZv90N+MJFSx/RFZzff4Qqp16eK7YRND8gap/F8 KXDNhv7ipYAu+UtJPRU7wdpJKqht+yi8ath+dr0TQegRVfnM27K4AvZVAZNMJftZ2wxZ T0OaUtpn2H7CI1aNyDcZ9cLCINEuUTgxETz1eB7/ZZa6O/RvthlIPLSTuza7SEeobqlx L+7c4ltOl+mGGqeWm4HcVAA8YRhATEzHT2SiJA9l6eSFnaStAGpl1UOviyuUa9HxqZ1I 4c+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725552445; x=1726157245; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=1B+iqkQFoSl8opVKERRymk/x09aYAE3vmLT84NIgIkw=; b=BE8wkf5zKIms0SMHsd0wix6bImUOhs0FQDkfh+zYszcoGnAcRI8qmsg+d6x3TQXwCm eSYcex/QULCQ1omqu3Yzt7aLEDMlAXG5V5Epv8mxtfJnD3PvrKzEO7cSvcyS8ovXYRMp pC9hEcUGAPaPBoJj0BKebFtiAbNDUj2Ha/m4JiwPToTtaxlstquL3ES1kYwh3HpVhqoh wsAIiiHz/d8Pe110WxOUxYQSvPhC+FTgcKnAe4w1p9KOOzbCJMFAhFMU0YN93lkzoqT4 /2SaJGOhCOK+TaxQFOiqlMXpfzJZt9rUBjYrP3g1AwtKmFA6OPB80TFs/NH+8/IWLmX8 f7tA== X-Gm-Message-State: AOJu0Yy5LX1ksNtQeuznXRCM/1U2G/rnVYOtg4thm/6j/pegJUBZw9sU 4upcZLJd2Y/2luK3yQnrhleVex3EXYyd846BRYlhwoffW7wKFgPbgJXDI41MI9JBd/28mHo/FX2 qxu5W39lhcXVd2CaV09IXq4elZZfZyWdcM8tAXd5TF6tZaaz5zeGH2w== X-Google-Smtp-Source: AGHT+IE/Yieem0u5ETWnZMUjr2LqHYO0Xe39B8DBg+voCfC9bnA9WXWy540qwjdgDyKXTy3NNJC7L3PJiJPWi8jwnYk= X-Received: by 2002:a17:90b:3506:b0:2da:5028:cfb9 with SMTP id 98e67ed59e1d1-2da5028d1e0mr12224035a91.4.1725552443358; Thu, 05 Sep 2024 09:07:23 -0700 (PDT) MIME-Version: 1.0 From: Edwin Brossette Date: Thu, 5 Sep 2024 18:07:12 +0200 Message-ID: Subject: net/failsafe: segfault happens on hotplug alarm. To: dev@dpdk.org Cc: Didier Pallard , Olivier Matz , Laurent Hardy , grive@u256.net Content-Type: multipart/alternative; boundary="0000000000006da3560621617c54" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000006da3560621617c54 Content-Type: text/plain; charset="UTF-8" Hello, I recently ran into an issue when using DPDK's failsafe pmd on a Microsoft Azure setup. On this setup, I have the failsafe pmd managing a netvsc interface with a Mellanox nic (which can be used through the hardware acceleration feature). A segfault is sometimes seen whenever I unplug the Mellanox device, which is a case that should be handled by the pmd. On a more recent DPDK version (I tested with stable v23.11.1), this segfault is systematic. This seems to happen because the function rte_eth_dev_release_port() is called twice when the hotplug_alarm triggers. You can see it in this bit of code here: https://git.dpdk.org/dpdk/tree/drivers/net/failsafe/failsafe_ether.c#n276 In the fs_dev_remove() function, the rte_eth_dev_close() calls run the rte_eth_dev_release_port() the first time, and it is then called a second time when handling the DEV_PROBED case. I noticed when searching into the mailing list that this problem was already seen once and a patch was suggested. Here is the link to the mail: https://mails.dpdk.org/archives/dev/2022-November/256898.html Applying this patch on my local branched fixed the issue on my end although I cannot attest for sure that it is the best possible fix. I could see a memory leak happening if the fs_dev_remove() function was called as sdev->state == DEV_PROBED, as the release would never be done. But it is still weird that if sdev->state == DEV_STARTED, we do rte_eth_dev_release_port() first then rte_dev_remove(), and if sdev->state == DEV_PROBED, we call them in the opposite order. Perhaps someone could look into it? Regards, Edwin Brossette. --0000000000006da3560621617c54 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello,

I recently ran into an issue when using DPDK= 's failsafe pmd on a Microsoft Azure setup. On this setup, I have the f= ailsafe pmd managing a netvsc interface with a Mellanox nic (which can be u= sed through the hardware acceleration feature). A segfault is sometimes see= n whenever I unplug the Mellanox device, which is a case that should be han= dled by the pmd. On a more recent DPDK version (I tested with stable v23.11= .1), this segfault is systematic.

This seems to happen because the f= unction rte_eth_dev_release_port() is called twice when the hotplug_alarm t= riggers. You can see it in this bit of code here:
https://git= .dpdk.org/dpdk/tree/drivers/net/failsafe/failsafe_ether.c#n276

I= n the fs_dev_remove() function, the rte_eth_dev_close() calls run the rte_e= th_dev_release_port() the first time, and it is then called a second time w= hen handling the DEV_PROBED case. I noticed when searching into the mailing= list that this problem was already seen once and a patch was suggested. He= re is the link to the mail:
https://mails.dpdk.org/archives/dev/2022-Nove= mber/256898.html

Applying this patch on my local branched fixed = the issue on my end although I cannot attest for sure that it is the best p= ossible fix. I could see a memory leak happening if the fs_dev_remove() fun= ction was called as sdev->state =3D=3D DEV_PROBED, as the release would = never be done. But it is still weird that if sdev->state =3D=3D DEV_STAR= TED, we do rte_eth_dev_release_port() first then rte_dev_remove(), and if s= dev->state =3D=3D DEV_PROBED, we call them in the opposite order.
Perhaps someone could look into it?

Regards,
Edwin Brossette.
--0000000000006da3560621617c54--