DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Gaëtan Rivet" <grive@u256.net>
To: vipul.ashri@oracle.com, dev@dpdk.org
Cc: stable@dpdk.org
Subject: Re: [PATCH v2] net/failsafe: link_update request crashing at boot
Date: Mon, 22 Nov 2021 11:23:01 +0100	[thread overview]
Message-ID: <87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com> (raw)
In-Reply-To: <20211021214215.1633-1-vipul.ashri@oracle.com>

On Thu, Oct 21, 2021, at 23:42, vipul.ashri@oracle.com wrote:
> From: Vipul Ashri <vipul.ashri@oracle.com>
>
> failsafe crashed while sending early link_update request during
> boot time initialization.
> Based on debugging we found failsafe device was good but sub-
> devices were progressing towards initialization and SUBOPS macro
> where expanding macro gives [partial_dev]->dev_ops->link_update()
> execution of which triggered crash because dev_ops==0. similar
> crash seen at failsafe_eth_dev_close()
>
> Failsafe driver need a separate check for subdevices similar to
> "RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);" which is
> called to almost every eth_dev function.
>
> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")
> Cc: stable@dpdk.org
> Signed-off-by: Vipul Ashri <vipul.ashri@oracle.com>

Hello Vipul,

I'm sorry for the delay, I missed your fix on the mailing list.

IIUC, the issue is that failsafe finished init and received an ethdev
operation call, but one of its sub-device, although marked DEV_ACTIVE,
has its eth_dev->dev_ops field NULL.

It is really surprising to me, because there aren't many ways for a sub-device
to become DEV_ACTIVE.

The only two ways are

  * by executing 'fs_dev_configure()', which will first execute
    rte_eth_dev_configure() on the sub-device, and on error would
    stop *without* setting DEV_ACTIVE.
    rte_eth_dev_configure() will itself execute
    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV), so it would
    return negative errno and fs_dev_configure() would abort.

  * by executing 'fs_dev_remove()' and the sub-device was 'DEV_STARTED'
    to begin with, then it is retrograded to DEV_ACTIVE once stopped.

So I don't understand yet how it is possible for a sub-device to become DEV_ACTIVE
while its eth_dev->dev_ops are NULL. It seems more like a bug, memory corruption or
just an unexpected execution pattern.

Could describe in more detail the execution?
In particular, setting the EAL log-level to debug with the option:
' --log-level pmd.net.failsafe:debug '
for example while using testpmd or your DPDK app.
It should show ethdev level accesses to the sub-devices, and error values.

Best regards,
-- 
Gaetan Rivet

  parent reply	other threads:[~2021-11-22 10:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 11:51 [dpdk-dev] [PATCH 1/1] " vipul.ashri
2021-10-21 21:42 ` [dpdk-dev] [PATCH v2] " vipul.ashri
2021-11-22  9:36   ` Ferruh Yigit
2021-11-22 10:23   ` Gaëtan Rivet [this message]
2022-02-15 16:24     ` Vipul Ashri
2022-02-15 16:46       ` Vipul Ashri
2023-10-17 16:43         ` Stephen Hemminger
2024-04-12 11:27           ` Ferruh Yigit
2023-07-07  9:35       ` Ferruh Yigit
2022-02-14 13:09 Vipul Ashri
2022-02-14 16:54 ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com \
    --to=grive@u256.net \
    --cc=dev@dpdk.org \
    --cc=stable@dpdk.org \
    --cc=vipul.ashri@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).