From: "Zhang, Qi Z" <qi.z.zhang@intel.com>
To: "Ye, MingjinX" <mingjinx.ye@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Yang, Qiming" <qiming.yang@intel.com>,
"Zhou, YidingX" <yidingx.zhou@intel.com>,
"stable@dpdk.org" <stable@dpdk.org>
Subject: RE: [PATCH v3] net/ice: fix crash on closing representor ports
Date: Thu, 2 Nov 2023 03:59:21 +0000 [thread overview]
Message-ID: <DM4PR11MB59940F596F1E6F481D51A305D7A6A@DM4PR11MB5994.namprd11.prod.outlook.com> (raw)
In-Reply-To: <SN7PR11MB7139CC496AF0A0B64CC17D5DE5A6A@SN7PR11MB7139.namprd11.prod.outlook.com>
> -----Original Message-----
> From: Ye, MingjinX <mingjinx.ye@intel.com>
> Sent: Thursday, November 2, 2023 10:39 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> <yidingx.zhou@intel.com>; stable@dpdk.org
> Subject: RE: [PATCH v3] net/ice: fix crash on closing representor ports
>
>
>
> > -----Original Message-----
> > From: Zhang, Qi Z <qi.z.zhang@intel.com>
> > Sent: Wednesday, November 1, 2023 6:49 PM
> > To: Ye, MingjinX <mingjinx.ye@intel.com>; dev@dpdk.org
> > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> > <yidingx.zhou@intel.com>; stable@dpdk.org
> > Subject: RE: [PATCH v3] net/ice: fix crash on closing representor
> > ports
> >
> >
> >
> > > -----Original Message-----
> > > From: Ye, MingjinX <mingjinx.ye@intel.com>
> > > Sent: Wednesday, November 1, 2023 6:14 PM
> > > To: dev@dpdk.org
> > > Cc: Yang, Qiming <qiming.yang@intel.com>; Zhou, YidingX
> > > <yidingx.zhou@intel.com>; Ye, MingjinX <mingjinx.ye@intel.com>;
> > > stable@dpdk.org; Zhang, Qi Z <qi.z.zhang@intel.com>
> > > Subject: [PATCH v3] net/ice: fix crash on closing representor ports
> > >
> > > The data resource in struct rte_eth_dev is cleared and points to
> > > NULL when the DCF port is closed.
> > >
> > > If the DCF representor port is closed after the DCF port is closed,
> > > a segmentation fault occurs because the representor port accesses
> > > the data resource released by the DCF port.
> > >
> > > This patch checks if the resource is present before accessing.
> > >
> > > Fixes: 5674465a32c8 ("net/ice: add DCF VLAN handling")
> > > Fixes: da9cdcd1f372 ("net/ice: fix crash on representor port
> > > closing")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Mingjin Ye <mingjinx.ye@intel.com>
> > > ---
> > > v3: New solution.
> > > ---
> > > drivers/net/ice/ice_dcf_vf_representor.c | 8 +++++---
> > > 1 file changed, 5 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/ice/ice_dcf_vf_representor.c
> > > b/drivers/net/ice/ice_dcf_vf_representor.c
> > > index b9fcfc80ad..8c45e28f02 100644
> > > --- a/drivers/net/ice/ice_dcf_vf_representor.c
> > > +++ b/drivers/net/ice/ice_dcf_vf_representor.c
> > > @@ -111,14 +111,16 @@ ice_dcf_vf_repr_link_update(__rte_unused
> > struct
> > > rte_eth_dev *ethdev, static __rte_always_inline struct ice_dcf_hw *
> > > ice_dcf_vf_repr_hw(struct ice_dcf_vf_repr *repr) {
> > > - struct ice_dcf_adapter *dcf_adapter =
> > > - repr->dcf_eth_dev->data->dev_private;
> > > + struct rte_eth_dev_data *dcf_data = repr->dcf_eth_dev->data;
> >
> > Seems this expose another issue, if dcf port already be closed, the
> > dcf_eth_dev instance could already be reused by another driver.
> Your analysis is spot on.
>
> The reason for the issue:
> Affected by 36c46e738120c381cf663c96692454c5aa75e203 commit.
> da9cdcd1f37220e87db23993d6352637d71df25b commit does not fix the
> issue.
>
> v2 patch:
> Notify every proxy port of DCF port status.
>
> v3 patch:
> Based on da9cdcd1f37220e87db23993d636352637d71df25b, made
> optimization to fix the issue.
>
> > So we can't assume dcf_eth_dev->data is NULL, I think you can refine
> > based on v2's method, but don't update dcf_valid flag in representor
> > port's dev_stop.
> Implementation difficulties:
> 1. When the DCF is created, all associated proxy ports are created and each
> proxy port (struct rte_eth_dev) pointer is recorded in an array.
> 2. When shutting down the DCF, all all proxy ports are notified at
> ice_dcf_vf_repr_stop_all. Finally the array is deleted.
>
> Based on the original scenario, notifying all agent ports of DCF state failure is
> the simplest and most effective way. This is very similar to the v2 patch
> scenario.
Yes, that's why I suggest to refine based on v2 after notice the issue on v3, seems we need to maintain a per repr flag for DCF status, but need to update it carefully.
>
> Check each DCF port flag?
> If the DCF port has failed and the resource has been released. the DCF port
> status, naturally, is not available. This is very similar to the v3 patch scenario.
Check dev->data is NULL does not work, a better way is to check eth_dev->state and device name to make sure it is still used by DCF.
But I think v2's method is more reliable.
>
> Other ideas
> 1. Re-implement DCF port and proxy port communication, can't directly use
> (struct rte_eth_dev) pointer to access the device resources on the other end.
>
> 2. Designed to return an error when removing the proxy port directly. The
> proxy port device is managed by the control DCF (the proxy port is created by
> the DCF port and the removal work should be done by it).
>
> The above 2 options will bring a huge amount of work. It seems beyond the
> scope of this discussion.
>
next prev parent reply other threads:[~2023-11-02 3:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-26 9:51 [PATCH] net/iavf: " Mingjin Ye
2023-10-27 7:17 ` David Marchand
2023-10-30 2:35 ` [PATCH v2] " Mingjin Ye
2023-10-30 8:44 ` [PATCH v2] net/ice: " Mingjin Ye
2023-11-01 1:11 ` Zhang, Qi Z
2023-11-01 10:13 ` [PATCH v3] " Mingjin Ye
2023-11-01 10:48 ` Zhang, Qi Z
2023-11-02 2:38 ` Ye, MingjinX
2023-11-02 3:59 ` Zhang, Qi Z [this message]
2023-11-02 10:11 ` [PATCH v4] " Mingjin Ye
2023-11-06 1:23 ` Zhang, Qi Z
2023-11-06 10:00 ` [PATCH v5] " Mingjin Ye
2023-11-06 12:05 ` Zhang, Qi Z
2023-11-07 10:12 ` [PATCH v6] " Mingjin Ye
2023-11-07 12:18 ` Zhang, Qi Z
2023-11-08 11:39 ` [PATCH v7] " Mingjin Ye
2023-11-09 0:26 ` Zhang, Qi Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM4PR11MB59940F596F1E6F481D51A305D7A6A@DM4PR11MB5994.namprd11.prod.outlook.com \
--to=qi.z.zhang@intel.com \
--cc=dev@dpdk.org \
--cc=mingjinx.ye@intel.com \
--cc=qiming.yang@intel.com \
--cc=stable@dpdk.org \
--cc=yidingx.zhou@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).