From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0070.outbound.protection.outlook.com [104.47.0.70]) by dpdk.org (Postfix) with ESMTP id 18A091D8E; Thu, 14 Dec 2017 11:40:24 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=aneiOBiWaTX3rCyzKZ6XDz0iDNR4OmzmHXS8I/H7GOs=; b=bZ8V3Hbs2LkJVv2zCtDKX9dAZyZzNXrR7NpfVzSssjRLOyDxSFfgp9KfBOdKuBh69hWA/sOW7u6CZYgtN3Cb82Y1vta2xRgoEWMpeekjzOKfyt/7JF7W1FqKtpYnfPYDKlQ//tj1Yne0DvdlvlM45AmkvFKaAUSvr0anoHXdreI= Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com (10.167.127.17) by HE1PR0502MB3658.eurprd05.prod.outlook.com (10.167.127.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.282.5; Thu, 14 Dec 2017 10:40:23 +0000 Received: from HE1PR0502MB3659.eurprd05.prod.outlook.com ([fe80::982e:2dce:9449:6891]) by HE1PR0502MB3659.eurprd05.prod.outlook.com ([fe80::982e:2dce:9449:6891%13]) with mapi id 15.20.0282.012; Thu, 14 Dec 2017 10:40:23 +0000 From: Matan Azrad To: =?iso-8859-1?Q?Ga=EBtan_Rivet?= CC: Adrien Mazarguil , Thomas Monjalon , "dev@dpdk.org" , "stable@dpdk.org" Thread-Topic: [PATCH v2 4/4] net/failsafe: fix removed device handling Thread-Index: AQHTdCVqeFZYAzJ0Kk2VOrfpqCAdSKNBZw+wgAAKGACAAGDPgIAA0g4g Date: Thu, 14 Dec 2017 10:40:22 +0000 Message-ID: References: <1509637324-13525-1-git-send-email-matan@mellanox.com> <1513175370-16583-1-git-send-email-matan@mellanox.com> <1513175370-16583-5-git-send-email-matan@mellanox.com> <20171213151641.g42zr7zupbsdgxsv@bidouze.vm.6wind.com> <20171213160916.e3rmxmhfhqz72wco@bidouze.vm.6wind.com> <20171213215545.kywwximn2g5xm5x5@bidouze.vm.6wind.com> In-Reply-To: <20171213215545.kywwximn2g5xm5x5@bidouze.vm.6wind.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; HE1PR0502MB3658; 6:zBzT4IkO31NmkWErWeyGw06olIvvICFOKnj05TYyioqwH5UXDA/2NI54gwu3jp6GOy60wLeGhqZH4TgYP2ahXAplqWRdynd0ESSS5I8hC2cu7orz14lct8hK7h3iWDJm5ezhT3ixtodI2C4joP0L6t4d2Yrt5K4v5gZRejHoakOCGNL09HHUVorX/i+bh57VEJaTAStGTXAndr+ZtGdFX2N+ki83L5LJClOZAsWFz6ZGLsxGsvD74+1s3k8matxP05MUF98ZAt5UzXlxXgbGz4dXodoSpYzTyJfa8EFvYoBJWxmpA33GNtzwv8ePEVoVp0pVhn0jddqZsiMWF2/3Jxy0a+sei9ica2RuOp+2WaY=; 5:jLx2V91KWwbZYjLomDL0I5QZoNRuXMdVyioRCnBg3Vx3Zrxquhjwax+USbV5b3r0BxueKHuD0sneN3YZbepAIgrFncmV5Eck1o2K5wtWgrh0TW40Uf2MOpxIy9Uumi/SAADQYfPx/xrWP/dNT+9OlhPoeY8gwjim5Az3tSYoXrA=; 24:PGVIUX238LcWYI8yQleW42wjoZbwMECfIoAFXoH5M/W9VlRWp8sc3fuNIo9Ezbx+r5QbT1klMHLbTVO0Ag335S4aY9iwYvUoc9ZXP7aoO00=; 7:iA3m/Yngy2WXg29wXjCZ/hsQor7K4SvXjsZqZ2JiOR4GTucDgJ5X6/EXvKwuKl3gLJp2GZsQbxs6joLvxKlQjZ2qrQZAQLxFj1jfOtqk8hSj+C8VVsEQbJWFh27YYM7vts3vb7Eo5B3goeGxPH/3GEye02Ki9xkqriKfshrbzrCvXkAk+xL1BRL03i6qC2nD2aXoLu8GuBNC9L7nyq8Rw56MxoKANipXB20+UCHKtY9B7ZaV+h8XjyBAy02QqjIx x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 0b34aac3-7112-4406-442c-08d542df1472 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(48565401081)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307); SRVR:HE1PR0502MB3658; x-ms-traffictypediagnostic: HE1PR0502MB3658: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(278428928389397); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(8121501046)(5005006)(3002001)(93006095)(93001095)(10201501046)(3231023)(6055026)(6041248)(20161123555025)(20161123562025)(20161123558100)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(201708071742011); SRVR:HE1PR0502MB3658; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:HE1PR0502MB3658; x-forefront-prvs: 05214FD68E x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39860400002)(376002)(366004)(346002)(199004)(13464003)(76104003)(24454002)(189003)(51444003)(51914003)(229853002)(5660300001)(66066001)(6436002)(6246003)(25786009)(53936002)(68736007)(106356001)(5250100002)(55016002)(53546011)(6506007)(105586002)(3846002)(54906003)(102836003)(4326008)(8936002)(2950100002)(316002)(86362001)(93886005)(2900100001)(81156014)(3660700001)(6116002)(97736004)(2906002)(8676002)(14454004)(81166006)(7696005)(3280700002)(305945005)(74316002)(9686003)(478600001)(76176011)(99286004)(7736002)(33656002)(6916009); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0502MB3658; H:HE1PR0502MB3659.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 0b34aac3-7112-4406-442c-08d542df1472 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Dec 2017 10:40:22.8935 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0502MB3658 Subject: Re: [dpdk-dev] [PATCH v2 4/4] net/failsafe: fix removed device handling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Dec 2017 10:40:25 -0000 Hi Gaetan > -----Original Message----- > From: Ga=EBtan Rivet [mailto:gaetan.rivet@6wind.com] > Sent: Wednesday, December 13, 2017 11:56 PM > To: Matan Azrad > Cc: Adrien Mazarguil ; Thomas Monjalon > ; dev@dpdk.org; stable@dpdk.org > Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device handling >=20 > Hi again Matan, >=20 > On Wed, Dec 13, 2017 at 05:09:16PM +0100, Ga=EBtan Rivet wrote: > > On Wed, Dec 13, 2017 at 03:48:46PM +0000, Matan Azrad wrote: > > > Hi Gaetan > > > Thanks for the review. > > > Some comments.. > > > > > > > -----Original Message----- > > > > From: Ga=EBtan Rivet [mailto:gaetan.rivet@6wind.com] > > > > Sent: Wednesday, December 13, 2017 5:17 PM > > > > To: Matan Azrad > > > > Cc: Adrien Mazarguil ; Thomas > Monjalon > > > > ; dev@dpdk.org; stable@dpdk.org > > > > Subject: Re: [PATCH v2 4/4] net/failsafe: fix removed device > > > > handling > > > > > > > > Hi Matan, > > > > > > > > On Wed, Dec 13, 2017 at 02:29:30PM +0000, Matan Azrad wrote: > > > > > There is time between the physical removal of the device until > > > > > sub-device PMDs get a RMV interrupt. At this time DPDK PMDs and > > > > > applications still don't know about the removal and may call > > > > > sub-device control operation which should return an error. > > > > > > > > > > In previous code this error is reported to the application > > > > > contrary to fail-safe principle that the app should not be aware = of > device removal. > > > > > > > > > > Add an removal check in each relevant control command error flow > > > > > and prevent an error report to application when the sub-device is > removed. > > > > > > > > > > Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD") > > > > > Fixes: b737a1e ("net/failsafe: support flow API") > > > > > Cc: stable@dpdk.org > > > > > > > > > > > > > This patch is not a fix. > > > > It relies on an eth_dev API evolution. Without this evolution, > > > > this patch is meaningless and would break compilation if backported= in > stable branch. > > > > > > > > > > It is a fix because the bug is finally solved by this patch. > > > I agree that it cannot be backported itself, but maybe all the series= should > be backported. > > > Other idea: > > > Add new patch which documents the bug and backport it. > > > Remove it in this patch and remove cc stable from it. > > > What do you think? > > > > > > > I think you could write a crude version that would not rely on the > > ethdev evolution (checking sdev->remove only), which would be > > incomplete but still better than nothing. > > And why not in this patch document the issue. > > Without any dependency outside failsafe, this could be backported. > > > > Then complete the fix with the API evolution if the new devops is > > accepted. > > > > > > Please remove those tags. > > > > > > > > > Signed-off-by: Matan Azrad > > > > > --- > > > > > drivers/net/failsafe/failsafe_flow.c | 18 ++++++++++------- > > > > > drivers/net/failsafe/failsafe_ops.c | 34 > ++++++++++++++++++++++----- > > > > ------ > > > > > drivers/net/failsafe/failsafe_private.h | 10 ++++++++++ > > > > > 3 files changed, 44 insertions(+), 18 deletions(-) > > > > > > > > < ... > > > > > > > > > > +/* > > > > > + * Check if sub device was removed. > > > > > + */ > > > > > +static inline int > > > > > +fs_is_removed(struct sub_device *sdev) { > > > > > + if (sdev->remove =3D=3D 1 || > rte_eth_dev_is_removed(PORT_ID(sdev)) > > > > !=3D 0) > > > > > + return 1; > > > > > + return 0; > > > > > +} > > > > > > > > Have you considered adding this check within the subdev iterator it= self? > > > > I think it would prevent you from having to add it to each return > > > > value checks. > > > > > > > > It is still MT-unsafe anyway. > > > > > > > > > > This fix doesn't come to solve the MT issue, It comes to solve the er= ror > report to application because of removal. > > > Adding the check in subdev iterator doesn't make sense for this issue= . > > > > > > Matan. > > > > If you add this check in the iterator itself, you would skip removed > > devices before attempting operating upon them, right? > > > > Then it should probably help with your issue, unless you tested it and > > verified that it didnt? > > > > Something like this: > > > > ---8<--- > > > > diff --git a/drivers/net/failsafe/failsafe_private.h > > b/drivers/net/failsafe/failsafe_private.h > > index d81cc3ca6..62ddc0689 100644 > > --- a/drivers/net/failsafe/failsafe_private.h > > +++ b/drivers/net/failsafe/failsafe_private.h > > @@ -316,8 +316,12 @@ fs_find_next(struct rte_eth_dev *dev, > > subs =3D PRIV(dev)->subs; > > tail =3D PRIV(dev)->subs_tail; > > while (sid < tail) { > > + if (min_state > DEV_PROBED && > > + fs_is_removed(&sub[sid])) > > + goto next; > > if (subs[sid].state >=3D min_state) > > break; > > +next: > > sid++; > > } > > *sid_out =3D sid; > > > > --->8--- > > > > Only issue being that it is completely racy, but as this MT-unsafe > > property is inescapable we might as well ignore it and go for KISS. > > > > If that's enough, I would prefer instead of having this additional > > check added to all rte_eth operations. > > >=20 > Ok, actually you were right here to do it this way. The "is_removed" > check needs to happen after the operation attempt to effectively mitigate > the possible race. Checking before attempting the call will be much less > effective. >=20 > That being said, would it be cleaner to have eth_dev ops return -ENODEV > directly, and check against it within fail-safe? >=20 I think that according to "is_removed" semantic we must return a Boolean va= lue (Each value different from '0' means that the device is removed) like o= ther functions in c library (for example isspace()). Thanks. =20 > -- > Ga=EBtan Rivet > 6WIND