From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on0083.outbound.protection.outlook.com [104.47.0.83]) by dpdk.org (Postfix) with ESMTP id B81AD7D3A; Tue, 1 Aug 2017 14:18:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4w7UAvnwLlp0x8xQDGbBxYcLIrcu5SJRWgh4z6HOp2A=; b=cMZpyLRxV7kl5fwO3lip8uR/8XyyJeCQCCNeUEd6tR1KIzzXHJw0sfAttf7SV3N1Z/5KW/MLN0lQAK/WIkoWcndBVgxJrEzfg9gs4UpnFMUYMjNZc9PjRRLRbq8Nt9xo1NgZthGZU4l3Qc+y/qqI1g28C2olqcRJQa9oFhqKq1A= Received: from VI1PR0502MB3056.eurprd05.prod.outlook.com (10.175.22.17) by AM4PR0501MB2755.eurprd05.prod.outlook.com (10.172.216.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.1.1304.22; Tue, 1 Aug 2017 12:18:36 +0000 Received: from VI1PR0502MB3056.eurprd05.prod.outlook.com ([fe80::3da8:aae0:4a15:fbd8]) by VI1PR0502MB3056.eurprd05.prod.outlook.com ([fe80::3da8:aae0:4a15:fbd8%13]) with mapi id 15.01.1304.022; Tue, 1 Aug 2017 12:18:36 +0000 From: Matan Azrad To: Adrien Mazarguil CC: "dev@dpdk.org" , Thomas Monjalon , "Olga Shern" , "stable@dpdk.org" Thread-Topic: [PATCH] net/mlx4: workaround to verbs wrong error return Thread-Index: AQHTCgfEoeAeTg2ISkS844MN8M/g76JuAFFAgAFAkICAAAOmEIAAD22AgAAOOUA= Date: Tue, 1 Aug 2017 12:18:36 +0000 Message-ID: References: <1501499709-19873-1-git-send-email-matan@mellanox.com> <20170731141728.GO19852@6wind.com> <20170801094221.GQ19852@6wind.com> <20170801105037.GT19852@6wind.com> In-Reply-To: <20170801105037.GT19852@6wind.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR0501MB2755; 7:VG9sRe25L+52g7wt3Pm8sPeHixgXQlTDPbfisa/2Z7gAzhr84Pwe6HsZCVyxS6GmoHuA/PltPqFrSDALjDq7uXhGbYTocEdVwI8phw5dF+a34sMy6k+ZYrPWmJeqnfLkkcAzSPoB22oTQwcsgeKqc+Z5anaBARj5/Wx90+NvuOHr9zPvBSNImGEK5f6PxYPEf5xnd/ZDCbrzMvpHuXTc2B/sKAayjEfsJM51EJMhzNV+g/fJP/svmbdyIJVK0hlY44dfPZeVeBywSjCMwllN3wdS19Pl1OZd/ns0Es7KgjvSN+7an2fQEgnT3szCh/a0vgyvGP4kKqQ5JVvBSWuTGNvMvYBOV5nygSFDNYNr52Nbewcl1tXuebvwisdpduf9lXFEoDctp72XLPorjruDy+nXG5bjJ5a1n0xji8bepPQrUP2uRrUQD9nM/auepY+w3Ig22WnJ6zPNqf9WzsFp81rYVryzMIRw9wZGtZ0GzA9ErA2S2dvjY9AqHf1zSmNMZLZ0U8EqPxZtFzzmSjNUrGF4+drFp7pgc1NGKBv9r3KW+RC39CtwmWUpIJj+zmelECXa51aLQHbbLQiw/ZvQCQdpIrVnyxf0YVp45kFCA5QoVFOZNICyDotmWgQWz4xytxoQZYfuUCF+40eR3BuTiVxo/6A70v0UXOsejzOO76yLYpoy1BzBq89ykZtubnS8bhIOFXLSL4h4pFz+E61Ndnha0H+EV8GncneyvvSjr6imBoLAEhSEnsGT+D2p7hhHYw7+Xm4CG/KhYFpfNri+i7yqPBjeSlatU2kKqyaLWPA= x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-ms-office365-filtering-correlation-id: a592acc5-dee0-47a4-a8e8-08d4d8d76f63 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254152)(300000503095)(300135400095)(48565401081)(2017052603031)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095); SRVR:AM4PR0501MB2755; x-ms-traffictypediagnostic: AM4PR0501MB2755: x-exchange-antispam-report-test: UriScan:(278428928389397)(788757137089); x-microsoft-antispam-prvs: x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3002001)(100000703101)(100105400095)(6055026)(6041248)(20161123560025)(20161123555025)(20161123562025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:AM4PR0501MB2755; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:AM4PR0501MB2755; x-forefront-prvs: 0386B406AA x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(39410400002)(39840400002)(39860400002)(39450400003)(39400400002)(39850400002)(13464003)(199003)(24454002)(377454003)(189002)(229853002)(38730400002)(110136004)(6436002)(81166006)(9686003)(3846002)(99286003)(55016002)(6116002)(102836003)(54906002)(68736007)(8936002)(8676002)(81156014)(2900100001)(76176999)(101416001)(33656002)(4326008)(189998001)(66066001)(53936002)(3660700001)(5250100002)(53546010)(305945005)(25786009)(106356001)(6246003)(97736004)(7736002)(14454004)(2950100002)(2906002)(86362001)(6506006)(105586002)(74316002)(3280700002)(93886004)(7696004)(478600001)(5660300001)(54356999)(50986999)(6916009); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR0501MB2755; H:VI1PR0502MB3056.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Aug 2017 12:18:36.3203 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2755 Subject: Re: [dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error return X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2017 12:18:39 -0000 > -----Original Message----- > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] > Sent: Tuesday, August 1, 2017 1:51 PM > To: Matan Azrad > Cc: dev@dpdk.org; Thomas Monjalon ; Olga Shern > ; stable@dpdk.org > Subject: Re: [PATCH] net/mlx4: workaround to verbs wrong error return >=20 > Hi Matan, >=20 > (snipping a bit of unnecessary context) >=20 > On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote: > [...] > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] > [...] > > > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote: > [...] > > > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] > [...] > > > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote: > [...] > > > > > > @@ -1278,7 +1287,10 @@ struct rte_flow * > > > > > > for (flow =3D LIST_FIRST(&priv->flows); > > > > > > flow; > > > > > > flow =3D LIST_NEXT(flow, next)) { > > > > > > - claim_zero(ibv_destroy_flow(flow->ibv_flow)); > > > > > > + /* Current verbs does not allow to check real > > > > > > + * errors when the device was plugged out. > > > > > > + */ > > > > > > + ibv_destroy_flow(flow->ibv_flow); > > > > > > flow->ibv_flow =3D NULL; > > > > > > DEBUG("Flow %p removed", (void *)flow); > > > > > > } > > > > > > -- > > > > > > 1.8.3.1 > > > > > > > > > > > > > > > > This approach looks way too intrusive. How about making the > > > > > claim_zero() definition not fail but still complain when > > > > > compiled against a broken Verbs version instead? > > > > > > > > > > #include "mlx4_autoconf.h" > > > > > > > > > > [...] > > > > > > > > > > #ifndef HAVE_BROKEN_VERBS > > > > > #define claim_zero(...) assert((__VA_ARGS__) =3D=3D 0) #else /* > > > > > HAVE_BROKEN_VERBS */ #define claim_zero(...) \ > > > > > (void)(((__VA_ARGS__) =3D=3D 0) || \ > > > > > DEBUG("Assertion `" # __VA_ARGS__ "' failed > > > > > (IGNORED)")) #endif /* HAVE_BROKEN_VERBS */ > > > > > > > > > > You could use auto-config-h.sh to generate the > HAVE_BROKEN_VERBS > > > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some > > > > > symbol, macro or type that only exists or doesn't exist yet in > > > > > problematic releases for instance. > > > > > > > > > > > > > I agree with the dependence on broken verbs but there are other > > > > places in mlx4 code which use claim_zero assertion, So this > > > > suggestion will hurt other validations. > > > > > > Well, half broken is no better than completely broken in my opinion, > > > so while Verbs is being repaired, users debugging the mlx4 PMD will > > > temporarily get debug traces without the ensuing abort(). At least > > > the behavior will be consistent. > > > > > > Think about it, they already have to go out of their way to enable > > > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using > > > hot-plug but still use a buggy Verbs version, they can disable > > > HAVE_BROKEN_VERBS to revert to the normal behavior. > > > > > > > priv_flow_validate and priv_mac_addr_add functions calls also are > > wrapped by claim_zero, These are not ibv_destroy functions and don't > > depend only in broken verbs, The user want to be aborted in those > > cases otherwise he would have put there trace print as you suggest. >=20 > As the only exceptions, if you had to change something it would be these > instance in order to be less intrusive. But I suggest you not to since, a= gain, > this is a workaround for a problem that is not under PMD control. >=20 > > > > What's about to create new define depend on broken verbs for the > > > > specific > > > assertions? > > > > It will be still intrusive but more accurate. > > > > > > One reason I prefer the code to remain unchanged is that I'm > > > currently refactoring the entire PMD. Maintaining the above patch > > > (picking the right > > > ibv_*() calls that return a consistent value) will be difficult and > > > an intrusive patch won't be reverted easily once Verbs is fixed. > > > > > > > You can just find all claim_zero_new and replace it with claim_zero. >=20 > So what if I'm adding new ibv_destroy_qp() calls, can I expect them to wo= rk > consistently or will each of them have to be validated against a possible > assert() failure during hot-plug? >=20 > Note that ibv_destroy_qp() is only one example among many, the work > you've done for this patch will have to be performed every time new code = is > added. I don't find it particularly convenient. >=20 > > > All these claim_zero() checks ensure the PMD destroys Verbs > > > resources in the proper order (e.g. a flow before the QP it is > > > associated with). If the return value of any of these cannot be > > > relied on, it's useless to only check some of them. > > > > > > priv_flow_validate and priv_mac_addr_add functions are not destroy > verbs resources. >=20 > Right, and that's not a problem. Unfortunate users still get a nice debug= ging > message in the unlikely case of a failure for these. >=20 I don't think the user want debugging message for this functions - he want = to be aborted and to stop the program. He could add DEBUG prints if he had want, those behaviors are really differ= ent. > > > Moreover if ibv_destroy_something() wrongly returns an error when > > > the device is unplugged, I think this can happen to the calls not > > > part of your patch, i.e. all of them, so working around it at the > > > macro definition level makes sense. > > > > I checked with failsafe tests and found that only the specific destroy > functions are problematic. >=20 > What about other applications and corner cases, such as when the device i= s > unplugged while the PMD is being initialized? Since the control path is b= ound > by system calls, the PMD might actually sleep for a non negligible amount= of > time (there is really no upper limit to how long) and the device could > disappear at any point. Subsequent ibv_*() calls would return unexpected > errors. >=20 > I'm sure you cannot verify all corner cases, so let's avoid them. >=20 > > > If you don't know what symbol can be relied on in OFED 4.2 to define > > > HAVE_BROKEN_VERBS (which is just an example, you can use another > > > name BTW), maybe you can add a compilation option to enable manually > > > in case of trouble? Something verbose like: > > > > > > CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=3Dn > > > > > > Which will have to be documented. >=20 > What about the above suggestion? >=20 So, I don't think there is perfect solution to this issue.=20 I will take your suggestion to depend claim_zero in verbs version. Firstly, I will check if I can get the information for broken verbs from so= mewhere, If not, I will use compilation option. > -- > Adrien Mazarguil > 6WIND