From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f177.google.com (mail-wr0-f177.google.com [209.85.128.177]) by dpdk.org (Postfix) with ESMTP id 1B7607CB6 for ; Tue, 1 Aug 2017 12:50:47 +0200 (CEST) Received: by mail-wr0-f177.google.com with SMTP id y43so5154213wrd.3 for ; Tue, 01 Aug 2017 03:50:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=hFF1QRJj82+VKaaNorZJfFrYrryu7LTk+O5MHmLk358=; b=rkWnTBZnPJM2Yani33Vp4M4WR985s0LT/vTVwuEUawPqzQgBinPnTwlPzk2o5PNtSe 9A2f67X9lUOvDpM8ov73hknaatih8XacLgOTi0/bGCi2AzXPG3gIpzVBMepUJ0xgCaPa jLNX9su8eIycvsTiKEfrXXZ4oTTA5gV44x4lDvJWk29Vd0C0a04+qQ+X0q0pLTi35mD9 rzlBY9KUy8ls5f/GzB0vjEdUYSeT7lSjeqiGQYNoPrQfJXi2wOjt0g+PS1rNlNMZYDHg AH5sqxIQ4aNeLgOMt9yx/NkM7z9mR/7SnDNTXFCeuURt54iJbSq+qof6p97TCP0Iwdj8 OvoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=hFF1QRJj82+VKaaNorZJfFrYrryu7LTk+O5MHmLk358=; b=lGoeDoplLrbTS4deTpitCf5DNlFT6VaTM/CiCir5P4PYGyt0lsJl49boklwRcJ6uRf UlSDjSoz5e5sqQivCeDeUhi5vi+RM6Kad8VLHvpaG75ktBsRHmJyxVeZTLog4x1L85JS du5hRcgiVRUvx2UvgTKzIdyW2iYAW8mvOt1zOdLlIHK/CcAs9VPMwGNjLBZ8i3Tgog68 z1si5UxaW4a+TyICRIPaB/Ov4toEw9nBSZxxItWHjq4VGph+8Rc2uxtQdycq5gAQp1Z7 lQV8ln4WKCiEaBYx07gDL6UMfuhCC2/+hZKlFLB7XSCeY0sh9fUnX7qvLtIhO8++HCSd 21nA== X-Gm-Message-State: AIVw111g/V/3neMTJKI2un6SgXvJN4QTjkvamlGsOgoBgIkzYxUDLaHh kjOmbsDCYN3hkABq2Qk= X-Received: by 10.223.168.110 with SMTP id l101mr13947349wrc.251.1501584646808; Tue, 01 Aug 2017 03:50:46 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id l124sm841367wml.26.2017.08.01.03.50.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 01 Aug 2017 03:50:45 -0700 (PDT) Date: Tue, 1 Aug 2017 12:50:37 +0200 From: Adrien Mazarguil To: Matan Azrad Cc: "dev@dpdk.org" , Thomas Monjalon , Olga Shern , "stable@dpdk.org" Message-ID: <20170801105037.GT19852@6wind.com> References: <1501499709-19873-1-git-send-email-matan@mellanox.com> <20170731141728.GO19852@6wind.com> <20170801094221.GQ19852@6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error return X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Aug 2017 10:50:47 -0000 Hi Matan, (snipping a bit of unnecessary context) On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote: [...] > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] [...] > > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote: [...] > > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] [...] > > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote: [...] > > > > > @@ -1278,7 +1287,10 @@ struct rte_flow * > > > > > for (flow = LIST_FIRST(&priv->flows); > > > > > flow; > > > > > flow = LIST_NEXT(flow, next)) { > > > > > - claim_zero(ibv_destroy_flow(flow->ibv_flow)); > > > > > + /* Current verbs does not allow to check real > > > > > + * errors when the device was plugged out. > > > > > + */ > > > > > + ibv_destroy_flow(flow->ibv_flow); > > > > > flow->ibv_flow = NULL; > > > > > DEBUG("Flow %p removed", (void *)flow); > > > > > } > > > > > -- > > > > > 1.8.3.1 > > > > > > > > > > > > > This approach looks way too intrusive. How about making the > > > > claim_zero() definition not fail but still complain when compiled > > > > against a broken Verbs version instead? > > > > > > > > #include "mlx4_autoconf.h" > > > > > > > > [...] > > > > > > > > #ifndef HAVE_BROKEN_VERBS > > > > #define claim_zero(...) assert((__VA_ARGS__) == 0) #else /* > > > > HAVE_BROKEN_VERBS */ #define claim_zero(...) \ > > > > (void)(((__VA_ARGS__) == 0) || \ > > > > DEBUG("Assertion `" # __VA_ARGS__ "' failed (IGNORED)")) > > > > #endif /* HAVE_BROKEN_VERBS */ > > > > > > > > You could use auto-config-h.sh to generate the HAVE_BROKEN_VERBS > > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some > > > > symbol, macro or type that only exists or doesn't exist yet in > > > > problematic releases for instance. > > > > > > > > > > I agree with the dependence on broken verbs but there are other places > > > in mlx4 code which use claim_zero assertion, So this suggestion will > > > hurt other validations. > > > > Well, half broken is no better than completely broken in my opinion, so while > > Verbs is being repaired, users debugging the mlx4 PMD will temporarily get > > debug traces without the ensuing abort(). At least the behavior will be > > consistent. > > > > Think about it, they already have to go out of their way to enable > > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using hot-plug > > but still use a buggy Verbs version, they can disable HAVE_BROKEN_VERBS to > > revert to the normal behavior. > > > > priv_flow_validate and priv_mac_addr_add functions calls also are wrapped by claim_zero, > These are not ibv_destroy functions and don't depend only in broken verbs, > The user want to be aborted in those cases otherwise he would have put there trace print > as you suggest. As the only exceptions, if you had to change something it would be these instance in order to be less intrusive. But I suggest you not to since, again, this is a workaround for a problem that is not under PMD control. > > > What's about to create new define depend on broken verbs for the specific > > assertions? > > > It will be still intrusive but more accurate. > > > > One reason I prefer the code to remain unchanged is that I'm currently > > refactoring the entire PMD. Maintaining the above patch (picking the right > > ibv_*() calls that return a consistent value) will be difficult and an intrusive > > patch won't be reverted easily once Verbs is fixed. > > > > You can just find all claim_zero_new and replace it with claim_zero. So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work consistently or will each of them have to be validated against a possible assert() failure during hot-plug? Note that ibv_destroy_qp() is only one example among many, the work you've done for this patch will have to be performed every time new code is added. I don't find it particularly convenient. > > All these claim_zero() checks ensure the PMD destroys Verbs resources in > > the proper order (e.g. a flow before the QP it is associated with). If the > > return value of any of these cannot be relied on, it's useless to only check > > some of them. > > > priv_flow_validate and priv_mac_addr_add functions are not destroy verbs resources. Right, and that's not a problem. Unfortunate users still get a nice debugging message in the unlikely case of a failure for these. > > Moreover if ibv_destroy_something() wrongly returns an error when the > > device is unplugged, I think this can happen to the calls not part of your > > patch, i.e. all of them, so working around it at the macro definition level > > makes sense. > > I checked with failsafe tests and found that only the specific destroy functions are problematic. What about other applications and corner cases, such as when the device is unplugged while the PMD is being initialized? Since the control path is bound by system calls, the PMD might actually sleep for a non negligible amount of time (there is really no upper limit to how long) and the device could disappear at any point. Subsequent ibv_*() calls would return unexpected errors. I'm sure you cannot verify all corner cases, so let's avoid them. > > If you don't know what symbol can be relied on in OFED 4.2 to define > > HAVE_BROKEN_VERBS (which is just an example, you can use another name > > BTW), maybe you can add a compilation option to enable manually in case of > > trouble? Something verbose like: > > > > CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n > > > > Which will have to be documented. What about the above suggestion? -- Adrien Mazarguil 6WIND