From: Matan Azrad <matan@mellanox.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
Thomas Monjalon <thomas@monjalon.net>,
"Olga Shern" <olgas@mellanox.com>,
"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-stable] [PATCH] net/mlx4: workaround to verbs wrong error return
Date: Tue, 1 Aug 2017 12:18:36 +0000 [thread overview]
Message-ID: <VI1PR0502MB305659C51E01B36CC54BE49BD2B30@VI1PR0502MB3056.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <20170801105037.GT19852@6wind.com>
> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> Sent: Tuesday, August 1, 2017 1:51 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> <olgas@mellanox.com>; stable@dpdk.org
> Subject: Re: [PATCH] net/mlx4: workaround to verbs wrong error return
>
> Hi Matan,
>
> (snipping a bit of unnecessary context)
>
> On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote:
> [...]
> > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> [...]
> > > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote:
> [...]
> > > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> [...]
> > > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote:
> [...]
> > > > > > @@ -1278,7 +1287,10 @@ struct rte_flow *
> > > > > > for (flow = LIST_FIRST(&priv->flows);
> > > > > > flow;
> > > > > > flow = LIST_NEXT(flow, next)) {
> > > > > > - claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > > > > + /* Current verbs does not allow to check real
> > > > > > + * errors when the device was plugged out.
> > > > > > + */
> > > > > > + ibv_destroy_flow(flow->ibv_flow);
> > > > > > flow->ibv_flow = NULL;
> > > > > > DEBUG("Flow %p removed", (void *)flow);
> > > > > > }
> > > > > > --
> > > > > > 1.8.3.1
> > > > > >
> > > > >
> > > > > This approach looks way too intrusive. How about making the
> > > > > claim_zero() definition not fail but still complain when
> > > > > compiled against a broken Verbs version instead?
> > > > >
> > > > > #include "mlx4_autoconf.h"
> > > > >
> > > > > [...]
> > > > >
> > > > > #ifndef HAVE_BROKEN_VERBS
> > > > > #define claim_zero(...) assert((__VA_ARGS__) == 0) #else /*
> > > > > HAVE_BROKEN_VERBS */ #define claim_zero(...) \
> > > > > (void)(((__VA_ARGS__) == 0) || \
> > > > > DEBUG("Assertion `" # __VA_ARGS__ "' failed
> > > > > (IGNORED)")) #endif /* HAVE_BROKEN_VERBS */
> > > > >
> > > > > You could use auto-config-h.sh to generate the
> HAVE_BROKEN_VERBS
> > > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some
> > > > > symbol, macro or type that only exists or doesn't exist yet in
> > > > > problematic releases for instance.
> > > > >
> > > >
> > > > I agree with the dependence on broken verbs but there are other
> > > > places in mlx4 code which use claim_zero assertion, So this
> > > > suggestion will hurt other validations.
> > >
> > > Well, half broken is no better than completely broken in my opinion,
> > > so while Verbs is being repaired, users debugging the mlx4 PMD will
> > > temporarily get debug traces without the ensuing abort(). At least
> > > the behavior will be consistent.
> > >
> > > Think about it, they already have to go out of their way to enable
> > > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using
> > > hot-plug but still use a buggy Verbs version, they can disable
> > > HAVE_BROKEN_VERBS to revert to the normal behavior.
> > >
> >
> > priv_flow_validate and priv_mac_addr_add functions calls also are
> > wrapped by claim_zero, These are not ibv_destroy functions and don't
> > depend only in broken verbs, The user want to be aborted in those
> > cases otherwise he would have put there trace print as you suggest.
>
> As the only exceptions, if you had to change something it would be these
> instance in order to be less intrusive. But I suggest you not to since, again,
> this is a workaround for a problem that is not under PMD control.
>
> > > > What's about to create new define depend on broken verbs for the
> > > > specific
> > > assertions?
> > > > It will be still intrusive but more accurate.
> > >
> > > One reason I prefer the code to remain unchanged is that I'm
> > > currently refactoring the entire PMD. Maintaining the above patch
> > > (picking the right
> > > ibv_*() calls that return a consistent value) will be difficult and
> > > an intrusive patch won't be reverted easily once Verbs is fixed.
> > >
> >
> > You can just find all claim_zero_new and replace it with claim_zero.
>
> So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work
> consistently or will each of them have to be validated against a possible
> assert() failure during hot-plug?
>
> Note that ibv_destroy_qp() is only one example among many, the work
> you've done for this patch will have to be performed every time new code is
> added. I don't find it particularly convenient.
>
> > > All these claim_zero() checks ensure the PMD destroys Verbs
> > > resources in the proper order (e.g. a flow before the QP it is
> > > associated with). If the return value of any of these cannot be
> > > relied on, it's useless to only check some of them.
> >
> >
> > priv_flow_validate and priv_mac_addr_add functions are not destroy
> verbs resources.
>
> Right, and that's not a problem. Unfortunate users still get a nice debugging
> message in the unlikely case of a failure for these.
>
I don't think the user want debugging message for this functions - he want to be aborted
and to stop the program.
He could add DEBUG prints if he had want, those behaviors are really different.
> > > Moreover if ibv_destroy_something() wrongly returns an error when
> > > the device is unplugged, I think this can happen to the calls not
> > > part of your patch, i.e. all of them, so working around it at the
> > > macro definition level makes sense.
> >
> > I checked with failsafe tests and found that only the specific destroy
> functions are problematic.
>
> What about other applications and corner cases, such as when the device is
> unplugged while the PMD is being initialized? Since the control path is bound
> by system calls, the PMD might actually sleep for a non negligible amount of
> time (there is really no upper limit to how long) and the device could
> disappear at any point. Subsequent ibv_*() calls would return unexpected
> errors.
>
> I'm sure you cannot verify all corner cases, so let's avoid them.
>
> > > If you don't know what symbol can be relied on in OFED 4.2 to define
> > > HAVE_BROKEN_VERBS (which is just an example, you can use another
> > > name BTW), maybe you can add a compilation option to enable manually
> > > in case of trouble? Something verbose like:
> > >
> > > CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n
> > >
> > > Which will have to be documented.
>
> What about the above suggestion?
>
So, I don't think there is perfect solution to this issue.
I will take your suggestion to depend claim_zero in verbs version.
Firstly, I will check if I can get the information for broken verbs from somewhere,
If not, I will use compilation option.
> --
> Adrien Mazarguil
> 6WIND
prev parent reply other threads:[~2017-08-01 12:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-31 11:15 Matan Azrad
2017-07-31 14:17 ` Adrien Mazarguil
2017-07-31 16:56 ` Matan Azrad
2017-08-01 9:42 ` Adrien Mazarguil
2017-08-01 10:12 ` Matan Azrad
2017-08-01 10:50 ` Adrien Mazarguil
2017-08-01 12:18 ` Matan Azrad [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR0502MB305659C51E01B36CC54BE49BD2B30@VI1PR0502MB3056.eurprd05.prod.outlook.com \
--to=matan@mellanox.com \
--cc=adrien.mazarguil@6wind.com \
--cc=dev@dpdk.org \
--cc=olgas@mellanox.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).