DPDK patches and discussions
 help / color / mirror / Atom feed
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Thomas Monjalon <thomas@monjalon.net>,
	Olga Shern <olgas@mellanox.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error return
Date: Tue, 1 Aug 2017 12:50:37 +0200	[thread overview]
Message-ID: <20170801105037.GT19852@6wind.com> (raw)
In-Reply-To: <VI1PR0502MB30565471A11E693173A0EB0AD2B30@VI1PR0502MB3056.eurprd05.prod.outlook.com>

Hi Matan,

(snipping a bit of unnecessary context)

On Tue, Aug 01, 2017 at 10:12:29AM +0000, Matan Azrad wrote:
[...]
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
[...]
> > On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote:
[...]
> > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
[...]
> > > > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote:
[...]
> > > > > @@ -1278,7 +1287,10 @@ struct rte_flow *
> > > > >  	for (flow = LIST_FIRST(&priv->flows);
> > > > >  	     flow;
> > > > >  	     flow = LIST_NEXT(flow, next)) {
> > > > > -		claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > > > +		/* Current verbs does not allow to check real
> > > > > +		 * errors when the device was plugged out.
> > > > > +		 */
> > > > > +		ibv_destroy_flow(flow->ibv_flow);
> > > > >  		flow->ibv_flow = NULL;
> > > > >  		DEBUG("Flow %p removed", (void *)flow);
> > > > >  	}
> > > > > --
> > > > > 1.8.3.1
> > > > >
> > > >
> > > > This approach looks way too intrusive. How about making the
> > > > claim_zero() definition not fail but still complain when compiled
> > > > against a broken Verbs version instead?
> > > >
> > > >  #include "mlx4_autoconf.h"
> > > >
> > > >  [...]
> > > >
> > > >  #ifndef HAVE_BROKEN_VERBS
> > > >  #define claim_zero(...) assert((__VA_ARGS__) == 0)  #else /*
> > > > HAVE_BROKEN_VERBS */  #define claim_zero(...) \
> > > >      (void)(((__VA_ARGS__) == 0) || \
> > > >             DEBUG("Assertion `" # __VA_ARGS__ "' failed (IGNORED)"))
> > > > #endif /* HAVE_BROKEN_VERBS */
> > > >
> > > > You could use auto-config-h.sh to generate the HAVE_BROKEN_VERBS
> > > > definition in mlx4_autoconf.h (see mlx4 Makefile) based on some
> > > > symbol, macro or type that only exists or doesn't exist yet in
> > > > problematic releases for instance.
> > > >
> > >
> > > I agree with the dependence on broken verbs but there are other places
> > > in mlx4 code which use claim_zero assertion, So this suggestion will
> > > hurt other validations.
> > 
> > Well, half broken is no better than completely broken in my opinion, so while
> > Verbs is being repaired, users debugging the mlx4 PMD will temporarily get
> > debug traces without the ensuing abort(). At least the behavior will be
> > consistent.
> > 
> > Think about it, they already have to go out of their way to enable
> > CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using hot-plug
> > but still use a buggy Verbs version, they can disable HAVE_BROKEN_VERBS to
> > revert to the normal behavior.
> > 
> 
> priv_flow_validate and priv_mac_addr_add functions calls also are wrapped by claim_zero,
> These are not ibv_destroy functions and don't depend only in broken verbs,
> The user want to be aborted in those cases otherwise he would have put there trace print 
> as you suggest.

As the only exceptions, if you had to change something it would be these
instance in order to be less intrusive. But I suggest you not to since, 
again, this is a workaround for a problem that is not under PMD control.

> > > What's about to create new define depend on broken verbs for the specific
> > assertions?
> > > It will be still intrusive but more accurate.
> > 
> > One reason I prefer the code to remain unchanged is that I'm currently
> > refactoring the entire PMD. Maintaining the above patch (picking the right
> > ibv_*() calls that return a consistent value) will be difficult and an intrusive
> > patch won't be reverted easily once Verbs is fixed.
> >
> 
> You can just find all claim_zero_new and replace it with claim_zero. 

So what if I'm adding new ibv_destroy_qp() calls, can I expect them to work
consistently or will each of them have to be validated against a possible
assert() failure during hot-plug?

Note that ibv_destroy_qp() is only one example among many, the work you've
done for this patch will have to be performed every time new code is
added. I don't find it particularly convenient.

> > All these claim_zero() checks ensure the PMD destroys Verbs resources in
> > the proper order (e.g. a flow before the QP it is associated with). If the
> > return value of any of these cannot be relied on, it's useless to only check
> > some of them.
> 
> 
> priv_flow_validate and priv_mac_addr_add functions are not destroy verbs resources.

Right, and that's not a problem. Unfortunate users still get a nice
debugging message in the unlikely case of a failure for these.

> > Moreover if ibv_destroy_something() wrongly returns an error when the
> > device is unplugged, I think this can happen to the calls not part of your
> > patch, i.e. all of them, so working around it at the macro definition level
> > makes sense.
> 
> I checked with failsafe tests and found that only the specific destroy functions are problematic.

What about other applications and corner cases, such as when the device is
unplugged while the PMD is being initialized? Since the control path is
bound by system calls, the PMD might actually sleep for a non negligible
amount of time (there is really no upper limit to how long) and the device
could disappear at any point. Subsequent ibv_*() calls would return
unexpected errors.

I'm sure you cannot verify all corner cases, so let's avoid them.

> > If you don't know what symbol can be relied on in OFED 4.2 to define
> > HAVE_BROKEN_VERBS (which is just an example, you can use another name
> > BTW), maybe you can add a compilation option to enable manually in case of
> > trouble? Something verbose like:
> > 
> >  CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n
> > 
> > Which will have to be documented.

What about the above suggestion?

-- 
Adrien Mazarguil
6WIND

  reply	other threads:[~2017-08-01 10:50 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-31 11:15 Matan Azrad
2017-07-31 14:17 ` Adrien Mazarguil
2017-07-31 16:56   ` Matan Azrad
2017-08-01  9:42     ` Adrien Mazarguil
2017-08-01 10:12       ` Matan Azrad
2017-08-01 10:50         ` Adrien Mazarguil [this message]
2017-08-01 12:18           ` Matan Azrad
2017-08-02 19:00           ` [dpdk-dev] [PATCH v2] " Matan Azrad
2017-08-03 21:04             ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801105037.GT19852@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=olgas@mellanox.com \
    --cc=stable@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).