From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adrien.mazarguil@6wind.com>
Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50])
 by dpdk.org (Postfix) with ESMTP id 5031E9B7D
 for <dev@dpdk.org>; Tue,  1 Aug 2017 11:42:31 +0200 (CEST)
Received: by mail-wm0-f50.google.com with SMTP id t138so32209623wmt.1
 for <dev@dpdk.org>; Tue, 01 Aug 2017 02:42:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=JArU9+CqNtyEJvdaYZm7ccRxKMsuX2aHO4a3OYkUuh4=;
 b=YUzdpUFjgrESdeCh7pYx7XmQdOav8M4Ce7kACztfg3paisGu3w0uR9GQXs5MqkCf/B
 Nsc/hGuPhDk6mQfNykvPlPsSxAF08YIOxIZ6FIFaJFhSUIG2fbGxD94rvjKqu4329Y9b
 V86YqzhFJef65o8ToIhXcmfuX8UGwFJFRQKGdtJpVWhVtClzxuhnuo+eNLgOr0Yjks8x
 ph/9h/Nq/1GNrPAoPUwDtWhUrezYMva8tZd3Sn+SRLWwVR3KPd7N3zvIM3TVXlUKBmbU
 UUKlg7/8J/Hc+mdP5YyYxHo1S+/Huh4a2ThnoP4W0lfEoHRu7Fv2pDG489dsQcuv868R
 vqLA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=JArU9+CqNtyEJvdaYZm7ccRxKMsuX2aHO4a3OYkUuh4=;
 b=KglHfz08zeMNhW3n0HzBxEiWLxe18H8MMQA9YA8oiOPDVIldWZbSM434QuTaMvvNDh
 Pk/0p7YlrUqer9YooZD69pBdTY05XQ5yaJtTJJAkdY5Q6qmi47l1Jg7rIEXNOPoy20WP
 KrTRvCaJlxFx4oRae0bJ7DDpMNT3v5cOHE38USZ+TiU+aEM9hKJtUXB5eiXNDwZQ5dhu
 wLSqTLS/XT/IWbUA8gOYaf8qmJ73BiqZTUmsvCn7zMkH8l/EAkQd93GHyHM/7du03xIA
 WwPW3+IUlmFc117EZRqKYWxX+qwgDh2HtOpWqHyQLKHHj4LIxjbbSqsNU7U2/HSMhSln
 c6Qw==
X-Gm-Message-State: AIVw110KdIKFbCdv3DpfnRY1bFQdw1rknlHxxFkC1iJ9IvX4iK1a588F
 siH5tSiP5po14Zyi
X-Received: by 10.28.229.207 with SMTP id c198mr931738wmh.108.1501580550802;
 Tue, 01 Aug 2017 02:42:30 -0700 (PDT)
Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78])
 by smtp.gmail.com with ESMTPSA id 93sm34778939wra.82.2017.08.01.02.42.29
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 01 Aug 2017 02:42:29 -0700 (PDT)
Date: Tue, 1 Aug 2017 11:42:21 +0200
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
 Olga Shern <olgas@mellanox.com>, "stable@dpdk.org" <stable@dpdk.org>
Message-ID: <20170801094221.GQ19852@6wind.com>
References: <1501499709-19873-1-git-send-email-matan@mellanox.com>
 <20170731141728.GO19852@6wind.com>
 <DB6PR0502MB30485A1B4B65E885AD193A8BD2B20@DB6PR0502MB3048.eurprd05.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <DB6PR0502MB30485A1B4B65E885AD193A8BD2B20@DB6PR0502MB3048.eurprd05.prod.outlook.com>
Subject: Re: [dpdk-dev] [PATCH] net/mlx4: workaround to verbs wrong error
	return
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Aug 2017 09:42:31 -0000

Hi Matan,

On Mon, Jul 31, 2017 at 04:56:33PM +0000, Matan Azrad wrote:
> Hi Adrien
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Monday, July 31, 2017 5:17 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: dev@dpdk.org; Thomas Monjalon <thomas@monjalon.net>; Olga Shern
> > <olgas@mellanox.com>; stable@dpdk.org
> > Subject: Re: [PATCH] net/mlx4: workaround to verbs wrong error return
> > 
> > Hi Matan,
> > 
> > On Mon, Jul 31, 2017 at 02:15:09PM +0300, Matan Azrad wrote:
> > > Current mlx4 OFED version has bug which returns error to ibv destroy
> > > functions when the device was plugged out, in spite of the resources
> > > were destroyed correctly.
> > >
> > > Hence, failsafe PMD was aborted, only in debug mode, when it tries to
> > > remove the device in plug-out process.
> > >
> > > The workaround removed the ibv destroy assertions.
> > >
> > > DPDK 18.02 release should work with OFED-4.2 which will include the
> > > verbs fix to this bug, then, this patch will be removed.
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > Cc: stable@dpdk.org
> > 
> > Since this workaround is needed in order to validate hot-plug with mlx4
> > compiled in debug mode due to a problem in Verbs, I don't think
> > stable@dpdk.org should be involved.
> > 
> 
> Ok I'll remove it. 
> 
> > What will be back-ported, once fixed, is the minimum OFED version to install
> > to properly benefit from hot-plug functionality.
> > 
> > More comments about the patch below.
> > 
> > > ---
> > >  drivers/net/mlx4/mlx4.c      | 70
> > +++++++++++++++++++++++++++++++++++---------
> > >  drivers/net/mlx4/mlx4_flow.c | 22 ++++++++++----
> > >  2 files changed, 73 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx4/mlx4.c b/drivers/net/mlx4/mlx4.c index
> > > 8451f5b..94782c2 100644
> > > --- a/drivers/net/mlx4/mlx4.c
> > > +++ b/drivers/net/mlx4/mlx4.c
> > > @@ -955,7 +955,10 @@ struct rxq *
> > >  	return 0;
> > >  error:
> > >  	if (mr_linear != NULL)
> > > -		claim_zero(ibv_dereg_mr(mr_linear));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dereg_mr(mr_linear);
> > >
> > >  	rte_free(elts_linear);
> > >  	rte_free(elts);
> > > @@ -992,7 +995,10 @@ struct rxq *
> > >  	txq->elts_linear = NULL;
> > >  	txq->mr_linear = NULL;
> > >  	if (mr_linear != NULL)
> > > -		claim_zero(ibv_dereg_mr(mr_linear));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dereg_mr(mr_linear);
> > >
> > >  	rte_free(elts_linear);
> > >  	if (elts == NULL)
> > > @@ -1052,9 +1058,15 @@ struct rxq *
> > >  						&params));
> > >  	}
> > >  	if (txq->qp != NULL)
> > > -		claim_zero(ibv_destroy_qp(txq->qp));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_qp(txq->qp);
> > >  	if (txq->cq != NULL)
> > > -		claim_zero(ibv_destroy_cq(txq->cq));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_cq(txq->cq);
> > >  	if (txq->rd != NULL) {
> > >  		struct ibv_exp_destroy_res_domain_attr attr = {
> > >  			.comp_mask = 0,
> > > @@ -1070,7 +1082,10 @@ struct rxq *
> > >  		if (txq->mp2mr[i].mp == NULL)
> > >  			break;
> > >  		assert(txq->mp2mr[i].mr != NULL);
> > > -		claim_zero(ibv_dereg_mr(txq->mp2mr[i].mr));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dereg_mr(txq->mp2mr[i].mr);
> > >  	}
> > >  	memset(txq, 0, sizeof(*txq));
> > >  }
> > > @@ -1302,7 +1317,10 @@ static struct ibv_mr *mlx4_mp2mr(struct ibv_pd
> > *, struct rte_mempool *)
> > >  		DEBUG("%p: MR <-> MP table full, dropping oldest entry.",
> > >  		      (void *)txq);
> > >  		--i;
> > > -		claim_zero(ibv_dereg_mr(txq->mp2mr[0].mr));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dereg_mr(txq->mp2mr[0].mr);
> > >  		memmove(&txq->mp2mr[0], &txq->mp2mr[1],
> > >  			(sizeof(txq->mp2mr) - sizeof(txq->mp2mr[0])));
> > >  	}
> > > @@ -2355,7 +2373,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  	      (void *)rxq,
> > >  	      (*mac)[0], (*mac)[1], (*mac)[2], (*mac)[3], (*mac)[4], (*mac)[5],
> > >  	      mac_index, priv->vlan_filter[vlan_index].id);
> > > -	claim_zero(ibv_destroy_flow(rxq-
> > >mac_flow[mac_index][vlan_index]));
> > > +	/* Current verbs does not allow to check real
> > > +	 * errors when the device was plugged out.
> > > +	 */
> > > +	ibv_destroy_flow(rxq->mac_flow[mac_index][vlan_index]);
> > >  	rxq->mac_flow[mac_index][vlan_index] = NULL;  }
> > >
> > > @@ -2736,7 +2757,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  	DEBUG("%p: disabling allmulticast mode", (void *)rxq);
> > >  	if (rxq->allmulti_flow == NULL)
> > >  		return;
> > > -	claim_zero(ibv_destroy_flow(rxq->allmulti_flow));
> > > +	/* Current verbs does not allow to check real
> > > +	 * errors when the device was plugged out.
> > > +	 */
> > > +	ibv_destroy_flow(rxq->allmulti_flow);
> > >  	rxq->allmulti_flow = NULL;
> > >  	DEBUG("%p: allmulticast mode disabled", (void *)rxq);  } @@ -2796,7
> > > +2820,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  	DEBUG("%p: disabling promiscuous mode", (void *)rxq);
> > >  	if (rxq->promisc_flow == NULL)
> > >  		return;
> > > -	claim_zero(ibv_destroy_flow(rxq->promisc_flow));
> > > +	/* Current verbs does not allow to check real
> > > +	 * errors when the device was plugged out.
> > > +	 */
> > > +	ibv_destroy_flow(rxq->promisc_flow);
> > >  	rxq->promisc_flow = NULL;
> > >  	DEBUG("%p: promiscuous mode disabled", (void *)rxq);  } @@ -
> > 2847,9
> > > +2874,15 @@ struct txq_mp2mr_mbuf_check_data {
> > >  		rxq_mac_addrs_del(rxq);
> > >  	}
> > >  	if (rxq->qp != NULL)
> > > -		claim_zero(ibv_destroy_qp(rxq->qp));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_qp(rxq->qp);
> > >  	if (rxq->cq != NULL)
> > > -		claim_zero(ibv_destroy_cq(rxq->cq));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_cq(rxq->cq);
> > >  	if (rxq->channel != NULL)
> > >  		claim_zero(ibv_destroy_comp_channel(rxq->channel));
> > >  	if (rxq->rd != NULL) {
> > > @@ -2864,7 +2897,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  						      &attr));
> > >  	}
> > >  	if (rxq->mr != NULL)
> > > -		claim_zero(ibv_dereg_mr(rxq->mr));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dereg_mr(rxq->mr);
> > >  	memset(rxq, 0, sizeof(*rxq));
> > >  }
> > >
> > > @@ -4374,7 +4410,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  		priv_parent_list_cleanup(priv);
> > >  	if (priv->pd != NULL) {
> > >  		assert(priv->ctx != NULL);
> > > -		claim_zero(ibv_dealloc_pd(priv->pd));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_dealloc_pd(priv->pd);
> > >  		claim_zero(ibv_close_device(priv->ctx));
> > >  	} else
> > >  		assert(priv->ctx == NULL);
> > > @@ -6389,7 +6428,10 @@ struct txq_mp2mr_mbuf_check_data {
> > >  port_error:
> > >  		rte_free(priv);
> > >  		if (pd)
> > > -			claim_zero(ibv_dealloc_pd(pd));
> > > +			/* Current verbs does not allow to check real
> > > +			 * errors when the device was plugged out.
> > > +			 */
> > > +			ibv_dealloc_pd(pd);
> > >  		if (ctx)
> > >  			claim_zero(ibv_close_device(ctx));
> > >  		if (eth_dev)
> > > diff --git a/drivers/net/mlx4/mlx4_flow.c
> > > b/drivers/net/mlx4/mlx4_flow.c index 925c89c..daa62e3 100644
> > > --- a/drivers/net/mlx4/mlx4_flow.c
> > > +++ b/drivers/net/mlx4/mlx4_flow.c
> > > @@ -799,8 +799,11 @@ struct rte_flow_drop {
> > >  		struct rte_flow_drop *fdq = priv->flow_drop_queue;
> > >
> > >  		priv->flow_drop_queue = NULL;
> > > -		claim_zero(ibv_destroy_qp(fdq->qp));
> > > -		claim_zero(ibv_destroy_cq(fdq->cq));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_qp(fdq->qp);
> > > +		ibv_destroy_cq(fdq->cq);
> > >  		rte_free(fdq);
> > >  	}
> > >  }
> > > @@ -860,7 +863,10 @@ struct rte_flow_drop {
> > >  	priv->flow_drop_queue = fdq;
> > >  	return 0;
> > >  err_create_qp:
> > > -	claim_zero(ibv_destroy_cq(cq));
> > > +	/* Current verbs does not allow to check real
> > > +	 * errors when the device was plugged out.
> > > +	 */
> > > +	ibv_destroy_cq(cq);
> > >  err_create_cq:
> > >  	rte_free(fdq);
> > >  err:
> > > @@ -1200,7 +1206,10 @@ struct rte_flow *
> > >  	(void)priv;
> > >  	LIST_REMOVE(flow, next);
> > >  	if (flow->ibv_flow)
> > > -		claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_flow(flow->ibv_flow);
> > >  	rte_free(flow->ibv_attr);
> > >  	DEBUG("Flow destroyed %p", (void *)flow);
> > >  	rte_free(flow);
> > > @@ -1278,7 +1287,10 @@ struct rte_flow *
> > >  	for (flow = LIST_FIRST(&priv->flows);
> > >  	     flow;
> > >  	     flow = LIST_NEXT(flow, next)) {
> > > -		claim_zero(ibv_destroy_flow(flow->ibv_flow));
> > > +		/* Current verbs does not allow to check real
> > > +		 * errors when the device was plugged out.
> > > +		 */
> > > +		ibv_destroy_flow(flow->ibv_flow);
> > >  		flow->ibv_flow = NULL;
> > >  		DEBUG("Flow %p removed", (void *)flow);
> > >  	}
> > > --
> > > 1.8.3.1
> > >
> > 
> > This approach looks way too intrusive. How about making the claim_zero()
> > definition not fail but still complain when compiled against a broken Verbs
> > version instead?
> > 
> >  #include "mlx4_autoconf.h"
> > 
> >  [...]
> > 
> >  #ifndef HAVE_BROKEN_VERBS
> >  #define claim_zero(...) assert((__VA_ARGS__) == 0)
> >  #else /* HAVE_BROKEN_VERBS */
> >  #define claim_zero(...) \
> >      (void)(((__VA_ARGS__) == 0) || \
> >             DEBUG("Assertion `" # __VA_ARGS__ "' failed (IGNORED)"))
> >  #endif /* HAVE_BROKEN_VERBS */
> > 
> > You could use auto-config-h.sh to generate the HAVE_BROKEN_VERBS
> > definition
> > in mlx4_autoconf.h (see mlx4 Makefile) based on some symbol, macro or
> > type
> > that only exists or doesn't exist yet in problematic releases for instance.
> > 
> 
> I agree with the dependence on broken verbs but 
> there are other places in mlx4 code which use claim_zero assertion,
> So this suggestion will hurt other validations.

Well, half broken is no better than completely broken in my opinion, so
while Verbs is being repaired, users debugging the mlx4 PMD will temporarily
get debug traces without the ensuing abort(). At least the behavior will be
consistent.

Think about it, they already have to go out of their way to enable
CONFIG_RTE_LIBRTE_MLX4_DEBUG, if they know they aren't using hot-plug but
still use a buggy Verbs version, they can disable HAVE_BROKEN_VERBS to
revert to the normal behavior.

> What's about to create new define depend on broken verbs for the specific assertions?
> It will be still intrusive but more accurate.

One reason I prefer the code to remain unchanged is that I'm currently
refactoring the entire PMD. Maintaining the above patch (picking the right
ibv_*() calls that return a consistent value) will be difficult and an
intrusive patch won't be reverted easily once Verbs is fixed.

All these claim_zero() checks ensure the PMD destroys Verbs resources in the
proper order (e.g. a flow before the QP it is associated with). If the
return value of any of these cannot be relied on, it's useless to only check
some of them.

Moreover if ibv_destroy_something() wrongly returns an error when the device
is unplugged, I think this can happen to the calls not part of your patch,
i.e. all of them, so working around it at the macro definition level makes
sense.

If you don't know what symbol can be relied on in OFED 4.2 to define
HAVE_BROKEN_VERBS (which is just an example, you can use another name BTW),
maybe you can add a compilation option to enable manually in case of
trouble? Something verbose like:

 CONFIG_RTE_LIBRTE_MLX4_DEBUG_BROKEN_VERBS_ASSERT=n

Which will have to be documented.

-- 
Adrien Mazarguil
6WIND