DPDK patches and discussions
 help / color / mirror / Atom feed
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: dev@dpdk.org, Ophir Munk <ophirmu@mellanox.com>
Subject: Re: [dpdk-dev] [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers
Date: Mon, 30 Oct 2017 15:23:50 +0100	[thread overview]
Message-ID: <20171030142350.GC26782@6wind.com> (raw)
In-Reply-To: <1509358049-18854-7-git-send-email-matan@mellanox.com>

On Mon, Oct 30, 2017 at 10:07:28AM +0000, Matan Azrad wrote:
> Replace most of the memory barriers by compiler barriers since they are
> all targeted to the DRAM; This improves code efficiency for systems
> which force store order between different addresses.
> 
> Only the doorbell record store should be protected by memory barrier
> since it is targeted to the PCI memory domain.
> 
> Limit pre byte count store compiler barrier for systems with cache line
> size smaller than 64B (TXBB size).
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>

This sounds like an interesting performance improvement, can you share the
typical or expected amount (percentage/hard numbers) for a given use case as
part of the commit log?

More comments below.

> ---
>  drivers/net/mlx4/mlx4_rxtx.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
> index 8ea8851..482c399 100644
> --- a/drivers/net/mlx4/mlx4_rxtx.c
> +++ b/drivers/net/mlx4/mlx4_rxtx.c
> @@ -168,7 +168,7 @@ struct pv {
>  		/*
>  		 * Make sure we read the CQE after we read the ownership bit.
>  		 */
> -		rte_rmb();
> +		rte_io_rmb();

OK for this one since the rest of the code should not be run due to the
condition (I'm not even sure even a compiler barrier is necessary at all
here).

>  #ifndef NDEBUG
>  		if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
>  			     MLX4_CQE_OPCODE_ERROR)) {
> @@ -203,7 +203,7 @@ struct pv {
>  	 */
>  	cq->cons_index = cons_index;
>  	*cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index & MLX4_CQ_DB_CI_MASK);
> -	rte_wmb();
> +	rte_io_wmb();

This one could be removed entirely as well, which is more or less what the
move to a compiler barrier does. Nothing in subsequent code depends on this
doorbell being written, so this can piggy back on any subsequent rte_wmb().

On the other hand in my opinion a barrier (compiler or otherwise) might be
needed before the doorbell write, to make clear it cannot somehow be done
earlier in case something attempts to optimize it away.

>  	sq->tail = sq->tail + nr_txbbs;
>  	/* Update the list of packets posted for transmission. */
>  	elts_comp -= pkts;
> @@ -321,6 +321,7 @@ static int handle_multi_segs(struct rte_mbuf *buf,
>  		 * control segment.
>  		 */
>  		if ((uintptr_t)dseg & (uintptr_t)(MLX4_TXBB_SIZE - 1)) {
> +#if RTE_CACHE_LINE_SIZE < 64
>  			/*
>  			 * Need a barrier here before writing the byte_count
>  			 * fields to make sure that all the data is visible
> @@ -331,6 +332,7 @@ static int handle_multi_segs(struct rte_mbuf *buf,
>  			 * data, and end up sending the wrong data.
>  			 */
>  			rte_io_wmb();
> +#endif /* RTE_CACHE_LINE_SIZE */

Interesting one.

>  			dseg->byte_count = byte_count;
>  		} else {
>  			/*
> @@ -469,8 +471,7 @@ static int handle_multi_segs(struct rte_mbuf *buf,
>  				break;
>  			}
>  #endif /* NDEBUG */
> -			/* Need a barrier here before byte count store. */
> -			rte_io_wmb();
> +			/* Never be TXBB aligned, no need compiler barrier. */

The reason there was a barrier here at all was unclear, so if it's really
useless, you don't even need to describe why.

>  			dseg->byte_count = rte_cpu_to_be_32(buf->data_len);
>  
>  			/* Fill the control parameters for this packet. */
> @@ -533,7 +534,7 @@ static int handle_multi_segs(struct rte_mbuf *buf,
>  		 * setting ownership bit (because HW can start
>  		 * executing as soon as we do).
>  		 */
> -		rte_wmb();
> +		rte_io_wmb();

This one looks dangerous. A compiler barrier is not strong enough to
guarantee the order in which CPU will execute instructions, it only makes
sure what follows the barrier doesn't appear before it in the generated
code.

Unless the comment above this barrier is wrong, this change may cause
hard-to-debug issues down the road, you should drop it.

>  		ctrl->owner_opcode = rte_cpu_to_be_32(owner_opcode |
>  					      ((sq->head & sq->txbb_cnt) ?
>  						       MLX4_BIT_WQE_OWN : 0));
> -- 
> 1.8.3.1
> 

-- 
Adrien Mazarguil
6WIND

  reply	other threads:[~2017-10-30 14:24 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1508752838-30408-1-git-send-email-ophirmu@mellanox.com>
2017-10-23 14:21 ` [dpdk-dev] [PATCH v2 0/7] net/mlx4: follow-up on new TX datapath introduced in RC1 Ophir Munk
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 1/7] net/mlx4: remove error flows from Tx fast path Ophir Munk
2017-10-25 16:49     ` Adrien Mazarguil
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 2/7] net/mlx4: inline more Tx functions Ophir Munk
2017-10-25 16:49     ` Adrien Mazarguil
2017-10-25 21:42       ` Ophir Munk
2017-10-26  7:48         ` Adrien Mazarguil
2017-10-26 14:27           ` Ophir Munk
2017-10-29 19:30             ` Ophir Munk
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 3/7] net/mlx4: save lkey in big-endian format Ophir Munk
2017-10-23 15:24     ` Nélio Laranjeiro
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 4/7] net/mlx4: merge Tx path functions Ophir Munk
2017-10-24 13:51     ` Nélio Laranjeiro
2017-10-24 20:36       ` Ophir Munk
2017-10-25  7:50         ` Nélio Laranjeiro
2017-10-26 10:31           ` Matan Azrad
2017-10-26 12:12             ` Nélio Laranjeiro
2017-10-26 12:30               ` Matan Azrad
2017-10-26 13:44                 ` Nélio Laranjeiro
2017-10-26 16:21                   ` Matan Azrad
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 5/7] net/mlx4: remove unnecessary variables in Tx burst Ophir Munk
2017-10-25 16:49     ` Adrien Mazarguil
2017-10-23 14:21   ` [dpdk-dev] [PATCH v2 6/7] net/mlx4: improve performance of one Tx segment Ophir Munk
2017-10-25 16:50     ` Adrien Mazarguil
2017-10-23 14:22   ` [dpdk-dev] [PATCH v2 7/7] net/mlx4: separate Tx for multi-segments Ophir Munk
2017-10-25 16:50     ` Adrien Mazarguil
2017-10-30  8:15       ` Ophir Munk
2017-10-30 10:07   ` [dpdk-dev] [PATCH v3 0/7] Tx path improvements Matan Azrad
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 1/7] net/mlx4: remove error flows from Tx fast path Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil
2017-10-30 18:11         ` Matan Azrad
2017-10-31 10:16           ` Adrien Mazarguil
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 2/7] net/mlx4: associate MR to MP in a short function Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil
2017-10-31 13:25         ` Ophir Munk
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 3/7] net/mlx4: merge Tx path functions Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil
2017-10-30 18:12         ` Matan Azrad
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 4/7] net/mlx4: remove completion counter in Tx burst Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 5/7] net/mlx4: separate Tx segment cases Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil
2017-10-30 18:23         ` Matan Azrad
2017-10-31 10:17           ` Adrien Mazarguil
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers Matan Azrad
2017-10-30 14:23       ` Adrien Mazarguil [this message]
2017-10-30 19:47         ` Matan Azrad
2017-10-31 10:17           ` Adrien Mazarguil
2017-10-31 11:35             ` Matan Azrad
2017-10-31 13:21               ` Adrien Mazarguil
2017-10-30 10:07     ` [dpdk-dev] [PATCH v3 7/7] net/mlx4: remove empty Tx segment support Matan Azrad
2017-10-30 14:24       ` Adrien Mazarguil
2017-10-31 18:21     ` [dpdk-dev] [PATCH v4 0/8] net/mlx4: Tx path improvements Matan Azrad
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 1/8] net/mlx4: remove error flows from Tx fast path Matan Azrad
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 2/8] net/mlx4: associate MR to MP in a short function Matan Azrad
2017-11-02 13:42         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 3/8] net/mlx4: fix ring wraparound compiler hint Matan Azrad
2017-11-02 13:42         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 4/8] net/mlx4: merge Tx path functions Matan Azrad
2017-11-02 13:42         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 5/8] net/mlx4: remove duplicate handling in Tx burst Matan Azrad
2017-11-02 13:42         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 6/8] net/mlx4: separate Tx segment cases Matan Azrad
2017-11-02 13:43         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 7/8] net/mlx4: fix HW memory optimizations careless Matan Azrad
2017-11-02 13:43         ` Adrien Mazarguil
2017-10-31 18:21       ` [dpdk-dev] [PATCH v4 8/8] net/mlx4: mitigate Tx path memory barriers Matan Azrad
2017-11-02 13:43         ` Adrien Mazarguil
2017-11-02 13:41       ` [dpdk-dev] [PATCH] net/mlx4: fix missing include Adrien Mazarguil
2017-11-02 20:35         ` Ferruh Yigit
2017-11-02 16:42     ` [dpdk-dev] [PATCH v5 0/8] net/mlx4: Tx path improvements Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 1/8] net/mlx4: remove error flows from Tx fast path Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 2/8] net/mlx4: associate MR to MP in a short function Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 3/8] net/mlx4: fix ring wraparound compiler hint Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 4/8] net/mlx4: merge Tx path functions Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 5/8] net/mlx4: remove duplicate handling in Tx burst Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 6/8] net/mlx4: separate Tx segment cases Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 7/8] net/mlx4: fix HW memory optimizations careless Matan Azrad
2017-11-02 16:42       ` [dpdk-dev] [PATCH v5 8/8] net/mlx4: mitigate Tx path memory barriers Matan Azrad
2017-11-02 17:07       ` [dpdk-dev] [PATCH v5 0/8] net/mlx4: Tx path improvements Adrien Mazarguil
2017-11-02 20:35         ` Ferruh Yigit
2017-11-02 20:41       ` Ferruh Yigit
2017-11-03  9:48         ` Adrien Mazarguil
2017-11-03 19:25       ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171030142350.GC26782@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    --cc=ophirmu@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).