From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f193.google.com (mail-wr0-f193.google.com [209.85.128.193]) by dpdk.org (Postfix) with ESMTP id B832F1B2EE for ; Mon, 30 Oct 2017 15:24:02 +0100 (CET) Received: by mail-wr0-f193.google.com with SMTP id r79so12759622wrb.13 for ; Mon, 30 Oct 2017 07:24:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=FqJA+x4fpAcjt62otgcIjUy7KpkH06huurbmE5OrOLs=; b=ZBts0bnDBFIZmPpotU+4XsHa7EjBWHubu3JEml3EFUh3YO/KPCPac62BpzyRZuiCda YZhMIlgCv5y6XhL76JdQ/uihzauTAM+Pe4lxSTn9gXYLIGBQORZQ7k/3qhLMFG01eiZx 3DlTN9MVj+YA+/JkWnugGiYeN9M7XVVkVMvdjZlNYDkA/3mSsktZfRTvJVtLIrJlGWuz hhzVW9XAZAdNwUHCX5e+IlkBmFMfLslrtaFGvahn9vaLIcIogasoRqoEZ3PsNXC3NHnl AaiSMCE6F29y5VRBuhVaNTKuTpmE/tVHoldyY67fAN4RuTQPP/9wK+QGjxI0XoQAlqWV DB+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FqJA+x4fpAcjt62otgcIjUy7KpkH06huurbmE5OrOLs=; b=YZ/Z2hlKpkXpD39QlWDBmL5VXwQXtWU034XGzkjmSBjHvwT+Re7MyGjBamnYM7qO+D 6s4v6BvaNQ+wQYQVqsXO2D4YKGpbfpE/iMvoBkDPR4CnyMDTMiGO7FZc7WLkVsg3k/Yf DBi5zbTkZr5QhrDyn+1yOFpXK2QRKCapE58N1MCu9AfKRukYzAs1hwjxc+1p3TP0Cxet c6Ag5yuX9B+ZXVNzwJOJtvqX0VuBEot5a9T6cjUGTGnkTpRhqEsfPr8JbCKBISf8+w0G cyY8e2O+jwfd9T2PHc9MLlKgZ+1zREjOm8yvMf7y0lDccGxYgV46sE2VwPevlcXD+Uzs KfgQ== X-Gm-Message-State: AMCzsaVbC8kZoMiqp+iaU8jXVEoaDYZ74xoNHV3XHcAUuomY+9ZzKU1M fImt3Nr2L2oNjaDAq3HuzRJNCsT5 X-Google-Smtp-Source: ABhQp+T0doSUqfbwkiVf7gjUckTuU9FW/CGo3jLwng1ibnrLol94P+Zhzd/h7/PiOBSz8Ue8pPPNGw== X-Received: by 10.223.171.161 with SMTP id s30mr7572875wrc.194.1509373442423; Mon, 30 Oct 2017 07:24:02 -0700 (PDT) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id c67sm4154198wmd.25.2017.10.30.07.24.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Oct 2017 07:24:01 -0700 (PDT) Date: Mon, 30 Oct 2017 15:23:50 +0100 From: Adrien Mazarguil To: Matan Azrad Cc: dev@dpdk.org, Ophir Munk Message-ID: <20171030142350.GC26782@6wind.com> References: <1508768520-4810-1-git-send-email-ophirmu@mellanox.com> <1509358049-18854-1-git-send-email-matan@mellanox.com> <1509358049-18854-7-git-send-email-matan@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1509358049-18854-7-git-send-email-matan@mellanox.com> Subject: Re: [dpdk-dev] [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Oct 2017 14:24:02 -0000 On Mon, Oct 30, 2017 at 10:07:28AM +0000, Matan Azrad wrote: > Replace most of the memory barriers by compiler barriers since they are > all targeted to the DRAM; This improves code efficiency for systems > which force store order between different addresses. > > Only the doorbell record store should be protected by memory barrier > since it is targeted to the PCI memory domain. > > Limit pre byte count store compiler barrier for systems with cache line > size smaller than 64B (TXBB size). > > Signed-off-by: Matan Azrad This sounds like an interesting performance improvement, can you share the typical or expected amount (percentage/hard numbers) for a given use case as part of the commit log? More comments below. > --- > drivers/net/mlx4/mlx4_rxtx.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c > index 8ea8851..482c399 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.c > +++ b/drivers/net/mlx4/mlx4_rxtx.c > @@ -168,7 +168,7 @@ struct pv { > /* > * Make sure we read the CQE after we read the ownership bit. > */ > - rte_rmb(); > + rte_io_rmb(); OK for this one since the rest of the code should not be run due to the condition (I'm not even sure even a compiler barrier is necessary at all here). > #ifndef NDEBUG > if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) == > MLX4_CQE_OPCODE_ERROR)) { > @@ -203,7 +203,7 @@ struct pv { > */ > cq->cons_index = cons_index; > *cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index & MLX4_CQ_DB_CI_MASK); > - rte_wmb(); > + rte_io_wmb(); This one could be removed entirely as well, which is more or less what the move to a compiler barrier does. Nothing in subsequent code depends on this doorbell being written, so this can piggy back on any subsequent rte_wmb(). On the other hand in my opinion a barrier (compiler or otherwise) might be needed before the doorbell write, to make clear it cannot somehow be done earlier in case something attempts to optimize it away. > sq->tail = sq->tail + nr_txbbs; > /* Update the list of packets posted for transmission. */ > elts_comp -= pkts; > @@ -321,6 +321,7 @@ static int handle_multi_segs(struct rte_mbuf *buf, > * control segment. > */ > if ((uintptr_t)dseg & (uintptr_t)(MLX4_TXBB_SIZE - 1)) { > +#if RTE_CACHE_LINE_SIZE < 64 > /* > * Need a barrier here before writing the byte_count > * fields to make sure that all the data is visible > @@ -331,6 +332,7 @@ static int handle_multi_segs(struct rte_mbuf *buf, > * data, and end up sending the wrong data. > */ > rte_io_wmb(); > +#endif /* RTE_CACHE_LINE_SIZE */ Interesting one. > dseg->byte_count = byte_count; > } else { > /* > @@ -469,8 +471,7 @@ static int handle_multi_segs(struct rte_mbuf *buf, > break; > } > #endif /* NDEBUG */ > - /* Need a barrier here before byte count store. */ > - rte_io_wmb(); > + /* Never be TXBB aligned, no need compiler barrier. */ The reason there was a barrier here at all was unclear, so if it's really useless, you don't even need to describe why. > dseg->byte_count = rte_cpu_to_be_32(buf->data_len); > > /* Fill the control parameters for this packet. */ > @@ -533,7 +534,7 @@ static int handle_multi_segs(struct rte_mbuf *buf, > * setting ownership bit (because HW can start > * executing as soon as we do). > */ > - rte_wmb(); > + rte_io_wmb(); This one looks dangerous. A compiler barrier is not strong enough to guarantee the order in which CPU will execute instructions, it only makes sure what follows the barrier doesn't appear before it in the generated code. Unless the comment above this barrier is wrong, this change may cause hard-to-debug issues down the road, you should drop it. > ctrl->owner_opcode = rte_cpu_to_be_32(owner_opcode | > ((sq->head & sq->txbb_cnt) ? > MLX4_BIT_WQE_OWN : 0)); > -- > 1.8.3.1 > -- Adrien Mazarguil 6WIND