From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id 19F731B42B;
 Fri,  2 Nov 2018 12:43:49 +0100 (CET)
X-Amp-Result: UNSCANNABLE
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 02 Nov 2018 04:43:49 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,455,1534834800"; d="scan'208";a="83508160"
Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.107])
 by fmsmga008.fm.intel.com with SMTP; 02 Nov 2018 04:43:46 -0700
Received: by  (sSMTP sendmail emulation); Fri, 02 Nov 2018 11:43:45 +0000
Date: Fri, 2 Nov 2018 11:43:44 +0000
From: Bruce Richardson <bruce.richardson@intel.com>
To: Gavin Hu <gavin.hu@arm.com>
Cc: dev@dpdk.org, thomas@monjalon.net, stephen@networkplumber.org,
 olivier.matz@6wind.com, chaozhu@linux.vnet.ibm.com,
 konstantin.ananyev@intel.com, jerin.jacob@caviumnetworks.com,
 Honnappa.Nagarahalli@arm.com, stable@dpdk.org
Message-ID: <20181102114344.GA13324@bricha3-MOBL.ger.corp.intel.com>
References: <1541066031-29125-1-git-send-email-gavin.hu@arm.com>
 <1541157688-40012-3-git-send-email-gavin.hu@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1541157688-40012-3-git-send-email-gavin.hu@arm.com>
Organization: Intel Research and Development Ireland Ltd.
User-Agent: Mutt/1.10.1 (2018-07-13)
Subject: Re: [dpdk-dev] [PATCH v5 2/2] ring: move the atomic load of head
	above the loop
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Nov 2018 11:43:50 -0000

On Fri, Nov 02, 2018 at 07:21:28PM +0800, Gavin Hu wrote:
> In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
> the do {} while loop as upon failure the old_head will be updated,
> another load is costly and not necessary.
> 
> This helps a little on the latency,about 1~5%.
> 
>  Test result with the patch(two cores):
>  SP/SC bulk enq/dequeue (size: 8): 5.64
>  MP/MC bulk enq/dequeue (size: 8): 9.58
>  SP/SC bulk enq/dequeue (size: 32): 1.98
>  MP/MC bulk enq/dequeue (size: 32): 2.30
> 
> Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Reviewed-by: Jia He <justin.he@arm.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  doc/guides/rel_notes/release_18_11.rst |  7 +++++++
>  lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index 376128f..b68afab 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -69,6 +69,13 @@ New Features
>    checked out against that dma mask and rejected if out of range. If more than
>    one device has addressing limitations, the dma mask is the more restricted one.
>  
> +* **Updated the ring library with C11 memory model.**
> +
> +  Updated the ring library with C11 memory model, in our tests the changes
> +  decreased latency by 27~29% and 3~15% for MPMC and SPSC cases respectively.
> +  The real improvements may vary with the number of contending lcores and the
> +  size of ring.
> +
Is this a little misleading, and will users expect massive performance
improvements generally? The C11 model seems to be used only on some, but
not all, arm platforms, and then only with "make" builds.

config/arm/meson.build: ['RTE_USE_C11_MEM_MODEL', false]]
config/common_armv8a_linuxapp:CONFIG_RTE_USE_C11_MEM_MODEL=y
config/common_base:CONFIG_RTE_USE_C11_MEM_MODEL=n
config/defconfig_arm64-thunderx-linuxapp-gcc:CONFIG_RTE_USE_C11_MEM_MODEL=n

/Bruce