From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 962E2A0583;
	Thu, 19 Mar 2020 10:30:31 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 04F621C027;
	Thu, 19 Mar 2020 10:30:30 +0100 (CET)
Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com
 [209.85.221.66]) by dpdk.org (Postfix) with ESMTP id 041C42BAA
 for <dev@dpdk.org>; Thu, 19 Mar 2020 10:30:28 +0100 (CET)
Received: by mail-wr1-f66.google.com with SMTP id s5so1844135wrg.3
 for <dev@dpdk.org>; Thu, 19 Mar 2020 02:30:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to:user-agent;
 bh=CiBdQcueCZByPFgRZ3HBlHH/4ZpyKKvNjIfQzrwLaf0=;
 b=YAbtUx/KN/ogVVoaqb8yBx2c+Mv+1stlbKYggxoIPUfPATX1FZkQ3jtYcqMdDKLc/n
 aK26TIGA2vc48ylDQWM3YRtuXPUivtvvf92MXKb6aKaPew8hCOkDCtR5MdyFOBsMCIPN
 ABzpMl115KKZZ+MRFobQxFgeTXnqfgzSd7Zx6PYVKKmd31oHBydEzjbQ8bSMProJhQXw
 1wLDKXdS7NPLZ+0XZakxD/TncJWPBtG3NBFDrriPpOsve6LGmUg1qxCuwYdl+c6nXBMB
 LuboXgU00H+yKGfEa2Pa1CiKI1LuPFrLrpoVt0tBmDWgItmKtWE8GtL/WYAvXRwf7A/z
 IQWQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to:user-agent;
 bh=CiBdQcueCZByPFgRZ3HBlHH/4ZpyKKvNjIfQzrwLaf0=;
 b=kbTC/Q1GA6llzycnjbrIo02UysVyAhx79GoEY6AJD0LsH1MWpFvrWenSI9qrAHgUxW
 4s0mWyLUuVW/Qw6iASzl306KpsvCFr0iV00LynMD0ETQeMSvY+KAfx1UDU7MGiU8gWDr
 Nhr/d5+1tVFv3txlD1vziAcP8lJY03P6kA8xn5ppmaFHmELQVl9pt0tPbk8a1zu2y4ry
 KEHKTPGNEwR2LntwWRATqOxm4mG1A8Rt1OR101ue/YPvi8rfFLEK7RUap1CubjzXecIB
 fxDQpp5GHAt76VY2DVAb1G/R3jCv/GMnu4X1X1hMEBHUbhAJHq2piT3aJo1BaxBDax87
 HRBQ==
X-Gm-Message-State: ANhLgQ2r2BbE6QifIg35Fwg1FCUiS+kv4uEL1XLqvr+IvDLAe3wLi437
 eZf1jPzD7realFgENPY/RPDVwQ==
X-Google-Smtp-Source: ADFU+vsXmVsICN5tvn35Q5IdrG9hZwvS5Tf3rbsZ9WWfl2BP+TuLu35Bi1FP953Gi3cxHAXzq00uOA==
X-Received: by 2002:a5d:6287:: with SMTP id k7mr2825673wru.195.1584610227624; 
 Thu, 19 Mar 2020 02:30:27 -0700 (PDT)
Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr.
 [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0])
 by smtp.gmail.com with ESMTPSA id b15sm2495201wru.70.2020.03.19.02.30.26
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 19 Mar 2020 02:30:26 -0700 (PDT)
Date: Thu, 19 Mar 2020 10:30:25 +0100
From: Olivier Matz <olivier.matz@6wind.com>
To: Alexander Kozyrev <akozyrev@mellanox.com>
Cc: dev@dpdk.org, viacheslavo@mellanox.com, matan@mellanox.com,
 thomas@monjalon.net, stable@dpdk.org
Message-ID: <20200319093025.GT17125@platinum>
References: <1584383500-27482-1-git-send-email-akozyrev@mellanox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1584383500-27482-1-git-send-email-akozyrev@mellanox.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Subject: Re: [dpdk-dev] [PATCH] mbuf: optimize memory loads during mbuf
	freeing
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi,

On Mon, Mar 16, 2020 at 06:31:40PM +0000, Alexander Kozyrev wrote:
> Introduction of pinned external buffers doubled memory loads in the
> rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
> code shows unnecessary load of the pool field of the rte_mbuf structure.
> Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
> Before the change the code was:
> 	movq  0x18(%rbx), %rax // load the ol_flags field
> 	test %r13, %rax	       // check if ol_flags equals to 0x60...0
> 	jz 0x9a8718 <Block 2>  // jump out to "if (m->next != NULL)"
> After the change the code becomed:
> 	movq  0x18(%rbx), %rax // load ol_flags
> 	test %r14, %rax	       // check if ol_flags equals to 0x60...0
> 	jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
> 	movq  0x48(%rbx), %rax // load the pool field
> 	jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
> Look like this absolutely unneeded memory load of the pool field is an
> optimization for the external buffer case in GCC (4.8.5), since Clang
> generates the same assembly for both before and after the chenge versions.
> Plus, GCC favors the extrnal buffer case over the simple case.
> This assembly code layout causes the performance degradation because the
> rte_pktmbuf_prefree_seg() function is a part of a very hot path.
> Workaround this compilation issue by moving the check for pinned buffer
> apart from the check for external buffer and restore the initial code
> flow that favors the direct mbuf case over the external one.
> 
> Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 34679e0..ab9d3f5 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1335,10 +1335,9 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>  	if (likely(rte_mbuf_refcnt_read(m) == 1)) {
>  
>  		if (!RTE_MBUF_DIRECT(m)) {
> -			if (!RTE_MBUF_HAS_EXTBUF(m) ||
> -			    !RTE_MBUF_HAS_PINNED_EXTBUF(m))
> -				rte_pktmbuf_detach(m);
> -			else if (__rte_pktmbuf_pinned_extbuf_decref(m))
> +			rte_pktmbuf_detach(m);
> +			if (RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> +			    __rte_pktmbuf_pinned_extbuf_decref(m))
>  				return NULL;
>  		}
>  
[...]

Reading the previous code again, it was correct but not easy
to understand, especially the:

  if (!RTE_MBUF_HAS_EXTBUF(m) || !RTE_MBUF_HAS_PINNED_EXTBUF(m))

Knowing we already checked it is not a direct mbuf, it is equivalent to:

  if (!RTE_MBUF_HAS_PINNED_EXTBUF(m))

I think the objective was to avoid an access to the pool flags if not
necessary.

Completely removing the test as you did is also functionally OK, because
rte_pktmbuf_detach() also does the check, and the code is even clearer.

I wonder however if doing this wouldn't avoid an access to the pool
flags for mbufs which have the IND_ATTACHED flags:

		if (!RTE_MBUF_DIRECT(m)) {
			rte_pktmbuf_detach(m);
			if (RTE_MBUF_HAS_EXTBUF(m) &&
			    RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
			    __rte_pktmbuf_pinned_extbuf_decref(m))
				return NULL;
		}

What do you think?

Nit: if you wish to send a v2, there are few english fixes that could
be done (becomed, chenge, extrnal)

Thanks