From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 31E7FA31F3
	for <public@inbox.dpdk.org>; Fri, 18 Oct 2019 18:11:44 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id A38791C0DA;
	Fri, 18 Oct 2019 18:11:43 +0200 (CEST)
Received: from mail-il1-f196.google.com (mail-il1-f196.google.com
 [209.85.166.196]) by dpdk.org (Postfix) with ESMTP id 46B7D1C0D7
 for <dev@dpdk.org>; Fri, 18 Oct 2019 18:11:42 +0200 (CEST)
Received: by mail-il1-f196.google.com with SMTP id f13so6008708ils.11
 for <dev@dpdk.org>; Fri, 18 Oct 2019 09:11:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=YXSvZWtHauOCGSsJ6fK9pHqNYwP/BODb78IVWbYmRbE=;
 b=qAf/1Np0/o9e/nzsEInMomPPXa/78wH5ITXtAdmaeHc8WlLMpSKLpFXlYiAG+Gqex9
 qJsZEJxk314250EXSUhr8RNZ0IrqoIF6gXY750KHtSrFba5n7i90nsPmSn+N8y5vQVz9
 qFZyaR/F2ORcUvnMQnEE75ljR5NujSf5xU2cKFnwZIPeiZ73x1H/thPxtMHUJf6vtfKR
 lQ3lWHn5Rx/P2KNCgP2ARxQ2y+pM1JQUGXBhRx7BqgX4hVr6mYUlDMTdv4wfhBHgji9z
 50tP8MSD4CzlIifdj2r5TCZhGGlhNVMa/Dat+IvhRce391xdkDLVm7I1pTMhSkgANGEQ
 hrTg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=YXSvZWtHauOCGSsJ6fK9pHqNYwP/BODb78IVWbYmRbE=;
 b=hTmbRlhMqbEhOLGZcZ8PUcxcLoN5cpqSRfzk4WRgDmsIOwqTFsIG3yv+VYLwiqp5uV
 VaHnpF2sgmUaCEOJvZeQrUZM7rDEGudHFU8B0sFQ9MoV6FoN3nDej+xwhcYUVU0cWQg4
 snn3UTrjR+dDp0DEXfCiEGFrF+1B0c1n/Ns/Ggn1ws5ln+v7nl7lnbBAe1ylk1Gfor50
 N++VswkQhFmMqoh9Uz3+9FOwzjqTvo4WPpdF1PEH5d8Xd6sxu6qZCN4wY2ehfOtrIzjU
 OmbxLVY3b6qhpxAiVUkqth4Y8P0kqzN9hVLrmuwwz+OBeziC8D2dHLTjfV0UYu4JbN/I
 vxLQ==
X-Gm-Message-State: APjAAAUGEssGO/s6+8wG9V7CMMcuqjdgk1Wt6nYWspvMTn3TwR29Epdh
 SJj1bAwASIMKS4bweDZzaElHZxoXaVKXAyECRzM=
X-Google-Smtp-Source: APXvYqzTorssaqhLFBlRJQk8QBd4lQhFVPhAndl3t2oPdYD7Mm4RuMquDZjrV+bZBxhXhXlN1XOgP3X2qBP1QBW3ui0=
X-Received: by 2002:a92:918b:: with SMTP id e11mr11547852ill.130.1571415101265; 
 Fri, 18 Oct 2019 09:11:41 -0700 (PDT)
MIME-Version: 1.0
References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com>
 <20191009024709.38144-1-honnappa.nagarahalli@arm.com>
 <20191009024709.38144-2-honnappa.nagarahalli@arm.com>
 <VE1PR08MB5149D57CAA77B51392E5423898970@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725801A8C68545@IRSMSX104.ger.corp.intel.com>
 <VE1PR08MB5149CD175CEB6B455C99F88D98900@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com>
 <VE1PR08MB5149D51FA4EDB55D6DEFA129986D0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725801A8C6A2DA@IRSMSX104.ger.corp.intel.com>
 <VE1PR08MB51496EBD19AD797C17C29CF6986D0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <7df09c22-5b8b-77d8-1e8a-a2714e732036@linux.vnet.ibm.com>
 <VE1PR08MB5149DC6F20C8F1689E5D76E7986C0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <CALBAE1MdGRGV-n-Q=8fcXgy=4rfTO-_aG_LemdZ+RtzT4r8XpQ@mail.gmail.com>
In-Reply-To: <CALBAE1MdGRGV-n-Q=8fcXgy=4rfTO-_aG_LemdZ+RtzT4r8XpQ@mail.gmail.com>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Fri, 18 Oct 2019 21:41:29 +0530
Message-ID: <CALBAE1N_=jZYD2OGuaaPx4oFO-kXVB9PF9qFcVZcb5dfUEuizg@mail.gmail.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Cc: David Christensen <drc@linux.vnet.ibm.com>, 
 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, 
 "olivier.matz@6wind.com" <olivier.matz@6wind.com>,
 "sthemmin@microsoft.com" <sthemmin@microsoft.com>, 
 "jerinj@marvell.com" <jerinj@marvell.com>, "Richardson,
 Bruce" <bruce.richardson@intel.com>, 
 "david.marchand@redhat.com" <david.marchand@redhat.com>, 
 "pbhagavatula@marvell.com" <pbhagavatula@marvell.com>,
 "dev@dpdk.org" <dev@dpdk.org>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>, 
 "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>,
 "Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>, 
 "stephen@networkplumber.org" <stephen@networkplumber.org>, nd <nd@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support
 configurable element size
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Oct 18, 2019 at 1:34 PM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > <snip>
> >
> > > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable el=
ement
> > > size
> > >
> > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the resul=
ts
> > > >>> are as
> > > >> follows. The numbers in brackets are with the code on master.
> > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
> > > >>>
> > > >>> RTE>>ring_perf_elem_autotest
> > > >>> ### Testing single element and burst enq/deq ### SP/SC single
> > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst
> > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/=
SC
> > > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size=
:
> > > >>> 32): 2
> > > >>>
> > > >>> ### Testing empty dequeue ###
> > > >>> SC empty dequeue: 2.11
> > > >>> MC empty dequeue: 1.41 (2.11)
> > > >>>
> > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size=
:
> > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue
> > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35
> > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)
> > > >>>
> > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (=
size:
> > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
> > > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk
> > > >>> enq/dequeue
> > > >>> (size: 32): 25.74 (20.91)
> > > >>>
> > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size=
:
> > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.=
43)
> > > >>> SP/SC bulk enq/dequeue (size:
> > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)
> > > >>>
> > > >>> On one of the Arm platform
> > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the res=
t
> > > >>> are
> > > >>> ok)
> > >
> > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16
> > > cores/node (SMT=3D4).  Applied all 3 patches in v5, test results are =
as
> > > follows:
> > >
> > > RTE>>ring_perf_elem_autotest
> > > ### Testing single element and burst enq/deq ### SP/SC single enq/deq=
ueue:
> > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5
> > > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 3=
2): 2
> > > MP/MC burst enq/dequeue (size: 32): 2
> > >
> > > ### Testing empty dequeue ###
> > > SC empty dequeue: 7.81
> > > MC empty dequeue: 7.81
> > >
> > > ### Testing using a single lcore ###
> > > SP/SC bulk enq/dequeue (size: 8): 5.76
> > > MP/MC bulk enq/dequeue (size: 8): 7.66
> > > SP/SC bulk enq/dequeue (size: 32): 2.10
> > > MP/MC bulk enq/dequeue (size: 32): 2.57
> > >
> > > ### Testing using two hyperthreads ###
> > > SP/SC bulk enq/dequeue (size: 8): 13.13
> > > MP/MC bulk enq/dequeue (size: 8): 13.98
> > > SP/SC bulk enq/dequeue (size: 32): 3.41
> > > MP/MC bulk enq/dequeue (size: 32): 4.45
> > >
> > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size=
: 8):
> > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue
> > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40
> > >
> > > ### Testing using two NUMA nodes ###
> > > SP/SC bulk enq/dequeue (size: 8): 63.41
> > > MP/MC bulk enq/dequeue (size: 8): 62.70
> > > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size=
:
> > > 32): 22.96
> > >
> > Thanks for running this. There is another test 'ring_perf_autotest' whi=
ch provides the numbers with the original implementation. The goal is to ma=
ke sure the numbers with the original implementation are the same as these.=
 Can you please run that as well?
>
> Honnappa,
>
> Your earlier perf report shows the cycles are in less than 1. That's
> is due to it is using 50 or 100MHz clock in EL0.
> Please check with PMU counter. See "ARM64 profiling" in
>
> http://doc.dpdk.org/guides/prog_guide/profile_app.html
>
>
> Here is the octeontx2 values. There is a regression in two core cases
> as you reported earlier in x86.
>
>
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 288
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 61
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 21
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.35
> MP/MC bulk enq/dequeue (size: 8): 67.36
> SP/SC bulk enq/dequeue (size: 32): 13.10
> MP/MC bulk enq/dequeue (size: 32): 21.64
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 75.94
> MP/MC bulk enq/dequeue (size: 8): 107.66
> SP/SC bulk enq/dequeue (size: 32): 24.51
> MP/MC bulk enq/dequeue (size: 32): 33.23
> Test OK
> RTE>>
>
> ---- after applying v5 of the patch ------
>
> RTE>>ring_perf_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 289
> MP/MC single enq/dequeue: 452
> SP/SC burst enq/dequeue (size: 8): 40
> MP/MC burst enq/dequeue (size: 8): 64
> SP/SC burst enq/dequeue (size: 32): 13
> MP/MC burst enq/dequeue (size: 32): 22
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 39.73
> MP/MC bulk enq/dequeue (size: 8): 69.13
> SP/SC bulk enq/dequeue (size: 32): 13.44
> MP/MC bulk enq/dequeue (size: 32): 22.00
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 76.02
> MP/MC bulk enq/dequeue (size: 8): 112.50
> SP/SC bulk enq/dequeue (size: 32): 24.71
> MP/MC bulk enq/dequeue (size: 32): 33.34
> Test OK
> RTE>>
>
> RTE>>ring_perf_elem_autotest
> ### Testing single element and burst enq/deq ###
> SP/SC single enq/dequeue: 290
> MP/MC single enq/dequeue: 503
> SP/SC burst enq/dequeue (size: 8): 39
> MP/MC burst enq/dequeue (size: 8): 63
> SP/SC burst enq/dequeue (size: 32): 11
> MP/MC burst enq/dequeue (size: 32): 19
>
> ### Testing empty dequeue ###
> SC empty dequeue: 6.33
> MC empty dequeue: 6.67
>
> ### Testing using a single lcore ###
> SP/SC bulk enq/dequeue (size: 8): 38.92
> MP/MC bulk enq/dequeue (size: 8): 62.54
> SP/SC bulk enq/dequeue (size: 32): 11.46
> MP/MC bulk enq/dequeue (size: 32): 19.89
>
> ### Testing using two physical cores ###
> SP/SC bulk enq/dequeue (size: 8): 87.55
> MP/MC bulk enq/dequeue (size: 8): 99.10
> SP/SC bulk enq/dequeue (size: 32): 26.63
> MP/MC bulk enq/dequeue (size: 32): 29.91
> Test OK
> RTE>>

it looks like removal of 3/3 and keeping only 1/3 and 2/3 shows better
results in some cases


RTE>>ring_perf_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 439
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.67

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.48
SP/SC bulk enq/dequeue (size: 32): 13.40
MP/MC bulk enq/dequeue (size: 32): 22.03

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 75.94
MP/MC bulk enq/dequeue (size: 8): 105.84
SP/SC bulk enq/dequeue (size: 32): 25.11
MP/MC bulk enq/dequeue (size: 32): 33.48
Test OK
RTE>>


RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 288
MP/MC single enq/dequeue: 452
SP/SC burst enq/dequeue (size: 8): 39
MP/MC burst enq/dequeue (size: 8): 61
SP/SC burst enq/dequeue (size: 32): 13
MP/MC burst enq/dequeue (size: 32): 22

### Testing empty dequeue ###
SC empty dequeue: 6.33
MC empty dequeue: 6.00

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 38.35
MP/MC bulk enq/dequeue (size: 8): 67.46
SP/SC bulk enq/dequeue (size: 32): 13.42
MP/MC bulk enq/dequeue (size: 32): 22.01

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 76.04
MP/MC bulk enq/dequeue (size: 8): 104.88
SP/SC bulk enq/dequeue (size: 32): 24.75
MP/MC bulk enq/dequeue (size: 32): 34.66
Test OK
RTE>>


>
>
>
> > > Dave