From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 31E7FA31F3 for ; Fri, 18 Oct 2019 18:11:44 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A38791C0DA; Fri, 18 Oct 2019 18:11:43 +0200 (CEST) Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by dpdk.org (Postfix) with ESMTP id 46B7D1C0D7 for ; Fri, 18 Oct 2019 18:11:42 +0200 (CEST) Received: by mail-il1-f196.google.com with SMTP id f13so6008708ils.11 for ; Fri, 18 Oct 2019 09:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=YXSvZWtHauOCGSsJ6fK9pHqNYwP/BODb78IVWbYmRbE=; b=qAf/1Np0/o9e/nzsEInMomPPXa/78wH5ITXtAdmaeHc8WlLMpSKLpFXlYiAG+Gqex9 qJsZEJxk314250EXSUhr8RNZ0IrqoIF6gXY750KHtSrFba5n7i90nsPmSn+N8y5vQVz9 qFZyaR/F2ORcUvnMQnEE75ljR5NujSf5xU2cKFnwZIPeiZ73x1H/thPxtMHUJf6vtfKR lQ3lWHn5Rx/P2KNCgP2ARxQ2y+pM1JQUGXBhRx7BqgX4hVr6mYUlDMTdv4wfhBHgji9z 50tP8MSD4CzlIifdj2r5TCZhGGlhNVMa/Dat+IvhRce391xdkDLVm7I1pTMhSkgANGEQ hrTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=YXSvZWtHauOCGSsJ6fK9pHqNYwP/BODb78IVWbYmRbE=; b=hTmbRlhMqbEhOLGZcZ8PUcxcLoN5cpqSRfzk4WRgDmsIOwqTFsIG3yv+VYLwiqp5uV VaHnpF2sgmUaCEOJvZeQrUZM7rDEGudHFU8B0sFQ9MoV6FoN3nDej+xwhcYUVU0cWQg4 snn3UTrjR+dDp0DEXfCiEGFrF+1B0c1n/Ns/Ggn1ws5ln+v7nl7lnbBAe1ylk1Gfor50 N++VswkQhFmMqoh9Uz3+9FOwzjqTvo4WPpdF1PEH5d8Xd6sxu6qZCN4wY2ehfOtrIzjU OmbxLVY3b6qhpxAiVUkqth4Y8P0kqzN9hVLrmuwwz+OBeziC8D2dHLTjfV0UYu4JbN/I vxLQ== X-Gm-Message-State: APjAAAUGEssGO/s6+8wG9V7CMMcuqjdgk1Wt6nYWspvMTn3TwR29Epdh SJj1bAwASIMKS4bweDZzaElHZxoXaVKXAyECRzM= X-Google-Smtp-Source: APXvYqzTorssaqhLFBlRJQk8QBd4lQhFVPhAndl3t2oPdYD7Mm4RuMquDZjrV+bZBxhXhXlN1XOgP3X2qBP1QBW3ui0= X-Received: by 2002:a92:918b:: with SMTP id e11mr11547852ill.130.1571415101265; Fri, 18 Oct 2019 09:11:41 -0700 (PDT) MIME-Version: 1.0 References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB97725801A8C68545@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C6A2DA@IRSMSX104.ger.corp.intel.com> <7df09c22-5b8b-77d8-1e8a-a2714e732036@linux.vnet.ibm.com> In-Reply-To: From: Jerin Jacob Date: Fri, 18 Oct 2019 21:41:29 +0530 Message-ID: To: Honnappa Nagarahalli Cc: David Christensen , "Ananyev, Konstantin" , "olivier.matz@6wind.com" , "sthemmin@microsoft.com" , "jerinj@marvell.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "pbhagavatula@marvell.com" , "dev@dpdk.org" , Dharmik Thakkar , "Ruifeng Wang (Arm Technology China)" , "Gavin Hu (Arm Technology China)" , "stephen@networkplumber.org" , nd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Oct 18, 2019 at 1:34 PM Jerin Jacob wrote: > > On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli > wrote: > > > > > > > > > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable el= ement > > > size > > > > > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the resul= ts > > > >>> are as > > > >> follows. The numbers in brackets are with the code on master. > > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 > > > >>> > > > >>> RTE>>ring_perf_elem_autotest > > > >>> ### Testing single element and burst enq/deq ### SP/SC single > > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst > > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/= SC > > > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size= : > > > >>> 32): 2 > > > >>> > > > >>> ### Testing empty dequeue ### > > > >>> SC empty dequeue: 2.11 > > > >>> MC empty dequeue: 1.41 (2.11) > > > >>> > > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size= : > > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue > > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 > > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95) > > > >>> > > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (= size: > > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27) > > > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk > > > >>> enq/dequeue > > > >>> (size: 32): 25.74 (20.91) > > > >>> > > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size= : > > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.= 43) > > > >>> SP/SC bulk enq/dequeue (size: > > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74) > > > >>> > > > >>> On one of the Arm platform > > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the res= t > > > >>> are > > > >>> ok) > > > > > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16 > > > cores/node (SMT=3D4). Applied all 3 patches in v5, test results are = as > > > follows: > > > > > > RTE>>ring_perf_elem_autotest > > > ### Testing single element and burst enq/deq ### SP/SC single enq/deq= ueue: > > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5 > > > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 3= 2): 2 > > > MP/MC burst enq/dequeue (size: 32): 2 > > > > > > ### Testing empty dequeue ### > > > SC empty dequeue: 7.81 > > > MC empty dequeue: 7.81 > > > > > > ### Testing using a single lcore ### > > > SP/SC bulk enq/dequeue (size: 8): 5.76 > > > MP/MC bulk enq/dequeue (size: 8): 7.66 > > > SP/SC bulk enq/dequeue (size: 32): 2.10 > > > MP/MC bulk enq/dequeue (size: 32): 2.57 > > > > > > ### Testing using two hyperthreads ### > > > SP/SC bulk enq/dequeue (size: 8): 13.13 > > > MP/MC bulk enq/dequeue (size: 8): 13.98 > > > SP/SC bulk enq/dequeue (size: 32): 3.41 > > > MP/MC bulk enq/dequeue (size: 32): 4.45 > > > > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size= : 8): > > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue > > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40 > > > > > > ### Testing using two NUMA nodes ### > > > SP/SC bulk enq/dequeue (size: 8): 63.41 > > > MP/MC bulk enq/dequeue (size: 8): 62.70 > > > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size= : > > > 32): 22.96 > > > > > Thanks for running this. There is another test 'ring_perf_autotest' whi= ch provides the numbers with the original implementation. The goal is to ma= ke sure the numbers with the original implementation are the same as these.= Can you please run that as well? > > Honnappa, > > Your earlier perf report shows the cycles are in less than 1. That's > is due to it is using 50 or 100MHz clock in EL0. > Please check with PMU counter. See "ARM64 profiling" in > > http://doc.dpdk.org/guides/prog_guide/profile_app.html > > > Here is the octeontx2 values. There is a regression in two core cases > as you reported earlier in x86. > > > RTE>>ring_perf_autotest > ### Testing single element and burst enq/deq ### > SP/SC single enq/dequeue: 288 > MP/MC single enq/dequeue: 452 > SP/SC burst enq/dequeue (size: 8): 39 > MP/MC burst enq/dequeue (size: 8): 61 > SP/SC burst enq/dequeue (size: 32): 13 > MP/MC burst enq/dequeue (size: 32): 21 > > ### Testing empty dequeue ### > SC empty dequeue: 6.33 > MC empty dequeue: 6.67 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 38.35 > MP/MC bulk enq/dequeue (size: 8): 67.36 > SP/SC bulk enq/dequeue (size: 32): 13.10 > MP/MC bulk enq/dequeue (size: 32): 21.64 > > ### Testing using two physical cores ### > SP/SC bulk enq/dequeue (size: 8): 75.94 > MP/MC bulk enq/dequeue (size: 8): 107.66 > SP/SC bulk enq/dequeue (size: 32): 24.51 > MP/MC bulk enq/dequeue (size: 32): 33.23 > Test OK > RTE>> > > ---- after applying v5 of the patch ------ > > RTE>>ring_perf_autotest > ### Testing single element and burst enq/deq ### > SP/SC single enq/dequeue: 289 > MP/MC single enq/dequeue: 452 > SP/SC burst enq/dequeue (size: 8): 40 > MP/MC burst enq/dequeue (size: 8): 64 > SP/SC burst enq/dequeue (size: 32): 13 > MP/MC burst enq/dequeue (size: 32): 22 > > ### Testing empty dequeue ### > SC empty dequeue: 6.33 > MC empty dequeue: 6.67 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 39.73 > MP/MC bulk enq/dequeue (size: 8): 69.13 > SP/SC bulk enq/dequeue (size: 32): 13.44 > MP/MC bulk enq/dequeue (size: 32): 22.00 > > ### Testing using two physical cores ### > SP/SC bulk enq/dequeue (size: 8): 76.02 > MP/MC bulk enq/dequeue (size: 8): 112.50 > SP/SC bulk enq/dequeue (size: 32): 24.71 > MP/MC bulk enq/dequeue (size: 32): 33.34 > Test OK > RTE>> > > RTE>>ring_perf_elem_autotest > ### Testing single element and burst enq/deq ### > SP/SC single enq/dequeue: 290 > MP/MC single enq/dequeue: 503 > SP/SC burst enq/dequeue (size: 8): 39 > MP/MC burst enq/dequeue (size: 8): 63 > SP/SC burst enq/dequeue (size: 32): 11 > MP/MC burst enq/dequeue (size: 32): 19 > > ### Testing empty dequeue ### > SC empty dequeue: 6.33 > MC empty dequeue: 6.67 > > ### Testing using a single lcore ### > SP/SC bulk enq/dequeue (size: 8): 38.92 > MP/MC bulk enq/dequeue (size: 8): 62.54 > SP/SC bulk enq/dequeue (size: 32): 11.46 > MP/MC bulk enq/dequeue (size: 32): 19.89 > > ### Testing using two physical cores ### > SP/SC bulk enq/dequeue (size: 8): 87.55 > MP/MC bulk enq/dequeue (size: 8): 99.10 > SP/SC bulk enq/dequeue (size: 32): 26.63 > MP/MC bulk enq/dequeue (size: 32): 29.91 > Test OK > RTE>> it looks like removal of 3/3 and keeping only 1/3 and 2/3 shows better results in some cases RTE>>ring_perf_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 288 MP/MC single enq/dequeue: 439 SP/SC burst enq/dequeue (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22 ### Testing empty dequeue ### SC empty dequeue: 6.33 MC empty dequeue: 6.67 ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size: 8): 67.48 SP/SC bulk enq/dequeue (size: 32): 13.40 MP/MC bulk enq/dequeue (size: 32): 22.03 ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 75.94 MP/MC bulk enq/dequeue (size: 8): 105.84 SP/SC bulk enq/dequeue (size: 32): 25.11 MP/MC bulk enq/dequeue (size: 32): 33.48 Test OK RTE>> RTE>>ring_perf_elem_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 288 MP/MC single enq/dequeue: 452 SP/SC burst enq/dequeue (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22 ### Testing empty dequeue ### SC empty dequeue: 6.33 MC empty dequeue: 6.00 ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size: 8): 67.46 SP/SC bulk enq/dequeue (size: 32): 13.42 MP/MC bulk enq/dequeue (size: 32): 22.01 ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 76.04 MP/MC bulk enq/dequeue (size: 8): 104.88 SP/SC bulk enq/dequeue (size: 32): 24.75 MP/MC bulk enq/dequeue (size: 32): 34.66 Test OK RTE>> > > > > > > Dave