From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id CF699A318B for ; Fri, 18 Oct 2019 10:04:52 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 943891C0CF; Fri, 18 Oct 2019 10:04:52 +0200 (CEST) Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by dpdk.org (Postfix) with ESMTP id 4F01A1C0C1 for ; Fri, 18 Oct 2019 10:04:51 +0200 (CEST) Received: by mail-il1-f193.google.com with SMTP id z2so4729055ilb.3 for ; Fri, 18 Oct 2019 01:04:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sqSN5gHO3vD/1+gZVvjfzIhkukXkkHxemwwpI6KIAAY=; b=Pu9MQB2sh119zcBXxDlTm8vGobs74sfZkKEEHHknWDrOb2Xv1HNQWVnIDAg04hNLkf 2Sy5ofTpjWRdm70cFEK0wZFhZzzDPMC46gf6fVPre3iiK/TwZ/+gWevrXNnaRqViOnGu akunYJez7Kta2iCnHytD5D4Gual6enQ7MfKsmlVc2UjmJcdJC4jN0erY1nLCxfVp3hEG 1xeKBCmhW+VhYRKqppy2Dangj5zpmXsMyDoQPsjlTjlwDjfrORgccQ0o+ZPm6JIx2q/w PmUtOdxT38PJGRRtLdlEeKYhB4QRQJprpDIptRuDsuYbcuFEt2tiNOfTJDiNxnAJvaeB 42qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sqSN5gHO3vD/1+gZVvjfzIhkukXkkHxemwwpI6KIAAY=; b=nkFQfcbIHf7Mkft4lqyHqoM0mNJudTAWUESDGvGQcdPN/gvTowTYIZB/BiaAPX22yY zHSX/3cMGEY5lUrnMGBQ2RxFD0VXKvVeuYjvzPaW8stGZmCHQ9yLVM1ZrQsmI2irFrG2 Wz9XVUr3MhGLwjQ4kvewnSyM3wEMNWnq2zfc6Y3u6S5W5ibYFl2clJfkOsCLoC4E9+E7 Eq03XEAtNasfGyg2TPGGCt8w6qAIWvrtRTCGGv53CV8O7qH51+xw1ZA6hx0RVtJi+xC+ AzEYLnFn1cfZPgnlYdSognmcZTurHTCDspxW7MupNEgP3zAfE/OgDDotbPXD0VG4xukT xP7w== X-Gm-Message-State: APjAAAXwQOqAPH4Ge1csoKTuc4vVmjyR6CyxCbpqjND2vVpCO7MKAVgH fE1kTqGZGmGmSpxvZ486cuSkbhZnz1BdrIqHCE8= X-Google-Smtp-Source: APXvYqzUh5t0O4olZP0vgtVrmCB9xDAOX+eCY63JvEWfO9dNirI9YGcbrEJ7aFmBad20DkWuDdKOiBihW7fxVuFie+A= X-Received: by 2002:a92:d084:: with SMTP id h4mr9224434ilh.294.1571385890297; Fri, 18 Oct 2019 01:04:50 -0700 (PDT) MIME-Version: 1.0 References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-1-honnappa.nagarahalli@arm.com> <20191009024709.38144-2-honnappa.nagarahalli@arm.com> <2601191342CEEE43887BDE71AB97725801A8C68545@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725801A8C6A2DA@IRSMSX104.ger.corp.intel.com> <7df09c22-5b8b-77d8-1e8a-a2714e732036@linux.vnet.ibm.com> In-Reply-To: From: Jerin Jacob Date: Fri, 18 Oct 2019 13:34:38 +0530 Message-ID: To: Honnappa Nagarahalli Cc: David Christensen , "Ananyev, Konstantin" , "olivier.matz@6wind.com" , "sthemmin@microsoft.com" , "jerinj@marvell.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "pbhagavatula@marvell.com" , "dev@dpdk.org" , Dharmik Thakkar , "Ruifeng Wang (Arm Technology China)" , "Gavin Hu (Arm Technology China)" , "stephen@networkplumber.org" , nd Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Oct 18, 2019 at 8:48 AM Honnappa Nagarahalli wrote: > > > > > Subject: Re: [PATCH v4 1/2] lib/ring: apis to support configurable element > > size > > > > >>> I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results > > >>> are as > > >> follows. The numbers in brackets are with the code on master. > > >>> gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 > > >>> > > >>> RTE>>ring_perf_elem_autotest > > >>> ### Testing single element and burst enq/deq ### SP/SC single > > >>> enq/dequeue: 5 MP/MC single enq/dequeue: 40 (35) SP/SC burst > > >>> enq/dequeue (size: 8): 2 MP/MC burst enq/dequeue (size: 8): 6 SP/SC > > >>> burst enq/dequeue (size: 32): 1 (2) MP/MC burst enq/dequeue (size: > > >>> 32): 2 > > >>> > > >>> ### Testing empty dequeue ### > > >>> SC empty dequeue: 2.11 > > >>> MC empty dequeue: 1.41 (2.11) > > >>> > > >>> ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: > > >>> 8): 2.15 (2.86) MP/MC bulk enq/dequeue > > >>> (size: 8): 6.35 (6.91) SP/SC bulk enq/dequeue (size: 32): 1.35 > > >>> (2.06) MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95) > > >>> > > >>> ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: > > >>> 8): 73.81 (15.33) MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27) > > >>> SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58) MP/MC bulk > > >>> enq/dequeue > > >>> (size: 32): 25.74 (20.91) > > >>> > > >>> ### Testing using two NUMA nodes ### SP/SC bulk enq/dequeue (size: > > >>> 8): 164.32 (50.66) MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43) > > >>> SP/SC bulk enq/dequeue (size: > > >>> 32): 50.78 (23) MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74) > > >>> > > >>> On one of the Arm platform > > >>> MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest > > >>> are > > >>> ok) > > > > Tried this on a Power9 platform (3.6GHz), with two numa nodes and 16 > > cores/node (SMT=4). Applied all 3 patches in v5, test results are as > > follows: > > > > RTE>>ring_perf_elem_autotest > > ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: > > 42 MP/MC single enq/dequeue: 59 SP/SC burst enq/dequeue (size: 8): 5 > > MP/MC burst enq/dequeue (size: 8): 7 SP/SC burst enq/dequeue (size: 32): 2 > > MP/MC burst enq/dequeue (size: 32): 2 > > > > ### Testing empty dequeue ### > > SC empty dequeue: 7.81 > > MC empty dequeue: 7.81 > > > > ### Testing using a single lcore ### > > SP/SC bulk enq/dequeue (size: 8): 5.76 > > MP/MC bulk enq/dequeue (size: 8): 7.66 > > SP/SC bulk enq/dequeue (size: 32): 2.10 > > MP/MC bulk enq/dequeue (size: 32): 2.57 > > > > ### Testing using two hyperthreads ### > > SP/SC bulk enq/dequeue (size: 8): 13.13 > > MP/MC bulk enq/dequeue (size: 8): 13.98 > > SP/SC bulk enq/dequeue (size: 32): 3.41 > > MP/MC bulk enq/dequeue (size: 32): 4.45 > > > > ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): > > 11.00 MP/MC bulk enq/dequeue (size: 8): 10.95 SP/SC bulk enq/dequeue > > (size: 32): 3.08 MP/MC bulk enq/dequeue (size: 32): 3.40 > > > > ### Testing using two NUMA nodes ### > > SP/SC bulk enq/dequeue (size: 8): 63.41 > > MP/MC bulk enq/dequeue (size: 8): 62.70 > > SP/SC bulk enq/dequeue (size: 32): 15.39 MP/MC bulk enq/dequeue (size: > > 32): 22.96 > > > Thanks for running this. There is another test 'ring_perf_autotest' which provides the numbers with the original implementation. The goal is to make sure the numbers with the original implementation are the same as these. Can you please run that as well? Honnappa, Your earlier perf report shows the cycles are in less than 1. That's is due to it is using 50 or 100MHz clock in EL0. Please check with PMU counter. See "ARM64 profiling" in http://doc.dpdk.org/guides/prog_guide/profile_app.html Here is the octeontx2 values. There is a regression in two core cases as you reported earlier in x86. RTE>>ring_perf_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 288 MP/MC single enq/dequeue: 452 SP/SC burst enq/dequeue (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 61 SP/SC burst enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 21 ### Testing empty dequeue ### SC empty dequeue: 6.33 MC empty dequeue: 6.67 ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 38.35 MP/MC bulk enq/dequeue (size: 8): 67.36 SP/SC bulk enq/dequeue (size: 32): 13.10 MP/MC bulk enq/dequeue (size: 32): 21.64 ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 75.94 MP/MC bulk enq/dequeue (size: 8): 107.66 SP/SC bulk enq/dequeue (size: 32): 24.51 MP/MC bulk enq/dequeue (size: 32): 33.23 Test OK RTE>> ---- after applying v5 of the patch ------ RTE>>ring_perf_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 289 MP/MC single enq/dequeue: 452 SP/SC burst enq/dequeue (size: 8): 40 MP/MC burst enq/dequeue (size: 8): 64 SP/SC burst enq/dequeue (size: 32): 13 MP/MC burst enq/dequeue (size: 32): 22 ### Testing empty dequeue ### SC empty dequeue: 6.33 MC empty dequeue: 6.67 ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 39.73 MP/MC bulk enq/dequeue (size: 8): 69.13 SP/SC bulk enq/dequeue (size: 32): 13.44 MP/MC bulk enq/dequeue (size: 32): 22.00 ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 76.02 MP/MC bulk enq/dequeue (size: 8): 112.50 SP/SC bulk enq/dequeue (size: 32): 24.71 MP/MC bulk enq/dequeue (size: 32): 33.34 Test OK RTE>> RTE>>ring_perf_elem_autotest ### Testing single element and burst enq/deq ### SP/SC single enq/dequeue: 290 MP/MC single enq/dequeue: 503 SP/SC burst enq/dequeue (size: 8): 39 MP/MC burst enq/dequeue (size: 8): 63 SP/SC burst enq/dequeue (size: 32): 11 MP/MC burst enq/dequeue (size: 32): 19 ### Testing empty dequeue ### SC empty dequeue: 6.33 MC empty dequeue: 6.67 ### Testing using a single lcore ### SP/SC bulk enq/dequeue (size: 8): 38.92 MP/MC bulk enq/dequeue (size: 8): 62.54 SP/SC bulk enq/dequeue (size: 32): 11.46 MP/MC bulk enq/dequeue (size: 32): 19.89 ### Testing using two physical cores ### SP/SC bulk enq/dequeue (size: 8): 87.55 MP/MC bulk enq/dequeue (size: 8): 99.10 SP/SC bulk enq/dequeue (size: 32): 26.63 MP/MC bulk enq/dequeue (size: 32): 29.91 Test OK RTE>> > > Dave