From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 812BCA2EDB for ; Fri, 6 Sep 2019 11:11:46 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D559D1F22B; Fri, 6 Sep 2019 11:11:45 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 987501BE93 for ; Fri, 6 Sep 2019 11:11:44 +0200 (CEST) X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 02:11:43 -0700 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="177583272" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.46]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 02:11:41 -0700 Date: Fri, 6 Sep 2019 10:11:39 +0100 From: Bruce Richardson To: "Liu, Yong" Cc: Ilya Maximets , "Bie, Tiwei" , "maxime.coquelin@redhat.com" , "dev@dpdk.org" Message-ID: <20190906091139.GB1600@bricha3-MOBL.ger.corp.intel.com> References: <20190905161421.55981-3-yong.liu@intel.com> <9674491d-4ce0-ea60-e92c-4be2e3d540b8@samsung.com> <86228AFD5BCD8E4EBFD2B90117B5E81E6339BAE9@SHSMSX103.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <86228AFD5BCD8E4EBFD2B90117B5E81E6339BAE9@SHSMSX103.ccr.corp.intel.com> User-Agent: Mutt/1.11.4 (2019-03-13) Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function for packed ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Sep 06, 2019 at 01:42:44AM +0000, Liu, Yong wrote: > > > > -----Original Message----- > > From: Ilya Maximets [mailto:i.maximets@samsung.com] > > Sent: Thursday, September 05, 2019 6:31 PM > > To: Liu, Yong ; Bie, Tiwei ; > > maxime.coquelin@redhat.com; dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function > > for packed ring > > > > On 05.09.2019 19:14, Marvin Liu wrote: > > > Burst enqueue function will first check whether descriptors are cache > > > aligned. It will also check prerequisites in the beginning. Burst > > > enqueue function not support chained mbufs, single packet enqueue > > > function will handle it. > > > > > > Signed-off-by: Marvin Liu > > > > Hi. > > > > Can we rely on loop unrolling by compiler instead of repeating each > > command 4 times? > > > > For example: > > > > uint64_t len[PACKED_DESCS_BURST]; > > > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > len[i] = descs[avail_idx + i].len; > > > > > > For 'if's: > > > > res = false; > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > res |= pkts[i]->next != NULL; > > if (unlikely(res)) > > return -1; > > > > or just > > > > for (i = 0; i < PACKED_DESCS_BURST; i++) > > if (unlikely(pkts[i]->next != NULL)) > > return -1; > > > > Since PACKED_DESCS_BURST is a fairly small constant, loops should be > > unrolled by compiler producing almost same code. > > > > This will significantly reduce code size and will also allow to > > play with PACKED_DESCS_BURST value without massive code changes. > > > > Same is applicable to other patches in the series. > > > > What do you think? > > > > Hi Ilya, > I did some test with the unroll availability of various compilers before. > All listed compilers will cause loopback performance drop compared to repeating code version, especially GCC7.4 and ICC. > Newer compilers will have much less impact (around 3%) on the throughput. > If we can accept that, repeating code can be replaced with small loop function. > > |----------------|---------------|-------------|------| > | Compiler | Auto unrolled | Fixed batch | Gap | > |----------------|---------------|-------------|------| > | Clang6.0.0 | 13.1M | 13.5M | 0.4M | > |----------------|---------------|-------------|------| > | GCC 8.3.0 | 13.9M | 14.4M | 0.5M | > |----------------|---------------|-------------|------| > | GCC 7.4.0 | 12.6M | 13.5M | 0.9M | > |----------------|---------------|-------------|------| > | ICC 19.0.4.243 | 11.0M | 12.3M | 1.3M | > |----------------|---------------|-------------|------| > > Thanks, > Marvin > Did you verify that the compiler was actually unrolling the loops? You may need to put __attribute__((optimize("unroll-loops"))) in the function definition. /Bruce