From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 07800A2EDB for ; Fri, 6 Sep 2019 11:23:25 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 82AE11F27A; Fri, 6 Sep 2019 11:23:25 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 0C2961F279 for ; Fri, 6 Sep 2019 11:23:22 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 02:23:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="208173369" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga004.fm.intel.com with ESMTP; 06 Sep 2019 02:23:22 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 6 Sep 2019 02:23:21 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.140]) by shsmsx102.ccr.corp.intel.com ([169.254.2.113]) with mapi id 14.03.0439.000; Fri, 6 Sep 2019 17:23:19 +0800 From: "Liu, Yong" To: "Richardson, Bruce" CC: Ilya Maximets , "Bie, Tiwei" , "maxime.coquelin@redhat.com" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function for packed ring Thread-Index: AQHVY8TLwxJ6oPf880Krx+ELrrCn7KccXGWAgAGAl7D///uMgIAAhygQ Date: Fri, 6 Sep 2019 09:23:18 +0000 Message-ID: <86228AFD5BCD8E4EBFD2B90117B5E81E6339C634@SHSMSX103.ccr.corp.intel.com> References: <20190905161421.55981-3-yong.liu@intel.com> <9674491d-4ce0-ea60-e92c-4be2e3d540b8@samsung.com> <86228AFD5BCD8E4EBFD2B90117B5E81E6339BAE9@SHSMSX103.ccr.corp.intel.com> <20190906091139.GB1600@bricha3-MOBL.ger.corp.intel.com> In-Reply-To: <20190906091139.GB1600@bricha3-MOBL.ger.corp.intel.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTVmNjI2OTctMmNmMi00Zjk3LTg1YjktOGJjMGUwZDg5MjNhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiT0xDSE9tZWt0cm5ZNWlmY1lCV0ZVWG9YaUZYRklrcFAzRlJVU0lTVDVCeXU1b1wvclEzdVVCbmp0cG82ZEdHdlEifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue function for packed ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: Friday, September 06, 2019 5:12 PM > To: Liu, Yong > Cc: Ilya Maximets ; Bie, Tiwei > ; maxime.coquelin@redhat.com; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue functio= n > for packed ring >=20 > On Fri, Sep 06, 2019 at 01:42:44AM +0000, Liu, Yong wrote: > > > > > > > -----Original Message----- > > > From: Ilya Maximets [mailto:i.maximets@samsung.com] > > > Sent: Thursday, September 05, 2019 6:31 PM > > > To: Liu, Yong ; Bie, Tiwei ; > > > maxime.coquelin@redhat.com; dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v1 02/14] vhost: add burst enqueue > function > > > for packed ring > > > > > > On 05.09.2019 19:14, Marvin Liu wrote: > > > > Burst enqueue function will first check whether descriptors are cac= he > > > > aligned. It will also check prerequisites in the beginning. Burst > > > > enqueue function not support chained mbufs, single packet enqueue > > > > function will handle it. > > > > > > > > Signed-off-by: Marvin Liu > > > > > > Hi. > > > > > > Can we rely on loop unrolling by compiler instead of repeating each > > > command 4 times? > > > > > > For example: > > > > > > uint64_t len[PACKED_DESCS_BURST]; > > > > > > for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > len[i] =3D descs[avail_idx + i].len; > > > > > > > > > For 'if's: > > > > > > res =3D false; > > > for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > res |=3D pkts[i]->next !=3D NULL; > > > if (unlikely(res)) > > > return -1; > > > > > > or just > > > > > > for (i =3D 0; i < PACKED_DESCS_BURST; i++) > > > if (unlikely(pkts[i]->next !=3D NULL)) > > > return -1; > > > > > > Since PACKED_DESCS_BURST is a fairly small constant, loops should be > > > unrolled by compiler producing almost same code. > > > > > > This will significantly reduce code size and will also allow to > > > play with PACKED_DESCS_BURST value without massive code changes. > > > > > > Same is applicable to other patches in the series. > > > > > > What do you think? > > > > > > > Hi Ilya, > > I did some test with the unroll availability of various compilers befor= e. > > All listed compilers will cause loopback performance drop compared to > repeating code version, especially GCC7.4 and ICC. > > Newer compilers will have much less impact (around 3%) on the throughpu= t. > > If we can accept that, repeating code can be replaced with small loop > function. > > > > |----------------|---------------|-------------|------| > > | Compiler | Auto unrolled | Fixed batch | Gap | > > |----------------|---------------|-------------|------| > > | Clang6.0.0 | 13.1M | 13.5M | 0.4M | > > |----------------|---------------|-------------|------| > > | GCC 8.3.0 | 13.9M | 14.4M | 0.5M | > > |----------------|---------------|-------------|------| > > | GCC 7.4.0 | 12.6M | 13.5M | 0.9M | > > |----------------|---------------|-------------|------| > > | ICC 19.0.4.243 | 11.0M | 12.3M | 1.3M | > > |----------------|---------------|-------------|------| > > > > Thanks, > > Marvin > > > Did you verify that the compiler was actually unrolling the loops? You ma= y > need to put __attribute__((optimize("unroll-loops"))) in the function > definition. Thanks for note, Bruce. I only checked GCC compiled binaries, loop have bee= n unrolled. Will double check clang and ICC compiling result. Regards, Marvin >=20 > /Bruce