From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D1DA8A00E6 for ; Thu, 11 Jul 2019 11:54:54 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B903F2C6A; Thu, 11 Jul 2019 11:54:53 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 396AE1C01 for ; Thu, 11 Jul 2019 11:54:52 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7DAC6308FC4B; Thu, 11 Jul 2019 09:54:51 +0000 (UTC) Received: from [10.72.12.56] (ovpn-12-56.pek2.redhat.com [10.72.12.56]) by smtp.corp.redhat.com (Postfix) with ESMTP id A944D60A97; Thu, 11 Jul 2019 09:54:47 +0000 (UTC) To: "Liu, Yong" , "Bie, Tiwei" , "maxime.coquelin@redhat.com" , "dev@dpdk.org" References: <20190708171320.38802-1-yong.liu@intel.com> <20190708171320.38802-3-yong.liu@intel.com> <86228AFD5BCD8E4EBFD2B90117B5E81E63334CE6@SHSMSX103.ccr.corp.intel.com> <435a2d7c-4751-95e7-73d7-9e519a3a893a@redhat.com> <86228AFD5BCD8E4EBFD2B90117B5E81E63335C81@SHSMSX103.ccr.corp.intel.com> From: Jason Wang Message-ID: <25f7d7e5-899b-ca04-30c0-c84ce1fd4210@redhat.com> Date: Thu, 11 Jul 2019 17:54:45 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <86228AFD5BCD8E4EBFD2B90117B5E81E63335C81@SHSMSX103.ccr.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.43]); Thu, 11 Jul 2019 09:54:51 +0000 (UTC) Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 2019/7/11 下午5:49, Liu, Yong wrote: > >> -----Original Message----- >> From: Jason Wang [mailto:jasowang@redhat.com] >> Sent: Thursday, July 11, 2019 12:11 PM >> To: Liu, Yong ; Bie, Tiwei ; >> maxime.coquelin@redhat.com; dev@dpdk.org >> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue >> function >> >> >> On 2019/7/10 下午3:30, Liu, Yong wrote: >>>> -----Original Message----- >>>> From: Jason Wang [mailto:jasowang@redhat.com] >>>> Sent: Wednesday, July 10, 2019 12:28 PM >>>> To: Liu, Yong ; Bie, Tiwei ; >>>> maxime.coquelin@redhat.com; dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast >> enqueue >>>> function >>>> >>>> >>>> On 2019/7/9 上午1:13, Marvin Liu wrote: >>>>> In fast enqueue function, will first check whether descriptors are >>>>> cache aligned. Fast enqueue function will check prerequisites in the >>>>> beginning. Fast enqueue function do not support chained mbufs, normal >>>>> function will handle that. >>>>> >>>>> Signed-off-by: Marvin Liu >>>> Any reason for not letting compiler to unroll the loops? >>>> >>> Hi Jason, >>> I'm not sure about how much compiler can help on unrolling loops as it >> can't know how much loops will create in one call. >>> After force not using unroll-loop optimization by "-fno-unroll-loops", >> virtio_dev_rx_packed function size remained the same. >>> So look like gcc unroll-loop optimization do not help here. >> >> I meant something like "pragma GCC unroll N" just before the loop you >> want unrolled. >> >> Thanks >> > Hi Jason, > Just tired with gcc8.3.0 and master code, only 0.1Mpps performance gain with "#pragma GCC unroll". > I think this compiler pragma is not helpful in the big loop which contained so much functions. > > Thanks, > Marvin Yes, it probably need some trick e.g break the big loop into small ones. What I want do here is unroll the loop based on PACKED_DESC_PER_CACHELINE instead of a hard-coded 4. Thanks >>> And fast enqueue function not only did unroll loop, it also checked cache >> alignment which can help performance in another side. >>> Regards, >>> Marvin >>> >>>> Thanks >>>>