From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 3929937AC for ; Fri, 1 Sep 2017 12:32:59 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Sep 2017 03:32:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,457,1498546800"; d="scan'208";a="1213404058" Received: from debian-zgviawfucg.sh.intel.com (HELO debian-ZGViaWFuCg) ([10.67.104.160]) by fmsmga002.fm.intel.com with ESMTP; 01 Sep 2017 03:32:55 -0700 Date: Fri, 1 Sep 2017 18:33:23 +0800 From: Tiwei Bie To: Maxime Coquelin Cc: dev@dpdk.org, yliu@fridaylinux.org, Zhihong Wang , Zhiyong Yang , Santosh Shukla , Jerin Jacob , hemant.agrawal@nxp.com Message-ID: <20170901103322.GA10109@debian-ZGViaWFuCg> References: <20170824021939.21306-1-tiwei.bie@intel.com> <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com> User-Agent: Mutt/1.7.2 (2016-11-26) Subject: Re: [dpdk-dev] [PATCH] vhost: adaptively batch small guest memory copies X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Sep 2017 10:32:59 -0000 On Fri, Sep 01, 2017 at 11:45:42AM +0200, Maxime Coquelin wrote: > On 08/24/2017 04:19 AM, Tiwei Bie wrote: > > This patch adaptively batches the small guest memory copies. > > By batching the small copies, the efficiency of executing the > > memory LOAD instructions can be improved greatly, because the > > memory LOAD latency can be effectively hidden by the pipeline. > > We saw great performance boosts for small packets PVP test. > > > > This patch improves the performance for small packets, and has > > distinguished the packets by size. So although the performance > > for big packets doesn't change, it makes it relatively easy to > > do some special optimizations for the big packets too. > > Do you mean that if we would batch unconditionnaly whatever the size, > we see performance drop for larger (>256) packets? > Yeah, you are right. > Other question is about indirect descriptors, my understanding of the > patch is that the number of batched copies is limited to the queue size. > In theory, we could have more than that with indirect descriptors (first > indirect desc for the vnet header, second one for the packet). > > So in the worst case, we would have the first small copies being > batched, but not the last ones if there are more than queue size. > So, I think it works, but I'd like your confirmation. > Yeah, you are right. If the number of small copies is larger than the queue size, the last ones won't be batched any more. > > > > Signed-off-by: Tiwei Bie > > Signed-off-by: Zhihong Wang > > Signed-off-by: Zhiyong Yang > > --- > > This optimization depends on the CPU internal pipeline design. > > So further tests (e.g. ARM) from the community is appreciated. > > Agree, I think this is important to have it tested on ARM platforms at > least to ensure it doesn't introduce a regression. > > Adding Santosh, Jerin & Hemant in cc, who might know who could do the > test. > Thank you very much! :-) Best regards, Tiwei Bie