From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id F11242C07 for ; Mon, 22 Aug 2016 10:11:16 +0200 (CEST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 012F7C05678D; Mon, 22 Aug 2016 08:11:16 +0000 (UTC) Received: from [10.36.5.215] (vpn1-5-215.ams2.redhat.com [10.36.5.215]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u7M8BD3t009032 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 22 Aug 2016 04:11:15 -0400 To: Zhihong Wang , dev@dpdk.org References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <1471585430-125925-1-git-send-email-zhihong.wang@intel.com> Cc: yuanhan.liu@linux.intel.com From: Maxime Coquelin Message-ID: Date: Mon, 22 Aug 2016 10:11:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1471585430-125925-1-git-send-email-zhihong.wang@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 22 Aug 2016 08:11:16 +0000 (UTC) Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Aug 2016 08:11:17 -0000 Hi Zhihong, On 08/19/2016 07:43 AM, Zhihong Wang wrote: > This patch set optimizes the vhost enqueue function. > > It implements the vhost logic from scratch into a single function designed > for high performance and good maintainability, and improves CPU efficiency > significantly by optimizing cache access, which means: > > * For fast frontends (eg. DPDK virtio pmd), higher performance (maximum > throughput) can be achieved. > > * For slow frontends (eg. kernel virtio-net), better scalability can be > achieved, each vhost core can support more connections since it takes > less cycles to handle each single frontend. > > The main optimization techniques are: > > 1. Reorder code to reduce CPU pipeline stall cycles. > > 2. Batch update the used ring for better efficiency. > > 3. Prefetch descriptor to hide cache latency. > > 4. Remove useless volatile attribute to allow compiler optimization. Thanks for these details, this is helpful to understand where the perf gain comes from. I would suggest to add these information as comments in the code where/if it makes sense. If more a general comment, at least add it in the commit message of the patch introducing it. Indeed, adding it to the cover letter is fine, but the information is lost as soon as the series is applied. You don't mention any figures, so I set up a benchmark on my side to evaluate your series. It indeed shows an interesting performance gain. My setup consists of one host running a guest. The guest generates as much 64bytes packets as possible using pktgen-dpdk. The hosts forwards received packets back to the guest using testpmd on vhost pmd interface. Guest's vCPUs are pinned to physical CPUs. I tested it with and without your v1 patch, with and without rx-mergeable feature turned ON. Results are the average of 8 runs of 60 seconds: Rx-Mergeable ON : 7.72Mpps Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps Rx-Mergeable OFF: 10.52Mpps Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps Regards, Maxime