From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 59721A00E6 for ; Mon, 8 Jul 2019 11:29:52 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 649FC3256; Mon, 8 Jul 2019 11:29:51 +0200 (CEST) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 287512F42 for ; Mon, 8 Jul 2019 11:29:49 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Jul 2019 02:29:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,466,1557212400"; d="scan'208";a="155821624" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.142]) by orsmga007.jf.intel.com with ESMTP; 08 Jul 2019 02:29:45 -0700 From: Marvin Liu To: tiwei.bie@intel.com, maxime.coquelin@redhat.com, dev@dpdk.org Cc: Marvin Liu Date: Tue, 9 Jul 2019 01:13:07 +0800 Message-Id: <20190708171320.38802-1-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [dpdk-dev] [RFC] vhost packed ring performance optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Packed ring has more compact ring format and thus can significantly reduced the number of cache miss. It can lead to better performance. This has been approved in virtio user driver, on normal E5 Xeon cpu single core performance can raise 12%. http://mails.dpdk.org/archives/dev/2018-April/095470.html However vhost performance with packed ring performance was decreased. Through analysis, mostly extra cost was from the calculating of each descriptor flag which depended on ring wrap counter. Moreover, both frontend and backend need to write same descriptors which will cause cache contention. Especially when doing vhost enqueue function, virtio refill packed ring function may write same cache line when vhost doing enqueue function. This kind of extra cache cost will neutralize the benefit of reducing cache misses. For optimizing vhost packed ring performance, vhost enqueue and dequeue function will be divided into fast and normal parts. Several methods will be taken in fast path: Uroll burst loop function into more pieces. Handle descriptors in one cache line simultaneously. Prerequisite check that whether I/O space can copy directly into mbuf space and vice versa. Prerequisite check that whether descriptor mapping is successful. Distinguish vhost descriptor update function by enqueue and dequeue function. Buffer dequeue used descriptors as many as possible. Update enqueue used descriptors by cache line. Cached memory region structure for fast conversion. Defined macros for pre-calculating packed descriptors flag. Indirect and merged packets will be handled in normal path, as most-likely they are large packets and most of costs are in memory copy. After all these methods done, single core vhost PvP performance with 64B packet on Xeon 8180 can boost 35%, loopback performance measured by virtio user pmd can boost over 45%. Marvin Liu (13): add vhost normal enqueue function add vhost packed ring fast enqueue function add vhost packed ring normal dequeue function add vhost packed ring fast dequeue function add enqueue shadow used descs update and flush functions add vhost fast enqueue flush function add vhost dequeue shadow descs update function add vhost fast dequeue flush function replace vhost enqueue packed ring function add vhost fast zero copy dequeue packed ring function replace vhost dequeue packed ring function support inorder in vhost dequeue path remove useless vhost functions lib/librte_vhost/vhost.h | 16 + lib/librte_vhost/virtio_net.c | 1048 +++++++++++++++++++++++++++------ 2 files changed, 883 insertions(+), 181 deletions(-) -- 2.17.1