From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7D614A2E1B for ; Thu, 5 Sep 2019 10:34:35 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8D1241ED83; Thu, 5 Sep 2019 10:34:34 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 3FAF51ED83 for ; Thu, 5 Sep 2019 10:34:33 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2019 01:34:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,470,1559545200"; d="scan'208";a="383781484" Received: from npg-dpdk-virtual-marvin-dev.sh.intel.com ([10.67.119.142]) by fmsmga006.fm.intel.com with ESMTP; 05 Sep 2019 01:34:30 -0700 From: Marvin Liu To: tiwei.bie@intel.com, maxime.coquelin@redhat.com, dev@dpdk.org Cc: Marvin Liu Date: Fri, 6 Sep 2019 00:14:07 +0800 Message-Id: <20190905161421.55981-1-yong.liu@intel.com> X-Mailer: git-send-email 2.17.1 Subject: [dpdk-dev] [PATCH v1 00/14] vhost packed ring performance optimization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Packed ring has more compact ring format and thus can significantly reduce the number of cache miss. It can lead to better performance. This has been approved in virtio user driver, on normal E5 Xeon cpu single core performance can raise 12%. http://mails.dpdk.org/archives/dev/2018-April/095470.html However vhost performance with packed ring performance was decreased. Through analysis, mostly extra cost was from the calculating of each descriptor flag which depended on ring wrap counter. Moreover, both frontend and backend need to write same descriptors which will cause cache contention. Especially when doing vhost enqueue function, virtio refill packed ring function may write same cache line when vhost doing enqueue function. This kind of extra cache cost will reduce the benefit of reducing cache misses. For optimizing vhost packed ring performance, vhost enqueue and dequeue function will be splitted into fast and normal path. Several methods will be taken in fast path: Uroll burst loop function into more pieces. Handle descriptors in one cache line simultaneously. Prerequisite check that whether I/O space can copy directly into mbuf space and vice versa. Prerequisite check that whether descriptor mapping is successful. Distinguish vhost descriptor update function by enqueue and dequeue function. Buffer dequeue used descriptors as many as possible. Update enqueue used descriptors by cache line. Cache memory region structure for fast conversion. Disable sofware prefetch is hardware can do better. After all these methods done, single core vhost PvP performance with 64B packet on Xeon 8180 can boost 40%. Marvin Liu (14): vhost: add single packet enqueue function vhost: add burst enqueue function for packed ring vhost: add single packet dequeue function vhost: add burst dequeue function vhost: rename flush shadow used ring functions vhost: flush vhost enqueue shadow ring by burst vhost: add flush function for burst enqueue vhost: buffer vhost dequeue shadow ring vhost: split enqueue and dequeue flush functions vhost: optimize Rx function of packed ring vhost: add burst and single zero dequeue functions vhost: optimize Tx function of packed ring vhost: cache address translation result vhost: check whether disable software pre-fetch lib/librte_vhost/Makefile | 6 + lib/librte_vhost/rte_vhost.h | 27 + lib/librte_vhost/vhost.h | 13 + lib/librte_vhost/virtio_net.c | 1094 +++++++++++++++++++++++++++------ 4 files changed, 944 insertions(+), 196 deletions(-) -- 2.17.1