From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id E69DC68CA for ; Fri, 14 Oct 2016 11:33:48 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 14 Oct 2016 02:33:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,344,1473145200"; d="scan'208";a="1064724634" Received: from yliu-dev.sh.intel.com ([10.239.67.162]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2016 02:33:46 -0700 From: Yuanhan Liu To: dev@dpdk.org Cc: Maxime Coquelin , Zhihong Wang , Yuanhan Liu Date: Fri, 14 Oct 2016 17:34:33 +0800 Message-Id: <1476437678-7102-3-git-send-email-yuanhan.liu@linux.intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1476437678-7102-1-git-send-email-yuanhan.liu@linux.intel.com> References: <1474336817-22683-1-git-send-email-zhihong.wang@intel.com> <1476437678-7102-1-git-send-email-yuanhan.liu@linux.intel.com> Subject: [dpdk-dev] [PATCH v7 2/7] vhost: optimize cache access X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Oct 2016 09:33:50 -0000 From: Zhihong Wang This patch reorders the code to delay virtio header write to improve cache access efficiency for cases where the mrg_rxbuf feature is turned on. CPU pipeline stall cycles can be significantly reduced. Virtio header write and mbuf data copy are all remote store operations which takes a long time to finish. It's a good idea to put them together to remove bubbles in between, to let as many remote store instructions as possible go into store buffer at the same time to hide latency, and to let the H/W prefetcher goes to work as early as possible. On a Haswell machine, about 100 cycles can be saved per packet by this patch alone. Taking 64B packets traffic for example, this means about 60% efficiency improvement for the enqueue operation. Signed-off-by: Zhihong Wang Signed-off-by: Yuanhan Liu --- lib/librte_vhost/virtio_net.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 812e5d3..d4fc62a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -390,6 +390,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t desc_offset, desc_avail; uint32_t cpy_len; uint16_t desc_idx, used_idx; + uint64_t hdr_addr, hdr_phys_addr; + struct rte_mbuf *hdr_mbuf; if (unlikely(m == NULL)) return 0; @@ -401,17 +403,15 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr) return 0; - rte_prefetch0((void *)(uintptr_t)desc_addr); + hdr_mbuf = m; + hdr_addr = desc_addr; + hdr_phys_addr = buf_vec[vec_idx].buf_addr; + rte_prefetch0((void *)(uintptr_t)hdr_addr); virtio_hdr.num_buffers = end_idx - start_idx; LOG_DEBUG(VHOST_DATA, "(%d) RX: num merge buffers %d\n", dev->vid, virtio_hdr.num_buffers); - virtio_enqueue_offload(m, &virtio_hdr.hdr); - copy_virtio_net_hdr(dev, desc_addr, virtio_hdr); - vhost_log_write(dev, buf_vec[vec_idx].buf_addr, dev->vhost_hlen); - PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); - desc_avail = buf_vec[vec_idx].buf_len - dev->vhost_hlen; desc_offset = dev->vhost_hlen; desc_chain_head = buf_vec[vec_idx].desc_idx; @@ -456,6 +456,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, mbuf_avail = rte_pktmbuf_data_len(m); } + if (hdr_addr) { + virtio_enqueue_offload(hdr_mbuf, &virtio_hdr.hdr); + copy_virtio_net_hdr(dev, hdr_addr, virtio_hdr); + vhost_log_write(dev, hdr_phys_addr, dev->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)hdr_addr, + dev->vhost_hlen, 0); + + hdr_addr = 0; + } + cpy_len = RTE_MIN(desc_avail, mbuf_avail); rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), -- 1.9.0