From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id C18E68E7D for ; Thu, 3 Dec 2015 07:02:48 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP; 02 Dec 2015 22:02:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,376,1444719600"; d="scan'208";a="699276003" Received: from yliu-dev.sh.intel.com ([10.239.66.49]) by orsmga003.jf.intel.com with ESMTP; 02 Dec 2015 22:02:47 -0800 From: Yuanhan Liu To: dev@dpdk.org Date: Thu, 3 Dec 2015 14:06:12 +0800 Message-Id: <1449122773-25510-5-git-send-email-yuanhan.liu@linux.intel.com> X-Mailer: git-send-email 1.9.0 In-Reply-To: <1449122773-25510-1-git-send-email-yuanhan.liu@linux.intel.com> References: <1449122773-25510-1-git-send-email-yuanhan.liu@linux.intel.com> Cc: "Michael S. Tsirkin" , Victor Kaplansky Subject: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Dec 2015 06:02:49 -0000 First of all, rte_memcpy() is mostly useful for coping big packets by leveraging hardware advanced instructions like AVX. But for virtio net hdr, which is 12 bytes at most, invoking rte_memcpy() will not introduce any performance boost. And, to my suprise, rte_memcpy() is huge. Since rte_memcpy() is inlined, it takes more space every time we call it at a different place. Replacing the two rte_memcpy() with directly copy saves nearly 12K bytes of code size! # without this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 36171 0 0 36171 8d4b /path/to/vhost_rxtx.o # with this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 24179 0 0 24179 5e73 /path/to/vhost_rxtx.o Signed-off-by: Yuanhan Liu --- lib/librte_vhost/vhost_rxtx.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 7464b6b..1e0a24e 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -70,14 +70,17 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t cpy_len; struct vring_desc *desc; uint64_t desc_addr; - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0}; desc = &vq->desc[desc_idx]; desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + } else { + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; + } PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_offset = vq->vhost_hlen; @@ -340,7 +343,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint16_t res_start_idx, uint16_t res_end_idx, struct rte_mbuf *pkt) { - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint32_t vec_idx = 0; uint16_t cur_idx = res_start_idx; uint64_t desc_addr; @@ -361,13 +364,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_prefetch0((void *)(uintptr_t)desc_addr); - virtio_hdr.num_buffers = res_end_idx - res_start_idx; + hdr.num_buffers = res_end_idx - res_start_idx; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", dev->device_fh, virtio_hdr.num_buffers); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + } else { + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; + } PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; -- 1.9.0