From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 330D5475D for ; Thu, 29 Sep 2016 23:23:40 +0200 (CEST) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5240012638; Thu, 29 Sep 2016 21:23:39 +0000 (UTC) Received: from [10.36.5.134] (vpn1-5-134.ams2.redhat.com [10.36.5.134]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u8TLNaHU011105 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 29 Sep 2016 17:23:37 -0400 To: "Michael S. Tsirkin" References: <1474872056-24665-1-git-send-email-yuanhan.liu@linux.intel.com> <1474872056-24665-2-git-send-email-yuanhan.liu@linux.intel.com> <20160926221112-mutt-send-email-mst@kernel.org> <20160927031158.GA25823@yliu-dev.sh.intel.com> <20160927224935-mutt-send-email-mst@kernel.org> <20160928022848.GE1597@yliu-dev.sh.intel.com> <20160929205047-mutt-send-email-mst@kernel.org> <2889e609-f750-a4e1-66f8-768bb07a2339@redhat.com> <20160929231252-mutt-send-email-mst@kernel.org> Cc: Yuanhan Liu , Stephen Hemminger , dev@dpdk.org, qemu-devel@nongnu.org From: Maxime Coquelin Message-ID: <05d62750-303c-4b9b-a5cb-9db8552f0ab2@redhat.com> Date: Thu, 29 Sep 2016 23:23:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160929231252-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 29 Sep 2016 21:23:39 +0000 (UTC) Subject: Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Sep 2016 21:23:40 -0000 On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote: > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote: >> >> >> On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote: >>> On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote: >> ... >>>> >>>> Before enabling anything by default, we should first optimize the 1 slot >>>> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17% >>>> perf regression for 64 bytes case: >>>> - 2 descs per packet: 11.6Mpps >>>> - 1 desc per packet: 9.6Mpps >>>> >>>> This is due to the virtio header clearing in virtqueue_enqueue_xmit(). >>>> Removing it, we get better results than with 2 descs (1.20Mpps). >>>> Since the Virtio PMD doesn't support offloads, I wonder whether we can >>>> just drop the memset? >>> >>> What will happen? Will the header be uninitialized? >> Yes.. >> I didn't look closely at the spec, but just looked at DPDK's and Linux >> vhost implementations. IIUC, the header is just skipped in the two >> implementations. > > In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb > first thing it does is > memset(hdr, 0, sizeof(*hdr)); I meant in vhost-net linux implementation, the header is just skipped: static void handle_tx(struct vhost_net *net) { ... /* Skip header. TODO: support TSO. */ len = iov_length(vq->iov, out); iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); iov_iter_advance(&msg.msg_iter, hdr_size); And the same is done is done in DPDK: static inline int __attribute__((always_inline)) copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs, uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx, struct rte_mempool *mbuf_pool) { ... /* * A virtio driver normally uses at least 2 desc buffers * for Tx: the first for storing the header, and others * for storing the data. */ if (likely((desc->len == dev->vhost_hlen) && (desc->flags & VRING_DESC_F_NEXT) != 0)) { desc = &descs[desc->next]; if (unlikely(desc->flags & VRING_DESC_F_INDIRECT)) return -1; desc_addr = gpa_to_vva(dev, desc->addr); if (unlikely(!desc_addr)) return -1; rte_prefetch0((void *)(uintptr_t)desc_addr); desc_offset = 0; desc_avail = desc->len; nr_desc += 1; PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); } else { desc_avail = desc->len - dev->vhost_hlen; desc_offset = dev->vhost_hlen; } > > > >>> >>> The spec says: >>> The driver can send a completely checksummed packet. In this case, flags >>> will be zero, and gso_type >>> will be VIRTIO_NET_HDR_GSO_NONE. >>> >>> and >>> The driver MUST set num_buffers to zero. >>> If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to >>> zero and SHOULD supply a fully >>> checksummed packet to the device. >>> >>> and >>> If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been >>> negotiated, the driver MUST >>> set gso_type to VIRTIO_NET_HDR_GSO_NONE. >>> >>> so doing this unconditionally would be a spec violation, but if you see >>> value in this, we can add a feature bit. >> Right it would be a spec violation, so it should be done conditionally. >> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER? >> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set. >> If negotiated, we wouldn't need to prepend a header. > > Yes but two points. > > 1. why is this memset expensive? Is the test completely skipping looking > at the packet otherwise? Yes. > > 2. As long as we are doing this, see > Alignment vs. Networking > ======================== > in Documentation/unaligned-memory-access.txt Thanks, I'll have a look tomorrow. Maxime