From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: "Xie, Huawei" <huawei.xie@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
"dev@dpdk.org" <dev@dpdk.org>,
Victor Kaplansky <vkaplans@redhat.com>
Subject: Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst
Date: Mon, 7 Mar 2016 10:44:06 +0800 [thread overview]
Message-ID: <20160307024406.GX14300@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <C37D651A908B024F974696C65296B57B4C636CBD@SHSMSX101.ccr.corp.intel.com>
On Mon, Mar 07, 2016 at 02:19:54AM +0000, Xie, Huawei wrote:
> On 3/4/2016 10:19 AM, Yuanhan Liu wrote:
> > On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote:
> >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote:
> >>> The current rte_vhost_dequeue_burst() implementation is a bit messy
> >>> and logic twisted. And you could see repeat code here and there: it
> >>> invokes rte_pktmbuf_alloc() three times at three different places!
> >>>
> >>> However, rte_vhost_dequeue_burst() acutally does a simple job: copy
> >>> the packet data from vring desc to mbuf. What's tricky here is:
> >>>
> >>> - desc buff could be chained (by desc->next field), so that you need
> >>> fetch next one if current is wholly drained.
> >>>
> >>> - One mbuf could not be big enough to hold all desc buff, hence you
> >>> need to chain the mbuf as well, by the mbuf->next field.
> >>>
> >>> Even though, the logic could be simple. Here is the pseudo code.
> >>>
> >>> while (this_desc_is_not_drained_totally || has_next_desc) {
> >>> if (this_desc_has_drained_totally) {
> >>> this_desc = next_desc();
> >>> }
> >>>
> >>> if (mbuf_has_no_room) {
> >>> mbuf = allocate_a_new_mbuf();
> >>> }
> >>>
> >>> COPY(mbuf, desc);
> >>> }
> >>>
> >>> And this is how I refactored rte_vhost_dequeue_burst.
> >>>
> >>> Note that the old patch does a special handling for skipping virtio
> >>> header. However, that could be simply done by adjusting desc_avail
> >>> and desc_offset var:
> >>>
> >>> desc_avail = desc->len - vq->vhost_hlen;
> >>> desc_offset = vq->vhost_hlen;
> >>>
> >>> This refactor makes the code much more readable (IMO), yet it reduces
> >>> binary code size (nearly 2K).
> >>>
> >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>> ---
> >>>
> >>> v2: - fix potential NULL dereference bug of var "prev" and "head"
> >>> ---
> >>> lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++-------------------------
> >>> 1 file changed, 116 insertions(+), 181 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> >>> index 5e7e5b1..d5cd0fa 100644
> >>> --- a/lib/librte_vhost/vhost_rxtx.c
> >>> +++ b/lib/librte_vhost/vhost_rxtx.c
> >>> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m)
> >>> }
> >>> }
> >>>
> >>> +static inline struct rte_mbuf *
> >>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
> >>> + uint16_t desc_idx, struct rte_mempool *mbuf_pool)
> >>> +{
> >>> + struct vring_desc *desc;
> >>> + uint64_t desc_addr;
> >>> + uint32_t desc_avail, desc_offset;
> >>> + uint32_t mbuf_avail, mbuf_offset;
> >>> + uint32_t cpy_len;
> >>> + struct rte_mbuf *head = NULL;
> >>> + struct rte_mbuf *cur = NULL, *prev = NULL;
> >>> + struct virtio_net_hdr *hdr;
> >>> +
> >>> + desc = &vq->desc[desc_idx];
> >>> + desc_addr = gpa_to_vva(dev, desc->addr);
> >>> + rte_prefetch0((void *)(uintptr_t)desc_addr);
> >>> +
> >>> + /* Retrieve virtio net header */
> >>> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
> >>> + desc_avail = desc->len - vq->vhost_hlen;
> >> There is a serious bug here, desc->len - vq->vhost_len could overflow.
> >> VM could easily create this case. Let us fix it here.
> > Nope, this issue has been there since the beginning, and this patch
> > is a refactor: we should not bring any functional changes. Therefore,
> > we should not fix it here.
>
> No, I don't mean exactly fixing in this patch but in this series.
>
> Besides, from refactoring's perspective, actually we could make things
> further much simpler and more readable. Both the desc chains and mbuf
> could be converted into iovec, then both dequeue(copy_desc_to_mbuf) and
> enqueue(copy_mbuf_to_desc) could use the commonly used iovec copying
> algorithms As data path are performance oriented, let us stop here.
Agreed, I had same performance concern on further simplication,
therefore I didn't go further.
> >
> > And actually, it's been fixed in the 6th patch in this series:
>
> Will check that.
Do you have other comments for other patches? I'm considering to send
a new version recently, say maybe tomorrow.
--yliu
next prev parent reply other threads:[~2016-03-07 2:42 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu
2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu
2015-12-03 7:02 ` Stephen Hemminger
2015-12-03 7:25 ` Yuanhan Liu
2015-12-03 7:03 ` Stephen Hemminger
2015-12-12 6:55 ` Rich Lane
2015-12-14 1:55 ` Yuanhan Liu
2016-01-26 10:30 ` Xie, Huawei
2016-01-27 3:26 ` Yuanhan Liu
2016-01-27 6:12 ` Xie, Huawei
2016-01-27 6:16 ` Yuanhan Liu
2015-12-03 6:06 ` [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx Yuanhan Liu
2015-12-11 20:42 ` Rich Lane
2015-12-14 1:47 ` Yuanhan Liu
2016-01-21 13:50 ` Jérôme Jutteau
2016-01-27 3:27 ` Yuanhan Liu
2015-12-03 6:06 ` [dpdk-dev] [PATCH 3/5] vhost: refactor virtio_dev_merge_rx Yuanhan Liu
2015-12-03 6:06 ` [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu
2016-01-27 2:46 ` Xie, Huawei
2016-01-27 3:22 ` Yuanhan Liu
2016-01-27 5:56 ` Xie, Huawei
2016-01-27 6:02 ` Yuanhan Liu
2016-01-27 6:16 ` Xie, Huawei
2016-01-27 6:35 ` Yuanhan Liu
2015-12-03 6:06 ` [dpdk-dev] [PATCH 5/5] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu
2016-02-17 22:50 ` [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Thomas Monjalon
2016-02-18 4:09 ` Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu
2016-03-03 16:21 ` Xie, Huawei
2016-03-04 2:21 ` Yuanhan Liu
2016-03-07 2:19 ` Xie, Huawei
2016-03-07 2:44 ` Yuanhan Liu [this message]
2016-03-03 16:30 ` Xie, Huawei
2016-03-04 2:17 ` Yuanhan Liu
2016-03-07 2:32 ` Xie, Huawei
2016-03-07 2:48 ` Yuanhan Liu
2016-03-07 2:59 ` Xie, Huawei
2016-03-07 6:14 ` Yuanhan Liu
2016-03-03 17:19 ` Xie, Huawei
2016-03-04 2:11 ` Yuanhan Liu
2016-03-07 2:55 ` Xie, Huawei
2016-03-03 17:40 ` Xie, Huawei
2016-03-04 2:32 ` Yuanhan Liu
2016-03-07 3:02 ` Xie, Huawei
2016-03-07 3:03 ` Xie, Huawei
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx Yuanhan Liu
2016-03-07 3:34 ` Xie, Huawei
2016-03-08 12:27 ` Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu
2016-03-07 6:22 ` Xie, Huawei
2016-03-07 6:36 ` Yuanhan Liu
2016-03-07 6:38 ` Xie, Huawei
2016-03-07 6:51 ` Yuanhan Liu
2016-03-07 7:03 ` Xie, Huawei
2016-03-07 7:16 ` Xie, Huawei
2016-03-07 8:20 ` Yuanhan Liu
2016-03-07 7:52 ` Xie, Huawei
2016-03-07 8:38 ` Yuanhan Liu
2016-03-07 9:27 ` Xie, Huawei
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu
2016-03-07 1:20 ` Xie, Huawei
2016-03-07 4:20 ` Stephen Hemminger
2016-03-07 5:24 ` Xie, Huawei
2016-03-07 6:21 ` Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 5/7] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 6/7] vhost: do sanity check for desc->len Yuanhan Liu
2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next Yuanhan Liu
2016-03-07 3:10 ` Xie, Huawei
2016-03-07 6:57 ` Yuanhan Liu
2016-02-29 16:06 ` [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor Thomas Monjalon
2016-03-01 6:01 ` Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 2/8] vhost: refactor virtio_dev_rx Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx Yuanhan Liu
2016-03-11 16:18 ` Thomas Monjalon
2016-03-14 7:35 ` [dpdk-dev] [PATCH v4 " Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 4/8] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 5/8] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 6/8] vhost: do sanity check for desc->len Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 7/8] vhost: do sanity check for desc->next against with vq->size Yuanhan Liu
2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 8/8] vhost: avoid dead loop chain Yuanhan Liu
2016-03-14 23:09 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160307024406.GX14300@yliu-dev.sh.intel.com \
--to=yuanhan.liu@linux.intel.com \
--cc=dev@dpdk.org \
--cc=huawei.xie@intel.com \
--cc=mst@redhat.com \
--cc=vkaplans@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).