* [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor @ 2015-12-03 6:06 Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (6 more replies) 0 siblings, 7 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky Vhost rxtx code is derived from vhost-switch example, which is very likely the most messy code in DPDK. Unluckily, the move also brings over the bad merits: twisted logic, bad comments. When I joined this team firstly, I was quite scared off by the messy and long vhost rxtx code. While adding the vhost-user live migration support, that I have to make fews changes to it, I then ventured to look at it again, to understand it better, in the meantime, to see if I can refactor it. And, here you go. The first 3 patches refactor 3 major functions at vhost_rxtx.c, respectively. It simplifies the code logic, making it more readable. On the other hand, it reduces code size, due to a lot of same code are removed. Patch 4 gets rid of the rte_memcpy for virtio_hdr copy, which nearly saves 12K bytes of code size! Till now, the code has been greatly reduced: 39348 vs 24179. Patch 5 removes "unlikely" for VIRTIO_NET_F_MRG_RXBUF detection. Note that the code could be further simplified or reduced. However, judging that it's a first try and it's the *key* data path, I guess it's okay to not be radical and stop here so far. Another note is that we should add more secure checks at rxtx side. It could be a standalone task for v2.3, and this series is more about refactor, hence I leave it for future enhancement. --- Yuanhan Liu (5): vhost: refactor rte_vhost_dequeue_burst vhost: refactor virtio_dev_rx vhost: refactor virtio_dev_merge_rx vhost: do not use rte_memcpy for virtio_hdr copy vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection lib/librte_vhost/vhost_rxtx.c | 959 ++++++++++++++++++------------------------ 1 file changed, 407 insertions(+), 552 deletions(-) -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu @ 2015-12-03 6:06 ` Yuanhan Liu 2015-12-03 7:02 ` Stephen Hemminger ` (3 more replies) 2015-12-03 6:06 ` [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx Yuanhan Liu ` (5 subsequent siblings) 6 siblings, 4 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky The current rte_vhost_dequeue_burst() implementation is a bit messy and logic twisted. And you could see repeat code here and there: it invokes rte_pktmbuf_alloc() 3 at three different places! However, rte_vhost_dequeue_burst() acutally does a simple job: copy the packet data from vring desc to mbuf. What's tricky here is: - desc buff could be chained (by desc->next field), so that you need fetch next one if current is wholly drained. - One mbuf could not be big enough to hold all desc buff, hence you need to chain the mbuf as well, by the mbuf->next field. Even though, the logic could be simple. Here is the pseudo code. while (this_desc_is_not_drained_totally || has_next_desc) { if (this_desc_has_drained_totally) { this_desc = next_desc(); } if (mbuf_has_no_room) { mbuf = allocate_a_new_mbuf(); } COPY(mbuf, desc); } And this is how I refactored this patch. Note that the old patch does a special handling for skipping virtio header. However, that could be simply done by adjusting desc_avail and desc_offset var: /* Discard virtio header */ desc_avail = desc->len - vq->vhost_hlen; desc_offset = vq->vhost_hlen; This refactor makes the code much more readable (IMO), yet it reduces code size (nearly 2K): # without this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 39348 0 0 39348 99b4 /path/to/vhost_rxtx.o # with this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 37435 0 0 37435 923b /path/to/vhost_rxtx.o Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 287 +++++++++++++++++------------------------- 1 file changed, 113 insertions(+), 174 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 9322ce6..b4c6cab 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -43,6 +43,18 @@ #define MAX_PKT_BURST 32 +#define COPY(dst, src) do { \ + cpy_len = RTE_MIN(desc_avail, mbuf_avail); \ + rte_memcpy((void *)(uintptr_t)(dst), \ + (const void *)(uintptr_t)(src), cpy_len); \ + \ + mbuf_avail -= cpy_len; \ + mbuf_offset += cpy_len; \ + desc_avail -= cpy_len; \ + desc_offset += cpy_len; \ +} while(0) + + static bool is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) { @@ -568,19 +580,89 @@ rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, return virtio_dev_rx(dev, queue_id, pkts, count); } +static inline struct rte_mbuf * +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t desc_idx, struct rte_mempool *mbuf_pool) +{ + struct vring_desc *desc; + uint64_t desc_addr; + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + + struct rte_mbuf *head = NULL; + struct rte_mbuf *cur = NULL, *prev = NULL; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + /* Discard virtio header */ + desc_avail = desc->len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; + + mbuf_avail = 0; + mbuf_offset = 0; + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { + /* This desc reachs to its end, get the next one */ + if (desc_avail == 0) { + desc = &vq->desc[desc->next]; + + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + desc_offset = 0; + desc_avail = desc->len; + + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); + } + + /* + * This mbuf reachs to its end, get a new one + * to hold more data. + */ + if (mbuf_avail == 0) { + cur = rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(!cur)) { + RTE_LOG(ERR, VHOST_DATA, "Failed to " + "allocate memory for mbuf.\n"); + rte_pktmbuf_free(head); + return NULL; + } + if (!head) { + head = cur; + } else { + prev->next = cur; + prev->data_len = mbuf_offset; + head->nb_segs += 1; + } + head->pkt_len += mbuf_offset; + prev = cur; + + mbuf_offset = 0; + mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; + } + + COPY(rte_pktmbuf_mtod_offset(cur, uint64_t, mbuf_offset), + desc_addr + desc_offset); + } + prev->data_len = mbuf_offset; + head->pkt_len += mbuf_offset; + + return head; +} + uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) { - struct rte_mbuf *m, *prev; struct vhost_virtqueue *vq; - struct vring_desc *desc; - uint64_t vb_addr = 0; - uint32_t head[MAX_PKT_BURST]; + uint32_t desc_indexes[MAX_PKT_BURST]; uint32_t used_idx; uint32_t i; - uint16_t free_entries, entry_success = 0; + uint16_t free_entries; uint16_t avail_idx; + struct rte_mbuf *m; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -594,192 +676,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, return 0; avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - /* If there are no available buffers then return. */ - if (vq->last_used_idx == avail_idx) + free_entries = avail_idx - vq->last_used_idx; + if (free_entries == 0) return 0; - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, - dev->device_fh); + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh); - /* Prefetch available ring to retrieve head indexes. */ - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); + used_idx = vq->last_used_idx & (vq->size -1); - /*get the number of free entries in the ring*/ - free_entries = (avail_idx - vq->last_used_idx); + /* Prefetch available ring to retrieve head indexes. */ + rte_prefetch0(&vq->avail->ring[used_idx]); - free_entries = RTE_MIN(free_entries, count); - /* Limit to MAX_PKT_BURST. */ - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); + count = RTE_MIN(count, MAX_PKT_BURST); + count = RTE_MIN(count, free_entries); + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequene %u buffers\n", + dev->device_fh, count); - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", - dev->device_fh, free_entries); /* Retrieve all of the head indexes first to avoid caching issues. */ - for (i = 0; i < free_entries; i++) - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; + for (i = 0; i < count; i++) { + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & + (vq->size - 1)]; + } /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success]]); + rte_prefetch0(&vq->desc[desc_indexes[0]]); rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); - while (entry_success < free_entries) { - uint32_t vb_avail, vb_offset; - uint32_t seg_avail, seg_offset; - uint32_t cpy_len; - uint32_t seg_num = 0; - struct rte_mbuf *cur; - uint8_t alloc_err = 0; - - desc = &vq->desc[head[entry_success]]; - - /* Discard first buffer as it is the virtio header */ - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - vb_offset = 0; - vb_avail = desc->len; - } else { - vb_offset = vq->vhost_hlen; - vb_avail = desc->len - vb_offset; - } - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - - used_idx = vq->last_used_idx & (vq->size - 1); - - if (entry_success < (free_entries - 1)) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success+1]]); - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); - } - - /* Update used index buffer information. */ - vq->used->ring[used_idx].id = head[entry_success]; - vq->used->ring[used_idx].len = 0; - - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); - break; - } - seg_offset = 0; - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; - cpy_len = RTE_MIN(vb_avail, seg_avail); - - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); - - seg_num++; - cur = m; - prev = m; - while (cpy_len != 0) { - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), - (void *)((uintptr_t)(vb_addr + vb_offset)), - cpy_len); - - seg_offset += cpy_len; - vb_offset += cpy_len; - vb_avail -= cpy_len; - seg_avail -= cpy_len; - - if (vb_avail != 0) { - /* - * The segment reachs to its end, - * while the virtio buffer in TX vring has - * more data to be copied. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* Allocate mbuf and populate the structure. */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, VHOST_DATA, "Failed to " - "allocate memory for mbuf.\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } else { - if (desc->flags & VRING_DESC_F_NEXT) { - /* - * There are more virtio buffers in - * same vring entry need to be copied. - */ - if (seg_avail == 0) { - /* - * The current segment hasn't - * room to accomodate more - * data. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* - * Allocate an mbuf and - * populate the structure. - */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, - VHOST_DATA, - "Failed to " - "allocate memory " - "for mbuf\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } - - desc = &vq->desc[desc->next]; - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = desc->len; - - PRINT_PACKET(dev, (uintptr_t)vb_addr, - desc->len, 0); - } else { - /* The whole packet completes. */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - vb_avail = 0; - } - } - - cpy_len = RTE_MIN(vb_avail, seg_avail); - } - - if (unlikely(alloc_err == 1)) + for (i = 0; i < count; i++) { + m = copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool); + if (m == NULL) break; + pkts[i] = m; - m->nb_segs = seg_num; - - pkts[entry_success] = m; - vq->last_used_idx++; - entry_success++; + used_idx = vq->last_used_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_indexes[i]; + vq->used->ring[used_idx].len = 0; } rte_compiler_barrier(); - vq->used->idx += entry_success; + vq->used->idx += i; + /* Kick guest if required. */ if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) eventfd_write(vq->callfd, (eventfd_t)1); - return entry_success; + + return i; } -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu @ 2015-12-03 7:02 ` Stephen Hemminger 2015-12-03 7:25 ` Yuanhan Liu 2015-12-03 7:03 ` Stephen Hemminger ` (2 subsequent siblings) 3 siblings, 1 reply; 84+ messages in thread From: Stephen Hemminger @ 2015-12-03 7:02 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Thu, 3 Dec 2015 14:06:09 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > +#define COPY(dst, src) do { \ > + cpy_len = RTE_MIN(desc_avail, mbuf_avail); \ > + rte_memcpy((void *)(uintptr_t)(dst), \ > + (const void *)(uintptr_t)(src), cpy_len); \ > + \ > + mbuf_avail -= cpy_len; \ > + mbuf_offset += cpy_len; \ > + desc_avail -= cpy_len; \ > + desc_offset += cpy_len; \ > +} while(0) > + I see lots of issues here. All those void * casts are unnecessary, C casts arguements already. rte_memcpy is slower for constant size values than memcpy() This macro violates the rule that ther should be no hidden variables in a macro. I.e you are assuming cpy_len, desc_avail, and mbuf_avail are defined in all code using the macro. Why use an un-typed macro when an inline function would be just as fast and give type safety? ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 7:02 ` Stephen Hemminger @ 2015-12-03 7:25 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 7:25 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Dec 02, 2015 at 11:02:44PM -0800, Stephen Hemminger wrote: > On Thu, 3 Dec 2015 14:06:09 +0800 > Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > > > +#define COPY(dst, src) do { \ > > + cpy_len = RTE_MIN(desc_avail, mbuf_avail); \ > > + rte_memcpy((void *)(uintptr_t)(dst), \ > > + (const void *)(uintptr_t)(src), cpy_len); \ > > + \ > > + mbuf_avail -= cpy_len; \ > > + mbuf_offset += cpy_len; \ > > + desc_avail -= cpy_len; \ > > + desc_offset += cpy_len; \ > > +} while(0) > > + > > I see lots of issues here. > > All those void * casts are unnecessary, C casts arguements already. Hi Stephen, Without the cast, the compile will not work, as dst is actully with uint64_t type. > rte_memcpy is slower for constant size values than memcpy() Sorry, what does it have something to do with this patch? > This macro violates the rule that ther should be no hidden variables > in a macro. I.e you are assuming cpy_len, desc_avail, and mbuf_avail > are defined in all code using the macro. Yes, I'm aware of that. And I agree that it's not a good usage _in general_. But I'm thinking that it's okay to use it here, for the three major functions has quite similar logic. And if my memory servers me right, there are also quite many codes like that in Linux kernel. > Why use an un-typed macro when an inline function would be just > as fast and give type safety? It references too many variables, upto 6. And there are 4 vars needed to be updated. Therefore, making it to be a function would make the caller long and ugly. That's why I was thinking to use a macro to remove few lines of repeat code. So, if hidden var macro is forbidden here, I guess I would go with just unfolding those lines of code, but not introducing a helper function. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2015-12-03 7:02 ` Stephen Hemminger @ 2015-12-03 7:03 ` Stephen Hemminger 2015-12-12 6:55 ` Rich Lane 2016-01-26 10:30 ` Xie, Huawei 3 siblings, 0 replies; 84+ messages in thread From: Stephen Hemminger @ 2015-12-03 7:03 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Thu, 3 Dec 2015 14:06:09 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > + rte_prefetch0((void *)(uintptr_t)desc_addr); Another unnecessary set of casts. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2015-12-03 7:02 ` Stephen Hemminger 2015-12-03 7:03 ` Stephen Hemminger @ 2015-12-12 6:55 ` Rich Lane 2015-12-14 1:55 ` Yuanhan Liu 2016-01-26 10:30 ` Xie, Huawei 3 siblings, 1 reply; 84+ messages in thread From: Rich Lane @ 2015-12-12 6:55 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > > +static inline struct rte_mbuf * > +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > +{ > ... > + > + desc = &vq->desc[desc_idx]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + /* Discard virtio header */ > + desc_avail = desc->len - vq->vhost_hlen; If desc->len is zero (happens all the time when restarting DPDK apps in the guest) then desc_avail will be huge. > + desc_offset = vq->vhost_hlen; > + > + mbuf_avail = 0; > + mbuf_offset = 0; > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { + /* This desc reachs to its end, get the next one */ > + if (desc_avail == 0) { > + desc = &vq->desc[desc->next]; > Need to check desc->next against vq->size. There should be a limit on the number of descriptors in a chain to prevent an infinite loop. uint16_t > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > { > ... > avail_idx = *((volatile uint16_t *)&vq->avail->idx); > - > - /* If there are no available buffers then return. */ > - if (vq->last_used_idx == avail_idx) > + free_entries = avail_idx - vq->last_used_idx; > + if (free_entries == 0) > return 0; > A common problem is that avail_idx goes backwards when the guest zeroes its virtqueues. This function could check for free_entries > vq->size and reset vq->last_used_idx. + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequene %u buffers\n", > + dev->device_fh, count); > Typo at "dequene". ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-12 6:55 ` Rich Lane @ 2015-12-14 1:55 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-14 1:55 UTC (permalink / raw) To: Rich Lane; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Fri, Dec 11, 2015 at 10:55:48PM -0800, Rich Lane wrote: > On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > > +static inline struct rte_mbuf * > +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > +{ > > ... > > + > + desc = &vq->desc[desc_idx]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + /* Discard virtio header */ > + desc_avail = desc->len - vq->vhost_hlen; > > > If desc->len is zero (happens all the time when restarting DPDK apps in the > guest) then desc_avail will be huge. I'm aware of it; I have already noted that in the cover letter. This patch set is just a code refactor. Things like that will do a in a latter patch set. (The thing is that Huawei is very cagy about making any changes to vhost rxtx code, as it's the hot data path. So, I will not make any future changes base on this refactor, unless it's applied). > > > + desc_offset = vq->vhost_hlen; > + > + mbuf_avail = 0; > + mbuf_offset = 0; > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { > > + /* This desc reachs to its end, get the next one */ > + if (desc_avail == 0) { > + desc = &vq->desc[desc->next]; > > > Need to check desc->next against vq->size. Thanks for the remind. > > There should be a limit on the number of descriptors in a chain to prevent an > infinite loop. > > > uint16_t > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > count) > { > ... > avail_idx = *((volatile uint16_t *)&vq->avail->idx); > - > - /* If there are no available buffers then return. */ > - if (vq->last_used_idx == avail_idx) > + free_entries = avail_idx - vq->last_used_idx; > + if (free_entries == 0) > return 0; > > > A common problem is that avail_idx goes backwards when the guest zeroes its > virtqueues. This function could check for free_entries > vq->size and reset > vq->last_used_idx. Thanks, but ditto, will consider it in another patch set. > > > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequene %u buffers\n", > + dev->device_fh, count); > > > Typo at "dequene". Oops; thanks. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (2 preceding siblings ...) 2015-12-12 6:55 ` Rich Lane @ 2016-01-26 10:30 ` Xie, Huawei 2016-01-27 3:26 ` Yuanhan Liu 3 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-01-26 10:30 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > --- > lib/librte_vhost/vhost_rxtx.c | 287 +++++++++++++++++------------------------- > 1 file changed, 113 insertions(+), 174 deletions(-) Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us processing descriptors in a burst way in future. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2016-01-26 10:30 ` Xie, Huawei @ 2016-01-27 3:26 ` Yuanhan Liu 2016-01-27 6:12 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 3:26 UTC (permalink / raw) To: Xie, Huawei; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Tue, Jan 26, 2016 at 10:30:12AM +0000, Xie, Huawei wrote: > On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > > --- > > lib/librte_vhost/vhost_rxtx.c | 287 +++++++++++++++++------------------------- > > 1 file changed, 113 insertions(+), 174 deletions(-) > > Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not to do that, unless it has a good reason. > processing descriptors in a burst way in future. So, do you have a plan? --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2016-01-27 3:26 ` Yuanhan Liu @ 2016-01-27 6:12 ` Xie, Huawei 2016-01-27 6:16 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-01-27 6:12 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 1/27/2016 11:26 AM, Yuanhan Liu wrote: > On Tue, Jan 26, 2016 at 10:30:12AM +0000, Xie, Huawei wrote: >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> >>> --- >>> lib/librte_vhost/vhost_rxtx.c | 287 +++++++++++++++++------------------------- >>> 1 file changed, 113 insertions(+), 174 deletions(-) >> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us > I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not > to do that, unless it has a good reason. > >> processing descriptors in a burst way in future. > So, do you have a plan? I think it is OK. If we need unroll in future, we could do that then. I am open to this. Just my preference. I understand that wrapping makes code more readable. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst 2016-01-27 6:12 ` Xie, Huawei @ 2016-01-27 6:16 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 6:16 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Wed, Jan 27, 2016 at 06:12:22AM +0000, Xie, Huawei wrote: > On 1/27/2016 11:26 AM, Yuanhan Liu wrote: > > On Tue, Jan 26, 2016 at 10:30:12AM +0000, Xie, Huawei wrote: > >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > >>> --- > >>> lib/librte_vhost/vhost_rxtx.c | 287 +++++++++++++++++------------------------- > >>> 1 file changed, 113 insertions(+), 174 deletions(-) > >> Prefer to unroll copy_mbuf_to_desc and your COPY macro. It prevents us > > I'm okay to unroll COPY macro. But for copy_mbuf_to_desc, I prefer not > > to do that, unless it has a good reason. > > > >> processing descriptors in a burst way in future. > > So, do you have a plan? > > I think it is OK. If we need unroll in future, we could do that then. I > am open to this. Just my preference. I understand that wrapping makes > code more readable. Okay, let's consider it then: unroll would be easy after all. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu @ 2015-12-03 6:06 ` Yuanhan Liu 2015-12-11 20:42 ` Rich Lane 2015-12-03 6:06 ` [dpdk-dev] [PATCH 3/5] vhost: refactor virtio_dev_merge_rx Yuanhan Liu ` (4 subsequent siblings) 6 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky This is a simple refactor, as there isn't any twisted logic in old code. Here I just broke the code and introduced two helper functions, reserve_avail_buf() and copy_mbuf_to_desc() to make the code more readable. It saves nearly 1K bytes of code size: # without this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 37435 0 0 37435 923b /path/to/vhost_rxtx.o # with this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 36539 0 0 36539 8ebb /path/to/vhost_rxtx.o Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 275 ++++++++++++++++++++---------------------- 1 file changed, 129 insertions(+), 146 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index b4c6cab..cb29459 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -61,6 +61,107 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) return (is_tx ^ (idx & 1)) == 0 && idx < qp_nb * VIRTIO_QNUM; } +static inline int __attribute__((always_inline)) +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) +{ + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + struct vring_desc *desc; + uint64_t desc_addr; + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); + + desc_offset = vq->vhost_hlen; + desc_avail = desc->len - vq->vhost_hlen; + + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (1) { + /* done with current mbuf, fetch next */ + if (mbuf_avail == 0) { + m = m->next; + if (m == NULL) + break; + + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } + + /* done with current desc buf, fetch next */ + if (desc_avail == 0) { + if ((desc->flags & VRING_DESC_F_NEXT) == 0) { + /* Room in vring buffer is not enough */ + return -1; + } + + desc = &vq->desc[desc->next]; + desc_addr = gpa_to_vva(dev, desc->addr); + desc_offset = 0; + desc_avail = desc->len; + } + + COPY(desc_addr + desc_offset, + rte_pktmbuf_mtod_offset(m, uint64_t, mbuf_offset)); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); + } + *copied = rte_pktmbuf_pkt_len(m); + + return 0; +} + +/* + * As many data cores may want access to available buffers + * they need to be reserved. + */ +static inline uint32_t +reserve_avail_buf(struct vhost_virtqueue *vq, uint32_t count, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_end_idx; + uint16_t avail_idx; + uint16_t free_entries; + int success; + + count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST); + +again: + res_start_idx = vq->last_used_idx_res; + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + + free_entries = (avail_idx - res_start_idx); + count = RTE_MIN(count, free_entries); + if (count == 0) + return 0; + + res_end_idx = res_start_idx + count; + + /* + * update vq->last_used_idx_res atomically; try again if failed. + * + * TODO: Allow to disable cmpset if no concurrency in application. + */ + success = rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_end_idx); + if (!success) + goto again; + + *start = res_start_idx; + *end = res_end_idx; + + return count; +} + /** * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet @@ -70,21 +171,12 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t qp_nb) */ static inline uint32_t __attribute__((always_inline)) virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) + struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - struct vring_desc *desc; - struct rte_mbuf *buff; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; - uint64_t buff_addr = 0; - uint64_t buff_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST]; - uint32_t head_idx, packet_success = 0; - uint16_t avail_idx, res_cur_idx; - uint16_t res_base_idx, res_end_idx; - uint16_t free_entries; - uint8_t success = 0; + uint16_t res_start_idx, res_end_idx; + uint16_t desc_indexes[MAX_PKT_BURST]; + uint32_t i; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { @@ -98,152 +190,43 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, if (unlikely(vq->enabled == 0)) return 0; - count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; - - /* - * As many data cores may want access to available buffers, - * they need to be reserved. - */ - do { - res_base_idx = vq->last_used_idx_res; - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - free_entries = (avail_idx - res_base_idx); - /*check that we have enough buffers*/ - if (unlikely(count > free_entries)) - count = free_entries; - - if (count == 0) - return 0; - - res_end_idx = res_base_idx + count; - /* vq->last_used_idx_res is atomically updated. */ - /* TODO: Allow to disable cmpset if no concurrency in application. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, res_end_idx); - } while (unlikely(success == 0)); - res_cur_idx = res_base_idx; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| End Index %d\n", - dev->device_fh, res_cur_idx, res_end_idx); - - /* Prefetch available ring to retrieve indexes. */ - rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]); - - /* Retrieve all of the head indexes first to avoid caching issues. */ - for (head_idx = 0; head_idx < count; head_idx++) - head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & - (vq->size - 1)]; - - /*Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - - while (res_cur_idx != res_end_idx) { - uint32_t offset = 0, vb_offset = 0; - uint32_t pkt_len, len_to_cpy, data_len, total_copied = 0; - uint8_t hdr = 0, uncompleted_pkt = 0; + count = reserve_avail_buf(vq, count, &res_start_idx, &res_end_idx); + if (count == 0) + return 0; - /* Get descriptor from available ring */ - desc = &vq->desc[head[packet_success]]; + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") res_start_idx %d| res_end_idx Index %d\n", + dev->device_fh, res_start_idx, res_end_idx); - buff = pkts[packet_success]; + /* Retrieve all of the desc indexes first to avoid caching issues. */ + rte_prefetch0(&vq->avail->ring[res_start_idx & (vq->size - 1)]); + for (i = 0; i < count; i++) + desc_indexes[i] = vq->avail->ring[(res_start_idx + i) & (vq->size - 1)]; - /* Convert from gpa to vva (guest physical addr -> vhost virtual addr) */ - buff_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)buff_addr); + rte_prefetch0(&vq->desc[desc_indexes[0]]); + for (i = 0; i < count; i++) { + uint16_t desc_idx = desc_indexes[i]; + uint16_t used_idx = (res_start_idx + i) & (vq->size - 1); + uint32_t copied; + int err; - /* Copy virtio_hdr to packet and increment buffer address */ - buff_hdr_addr = buff_addr; + err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, &copied); - /* - * If the descriptors are chained the header and data are - * placed in separate buffers. - */ - if ((desc->flags & VRING_DESC_F_NEXT) && - (desc->len == vq->vhost_hlen)) { - desc = &vq->desc[desc->next]; - /* Buffer address translation. */ - buff_addr = gpa_to_vva(dev, desc->addr); + vq->used->ring[used_idx].id = desc_idx; + if (unlikely(err)) { + vq->used->ring[used_idx].len = vq->vhost_hlen; } else { - vb_offset += vq->vhost_hlen; - hdr = 1; - } - - pkt_len = rte_pktmbuf_pkt_len(buff); - data_len = rte_pktmbuf_data_len(buff); - len_to_cpy = RTE_MIN(data_len, - hdr ? desc->len - vq->vhost_hlen : desc->len); - while (total_copied < pkt_len) { - /* Copy mbuf data to buffer */ - rte_memcpy((void *)(uintptr_t)(buff_addr + vb_offset), - rte_pktmbuf_mtod_offset(buff, const void *, offset), - len_to_cpy); - PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset), - len_to_cpy, 0); - - offset += len_to_cpy; - vb_offset += len_to_cpy; - total_copied += len_to_cpy; - - /* The whole packet completes */ - if (total_copied == pkt_len) - break; - - /* The current segment completes */ - if (offset == data_len) { - buff = buff->next; - offset = 0; - data_len = rte_pktmbuf_data_len(buff); - } - - /* The current vring descriptor done */ - if (vb_offset == desc->len) { - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - buff_addr = gpa_to_vva(dev, desc->addr); - vb_offset = 0; - } else { - /* Room in vring buffer is not enough */ - uncompleted_pkt = 1; - break; - } - } - len_to_cpy = RTE_MIN(data_len - offset, desc->len - vb_offset); + vq->used->ring[used_idx].len = copied + vq->vhost_hlen; } - /* Update used ring with desc information */ - vq->used->ring[res_cur_idx & (vq->size - 1)].id = - head[packet_success]; - - /* Drop the packet if it is uncompleted */ - if (unlikely(uncompleted_pkt == 1)) - vq->used->ring[res_cur_idx & (vq->size - 1)].len = - vq->vhost_hlen; - else - vq->used->ring[res_cur_idx & (vq->size - 1)].len = - pkt_len + vq->vhost_hlen; - - res_cur_idx++; - packet_success++; - - if (unlikely(uncompleted_pkt == 1)) - continue; - - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); - - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); - - if (res_cur_idx < res_end_idx) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - } + if (i + 1 < count) + rte_prefetch0(&vq->desc[desc_indexes[i+1]]); } rte_compiler_barrier(); /* Wait until it's our turn to add our buffer to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != res_start_idx)) rte_pause(); *(volatile uint16_t *)&vq->used->idx += count; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx 2015-12-03 6:06 ` [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx Yuanhan Liu @ 2015-12-11 20:42 ` Rich Lane 2015-12-14 1:47 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Rich Lane @ 2015-12-11 20:42 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > +static inline int __attribute__((always_inline)) > +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) > +{ > ... > + while (1) { > + /* done with current mbuf, fetch next */ > + if (mbuf_avail == 0) { > + m = m->next; > + if (m == NULL) > + break; > + > + mbuf_offset = 0; > + mbuf_avail = rte_pktmbuf_data_len(m); > + } > + > + /* done with current desc buf, fetch next */ > + if (desc_avail == 0) { > + if ((desc->flags & VRING_DESC_F_NEXT) == 0) { > + /* Room in vring buffer is not enough */ > + return -1; > + } > + > + desc = &vq->desc[desc->next]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + desc_offset = 0; > + desc_avail = desc->len; > + } > + > + COPY(desc_addr + desc_offset, > + rte_pktmbuf_mtod_offset(m, uint64_t, mbuf_offset)); > + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), > + cpy_len, 0); > + } > + *copied = rte_pktmbuf_pkt_len(m); > AFAICT m will always be NULL at this point so the call to rte_pktmbuf_len will segfault. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx 2015-12-11 20:42 ` Rich Lane @ 2015-12-14 1:47 ` Yuanhan Liu 2016-01-21 13:50 ` Jérôme Jutteau 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2015-12-14 1:47 UTC (permalink / raw) To: Rich Lane; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Fri, Dec 11, 2015 at 12:42:33PM -0800, Rich Lane wrote: > On Wed, Dec 2, 2015 at 10:06 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > > +static inline int __attribute__((always_inline)) > +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) > +{ > ... > + while (1) { > + /* done with current mbuf, fetch next */ > + if (mbuf_avail == 0) { > + m = m->next; > + if (m == NULL) > + break; > + > + mbuf_offset = 0; > + mbuf_avail = rte_pktmbuf_data_len(m); > + } > + > + /* done with current desc buf, fetch next */ > + if (desc_avail == 0) { > + if ((desc->flags & VRING_DESC_F_NEXT) == 0) { > + /* Room in vring buffer is not enough */ > + return -1; > + } > + > + desc = &vq->desc[desc->next]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + desc_offset = 0; > + desc_avail = desc->len; > + } > + > + COPY(desc_addr + desc_offset, > + rte_pktmbuf_mtod_offset(m, uint64_t, mbuf_offset)); > + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), > + cpy_len, 0); > + } > + *copied = rte_pktmbuf_pkt_len(m); > > > AFAICT m will always be NULL at this point so the call to rte_pktmbuf_len will > segfault. Right, I should move it in the beginning of this function. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx 2015-12-14 1:47 ` Yuanhan Liu @ 2016-01-21 13:50 ` Jérôme Jutteau 2016-01-27 3:27 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Jérôme Jutteau @ 2016-01-21 13:50 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin Hi Yuanhan, 2015-12-14 2:47 GMT+01:00 Yuanhan Liu <yuanhan.liu@linux.intel.com>: > Right, I should move it in the beginning of this function. Any news about this refactoring ? -- Jérôme Jutteau, Tel : 0826.206.307 (poste 304) IMPORTANT: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx 2016-01-21 13:50 ` Jérôme Jutteau @ 2016-01-27 3:27 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 3:27 UTC (permalink / raw) To: Jérôme Jutteau; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Thu, Jan 21, 2016 at 02:50:01PM +0100, Jérôme Jutteau wrote: > Hi Yuanhan, > > 2015-12-14 2:47 GMT+01:00 Yuanhan Liu <yuanhan.liu@linux.intel.com>: > > Right, I should move it in the beginning of this function. > > Any news about this refactoring ? Hi Jérôme, Thanks for showing interests in this patch set; I was waiting for Huawei's comments. And fortunately, he starts making comments. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH 3/5] vhost: refactor virtio_dev_merge_rx 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx Yuanhan Liu @ 2015-12-03 6:06 ` Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu ` (3 subsequent siblings) 6 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky Current virtio_dev_merge_rx just looks like the old rte_vhost_dequeue_burst, twisted logic, that you can see same code block in quite many places. However, the logic virtio_dev_merge_rx is quite similar to virtio_dev_rx. The big difference is that the meregeable one could allocate more than one available entries to hold the data. Fetching all available entries to vec_buf at once makes the difference a bit bigger then. Anyway, it could be simpler, just like what we did for virtio_dev_rx(). The difference is that we need to update used ring properly, as there could be more than one entries: while (1) { if (this_desc_has_no_room) { this_desc = fetch_next_from_vec_buf(); if (it is the last of a desc chain) { update_used_ring(); } } if (this_mbuf_has_drained_totally) { this_mbuf = fetch_next_mbuf(); if (this_mbuf == NULL) break; } COPY(this_desc, this_mbuf); } It reduces quite many lines of code, but it just saves a few bytes of code size (which is expected, btw): # without this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 36539 0 0 36539 8ebb /path/to/vhost_rxtx.o # with this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 36171 0 0 36171 8d4b /path/to/vhost_rxtx.o Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 389 +++++++++++++++++------------------------- 1 file changed, 158 insertions(+), 231 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index cb29459..7464b6b 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -241,235 +241,194 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, return count; } +static inline int +fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, + uint32_t *allocated, uint32_t *vec_idx) +{ + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; + uint32_t vec_id = *vec_idx; + uint32_t len = *allocated; + + while (1) { + if (vec_id >= BUF_VECTOR_MAX) + return -1; + + len += vq->desc[idx].len; + vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; + vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; + vq->buf_vec[vec_id].desc_idx = idx; + vec_id++; + + if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0) + break; + + idx = vq->desc[idx].next; + } + + *allocated = len; + *vec_idx = vec_id; + + return 0; +} + +/* + * As many data cores may want to access available buffers concurrently, + * they need to be reserved. + * + * Returns -1 on fail, 0 on success + */ +static inline int +reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_cur_idx; + uint16_t avail_idx; + uint32_t allocated; + uint32_t vec_idx; + uint16_t tries; + +again: + res_start_idx = vq->last_used_idx_res; + res_cur_idx = res_start_idx; + + allocated = 0; + vec_idx = 0; + tries = 0; + while (1) { + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + if (unlikely(res_cur_idx == avail_idx)) { + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Failed " + "to get enough desc from vring\n", + dev->device_fh); + return -1; + } + + if (fill_vec_buf(vq, res_cur_idx, &allocated, &vec_idx) < 0) + return -1; + + res_cur_idx++; + tries++; + + if (allocated >= size) + break; + + /* + * if we tried all available ring items, and still + * can't get enough buf, it means something abnormal + * happened. + */ + if (tries >= vq->size) + return -1; + } + + /* + * update vq->last_used_idx_res atomically. + * retry again if failed. + */ + if (rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_cur_idx) == 0) + goto again; + + *start = res_start_idx; + *end = res_cur_idx; + return 0; +} + static inline uint32_t __attribute__((always_inline)) -copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, - uint16_t res_base_idx, uint16_t res_end_idx, - struct rte_mbuf *pkt) +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t res_start_idx, uint16_t res_end_idx, + struct rte_mbuf *pkt) { + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint32_t vec_idx = 0; - uint32_t entry_success = 0; - struct vhost_virtqueue *vq; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = { - {0, 0, 0, 0, 0, 0}, 0}; - uint16_t cur_idx = res_base_idx; - uint64_t vb_addr = 0; - uint64_t vb_hdr_addr = 0; - uint32_t seg_offset = 0; - uint32_t vb_offset = 0; - uint32_t seg_avail; - uint32_t vb_avail; - uint32_t cpy_len, entry_len; + uint16_t cur_idx = res_start_idx; + uint64_t desc_addr; + uint32_t mbuf_offset, mbuf_avail; + uint32_t desc_offset, desc_avail; + uint32_t cpy_len; + uint16_t desc_idx, used_idx; + uint32_t nr_used = 0; if (pkt == NULL) return 0; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " - "End Index %d\n", + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") Current Index %d| End Index %d\n", dev->device_fh, cur_idx, res_end_idx); - /* - * Convert from gpa to vva - * (guest physical addr -> vhost virtual addr) - */ - vq = dev->virtqueue[queue_id]; + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - vb_hdr_addr = vb_addr; - - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); - virtio_hdr.num_buffers = res_end_idx - res_base_idx; + virtio_hdr.num_buffers = res_end_idx - res_start_idx; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", dev->device_fh, virtio_hdr.num_buffers); - rte_memcpy((void *)(uintptr_t)vb_hdr_addr, + rte_memcpy((void *)(uintptr_t)desc_addr, (const void *)&virtio_hdr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); - PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; - seg_avail = rte_pktmbuf_data_len(pkt); - vb_offset = vq->vhost_hlen; - vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + mbuf_avail = rte_pktmbuf_data_len(pkt); + mbuf_offset = 0; + while (1) { + /* done with current desc buf, get the next one */ + if (desc_avail == 0) { + desc_idx = vq->buf_vec[vec_idx].desc_idx; - entry_len = vq->vhost_hlen; + if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { + /* Update used ring with desc information */ + used_idx = cur_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_idx; + vq->used->ring[used_idx].len = desc_offset; - if (vb_avail == 0) { - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; + nr_used++; + } - if ((vq->desc[desc_idx].flags - & VRING_DESC_F_NEXT) == 0) { - /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; + vec_idx++; + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - entry_len = 0; - cur_idx++; - entry_success++; + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)desc_addr); + desc_offset = 0; + desc_avail = vq->buf_vec[vec_idx].buf_len; } - vec_idx++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - } - - cpy_len = RTE_MIN(vb_avail, seg_avail); - - while (cpy_len > 0) { - /* Copy mbuf data to vring buffer */ - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), - rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset), - cpy_len); - - PRINT_PACKET(dev, - (uintptr_t)(vb_addr + vb_offset), - cpy_len, 0); - - seg_offset += cpy_len; - vb_offset += cpy_len; - seg_avail -= cpy_len; - vb_avail -= cpy_len; - entry_len += cpy_len; - - if (seg_avail != 0) { - /* - * The virtio buffer in this vring - * entry reach to its end. - * But the segment doesn't complete. - */ - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { - /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; - entry_len = 0; - cur_idx++; - entry_success++; - } - - vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This current segment complete, need continue to - * check if the whole packet complete or not. - */ + /* done with current mbuf, get the next one */ + if (mbuf_avail == 0) { pkt = pkt->next; - if (pkt != NULL) { - /* - * There are more segments. - */ - if (vb_avail == 0) { - /* - * This current buffer from vring is - * used up, need fetch next buffer - * from buf_vec. - */ - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; - - if ((vq->desc[desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { - uint16_t wrapped_idx = - cur_idx & (vq->size - 1); - /* - * Update used ring with the - * descriptor information - */ - vq->used->ring[wrapped_idx].id - = desc_idx; - vq->used->ring[wrapped_idx].len - = entry_len; - entry_success++; - entry_len = 0; - cur_idx++; - } - - /* Get next buffer from buf_vec. */ - vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_avail = - vq->buf_vec[vec_idx].buf_len; - vb_offset = 0; - } - - seg_offset = 0; - seg_avail = rte_pktmbuf_data_len(pkt); - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This whole packet completes. - */ - /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; - entry_success++; + if (!pkt) break; - } - } - } - - return entry_success; -} -static inline void __attribute__((always_inline)) -update_secure_len(struct vhost_virtqueue *vq, uint32_t id, - uint32_t *secure_len, uint32_t *vec_idx) -{ - uint16_t wrapped_idx = id & (vq->size - 1); - uint32_t idx = vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - uint32_t len = *secure_len; - uint32_t vec_id = *vec_idx; + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(pkt); + } - do { - next_desc = 0; - len += vq->desc[idx].len; - vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; - vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; - vq->buf_vec[vec_id].desc_idx = idx; - vec_id++; + COPY(desc_addr + desc_offset, + rte_pktmbuf_mtod_offset(pkt, uint64_t, mbuf_offset)); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); + } - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { - idx = vq->desc[idx].next; - next_desc = 1; - } - } while (next_desc); + used_idx = cur_idx & (vq->size - 1); + vq->used->ring[used_idx].id = vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[used_idx].len = desc_offset; + nr_used++; - *secure_len = len; - *vec_idx = vec_id; + return nr_used; } -/* - * This function works for mergeable RX. - */ static inline uint32_t __attribute__((always_inline)) virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - uint32_t pkt_idx = 0, entry_success = 0; - uint16_t avail_idx; - uint16_t res_base_idx, res_cur_idx; - uint8_t success = 0; + uint32_t pkt_idx = 0, nr_used = 0; + uint16_t start, end; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", dev->device_fh); @@ -485,62 +444,30 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, return 0; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); - if (count == 0) return 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; - do { - /* - * As many data cores may want access to available - * buffers, they need to be reserved. - */ - uint32_t secure_len = 0; - uint32_t vec_idx = 0; - - res_base_idx = vq->last_used_idx_res; - res_cur_idx = res_base_idx; - - do { - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - if (unlikely(res_cur_idx == avail_idx)) { - LOG_DEBUG(VHOST_DATA, - "(%"PRIu64") Failed " - "to get enough desc from " - "vring\n", - dev->device_fh); - goto merge_rx_exit; - } else { - update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx); - res_cur_idx++; - } - } while (pkt_len > secure_len); - - /* vq->last_used_idx_res is atomically updated. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, - res_cur_idx); - } while (success == 0); - - entry_success = copy_from_mbuf_to_vring(dev, queue_id, - res_base_idx, res_cur_idx, pkts[pkt_idx]); + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) + break; + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, + pkts[pkt_idx]); rte_compiler_barrier(); /* * Wait until it's our turn to add our buffer * to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != start)) rte_pause(); - *(volatile uint16_t *)&vq->used->idx += entry_success; - vq->last_used_idx = res_cur_idx; + *(volatile uint16_t *)&vq->used->idx += nr_used; + vq->last_used_idx = end; } -merge_rx_exit: if (likely(pkt_idx)) { /* flush used->idx update before we read avail->flags. */ rte_mb(); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu ` (2 preceding siblings ...) 2015-12-03 6:06 ` [dpdk-dev] [PATCH 3/5] vhost: refactor virtio_dev_merge_rx Yuanhan Liu @ 2015-12-03 6:06 ` Yuanhan Liu 2016-01-27 2:46 ` Xie, Huawei 2015-12-03 6:06 ` [dpdk-dev] [PATCH 5/5] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu ` (2 subsequent siblings) 6 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky First of all, rte_memcpy() is mostly useful for coping big packets by leveraging hardware advanced instructions like AVX. But for virtio net hdr, which is 12 bytes at most, invoking rte_memcpy() will not introduce any performance boost. And, to my suprise, rte_memcpy() is huge. Since rte_memcpy() is inlined, it takes more space every time we call it at a different place. Replacing the two rte_memcpy() with directly copy saves nearly 12K bytes of code size! # without this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 36171 0 0 36171 8d4b /path/to/vhost_rxtx.o # with this patch $ size /path/to/vhost_rxtx.o text data bss dec hex filename 24179 0 0 24179 5e73 /path/to/vhost_rxtx.o Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 7464b6b..1e0a24e 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -70,14 +70,17 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t cpy_len; struct vring_desc *desc; uint64_t desc_addr; - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0}; desc = &vq->desc[desc_idx]; desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + } else { + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; + } PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_offset = vq->vhost_hlen; @@ -340,7 +343,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, uint16_t res_start_idx, uint16_t res_end_idx, struct rte_mbuf *pkt) { - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0}; uint32_t vec_idx = 0; uint16_t cur_idx = res_start_idx; uint64_t desc_addr; @@ -361,13 +364,16 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_prefetch0((void *)(uintptr_t)desc_addr); - virtio_hdr.num_buffers = res_end_idx - res_start_idx; + hdr.num_buffers = res_end_idx - res_start_idx; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", dev->device_fh, virtio_hdr.num_buffers); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + } else { + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; + } PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2015-12-03 6:06 ` [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu @ 2016-01-27 2:46 ` Xie, Huawei 2016-01-27 3:22 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-01-27 2:46 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > + } else { > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > + } Thanks! We might simplify this further. Just reset the first two fields flags and gso_type. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2016-01-27 2:46 ` Xie, Huawei @ 2016-01-27 3:22 ` Yuanhan Liu 2016-01-27 5:56 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 3:22 UTC (permalink / raw) To: Xie, Huawei; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote: > On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > > + } else { > > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > > + } > > Thanks! > We might simplify this further. Just reset the first two fields flags > and gso_type. What's this "simplification" for? Don't even to say that we will add TSO support, which modifies few more files, such as csum_start: reseting the first two fields only is wrong here. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2016-01-27 3:22 ` Yuanhan Liu @ 2016-01-27 5:56 ` Xie, Huawei 2016-01-27 6:02 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-01-27 5:56 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote: >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: >>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { >>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; >>> + } else { >>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; >>> + } >> Thanks! >> We might simplify this further. Just reset the first two fields flags >> and gso_type. > What's this "simplification" for? Don't even to say that we will add > TSO support, which modifies few more files, such as csum_start: reseting > the first two fields only is wrong here. I know TSO before commenting, but at least in this implementation and this specific patch, i guess zeroing two fields are enough. What is wrong resetting only two fields? > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2016-01-27 5:56 ` Xie, Huawei @ 2016-01-27 6:02 ` Yuanhan Liu 2016-01-27 6:16 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 6:02 UTC (permalink / raw) To: Xie, Huawei; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote: > On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > > On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote: > >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > >>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > >>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > >>> + } else { > >>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > >>> + } > >> Thanks! > >> We might simplify this further. Just reset the first two fields flags > >> and gso_type. > > What's this "simplification" for? Don't even to say that we will add > > TSO support, which modifies few more files, such as csum_start: reseting > > the first two fields only is wrong here. > > I know TSO before commenting, but at least in this implementation and > this specific patch, i guess zeroing two fields are enough. > > What is wrong resetting only two fields? I then have to ask "What is the benifit of resetting only two fields"? If doing so, we have to change it back for TSO. My proposal requires no extra change when adding TSO support. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2016-01-27 6:02 ` Yuanhan Liu @ 2016-01-27 6:16 ` Xie, Huawei 2016-01-27 6:35 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-01-27 6:16 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On 1/27/2016 2:02 PM, Yuanhan Liu wrote: > On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote: >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote: >>> On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote: >>>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: >>>>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { >>>>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; >>>>> + } else { >>>>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; >>>>> + } >>>> Thanks! >>>> We might simplify this further. Just reset the first two fields flags >>>> and gso_type. >>> What's this "simplification" for? Don't even to say that we will add >>> TSO support, which modifies few more files, such as csum_start: reseting >>> the first two fields only is wrong here. >> I know TSO before commenting, but at least in this implementation and >> this specific patch, i guess zeroing two fields are enough. >> >> What is wrong resetting only two fields? > I then have to ask "What is the benifit of resetting only two fields"? > If doing so, we have to change it back for TSO. My proposal requires no > extra change when adding TSO support. ? Benefit is we save four unnecessary stores. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy 2016-01-27 6:16 ` Xie, Huawei @ 2016-01-27 6:35 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-01-27 6:35 UTC (permalink / raw) To: Xie, Huawei; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Jan 27, 2016 at 06:16:37AM +0000, Xie, Huawei wrote: > On 1/27/2016 2:02 PM, Yuanhan Liu wrote: > > On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote: > >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote: > >>> On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote: > >>>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote: > >>>>> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > >>>>> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > >>>>> + } else { > >>>>> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > >>>>> + } > >>>> Thanks! > >>>> We might simplify this further. Just reset the first two fields flags > >>>> and gso_type. > >>> What's this "simplification" for? Don't even to say that we will add > >>> TSO support, which modifies few more files, such as csum_start: reseting > >>> the first two fields only is wrong here. > >> I know TSO before commenting, but at least in this implementation and > >> this specific patch, i guess zeroing two fields are enough. > >> > >> What is wrong resetting only two fields? > > I then have to ask "What is the benifit of resetting only two fields"? > > If doing so, we have to change it back for TSO. My proposal requires no > > extra change when adding TSO support. > > ? Benefit is we save four unnecessary stores. Hmm..., the hdr size is 12 bytes at most. I mean, does it really matter, coping 3 bytes, or coping 12 bytes in a row? --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH 5/5] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu ` (3 preceding siblings ...) 2015-12-03 6:06 ` [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu @ 2015-12-03 6:06 ` Yuanhan Liu 2016-02-17 22:50 ` [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Thomas Monjalon 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu 6 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2015-12-03 6:06 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost. Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't make sense to me at all. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 1e0a24e..7d96cd4 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -490,7 +490,7 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint16_t count) { - if (unlikely(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))) + if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) return virtio_dev_merge_rx(dev, queue_id, pkts, count); else return virtio_dev_rx(dev, queue_id, pkts, count); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu ` (4 preceding siblings ...) 2015-12-03 6:06 ` [dpdk-dev] [PATCH 5/5] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu @ 2016-02-17 22:50 ` Thomas Monjalon 2016-02-18 4:09 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu 6 siblings, 1 reply; 84+ messages in thread From: Thomas Monjalon @ 2016-02-17 22:50 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin Hi Yuanhan, 2015-12-03 14:06, Yuanhan Liu: > Vhost rxtx code is derived from vhost-switch example, which is very > likely the most messy code in DPDK. Unluckily, the move also brings > over the bad merits: twisted logic, bad comments. > > When I joined this team firstly, I was quite scared off by the messy > and long vhost rxtx code. While adding the vhost-user live migration > support, that I have to make fews changes to it, I then ventured to > look at it again, to understand it better, in the meantime, to see if > I can refactor it. > > And, here you go. There were several comments and a typo detected. Please what is the status of this patchset? ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor 2016-02-17 22:50 ` [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Thomas Monjalon @ 2016-02-18 4:09 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 4:09 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Wed, Feb 17, 2016 at 11:50:22PM +0100, Thomas Monjalon wrote: > Hi Yuanhan, > > 2015-12-03 14:06, Yuanhan Liu: > > Vhost rxtx code is derived from vhost-switch example, which is very > > likely the most messy code in DPDK. Unluckily, the move also brings > > over the bad merits: twisted logic, bad comments. > > > > When I joined this team firstly, I was quite scared off by the messy > > and long vhost rxtx code. While adding the vhost-user live migration > > support, that I have to make fews changes to it, I then ventured to > > look at it again, to understand it better, in the meantime, to see if > > I can refactor it. > > > > And, here you go. > > There were several comments and a typo detected. > Please what is the status of this patchset? Hi Thomas, It was delayed; and I will address those comments and send out a new version recently. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu ` (5 preceding siblings ...) 2016-02-17 22:50 ` [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Thomas Monjalon @ 2016-02-18 13:49 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (8 more replies) 6 siblings, 9 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky Here is a patchset for refactoring vhost rxtx code, mainly for improving readability. The first 3 patches refactor 3 major functions at vhost_rxtx.c, respectively. It simplifies the code logic, making it more readable. On the other hand, it reduces binary code size, due to a lot of duplicate code are removed. Patch 4 gets rid of the rte_memcpy for virtio_hdr copy, which nearly saves 12K bytes of binary code size! Till now, the code has been greatly reduced: 39k vs 24k. Patch 5 removes "unlikely" for VIRTIO_NET_F_MRG_RXBUF detection. Patch 6 and 7 do some sanity check for two desc fields, to make vhost robust and be protected from malicious guest or abnormal use cases. It's key data path changes, and I have done some test to make sure it doesn't break something. However, more test are welcome! --- Yuanhan Liu (7): vhost: refactor rte_vhost_dequeue_burst vhost: refactor virtio_dev_rx vhost: refactor virtio_dev_merge_rx vhost: do not use rte_memcpy for virtio_hdr copy vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection vhost: do sanity check for desc->len vhost: do sanity check for desc->next lib/librte_vhost/vhost_rxtx.c | 1000 ++++++++++++++++++----------------------- 1 file changed, 442 insertions(+), 558 deletions(-) -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-03-03 16:21 ` Xie, Huawei ` (4 more replies) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx Yuanhan Liu ` (7 subsequent siblings) 8 siblings, 5 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky The current rte_vhost_dequeue_burst() implementation is a bit messy and logic twisted. And you could see repeat code here and there: it invokes rte_pktmbuf_alloc() three times at three different places! However, rte_vhost_dequeue_burst() acutally does a simple job: copy the packet data from vring desc to mbuf. What's tricky here is: - desc buff could be chained (by desc->next field), so that you need fetch next one if current is wholly drained. - One mbuf could not be big enough to hold all desc buff, hence you need to chain the mbuf as well, by the mbuf->next field. Even though, the logic could be simple. Here is the pseudo code. while (this_desc_is_not_drained_totally || has_next_desc) { if (this_desc_has_drained_totally) { this_desc = next_desc(); } if (mbuf_has_no_room) { mbuf = allocate_a_new_mbuf(); } COPY(mbuf, desc); } And this is how I refactored rte_vhost_dequeue_burst. Note that the old patch does a special handling for skipping virtio header. However, that could be simply done by adjusting desc_avail and desc_offset var: desc_avail = desc->len - vq->vhost_hlen; desc_offset = vq->vhost_hlen; This refactor makes the code much more readable (IMO), yet it reduces binary code size (nearly 2K). Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- v2: - fix potential NULL dereference bug of var "prev" and "head" --- lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- 1 file changed, 116 insertions(+), 181 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 5e7e5b1..d5cd0fa 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) } } +static inline struct rte_mbuf * +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t desc_idx, struct rte_mempool *mbuf_pool) +{ + struct vring_desc *desc; + uint64_t desc_addr; + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + struct rte_mbuf *head = NULL; + struct rte_mbuf *cur = NULL, *prev = NULL; + struct virtio_net_hdr *hdr; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + /* Retrieve virtio net header */ + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); + desc_avail = desc->len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; + + mbuf_avail = 0; + mbuf_offset = 0; + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { + /* This desc reachs to its end, get the next one */ + if (desc_avail == 0) { + desc = &vq->desc[desc->next]; + + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + desc_offset = 0; + desc_avail = desc->len; + + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); + } + + /* + * This mbuf reachs to its end, get a new one + * to hold more data. + */ + if (mbuf_avail == 0) { + cur = rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(!cur)) { + RTE_LOG(ERR, VHOST_DATA, "Failed to " + "allocate memory for mbuf.\n"); + if (head) + rte_pktmbuf_free(head); + return NULL; + } + if (!head) { + head = cur; + } else { + prev->next = cur; + prev->data_len = mbuf_offset; + head->nb_segs += 1; + } + head->pkt_len += mbuf_offset; + prev = cur; + + mbuf_offset = 0; + mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; + } + + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), + (void *)((uintptr_t)(desc_addr + desc_offset)), + cpy_len); + + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + if (prev) { + prev->data_len = mbuf_offset; + head->pkt_len += mbuf_offset; + + if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) + vhost_dequeue_offload(hdr, head); + } + + return head; +} + uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) { - struct rte_mbuf *m, *prev; struct vhost_virtqueue *vq; - struct vring_desc *desc; - uint64_t vb_addr = 0; - uint64_t vb_net_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST]; + uint32_t desc_indexes[MAX_PKT_BURST]; uint32_t used_idx; uint32_t i; - uint16_t free_entries, entry_success = 0; + uint16_t free_entries; uint16_t avail_idx; - struct virtio_net_hdr *hdr = NULL; + struct rte_mbuf *m; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, return 0; avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - /* If there are no available buffers then return. */ - if (vq->last_used_idx == avail_idx) + free_entries = avail_idx - vq->last_used_idx; + if (free_entries == 0) return 0; - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, - dev->device_fh); + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh); - /* Prefetch available ring to retrieve head indexes. */ - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); + used_idx = vq->last_used_idx & (vq->size -1); - /*get the number of free entries in the ring*/ - free_entries = (avail_idx - vq->last_used_idx); + /* Prefetch available ring to retrieve head indexes. */ + rte_prefetch0(&vq->avail->ring[used_idx]); - free_entries = RTE_MIN(free_entries, count); - /* Limit to MAX_PKT_BURST. */ - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); + count = RTE_MIN(count, MAX_PKT_BURST); + count = RTE_MIN(count, free_entries); + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n", + dev->device_fh, count); - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", - dev->device_fh, free_entries); /* Retrieve all of the head indexes first to avoid caching issues. */ - for (i = 0; i < free_entries; i++) - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; + for (i = 0; i < count; i++) { + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & + (vq->size - 1)]; + } /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success]]); + rte_prefetch0(&vq->desc[desc_indexes[0]]); rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); - while (entry_success < free_entries) { - uint32_t vb_avail, vb_offset; - uint32_t seg_avail, seg_offset; - uint32_t cpy_len; - uint32_t seg_num = 0; - struct rte_mbuf *cur; - uint8_t alloc_err = 0; - - desc = &vq->desc[head[entry_success]]; - - vb_net_hdr_addr = gpa_to_vva(dev, desc->addr); - hdr = (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr); - - /* Discard first buffer as it is the virtio header */ - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - vb_offset = 0; - vb_avail = desc->len; - } else { - vb_offset = vq->vhost_hlen; - vb_avail = desc->len - vb_offset; - } - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - - used_idx = vq->last_used_idx & (vq->size - 1); - - if (entry_success < (free_entries - 1)) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success+1]]); - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); - } - - /* Update used index buffer information. */ - vq->used->ring[used_idx].id = head[entry_success]; - vq->used->ring[used_idx].len = 0; - - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { - RTE_LOG(ERR, VHOST_DATA, - "Failed to allocate memory for mbuf.\n"); - break; - } - seg_offset = 0; - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; - cpy_len = RTE_MIN(vb_avail, seg_avail); - - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); - - seg_num++; - cur = m; - prev = m; - while (cpy_len != 0) { - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), - (void *)((uintptr_t)(vb_addr + vb_offset)), - cpy_len); - - seg_offset += cpy_len; - vb_offset += cpy_len; - vb_avail -= cpy_len; - seg_avail -= cpy_len; - - if (vb_avail != 0) { - /* - * The segment reachs to its end, - * while the virtio buffer in TX vring has - * more data to be copied. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* Allocate mbuf and populate the structure. */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, VHOST_DATA, "Failed to " - "allocate memory for mbuf.\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } else { - if (desc->flags & VRING_DESC_F_NEXT) { - /* - * There are more virtio buffers in - * same vring entry need to be copied. - */ - if (seg_avail == 0) { - /* - * The current segment hasn't - * room to accomodate more - * data. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* - * Allocate an mbuf and - * populate the structure. - */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, - VHOST_DATA, - "Failed to " - "allocate memory " - "for mbuf\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } - - desc = &vq->desc[desc->next]; - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = desc->len; - - PRINT_PACKET(dev, (uintptr_t)vb_addr, - desc->len, 0); - } else { - /* The whole packet completes. */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - vb_avail = 0; - } - } - - cpy_len = RTE_MIN(vb_avail, seg_avail); - } - - if (unlikely(alloc_err == 1)) + for (i = 0; i < count; i++) { + m = copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool); + if (m == NULL) break; + pkts[i] = m; - m->nb_segs = seg_num; - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) - vhost_dequeue_offload(hdr, m); - - pkts[entry_success] = m; - vq->last_used_idx++; - entry_success++; + used_idx = vq->last_used_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_indexes[i]; + vq->used->ring[used_idx].len = 0; } rte_compiler_barrier(); - vq->used->idx += entry_success; + vq->used->idx += i; + /* Kick guest if required. */ if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) eventfd_write(vq->callfd, (eventfd_t)1); - return entry_success; + + return i; } -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu @ 2016-03-03 16:21 ` Xie, Huawei 2016-03-04 2:21 ` Yuanhan Liu 2016-03-03 16:30 ` Xie, Huawei ` (3 subsequent siblings) 4 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-03 16:21 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > The current rte_vhost_dequeue_burst() implementation is a bit messy > and logic twisted. And you could see repeat code here and there: it > invokes rte_pktmbuf_alloc() three times at three different places! > > However, rte_vhost_dequeue_burst() acutally does a simple job: copy > the packet data from vring desc to mbuf. What's tricky here is: > > - desc buff could be chained (by desc->next field), so that you need > fetch next one if current is wholly drained. > > - One mbuf could not be big enough to hold all desc buff, hence you > need to chain the mbuf as well, by the mbuf->next field. > > Even though, the logic could be simple. Here is the pseudo code. > > while (this_desc_is_not_drained_totally || has_next_desc) { > if (this_desc_has_drained_totally) { > this_desc = next_desc(); > } > > if (mbuf_has_no_room) { > mbuf = allocate_a_new_mbuf(); > } > > COPY(mbuf, desc); > } > > And this is how I refactored rte_vhost_dequeue_burst. > > Note that the old patch does a special handling for skipping virtio > header. However, that could be simply done by adjusting desc_avail > and desc_offset var: > > desc_avail = desc->len - vq->vhost_hlen; > desc_offset = vq->vhost_hlen; > > This refactor makes the code much more readable (IMO), yet it reduces > binary code size (nearly 2K). > > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > --- > > v2: - fix potential NULL dereference bug of var "prev" and "head" > --- > lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- > 1 file changed, 116 insertions(+), 181 deletions(-) > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > index 5e7e5b1..d5cd0fa 100644 > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > } > } > > +static inline struct rte_mbuf * > +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > +{ > + struct vring_desc *desc; > + uint64_t desc_addr; > + uint32_t desc_avail, desc_offset; > + uint32_t mbuf_avail, mbuf_offset; > + uint32_t cpy_len; > + struct rte_mbuf *head = NULL; > + struct rte_mbuf *cur = NULL, *prev = NULL; > + struct virtio_net_hdr *hdr; > + > + desc = &vq->desc[desc_idx]; > + desc_addr = gpa_to_vva(dev, desc->addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + /* Retrieve virtio net header */ > + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > + desc_avail = desc->len - vq->vhost_hlen; There is a serious bug here, desc->len - vq->vhost_len could overflow. VM could easily create this case. Let us fix it here. > + desc_offset = vq->vhost_hlen; > + > + mbuf_avail = 0; > + mbuf_offset = 0; > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { > + /* This desc reachs to its end, get the next one */ > + if (desc_avail == 0) { > + desc = &vq->desc[desc->next]; > + > + desc_addr = gpa_to_vva(dev, desc->addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + desc_offset = 0; > + desc_avail = desc->len; > + > + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > + } > + > + /* > + * This mbuf reachs to its end, get a new one > + * to hold more data. > + */ > + if (mbuf_avail == 0) { > + cur = rte_pktmbuf_alloc(mbuf_pool); > + if (unlikely(!cur)) { > + RTE_LOG(ERR, VHOST_DATA, "Failed to " > + "allocate memory for mbuf.\n"); > + if (head) > + rte_pktmbuf_free(head); > + return NULL; > + } > + if (!head) { > + head = cur; > + } else { > + prev->next = cur; > + prev->data_len = mbuf_offset; > + head->nb_segs += 1; > + } > + head->pkt_len += mbuf_offset; > + prev = cur; > + > + mbuf_offset = 0; > + mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > + } > + > + cpy_len = RTE_MIN(desc_avail, mbuf_avail); > + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), > + (void *)((uintptr_t)(desc_addr + desc_offset)), > + cpy_len); > + > + mbuf_avail -= cpy_len; > + mbuf_offset += cpy_len; > + desc_avail -= cpy_len; > + desc_offset += cpy_len; > + } > + > + if (prev) { > + prev->data_len = mbuf_offset; > + head->pkt_len += mbuf_offset; > + > + if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) > + vhost_dequeue_offload(hdr, head); > + } > + > + return head; > +} > + > uint16_t > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > { > - struct rte_mbuf *m, *prev; > struct vhost_virtqueue *vq; > - struct vring_desc *desc; > - uint64_t vb_addr = 0; > - uint64_t vb_net_hdr_addr = 0; > - uint32_t head[MAX_PKT_BURST]; > + uint32_t desc_indexes[MAX_PKT_BURST]; > uint32_t used_idx; > uint32_t i; > - uint16_t free_entries, entry_success = 0; > + uint16_t free_entries; > uint16_t avail_idx; > - struct virtio_net_hdr *hdr = NULL; > + struct rte_mbuf *m; > > if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { > RTE_LOG(ERR, VHOST_DATA, > @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > return 0; > > avail_idx = *((volatile uint16_t *)&vq->avail->idx); > - > - /* If there are no available buffers then return. */ > - if (vq->last_used_idx == avail_idx) > + free_entries = avail_idx - vq->last_used_idx; > + if (free_entries == 0) > return 0; > > - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, > - dev->device_fh); > + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh); > > - /* Prefetch available ring to retrieve head indexes. */ > - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); > + used_idx = vq->last_used_idx & (vq->size -1); > > - /*get the number of free entries in the ring*/ > - free_entries = (avail_idx - vq->last_used_idx); > + /* Prefetch available ring to retrieve head indexes. */ > + rte_prefetch0(&vq->avail->ring[used_idx]); > > - free_entries = RTE_MIN(free_entries, count); > - /* Limit to MAX_PKT_BURST. */ > - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); > + count = RTE_MIN(count, MAX_PKT_BURST); > + count = RTE_MIN(count, free_entries); > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n", > + dev->device_fh, count); > > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", > - dev->device_fh, free_entries); > /* Retrieve all of the head indexes first to avoid caching issues. */ > - for (i = 0; i < free_entries; i++) > - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; > + for (i = 0; i < count; i++) { > + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & > + (vq->size - 1)]; > + } > > /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success]]); > + rte_prefetch0(&vq->desc[desc_indexes[0]]); > rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); > > - while (entry_success < free_entries) { > - uint32_t vb_avail, vb_offset; > - uint32_t seg_avail, seg_offset; > - uint32_t cpy_len; > - uint32_t seg_num = 0; > - struct rte_mbuf *cur; > - uint8_t alloc_err = 0; > - > - desc = &vq->desc[head[entry_success]]; > - > - vb_net_hdr_addr = gpa_to_vva(dev, desc->addr); > - hdr = (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr); > - > - /* Discard first buffer as it is the virtio header */ > - if (desc->flags & VRING_DESC_F_NEXT) { > - desc = &vq->desc[desc->next]; > - vb_offset = 0; > - vb_avail = desc->len; > - } else { > - vb_offset = vq->vhost_hlen; > - vb_avail = desc->len - vb_offset; > - } > - > - /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - > - used_idx = vq->last_used_idx & (vq->size - 1); > - > - if (entry_success < (free_entries - 1)) { > - /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success+1]]); > - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); > - } > - > - /* Update used index buffer information. */ > - vq->used->ring[used_idx].id = head[entry_success]; > - vq->used->ring[used_idx].len = 0; > - > - /* Allocate an mbuf and populate the structure. */ > - m = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(m == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, > - "Failed to allocate memory for mbuf.\n"); > - break; > - } > - seg_offset = 0; > - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); > - > - seg_num++; > - cur = m; > - prev = m; > - while (cpy_len != 0) { > - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), > - (void *)((uintptr_t)(vb_addr + vb_offset)), > - cpy_len); > - > - seg_offset += cpy_len; > - vb_offset += cpy_len; > - vb_avail -= cpy_len; > - seg_avail -= cpy_len; > - > - if (vb_avail != 0) { > - /* > - * The segment reachs to its end, > - * while the virtio buffer in TX vring has > - * more data to be copied. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* Allocate mbuf and populate the structure. */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, "Failed to " > - "allocate memory for mbuf.\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } else { > - if (desc->flags & VRING_DESC_F_NEXT) { > - /* > - * There are more virtio buffers in > - * same vring entry need to be copied. > - */ > - if (seg_avail == 0) { > - /* > - * The current segment hasn't > - * room to accomodate more > - * data. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* > - * Allocate an mbuf and > - * populate the structure. > - */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, > - VHOST_DATA, > - "Failed to " > - "allocate memory " > - "for mbuf\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } > - > - desc = &vq->desc[desc->next]; > - > - /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - vb_offset = 0; > - vb_avail = desc->len; > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, > - desc->len, 0); > - } else { > - /* The whole packet completes. */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - vb_avail = 0; > - } > - } > - > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } > - > - if (unlikely(alloc_err == 1)) > + for (i = 0; i < count; i++) { > + m = copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool); > + if (m == NULL) > break; > + pkts[i] = m; > > - m->nb_segs = seg_num; > - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) > - vhost_dequeue_offload(hdr, m); > - > - pkts[entry_success] = m; > - vq->last_used_idx++; > - entry_success++; > + used_idx = vq->last_used_idx++ & (vq->size - 1); > + vq->used->ring[used_idx].id = desc_indexes[i]; > + vq->used->ring[used_idx].len = 0; > } > > rte_compiler_barrier(); > - vq->used->idx += entry_success; > + vq->used->idx += i; > + > /* Kick guest if required. */ > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > eventfd_write(vq->callfd, (eventfd_t)1); > - return entry_success; > + > + return i; > } ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-03 16:21 ` Xie, Huawei @ 2016-03-04 2:21 ` Yuanhan Liu 2016-03-07 2:19 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-04 2:21 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > The current rte_vhost_dequeue_burst() implementation is a bit messy > > and logic twisted. And you could see repeat code here and there: it > > invokes rte_pktmbuf_alloc() three times at three different places! > > > > However, rte_vhost_dequeue_burst() acutally does a simple job: copy > > the packet data from vring desc to mbuf. What's tricky here is: > > > > - desc buff could be chained (by desc->next field), so that you need > > fetch next one if current is wholly drained. > > > > - One mbuf could not be big enough to hold all desc buff, hence you > > need to chain the mbuf as well, by the mbuf->next field. > > > > Even though, the logic could be simple. Here is the pseudo code. > > > > while (this_desc_is_not_drained_totally || has_next_desc) { > > if (this_desc_has_drained_totally) { > > this_desc = next_desc(); > > } > > > > if (mbuf_has_no_room) { > > mbuf = allocate_a_new_mbuf(); > > } > > > > COPY(mbuf, desc); > > } > > > > And this is how I refactored rte_vhost_dequeue_burst. > > > > Note that the old patch does a special handling for skipping virtio > > header. However, that could be simply done by adjusting desc_avail > > and desc_offset var: > > > > desc_avail = desc->len - vq->vhost_hlen; > > desc_offset = vq->vhost_hlen; > > > > This refactor makes the code much more readable (IMO), yet it reduces > > binary code size (nearly 2K). > > > > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > > --- > > > > v2: - fix potential NULL dereference bug of var "prev" and "head" > > --- > > lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- > > 1 file changed, 116 insertions(+), 181 deletions(-) > > > > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > > index 5e7e5b1..d5cd0fa 100644 > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > > } > > } > > > > +static inline struct rte_mbuf * > > +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > > + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > > +{ > > + struct vring_desc *desc; > > + uint64_t desc_addr; > > + uint32_t desc_avail, desc_offset; > > + uint32_t mbuf_avail, mbuf_offset; > > + uint32_t cpy_len; > > + struct rte_mbuf *head = NULL; > > + struct rte_mbuf *cur = NULL, *prev = NULL; > > + struct virtio_net_hdr *hdr; > > + > > + desc = &vq->desc[desc_idx]; > > + desc_addr = gpa_to_vva(dev, desc->addr); > > + rte_prefetch0((void *)(uintptr_t)desc_addr); > > + > > + /* Retrieve virtio net header */ > > + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > > + desc_avail = desc->len - vq->vhost_hlen; > > There is a serious bug here, desc->len - vq->vhost_len could overflow. > VM could easily create this case. Let us fix it here. Nope, this issue has been there since the beginning, and this patch is a refactor: we should not bring any functional changes. Therefore, we should not fix it here. And actually, it's been fixed in the 6th patch in this series: [PATCH v2 6/7] vhost: do sanity check for desc->len --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-04 2:21 ` Yuanhan Liu @ 2016-03-07 2:19 ` Xie, Huawei 2016-03-07 2:44 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 2:19 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/4/2016 10:19 AM, Yuanhan Liu wrote: > On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> The current rte_vhost_dequeue_burst() implementation is a bit messy >>> and logic twisted. And you could see repeat code here and there: it >>> invokes rte_pktmbuf_alloc() three times at three different places! >>> >>> However, rte_vhost_dequeue_burst() acutally does a simple job: copy >>> the packet data from vring desc to mbuf. What's tricky here is: >>> >>> - desc buff could be chained (by desc->next field), so that you need >>> fetch next one if current is wholly drained. >>> >>> - One mbuf could not be big enough to hold all desc buff, hence you >>> need to chain the mbuf as well, by the mbuf->next field. >>> >>> Even though, the logic could be simple. Here is the pseudo code. >>> >>> while (this_desc_is_not_drained_totally || has_next_desc) { >>> if (this_desc_has_drained_totally) { >>> this_desc = next_desc(); >>> } >>> >>> if (mbuf_has_no_room) { >>> mbuf = allocate_a_new_mbuf(); >>> } >>> >>> COPY(mbuf, desc); >>> } >>> >>> And this is how I refactored rte_vhost_dequeue_burst. >>> >>> Note that the old patch does a special handling for skipping virtio >>> header. However, that could be simply done by adjusting desc_avail >>> and desc_offset var: >>> >>> desc_avail = desc->len - vq->vhost_hlen; >>> desc_offset = vq->vhost_hlen; >>> >>> This refactor makes the code much more readable (IMO), yet it reduces >>> binary code size (nearly 2K). >>> >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> >>> --- >>> >>> v2: - fix potential NULL dereference bug of var "prev" and "head" >>> --- >>> lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- >>> 1 file changed, 116 insertions(+), 181 deletions(-) >>> >>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c >>> index 5e7e5b1..d5cd0fa 100644 >>> --- a/lib/librte_vhost/vhost_rxtx.c >>> +++ b/lib/librte_vhost/vhost_rxtx.c >>> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) >>> } >>> } >>> >>> +static inline struct rte_mbuf * >>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >>> + uint16_t desc_idx, struct rte_mempool *mbuf_pool) >>> +{ >>> + struct vring_desc *desc; >>> + uint64_t desc_addr; >>> + uint32_t desc_avail, desc_offset; >>> + uint32_t mbuf_avail, mbuf_offset; >>> + uint32_t cpy_len; >>> + struct rte_mbuf *head = NULL; >>> + struct rte_mbuf *cur = NULL, *prev = NULL; >>> + struct virtio_net_hdr *hdr; >>> + >>> + desc = &vq->desc[desc_idx]; >>> + desc_addr = gpa_to_vva(dev, desc->addr); >>> + rte_prefetch0((void *)(uintptr_t)desc_addr); >>> + >>> + /* Retrieve virtio net header */ >>> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); >>> + desc_avail = desc->len - vq->vhost_hlen; >> There is a serious bug here, desc->len - vq->vhost_len could overflow. >> VM could easily create this case. Let us fix it here. > Nope, this issue has been there since the beginning, and this patch > is a refactor: we should not bring any functional changes. Therefore, > we should not fix it here. No, I don't mean exactly fixing in this patch but in this series. Besides, from refactoring's perspective, actually we could make things further much simpler and more readable. Both the desc chains and mbuf could be converted into iovec, then both dequeue(copy_desc_to_mbuf) and enqueue(copy_mbuf_to_desc) could use the commonly used iovec copying algorithms As data path are performance oriented, let us stop here. > > And actually, it's been fixed in the 6th patch in this series: Will check that. > > [PATCH v2 6/7] vhost: do sanity check for desc->len > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-07 2:19 ` Xie, Huawei @ 2016-03-07 2:44 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 2:44 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 02:19:54AM +0000, Xie, Huawei wrote: > On 3/4/2016 10:19 AM, Yuanhan Liu wrote: > > On Thu, Mar 03, 2016 at 04:21:19PM +0000, Xie, Huawei wrote: > >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>> The current rte_vhost_dequeue_burst() implementation is a bit messy > >>> and logic twisted. And you could see repeat code here and there: it > >>> invokes rte_pktmbuf_alloc() three times at three different places! > >>> > >>> However, rte_vhost_dequeue_burst() acutally does a simple job: copy > >>> the packet data from vring desc to mbuf. What's tricky here is: > >>> > >>> - desc buff could be chained (by desc->next field), so that you need > >>> fetch next one if current is wholly drained. > >>> > >>> - One mbuf could not be big enough to hold all desc buff, hence you > >>> need to chain the mbuf as well, by the mbuf->next field. > >>> > >>> Even though, the logic could be simple. Here is the pseudo code. > >>> > >>> while (this_desc_is_not_drained_totally || has_next_desc) { > >>> if (this_desc_has_drained_totally) { > >>> this_desc = next_desc(); > >>> } > >>> > >>> if (mbuf_has_no_room) { > >>> mbuf = allocate_a_new_mbuf(); > >>> } > >>> > >>> COPY(mbuf, desc); > >>> } > >>> > >>> And this is how I refactored rte_vhost_dequeue_burst. > >>> > >>> Note that the old patch does a special handling for skipping virtio > >>> header. However, that could be simply done by adjusting desc_avail > >>> and desc_offset var: > >>> > >>> desc_avail = desc->len - vq->vhost_hlen; > >>> desc_offset = vq->vhost_hlen; > >>> > >>> This refactor makes the code much more readable (IMO), yet it reduces > >>> binary code size (nearly 2K). > >>> > >>> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > >>> --- > >>> > >>> v2: - fix potential NULL dereference bug of var "prev" and "head" > >>> --- > >>> lib/librte_vhost/vhost_rxtx.c | 297 +++++++++++++++++------------------------- > >>> 1 file changed, 116 insertions(+), 181 deletions(-) > >>> > >>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c > >>> index 5e7e5b1..d5cd0fa 100644 > >>> --- a/lib/librte_vhost/vhost_rxtx.c > >>> +++ b/lib/librte_vhost/vhost_rxtx.c > >>> @@ -702,21 +702,104 @@ vhost_dequeue_offload(struct virtio_net_hdr *hdr, struct rte_mbuf *m) > >>> } > >>> } > >>> > >>> +static inline struct rte_mbuf * > >>> +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, > >>> + uint16_t desc_idx, struct rte_mempool *mbuf_pool) > >>> +{ > >>> + struct vring_desc *desc; > >>> + uint64_t desc_addr; > >>> + uint32_t desc_avail, desc_offset; > >>> + uint32_t mbuf_avail, mbuf_offset; > >>> + uint32_t cpy_len; > >>> + struct rte_mbuf *head = NULL; > >>> + struct rte_mbuf *cur = NULL, *prev = NULL; > >>> + struct virtio_net_hdr *hdr; > >>> + > >>> + desc = &vq->desc[desc_idx]; > >>> + desc_addr = gpa_to_vva(dev, desc->addr); > >>> + rte_prefetch0((void *)(uintptr_t)desc_addr); > >>> + > >>> + /* Retrieve virtio net header */ > >>> + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); > >>> + desc_avail = desc->len - vq->vhost_hlen; > >> There is a serious bug here, desc->len - vq->vhost_len could overflow. > >> VM could easily create this case. Let us fix it here. > > Nope, this issue has been there since the beginning, and this patch > > is a refactor: we should not bring any functional changes. Therefore, > > we should not fix it here. > > No, I don't mean exactly fixing in this patch but in this series. > > Besides, from refactoring's perspective, actually we could make things > further much simpler and more readable. Both the desc chains and mbuf > could be converted into iovec, then both dequeue(copy_desc_to_mbuf) and > enqueue(copy_mbuf_to_desc) could use the commonly used iovec copying > algorithms As data path are performance oriented, let us stop here. Agreed, I had same performance concern on further simplication, therefore I didn't go further. > > > > And actually, it's been fixed in the 6th patch in this series: > > Will check that. Do you have other comments for other patches? I'm considering to send a new version recently, say maybe tomorrow. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-03-03 16:21 ` Xie, Huawei @ 2016-03-03 16:30 ` Xie, Huawei 2016-03-04 2:17 ` Yuanhan Liu 2016-03-03 17:19 ` Xie, Huawei ` (2 subsequent siblings) 4 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-03 16:30 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > + mbuf_avail = 0; > + mbuf_offset = 0; > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { > + /* This desc reachs to its end, get the next one */ > + if (desc_avail == 0) { > + desc = &vq->desc[desc->next]; > + > + desc_addr = gpa_to_vva(dev, desc->addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + desc_offset = 0; > + desc_avail = desc->len; > + > + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > + } > + > + /* > + * This mbuf reachs to its end, get a new one > + * to hold more data. > + */ > + if (mbuf_avail == 0) { > + cur = rte_pktmbuf_alloc(mbuf_pool); > + if (unlikely(!cur)) { > + RTE_LOG(ERR, VHOST_DATA, "Failed to " > + "allocate memory for mbuf.\n"); > + if (head) > + rte_pktmbuf_free(head); > + return NULL; > + } We could always allocate the head mbuf before the loop, then we save the following branch and make the code more streamlined. It reminds me that this change prevents the possibility of mbuf bulk allocation, one solution is we pass the head mbuf from an additional parameter. Btw, put unlikely before the check of mbuf_avail and checks elsewhere. > + if (!head) { > + head = cur; > + } else { > + prev->next = cur; > + prev->data_len = mbuf_offset; > + head->nb_segs += 1; > + } > + head->pkt_len += mbuf_offset; > + prev = cur; > + > + mbuf_offset = 0; > + mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > + } > + > + cpy_len = RTE_MIN(desc_avail, mbuf_avail); > + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), > + (void *)((uintptr_t)(desc_addr + desc_offset)), > + cpy_len); > + > + mbuf_avail -= cpy_len; > + mbuf_offset += cpy_len; > + desc_avail -= cpy_len; > + desc_offset += cpy_len; > + } > + ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-03 16:30 ` Xie, Huawei @ 2016-03-04 2:17 ` Yuanhan Liu 2016-03-07 2:32 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-04 2:17 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Thu, Mar 03, 2016 at 04:30:42PM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > + mbuf_avail = 0; > > + mbuf_offset = 0; > > + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { > > + /* This desc reachs to its end, get the next one */ > > + if (desc_avail == 0) { > > + desc = &vq->desc[desc->next]; > > + > > + desc_addr = gpa_to_vva(dev, desc->addr); > > + rte_prefetch0((void *)(uintptr_t)desc_addr); > > + > > + desc_offset = 0; > > + desc_avail = desc->len; > > + > > + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > > + } > > + > > + /* > > + * This mbuf reachs to its end, get a new one > > + * to hold more data. > > + */ > > + if (mbuf_avail == 0) { > > + cur = rte_pktmbuf_alloc(mbuf_pool); > > + if (unlikely(!cur)) { > > + RTE_LOG(ERR, VHOST_DATA, "Failed to " > > + "allocate memory for mbuf.\n"); > > + if (head) > > + rte_pktmbuf_free(head); > > + return NULL; > > + } > > We could always allocate the head mbuf before the loop, then we save the > following branch and make the code more streamlined. > It reminds me that this change prevents the possibility of mbuf bulk > allocation, one solution is we pass the head mbuf from an additional > parameter. Yep, that's also something I have thought of. > Btw, put unlikely before the check of mbuf_avail and checks elsewhere. I don't think so. It would benifit for the small packets. What if, however, when TSO or jumbo frame is enabled that we have big packets? --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-04 2:17 ` Yuanhan Liu @ 2016-03-07 2:32 ` Xie, Huawei 2016-03-07 2:48 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 2:32 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/4/2016 10:15 AM, Yuanhan Liu wrote: > On Thu, Mar 03, 2016 at 04:30:42PM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> + mbuf_avail = 0; >>> + mbuf_offset = 0; >>> + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { >>> + /* This desc reachs to its end, get the next one */ >>> + if (desc_avail == 0) { >>> + desc = &vq->desc[desc->next]; >>> + >>> + desc_addr = gpa_to_vva(dev, desc->addr); >>> + rte_prefetch0((void *)(uintptr_t)desc_addr); >>> + >>> + desc_offset = 0; >>> + desc_avail = desc->len; >>> + >>> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); >>> + } >>> + >>> + /* >>> + * This mbuf reachs to its end, get a new one >>> + * to hold more data. >>> + */ >>> + if (mbuf_avail == 0) { >>> + cur = rte_pktmbuf_alloc(mbuf_pool); >>> + if (unlikely(!cur)) { >>> + RTE_LOG(ERR, VHOST_DATA, "Failed to " >>> + "allocate memory for mbuf.\n"); >>> + if (head) >>> + rte_pktmbuf_free(head); >>> + return NULL; >>> + } >> We could always allocate the head mbuf before the loop, then we save the >> following branch and make the code more streamlined. >> It reminds me that this change prevents the possibility of mbuf bulk >> allocation, one solution is we pass the head mbuf from an additional >> parameter. > Yep, that's also something I have thought of. > >> Btw, put unlikely before the check of mbuf_avail and checks elsewhere. > I don't think so. It would benifit for the small packets. What if, > however, when TSO or jumbo frame is enabled that we have big packets? Prefer to favor the path that packet could fit in one mbuf. Btw, not specially to this, return "m = copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool)", failure case is unlikely to happen, so add unlikely for the check if (m == NULL) there. Please check all branches elsewhere. > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-07 2:32 ` Xie, Huawei @ 2016-03-07 2:48 ` Yuanhan Liu 2016-03-07 2:59 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 2:48 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 02:32:46AM +0000, Xie, Huawei wrote: > On 3/4/2016 10:15 AM, Yuanhan Liu wrote: > > On Thu, Mar 03, 2016 at 04:30:42PM +0000, Xie, Huawei wrote: > >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>> + mbuf_avail = 0; > >>> + mbuf_offset = 0; > >>> + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { > >>> + /* This desc reachs to its end, get the next one */ > >>> + if (desc_avail == 0) { > >>> + desc = &vq->desc[desc->next]; > >>> + > >>> + desc_addr = gpa_to_vva(dev, desc->addr); > >>> + rte_prefetch0((void *)(uintptr_t)desc_addr); > >>> + > >>> + desc_offset = 0; > >>> + desc_avail = desc->len; > >>> + > >>> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); > >>> + } > >>> + > >>> + /* > >>> + * This mbuf reachs to its end, get a new one > >>> + * to hold more data. > >>> + */ > >>> + if (mbuf_avail == 0) { > >>> + cur = rte_pktmbuf_alloc(mbuf_pool); > >>> + if (unlikely(!cur)) { > >>> + RTE_LOG(ERR, VHOST_DATA, "Failed to " > >>> + "allocate memory for mbuf.\n"); > >>> + if (head) > >>> + rte_pktmbuf_free(head); > >>> + return NULL; > >>> + } > >> We could always allocate the head mbuf before the loop, then we save the > >> following branch and make the code more streamlined. > >> It reminds me that this change prevents the possibility of mbuf bulk > >> allocation, one solution is we pass the head mbuf from an additional > >> parameter. > > Yep, that's also something I have thought of. > > > >> Btw, put unlikely before the check of mbuf_avail and checks elsewhere. > > I don't think so. It would benifit for the small packets. What if, > > however, when TSO or jumbo frame is enabled that we have big packets? > > Prefer to favor the path that packet could fit in one mbuf. Hmmm, why? While I know that TSO and mergeable buf is disable by default in our vhost example vhost-switch, they are enabled by default in real life. > Btw, not specially to this, return "m = copy_desc_to_mbuf(dev, vq, > desc_indexes[i], mbuf_pool)", failure case is unlikely to happen, so add > unlikely for the check if (m == NULL) there. Please check all branches > elsewhere. Thanks for the remind, will have a detail check. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-07 2:48 ` Yuanhan Liu @ 2016-03-07 2:59 ` Xie, Huawei 2016-03-07 6:14 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 2:59 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/7/2016 10:47 AM, Yuanhan Liu wrote: > On Mon, Mar 07, 2016 at 02:32:46AM +0000, Xie, Huawei wrote: >> On 3/4/2016 10:15 AM, Yuanhan Liu wrote: >>> On Thu, Mar 03, 2016 at 04:30:42PM +0000, Xie, Huawei wrote: >>>> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>>>> + mbuf_avail = 0; >>>>> + mbuf_offset = 0; >>>>> + while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { >>>>> + /* This desc reachs to its end, get the next one */ >>>>> + if (desc_avail == 0) { >>>>> + desc = &vq->desc[desc->next]; >>>>> + >>>>> + desc_addr = gpa_to_vva(dev, desc->addr); >>>>> + rte_prefetch0((void *)(uintptr_t)desc_addr); >>>>> + >>>>> + desc_offset = 0; >>>>> + desc_avail = desc->len; >>>>> + >>>>> + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); >>>>> + } >>>>> + >>>>> + /* >>>>> + * This mbuf reachs to its end, get a new one >>>>> + * to hold more data. >>>>> + */ >>>>> + if (mbuf_avail == 0) { >>>>> + cur = rte_pktmbuf_alloc(mbuf_pool); >>>>> + if (unlikely(!cur)) { >>>>> + RTE_LOG(ERR, VHOST_DATA, "Failed to " >>>>> + "allocate memory for mbuf.\n"); >>>>> + if (head) >>>>> + rte_pktmbuf_free(head); >>>>> + return NULL; >>>>> + } >>>> We could always allocate the head mbuf before the loop, then we save the >>>> following branch and make the code more streamlined. >>>> It reminds me that this change prevents the possibility of mbuf bulk >>>> allocation, one solution is we pass the head mbuf from an additional >>>> parameter. >>> Yep, that's also something I have thought of. >>> >>>> Btw, put unlikely before the check of mbuf_avail and checks elsewhere. >>> I don't think so. It would benifit for the small packets. What if, >>> however, when TSO or jumbo frame is enabled that we have big packets? >> Prefer to favor the path that packet could fit in one mbuf. > Hmmm, why? While I know that TSO and mergeable buf is disable by default > in our vhost example vhost-switch, they are enabled by default in real > life. mergable is only meaningful in RX path. If you mean big packets in TX path, i personally favor the path that packet fits in one mbuf. >> Btw, not specially to this, return "m = copy_desc_to_mbuf(dev, vq, >> desc_indexes[i], mbuf_pool)", failure case is unlikely to happen, so add >> unlikely for the check if (m == NULL) there. Please check all branches >> elsewhere. > Thanks for the remind, will have a detail check. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-07 2:59 ` Xie, Huawei @ 2016-03-07 6:14 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 6:14 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 02:59:55AM +0000, Xie, Huawei wrote: > On 3/7/2016 10:47 AM, Yuanhan Liu wrote: > > On Mon, Mar 07, 2016 at 02:32:46AM +0000, Xie, Huawei wrote: > >> On 3/4/2016 10:15 AM, Yuanhan Liu wrote: > >>> On Thu, Mar 03, 2016 at 04:30:42PM +0000, Xie, Huawei wrote: > >>>> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>>> We could always allocate the head mbuf before the loop, then we save the > >>>> following branch and make the code more streamlined. > >>>> It reminds me that this change prevents the possibility of mbuf bulk > >>>> allocation, one solution is we pass the head mbuf from an additional > >>>> parameter. > >>> Yep, that's also something I have thought of. > >>> > >>>> Btw, put unlikely before the check of mbuf_avail and checks elsewhere. > >>> I don't think so. It would benifit for the small packets. What if, > >>> however, when TSO or jumbo frame is enabled that we have big packets? > >> Prefer to favor the path that packet could fit in one mbuf. > > Hmmm, why? While I know that TSO and mergeable buf is disable by default > > in our vhost example vhost-switch, they are enabled by default in real > > life. > > mergable is only meaningful in RX path. > If you mean big packets in TX path, Sorry, and yes, I meant that. > i personally favor the path that > packet fits in one mbuf. Sorry, that will not convince me. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-03-03 16:21 ` Xie, Huawei 2016-03-03 16:30 ` Xie, Huawei @ 2016-03-03 17:19 ` Xie, Huawei 2016-03-04 2:11 ` Yuanhan Liu 2016-03-03 17:40 ` Xie, Huawei 2016-03-07 3:03 ` Xie, Huawei 4 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-03 17:19 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Michael S. Tsirkin, Victor Kaplansky On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > [...] CCed changchun, the author for the chained handling of desc and mbuf. The change makes the code more readable, but i think the following commit message is simple and enough. > > while (this_desc_is_not_drained_totally || has_next_desc) { > if (this_desc_has_drained_totally) { > this_desc = next_desc(); > } > > if (mbuf_has_no_room) { > mbuf = allocate_a_new_mbuf(); > } > > COPY(mbuf, desc); > } > > [...] > > This refactor makes the code much more readable (IMO), yet it reduces > binary code size (nearly 2K). I guess the reduced binary code size comes from reduced inline calls to mbuf allocation. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-03 17:19 ` Xie, Huawei @ 2016-03-04 2:11 ` Yuanhan Liu 2016-03-07 2:55 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-04 2:11 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Thu, Mar 03, 2016 at 05:19:42PM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > [...] > CCed changchun, the author for the chained handling of desc and mbuf. > The change makes the code more readable, but i think the following > commit message is simple and enough. Hmm.., my commit log tells a full story: - What is the issue? (messy/logic twisted code) - What the code does? (And what are the challenges: few tricky places) - What's the proposed solution to fix it. (the below pseudo code) And you suggest me to get rid of the first 2 items and leave 3rd item (a solution) only? --yliu > > > > while (this_desc_is_not_drained_totally || has_next_desc) { > > if (this_desc_has_drained_totally) { > > this_desc = next_desc(); > > } > > > > if (mbuf_has_no_room) { > > mbuf = allocate_a_new_mbuf(); > > } > > > > COPY(mbuf, desc); > > } > > > > [...] > > > > This refactor makes the code much more readable (IMO), yet it reduces > > binary code size (nearly 2K). > I guess the reduced binary code size comes from reduced inline calls to > mbuf allocation. > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-04 2:11 ` Yuanhan Liu @ 2016-03-07 2:55 ` Xie, Huawei 0 siblings, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 2:55 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/4/2016 10:10 AM, Yuanhan Liu wrote: > On Thu, Mar 03, 2016 at 05:19:42PM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> [...] >> CCed changchun, the author for the chained handling of desc and mbuf. >> The change makes the code more readable, but i think the following >> commit message is simple and enough. > Hmm.., my commit log tells a full story: > > - What is the issue? (messy/logic twisted code) > > - What the code does? (And what are the challenges: few tricky places) > > - What's the proposed solution to fix it. (the below pseudo code) > > And you suggest me to get rid of the first 2 items and leave 3rd item > (a solution) only? The following are simple and enough with just one additional statement for the repeated mbuf allocation or your twisted. Other commit messages are overly duplicated. Just my personal opinion. Up to you. To this special case, for example, we could make both mbuf and vring_desc chains into iovec, then use commonly used iovec copy algorithms for both dequeue and enqueue, which further makes the code much simpler and more readable. For this change, one or two sentences are clear to me. > --yliu > >>> while (this_desc_is_not_drained_totally || has_next_desc) { >>> if (this_desc_has_drained_totally) { >>> this_desc = next_desc(); >>> } >>> >>> if (mbuf_has_no_room) { >>> mbuf = allocate_a_new_mbuf(); >>> } >>> >>> COPY(mbuf, desc); >>> } >>> >>> [...] >>> >>> This refactor makes the code much more readable (IMO), yet it reduces >>> binary code size (nearly 2K). >> I guess the reduced binary code size comes from reduced inline calls to >> mbuf allocation. >> ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (2 preceding siblings ...) 2016-03-03 17:19 ` Xie, Huawei @ 2016-03-03 17:40 ` Xie, Huawei 2016-03-04 2:32 ` Yuanhan Liu 2016-03-07 3:03 ` Xie, Huawei 4 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-03 17:40 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > The current rte_vhost_dequeue_burst() implementation is a bit messy [...] > + > uint16_t > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > { > - struct rte_mbuf *m, *prev; > struct vhost_virtqueue *vq; > - struct vring_desc *desc; > - uint64_t vb_addr = 0; > - uint64_t vb_net_hdr_addr = 0; > - uint32_t head[MAX_PKT_BURST]; > + uint32_t desc_indexes[MAX_PKT_BURST]; indices > uint32_t used_idx; > uint32_t i; > - uint16_t free_entries, entry_success = 0; > + uint16_t free_entries; > uint16_t avail_idx; > - struct virtio_net_hdr *hdr = NULL; > + struct rte_mbuf *m; > > if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { > RTE_LOG(ERR, VHOST_DATA, > @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > return 0; > > avail_idx = *((volatile uint16_t *)&vq->avail->idx); > - > - /* If there are no available buffers then return. */ > - if (vq->last_used_idx == avail_idx) > + free_entries = avail_idx - vq->last_used_idx; > + if (free_entries == 0) > return 0; > > - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, > - dev->device_fh); > + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh); > > - /* Prefetch available ring to retrieve head indexes. */ > - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); > + used_idx = vq->last_used_idx & (vq->size -1); > > - /*get the number of free entries in the ring*/ > - free_entries = (avail_idx - vq->last_used_idx); > + /* Prefetch available ring to retrieve head indexes. */ > + rte_prefetch0(&vq->avail->ring[used_idx]); > > - free_entries = RTE_MIN(free_entries, count); > - /* Limit to MAX_PKT_BURST. */ > - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); > + count = RTE_MIN(count, MAX_PKT_BURST); > + count = RTE_MIN(count, free_entries); > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n", > + dev->device_fh, count); > > - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", > - dev->device_fh, free_entries); > /* Retrieve all of the head indexes first to avoid caching issues. */ > - for (i = 0; i < free_entries; i++) > - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; > + for (i = 0; i < count; i++) { > + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & > + (vq->size - 1)]; > + } > > /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success]]); > + rte_prefetch0(&vq->desc[desc_indexes[0]]); > rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); > > - while (entry_success < free_entries) { > - uint32_t vb_avail, vb_offset; > - uint32_t seg_avail, seg_offset; > - uint32_t cpy_len; > - uint32_t seg_num = 0; > - struct rte_mbuf *cur; > - uint8_t alloc_err = 0; > - > - desc = &vq->desc[head[entry_success]]; > - > - vb_net_hdr_addr = gpa_to_vva(dev, desc->addr); > - hdr = (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr); > - > - /* Discard first buffer as it is the virtio header */ > - if (desc->flags & VRING_DESC_F_NEXT) { > - desc = &vq->desc[desc->next]; > - vb_offset = 0; > - vb_avail = desc->len; > - } else { > - vb_offset = vq->vhost_hlen; > - vb_avail = desc->len - vb_offset; > - } > - > - /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - > - used_idx = vq->last_used_idx & (vq->size - 1); > - > - if (entry_success < (free_entries - 1)) { > - /* Prefetch descriptor index. */ > - rte_prefetch0(&vq->desc[head[entry_success+1]]); > - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); > - } Why is this prefetch silently dropped in the patch? > - > - /* Update used index buffer information. */ > - vq->used->ring[used_idx].id = head[entry_success]; > - vq->used->ring[used_idx].len = 0; > - > - /* Allocate an mbuf and populate the structure. */ > - m = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(m == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, > - "Failed to allocate memory for mbuf.\n"); > - break; > - } > - seg_offset = 0; > - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); > - > - seg_num++; > - cur = m; > - prev = m; > - while (cpy_len != 0) { > - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), > - (void *)((uintptr_t)(vb_addr + vb_offset)), > - cpy_len); > - > - seg_offset += cpy_len; > - vb_offset += cpy_len; > - vb_avail -= cpy_len; > - seg_avail -= cpy_len; > - > - if (vb_avail != 0) { > - /* > - * The segment reachs to its end, > - * while the virtio buffer in TX vring has > - * more data to be copied. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* Allocate mbuf and populate the structure. */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, VHOST_DATA, "Failed to " > - "allocate memory for mbuf.\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } else { > - if (desc->flags & VRING_DESC_F_NEXT) { > - /* > - * There are more virtio buffers in > - * same vring entry need to be copied. > - */ > - if (seg_avail == 0) { > - /* > - * The current segment hasn't > - * room to accomodate more > - * data. > - */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - /* > - * Allocate an mbuf and > - * populate the structure. > - */ > - cur = rte_pktmbuf_alloc(mbuf_pool); > - if (unlikely(cur == NULL)) { > - RTE_LOG(ERR, > - VHOST_DATA, > - "Failed to " > - "allocate memory " > - "for mbuf\n"); > - rte_pktmbuf_free(m); > - alloc_err = 1; > - break; > - } > - seg_num++; > - prev->next = cur; > - prev = cur; > - seg_offset = 0; > - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; > - } > - > - desc = &vq->desc[desc->next]; > - > - /* Buffer address translation. */ > - vb_addr = gpa_to_vva(dev, desc->addr); > - /* Prefetch buffer address. */ > - rte_prefetch0((void *)(uintptr_t)vb_addr); > - vb_offset = 0; > - vb_avail = desc->len; > - > - PRINT_PACKET(dev, (uintptr_t)vb_addr, > - desc->len, 0); > - } else { > - /* The whole packet completes. */ > - cur->data_len = seg_offset; > - m->pkt_len += seg_offset; > - vb_avail = 0; > - } > - } > - > - cpy_len = RTE_MIN(vb_avail, seg_avail); > - } > - > - if (unlikely(alloc_err == 1)) > + for (i = 0; i < count; i++) { > + m = copy_desc_to_mbuf(dev, vq, desc_indexes[i], mbuf_pool); > + if (m == NULL) add unlikely for every case not possible to happen > break; > + pkts[i] = m; > > - m->nb_segs = seg_num; > - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) > - vhost_dequeue_offload(hdr, m); > - > - pkts[entry_success] = m; > - vq->last_used_idx++; > - entry_success++; > + used_idx = vq->last_used_idx++ & (vq->size - 1); > + vq->used->ring[used_idx].id = desc_indexes[i]; > + vq->used->ring[used_idx].len = 0; What is the correct value for ring[used_idx].len, the packet length or 0? > } > > rte_compiler_barrier(); > - vq->used->idx += entry_success; > + vq->used->idx += i; > + > /* Kick guest if required. */ > if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) > eventfd_write(vq->callfd, (eventfd_t)1); > - return entry_success; > + > + return i; > } ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-03 17:40 ` Xie, Huawei @ 2016-03-04 2:32 ` Yuanhan Liu 2016-03-07 3:02 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-04 2:32 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Thu, Mar 03, 2016 at 05:40:14PM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > The current rte_vhost_dequeue_burst() implementation is a bit messy > [...] > > + > > uint16_t > > rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) > > { > > - struct rte_mbuf *m, *prev; > > struct vhost_virtqueue *vq; > > - struct vring_desc *desc; > > - uint64_t vb_addr = 0; > > - uint64_t vb_net_hdr_addr = 0; > > - uint32_t head[MAX_PKT_BURST]; > > + uint32_t desc_indexes[MAX_PKT_BURST]; > > indices http://dictionary.reference.com/browse/index index noun, plural indexes, indices > > > > uint32_t used_idx; > > uint32_t i; > > - uint16_t free_entries, entry_success = 0; > > + uint16_t free_entries; > > uint16_t avail_idx; > > - struct virtio_net_hdr *hdr = NULL; > > + struct rte_mbuf *m; > > > > if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { > > RTE_LOG(ERR, VHOST_DATA, > > @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, > > return 0; > > > > - if (entry_success < (free_entries - 1)) { > > - /* Prefetch descriptor index. */ > > - rte_prefetch0(&vq->desc[head[entry_success+1]]); > > - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); > > - } > > Why is this prefetch silently dropped in the patch? Oops, good catching. Will fix it. Thanks. > > break; > > + pkts[i] = m; > > > > - m->nb_segs = seg_num; > > - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) > > - vhost_dequeue_offload(hdr, m); > > - > > - pkts[entry_success] = m; > > - vq->last_used_idx++; > > - entry_success++; > > + used_idx = vq->last_used_idx++ & (vq->size - 1); > > + vq->used->ring[used_idx].id = desc_indexes[i]; > > + vq->used->ring[used_idx].len = 0; > > What is the correct value for ring[used_idx].len, the packet length or 0? Good question. I didn't notice that before. Sounds buggy to me. However, that's from the old code. Will check it. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-03-04 2:32 ` Yuanhan Liu @ 2016-03-07 3:02 ` Xie, Huawei 0 siblings, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 3:02 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/4/2016 10:30 AM, Yuanhan Liu wrote: > On Thu, Mar 03, 2016 at 05:40:14PM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> The current rte_vhost_dequeue_burst() implementation is a bit messy >> [...] >>> + >>> uint16_t >>> rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, >>> struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) >>> { >>> - struct rte_mbuf *m, *prev; >>> struct vhost_virtqueue *vq; >>> - struct vring_desc *desc; >>> - uint64_t vb_addr = 0; >>> - uint64_t vb_net_hdr_addr = 0; >>> - uint32_t head[MAX_PKT_BURST]; >>> + uint32_t desc_indexes[MAX_PKT_BURST]; >> indices > http://dictionary.reference.com/browse/index > > index > noun, plural indexes, indices ok, i see both two are used. >> >>> uint32_t used_idx; >>> uint32_t i; >>> - uint16_t free_entries, entry_success = 0; >>> + uint16_t free_entries; >>> uint16_t avail_idx; >>> - struct virtio_net_hdr *hdr = NULL; >>> + struct rte_mbuf *m; >>> >>> if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { >>> RTE_LOG(ERR, VHOST_DATA, >>> @@ -730,197 +813,49 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, >>> return 0; >>> >>> - if (entry_success < (free_entries - 1)) { >>> - /* Prefetch descriptor index. */ >>> - rte_prefetch0(&vq->desc[head[entry_success+1]]); >>> - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); >>> - } >> Why is this prefetch silently dropped in the patch? > Oops, good catching. Will fix it. Thanks. > > >>> break; >>> + pkts[i] = m; >>> >>> - m->nb_segs = seg_num; >>> - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) >>> - vhost_dequeue_offload(hdr, m); >>> - >>> - pkts[entry_success] = m; >>> - vq->last_used_idx++; >>> - entry_success++; >>> + used_idx = vq->last_used_idx++ & (vq->size - 1); >>> + vq->used->ring[used_idx].id = desc_indexes[i]; >>> + vq->used->ring[used_idx].len = 0; >> What is the correct value for ring[used_idx].len, the packet length or 0? > Good question. I didn't notice that before. Sounds buggy to me. However, > that's from the old code. Will check it. Yes, i knew it is in old code also. Thanks. > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (3 preceding siblings ...) 2016-03-03 17:40 ` Xie, Huawei @ 2016-03-07 3:03 ` Xie, Huawei 4 siblings, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 3:03 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > + mbuf_avail = 0; > + mbuf_offset = 0; one cs nit, put it at the definition. ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-03-07 3:34 ` Xie, Huawei 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu ` (6 subsequent siblings) 8 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky This is a simple refactor, as there isn't any twisted logic in old code. Here I just broke the code and introduced two helper functions, reserve_avail_buf() and copy_mbuf_to_desc() to make the code more readable. Also, it saves nearly 1K bytes of binary code size. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- v2: - fix NULL dereference bug found by Rich. --- lib/librte_vhost/vhost_rxtx.c | 286 ++++++++++++++++++++---------------------- 1 file changed, 137 insertions(+), 149 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index d5cd0fa..d3775ad 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -92,6 +92,115 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) return; } +static inline int __attribute__((always_inline)) +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) +{ + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + struct vring_desc *desc; + uint64_t desc_addr; + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + virtio_enqueue_offload(m, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); + + desc_offset = vq->vhost_hlen; + desc_avail = desc->len - vq->vhost_hlen; + + *copied = rte_pktmbuf_pkt_len(m); + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (1) { + /* done with current mbuf, fetch next */ + if (mbuf_avail == 0) { + m = m->next; + if (m == NULL) + break; + + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } + + /* done with current desc buf, fetch next */ + if (desc_avail == 0) { + if ((desc->flags & VRING_DESC_F_NEXT) == 0) { + /* Room in vring buffer is not enough */ + return -1; + } + + desc = &vq->desc[desc->next]; + desc_addr = gpa_to_vva(dev, desc->addr); + desc_offset = 0; + desc_avail = desc->len; + } + + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), + rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), + cpy_len); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); + + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + return 0; +} + +/* + * As many data cores may want to access available buffers + * they need to be reserved. + */ +static inline uint32_t +reserve_avail_buf(struct vhost_virtqueue *vq, uint32_t count, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_end_idx; + uint16_t avail_idx; + uint16_t free_entries; + int success; + + count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST); + +again: + res_start_idx = vq->last_used_idx_res; + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + + free_entries = (avail_idx - res_start_idx); + count = RTE_MIN(count, free_entries); + if (count == 0) + return 0; + + res_end_idx = res_start_idx + count; + + /* + * update vq->last_used_idx_res atomically; try again if failed. + * + * TODO: Allow to disable cmpset if no concurrency in application. + */ + success = rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_end_idx); + if (!success) + goto again; + + *start = res_start_idx; + *end = res_end_idx; + + return count; +} + /** * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet @@ -101,21 +210,12 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) */ static inline uint32_t __attribute__((always_inline)) virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) + struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - struct vring_desc *desc; - struct rte_mbuf *buff, *first_buff; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; - uint64_t buff_addr = 0; - uint64_t buff_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST]; - uint32_t head_idx, packet_success = 0; - uint16_t avail_idx, res_cur_idx; - uint16_t res_base_idx, res_end_idx; - uint16_t free_entries; - uint8_t success = 0; + uint16_t res_start_idx, res_end_idx; + uint16_t desc_indexes[MAX_PKT_BURST]; + uint32_t i; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { @@ -129,155 +229,43 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, if (unlikely(vq->enabled == 0)) return 0; - count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; - - /* - * As many data cores may want access to available buffers, - * they need to be reserved. - */ - do { - res_base_idx = vq->last_used_idx_res; - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - free_entries = (avail_idx - res_base_idx); - /*check that we have enough buffers*/ - if (unlikely(count > free_entries)) - count = free_entries; - - if (count == 0) - return 0; - - res_end_idx = res_base_idx + count; - /* vq->last_used_idx_res is atomically updated. */ - /* TODO: Allow to disable cmpset if no concurrency in application. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, res_end_idx); - } while (unlikely(success == 0)); - res_cur_idx = res_base_idx; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| End Index %d\n", - dev->device_fh, res_cur_idx, res_end_idx); - - /* Prefetch available ring to retrieve indexes. */ - rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]); - - /* Retrieve all of the head indexes first to avoid caching issues. */ - for (head_idx = 0; head_idx < count; head_idx++) - head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & - (vq->size - 1)]; - - /*Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - - while (res_cur_idx != res_end_idx) { - uint32_t offset = 0, vb_offset = 0; - uint32_t pkt_len, len_to_cpy, data_len, total_copied = 0; - uint8_t hdr = 0, uncompleted_pkt = 0; + count = reserve_avail_buf(vq, count, &res_start_idx, &res_end_idx); + if (count == 0) + return 0; - /* Get descriptor from available ring */ - desc = &vq->desc[head[packet_success]]; + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") res_start_idx %d| res_end_idx Index %d\n", + dev->device_fh, res_start_idx, res_end_idx); - buff = pkts[packet_success]; - first_buff = buff; + /* Retrieve all of the desc indexes first to avoid caching issues. */ + rte_prefetch0(&vq->avail->ring[res_start_idx & (vq->size - 1)]); + for (i = 0; i < count; i++) + desc_indexes[i] = vq->avail->ring[(res_start_idx + i) & (vq->size - 1)]; - /* Convert from gpa to vva (guest physical addr -> vhost virtual addr) */ - buff_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)buff_addr); + rte_prefetch0(&vq->desc[desc_indexes[0]]); + for (i = 0; i < count; i++) { + uint16_t desc_idx = desc_indexes[i]; + uint16_t used_idx = (res_start_idx + i) & (vq->size - 1); + uint32_t copied; + int err; - /* Copy virtio_hdr to packet and increment buffer address */ - buff_hdr_addr = buff_addr; + err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, &copied); - /* - * If the descriptors are chained the header and data are - * placed in separate buffers. - */ - if ((desc->flags & VRING_DESC_F_NEXT) && - (desc->len == vq->vhost_hlen)) { - desc = &vq->desc[desc->next]; - /* Buffer address translation. */ - buff_addr = gpa_to_vva(dev, desc->addr); + vq->used->ring[used_idx].id = desc_idx; + if (unlikely(err)) { + vq->used->ring[used_idx].len = vq->vhost_hlen; } else { - vb_offset += vq->vhost_hlen; - hdr = 1; - } - - pkt_len = rte_pktmbuf_pkt_len(buff); - data_len = rte_pktmbuf_data_len(buff); - len_to_cpy = RTE_MIN(data_len, - hdr ? desc->len - vq->vhost_hlen : desc->len); - while (total_copied < pkt_len) { - /* Copy mbuf data to buffer */ - rte_memcpy((void *)(uintptr_t)(buff_addr + vb_offset), - rte_pktmbuf_mtod_offset(buff, const void *, offset), - len_to_cpy); - PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset), - len_to_cpy, 0); - - offset += len_to_cpy; - vb_offset += len_to_cpy; - total_copied += len_to_cpy; - - /* The whole packet completes */ - if (total_copied == pkt_len) - break; - - /* The current segment completes */ - if (offset == data_len) { - buff = buff->next; - offset = 0; - data_len = rte_pktmbuf_data_len(buff); - } - - /* The current vring descriptor done */ - if (vb_offset == desc->len) { - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - buff_addr = gpa_to_vva(dev, desc->addr); - vb_offset = 0; - } else { - /* Room in vring buffer is not enough */ - uncompleted_pkt = 1; - break; - } - } - len_to_cpy = RTE_MIN(data_len - offset, desc->len - vb_offset); + vq->used->ring[used_idx].len = copied + vq->vhost_hlen; } - /* Update used ring with desc information */ - vq->used->ring[res_cur_idx & (vq->size - 1)].id = - head[packet_success]; - - /* Drop the packet if it is uncompleted */ - if (unlikely(uncompleted_pkt == 1)) - vq->used->ring[res_cur_idx & (vq->size - 1)].len = - vq->vhost_hlen; - else - vq->used->ring[res_cur_idx & (vq->size - 1)].len = - pkt_len + vq->vhost_hlen; - - res_cur_idx++; - packet_success++; - - if (unlikely(uncompleted_pkt == 1)) - continue; - - virtio_enqueue_offload(first_buff, &virtio_hdr.hdr); - - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); - - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); - - if (res_cur_idx < res_end_idx) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - } + if (i + 1 < count) + rte_prefetch0(&vq->desc[desc_indexes[i+1]]); } rte_compiler_barrier(); /* Wait until it's our turn to add our buffer to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != res_start_idx)) rte_pause(); *(volatile uint16_t *)&vq->used->idx += count; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx Yuanhan Liu @ 2016-03-07 3:34 ` Xie, Huawei 2016-03-08 12:27 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 3:34 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > + while (1) { > + /* done with current mbuf, fetch next */ > + if (mbuf_avail == 0) { > + m = m->next; > + if (m == NULL) > + break; > + > + mbuf_offset = 0; > + mbuf_avail = rte_pktmbuf_data_len(m); > + } > + You could use while (mbuf_avail || m->next) to align with the style of coyp_desc_to_mbuf? ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx 2016-03-07 3:34 ` Xie, Huawei @ 2016-03-08 12:27 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-08 12:27 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 03:34:53AM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > + while (1) { > > + /* done with current mbuf, fetch next */ > > + if (mbuf_avail == 0) { > > + m = m->next; > > + if (m == NULL) > > + break; > > + > > + mbuf_offset = 0; > > + mbuf_avail = rte_pktmbuf_data_len(m); > > + } > > + > > You could use while (mbuf_avail || m->next) to align with the style of > coyp_desc_to_mbuf? Good suggestion, will do that. Thanks. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-03-07 6:22 ` Xie, Huawei 2016-03-07 7:52 ` Xie, Huawei 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu ` (5 subsequent siblings) 8 siblings, 2 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky Current virtio_dev_merge_rx() implementation just looks like the old rte_vhost_dequeue_burst(), full of twisted logic, that you can see same code block in quite many different places. However, the logic of virtio_dev_merge_rx() is quite similar to virtio_dev_rx(). The big difference is that the mergeable one could allocate more than one available entries to hold the data. Fetching all available entries to vec_buf at once makes the difference a bit bigger then. Anyway, it could be simpler, just like what we did for virtio_dev_rx(). One difference is that we need to update used ring properly. The pseudo code looks like below: while (1) { if (this_desc_has_no_room) { this_desc = fetch_next_from_vec_buf(); if (it is the last of a desc chain) { update_used_ring(); } } if (this_mbuf_has_drained_totally) { this_mbuf = fetch_next_mbuf(); if (this_mbuf == NULL) break; } COPY(this_desc, this_mbuf); } This patch reduces quite many lines of code, therefore, make it much more readable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 390 ++++++++++++++++++------------------------ 1 file changed, 163 insertions(+), 227 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index d3775ad..3909584 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -280,237 +280,200 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, return count; } -static inline uint32_t __attribute__((always_inline)) -copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, - uint16_t res_base_idx, uint16_t res_end_idx, - struct rte_mbuf *pkt) +static inline int +fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, + uint32_t *allocated, uint32_t *vec_idx) { - uint32_t vec_idx = 0; - uint32_t entry_success = 0; - struct vhost_virtqueue *vq; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = { - {0, 0, 0, 0, 0, 0}, 0}; - uint16_t cur_idx = res_base_idx; - uint64_t vb_addr = 0; - uint64_t vb_hdr_addr = 0; - uint32_t seg_offset = 0; - uint32_t vb_offset = 0; - uint32_t seg_avail; - uint32_t vb_avail; - uint32_t cpy_len, entry_len; - - if (pkt == NULL) - return 0; + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; + uint32_t vec_id = *vec_idx; + uint32_t len = *allocated; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " - "End Index %d\n", - dev->device_fh, cur_idx, res_end_idx); + while (1) { + if (vec_id >= BUF_VECTOR_MAX) + return -1; - /* - * Convert from gpa to vva - * (guest physical addr -> vhost virtual addr) - */ - vq = dev->virtqueue[queue_id]; + len += vq->desc[idx].len; + vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; + vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; + vq->buf_vec[vec_id].desc_idx = idx; + vec_id++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - vb_hdr_addr = vb_addr; + if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0) + break; - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); + idx = vq->desc[idx].next; + } - virtio_hdr.num_buffers = res_end_idx - res_base_idx; + *allocated = len; + *vec_idx = vec_id; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", - dev->device_fh, virtio_hdr.num_buffers); + return 0; +} - virtio_enqueue_offload(pkt, &virtio_hdr.hdr); +/* + * As many data cores may want to access available buffers concurrently, + * they need to be reserved. + * + * Returns -1 on fail, 0 on success + */ +static inline int +reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_cur_idx; + uint16_t avail_idx; + uint32_t allocated; + uint32_t vec_idx; + uint16_t tries; - rte_memcpy((void *)(uintptr_t)vb_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); +again: + res_start_idx = vq->last_used_idx_res; + res_cur_idx = res_start_idx; - PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); + allocated = 0; + vec_idx = 0; + tries = 0; + while (1) { + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + if (unlikely(res_cur_idx == avail_idx)) { + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Failed " + "to get enough desc from vring\n", + dev->device_fh); + return -1; + } - seg_avail = rte_pktmbuf_data_len(pkt); - vb_offset = vq->vhost_hlen; - vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + if (fill_vec_buf(vq, res_cur_idx, &allocated, &vec_idx) < 0) + return -1; - entry_len = vq->vhost_hlen; + res_cur_idx++; + tries++; - if (vb_avail == 0) { - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; + if (allocated >= size) + break; - if ((vq->desc[desc_idx].flags - & VRING_DESC_F_NEXT) == 0) { - /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; + /* + * if we tried all available ring items, and still + * can't get enough buf, it means something abnormal + * happened. + */ + if (tries >= vq->size) + return -1; + } - entry_len = 0; - cur_idx++; - entry_success++; - } + /* + * update vq->last_used_idx_res atomically. + * retry again if failed. + */ + if (rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_cur_idx) == 0) + goto again; - vec_idx++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + *start = res_start_idx; + *end = res_cur_idx; + return 0; +} - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - } +static inline uint32_t __attribute__((always_inline)) +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t res_start_idx, uint16_t res_end_idx, + struct rte_mbuf *m) +{ + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + uint32_t vec_idx = 0; + uint16_t cur_idx = res_start_idx; + uint64_t desc_addr; + uint32_t mbuf_offset, mbuf_avail; + uint32_t desc_offset, desc_avail; + uint32_t cpy_len; + uint16_t desc_idx, used_idx; + uint32_t nr_used = 0; - cpy_len = RTE_MIN(vb_avail, seg_avail); + if (m == NULL) + return 0; - while (cpy_len > 0) { - /* Copy mbuf data to vring buffer */ - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), - rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset), - cpy_len); + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") Current Index %d| End Index %d\n", + dev->device_fh, cur_idx, res_end_idx); - PRINT_PACKET(dev, - (uintptr_t)(vb_addr + vb_offset), - cpy_len, 0); + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + virtio_hdr.num_buffers = res_end_idx - res_start_idx; + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", + dev->device_fh, virtio_hdr.num_buffers); - seg_offset += cpy_len; - vb_offset += cpy_len; - seg_avail -= cpy_len; - vb_avail -= cpy_len; - entry_len += cpy_len; - - if (seg_avail != 0) { - /* - * The virtio buffer in this vring - * entry reach to its end. - * But the segment doesn't complete. - */ - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { + virtio_enqueue_offload(m, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); + + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; + + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (1) { + /* done with current desc buf, get the next one */ + if (desc_avail == 0) { + desc_idx = vq->buf_vec[vec_idx].desc_idx; + + if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; - entry_len = 0; - cur_idx++; - entry_success++; + used_idx = cur_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_idx; + vq->used->ring[used_idx].len = desc_offset; + + nr_used++; } vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This current segment complete, need continue to - * check if the whole packet complete or not. - */ - pkt = pkt->next; - if (pkt != NULL) { - /* - * There are more segments. - */ - if (vb_avail == 0) { - /* - * This current buffer from vring is - * used up, need fetch next buffer - * from buf_vec. - */ - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; - - if ((vq->desc[desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { - uint16_t wrapped_idx = - cur_idx & (vq->size - 1); - /* - * Update used ring with the - * descriptor information - */ - vq->used->ring[wrapped_idx].id - = desc_idx; - vq->used->ring[wrapped_idx].len - = entry_len; - entry_success++; - entry_len = 0; - cur_idx++; - } - - /* Get next buffer from buf_vec. */ - vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_avail = - vq->buf_vec[vec_idx].buf_len; - vb_offset = 0; - } - - seg_offset = 0; - seg_avail = rte_pktmbuf_data_len(pkt); - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This whole packet completes. - */ - /* Update used ring with desc information */ - vq->used->ring[cur_idx & (vq->size - 1)].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[cur_idx & (vq->size - 1)].len - = entry_len; - entry_success++; - break; - } + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)desc_addr); + desc_offset = 0; + desc_avail = vq->buf_vec[vec_idx].buf_len; } - } - return entry_success; -} + /* done with current mbuf, get the next one */ + if (mbuf_avail == 0) { + m = m->next; + if (!m) + break; -static inline void __attribute__((always_inline)) -update_secure_len(struct vhost_virtqueue *vq, uint32_t id, - uint32_t *secure_len, uint32_t *vec_idx) -{ - uint16_t wrapped_idx = id & (vq->size - 1); - uint32_t idx = vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - uint32_t len = *secure_len; - uint32_t vec_id = *vec_idx; + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } - do { - next_desc = 0; - len += vq->desc[idx].len; - vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; - vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; - vq->buf_vec[vec_id].desc_idx = idx; - vec_id++; + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), + rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), + cpy_len); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { - idx = vq->desc[idx].next; - next_desc = 1; - } - } while (next_desc); + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + used_idx = cur_idx & (vq->size - 1); + vq->used->ring[used_idx].id = vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[used_idx].len = desc_offset; + nr_used++; - *secure_len = len; - *vec_idx = vec_id; + return nr_used; } -/* - * This function works for mergeable RX. - */ static inline uint32_t __attribute__((always_inline)) virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - uint32_t pkt_idx = 0, entry_success = 0; - uint16_t avail_idx; - uint16_t res_base_idx, res_cur_idx; - uint8_t success = 0; + uint32_t pkt_idx = 0, nr_used = 0; + uint16_t start, end; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", dev->device_fh); @@ -526,57 +489,30 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, return 0; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); - if (count == 0) return 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; - do { - /* - * As many data cores may want access to available - * buffers, they need to be reserved. - */ - uint32_t secure_len = 0; - uint32_t vec_idx = 0; - - res_base_idx = vq->last_used_idx_res; - res_cur_idx = res_base_idx; - - do { - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - if (unlikely(res_cur_idx == avail_idx)) - goto merge_rx_exit; - - update_secure_len(vq, res_cur_idx, - &secure_len, &vec_idx); - res_cur_idx++; - } while (pkt_len > secure_len); - - /* vq->last_used_idx_res is atomically updated. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, - res_cur_idx); - } while (success == 0); - - entry_success = copy_from_mbuf_to_vring(dev, queue_id, - res_base_idx, res_cur_idx, pkts[pkt_idx]); + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) + break; + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, + pkts[pkt_idx]); rte_compiler_barrier(); /* * Wait until it's our turn to add our buffer * to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != start)) rte_pause(); - *(volatile uint16_t *)&vq->used->idx += entry_success; - vq->last_used_idx = res_cur_idx; + *(volatile uint16_t *)&vq->used->idx += nr_used; + vq->last_used_idx = end; } -merge_rx_exit: if (likely(pkt_idx)) { /* flush used->idx update before we read avail->flags. */ rte_mb(); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu @ 2016-03-07 6:22 ` Xie, Huawei 2016-03-07 6:36 ` Yuanhan Liu 2016-03-07 7:52 ` Xie, Huawei 1 sibling, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 6:22 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; > + uint32_t vec_id = *vec_idx; > + uint32_t len = *allocated; > There is bug not using volatile to retrieve the avail idx. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 6:22 ` Xie, Huawei @ 2016-03-07 6:36 ` Yuanhan Liu 2016-03-07 6:38 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 6:36 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; > > + uint32_t vec_id = *vec_idx; > > + uint32_t len = *allocated; > > > There is bug not using volatile to retrieve the avail idx. avail_idx? This is actually from "vq->last_used_idx_res". --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 6:36 ` Yuanhan Liu @ 2016-03-07 6:38 ` Xie, Huawei 2016-03-07 6:51 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 6:38 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/7/2016 2:35 PM, Yuanhan Liu wrote: > On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; >>> + uint32_t vec_id = *vec_idx; >>> + uint32_t len = *allocated; >>> >> There is bug not using volatile to retrieve the avail idx. > avail_idx? This is actually from "vq->last_used_idx_res". uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)] the idx retrieved from avail->ring. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 6:38 ` Xie, Huawei @ 2016-03-07 6:51 ` Yuanhan Liu 2016-03-07 7:03 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 6:51 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 06:38:42AM +0000, Xie, Huawei wrote: > On 3/7/2016 2:35 PM, Yuanhan Liu wrote: > > On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: > >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>> + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; > >>> + uint32_t vec_id = *vec_idx; > >>> + uint32_t len = *allocated; > >>> > >> There is bug not using volatile to retrieve the avail idx. > > avail_idx? This is actually from "vq->last_used_idx_res". > > uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)] > > the idx retrieved from avail->ring. Hmm.. I saw quite many similar lines of code retrieving an index from avail->ring, but none of them acutally use "volatile". So, a bug? --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 6:51 ` Yuanhan Liu @ 2016-03-07 7:03 ` Xie, Huawei 2016-03-07 7:16 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 7:03 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/7/2016 2:49 PM, Yuanhan Liu wrote: > On Mon, Mar 07, 2016 at 06:38:42AM +0000, Xie, Huawei wrote: >> On 3/7/2016 2:35 PM, Yuanhan Liu wrote: >>> On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: >>>> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>>>> + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; >>>>> + uint32_t vec_id = *vec_idx; >>>>> + uint32_t len = *allocated; >>>>> >>>> There is bug not using volatile to retrieve the avail idx. >>> avail_idx? This is actually from "vq->last_used_idx_res". >> uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)] >> >> the idx retrieved from avail->ring. > Hmm.. I saw quite many similar lines of code retrieving an index from > avail->ring, but none of them acutally use "volatile". So, a bug? Others are not. This function is inline, and is in one translation unit with its caller. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 7:03 ` Xie, Huawei @ 2016-03-07 7:16 ` Xie, Huawei 2016-03-07 8:20 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 7:16 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On 3/7/2016 3:04 PM, Xie, Huawei wrote: > On 3/7/2016 2:49 PM, Yuanhan Liu wrote: >> On Mon, Mar 07, 2016 at 06:38:42AM +0000, Xie, Huawei wrote: >>> On 3/7/2016 2:35 PM, Yuanhan Liu wrote: >>>> On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: >>>>> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>>>>> + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; >>>>>> + uint32_t vec_id = *vec_idx; >>>>>> + uint32_t len = *allocated; >>>>>> >>>>> There is bug not using volatile to retrieve the avail idx. >>>> avail_idx? This is actually from "vq->last_used_idx_res". >>> uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)] >>> >>> the idx retrieved from avail->ring. >> Hmm.. I saw quite many similar lines of code retrieving an index from >> avail->ring, but none of them acutally use "volatile". So, a bug? > Others are not. This function is inline, and is in one translation unit > with its caller. Oh, my fault. For the avail idx, we should take care on whether using volatile. >> --yliu >> > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 7:16 ` Xie, Huawei @ 2016-03-07 8:20 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 8:20 UTC (permalink / raw) To: Xie, Huawei; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Mon, Mar 07, 2016 at 07:16:39AM +0000, Xie, Huawei wrote: > On 3/7/2016 3:04 PM, Xie, Huawei wrote: > > On 3/7/2016 2:49 PM, Yuanhan Liu wrote: > >> On Mon, Mar 07, 2016 at 06:38:42AM +0000, Xie, Huawei wrote: > >>> On 3/7/2016 2:35 PM, Yuanhan Liu wrote: > >>>> On Mon, Mar 07, 2016 at 06:22:25AM +0000, Xie, Huawei wrote: > >>>>> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > >>>>>> + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; > >>>>>> + uint32_t vec_id = *vec_idx; > >>>>>> + uint32_t len = *allocated; > >>>>>> > >>>>> There is bug not using volatile to retrieve the avail idx. > >>>> avail_idx? This is actually from "vq->last_used_idx_res". > >>> uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)] > >>> > >>> the idx retrieved from avail->ring. > >> Hmm.. I saw quite many similar lines of code retrieving an index from > >> avail->ring, but none of them acutally use "volatile". So, a bug? > > Others are not. This function is inline, and is in one translation unit > > with its caller. > > Oh, my fault. For the avail idx, we should take care on whether using > volatile. I will keep it as it is. If there are any issues with it, let's fix it in another patch, but not in this refactor patch. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu 2016-03-07 6:22 ` Xie, Huawei @ 2016-03-07 7:52 ` Xie, Huawei 2016-03-07 8:38 ` Yuanhan Liu 1 sibling, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 7:52 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > Current virtio_dev_merge_rx() implementation just looks like the > old rte_vhost_dequeue_burst(), full of twisted logic, that you > can see same code block in quite many different places. > > However, the logic of virtio_dev_merge_rx() is quite similar to > virtio_dev_rx(). The big difference is that the mergeable one > could allocate more than one available entries to hold the data. > Fetching all available entries to vec_buf at once makes the [...] > - } > +static inline uint32_t __attribute__((always_inline)) > +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, > + uint16_t res_start_idx, uint16_t res_end_idx, > + struct rte_mbuf *m) > +{ > + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > + uint32_t vec_idx = 0; > + uint16_t cur_idx = res_start_idx; > + uint64_t desc_addr; > + uint32_t mbuf_offset, mbuf_avail; > + uint32_t desc_offset, desc_avail; > + uint32_t cpy_len; > + uint16_t desc_idx, used_idx; > + uint32_t nr_used = 0; > > - cpy_len = RTE_MIN(vb_avail, seg_avail); > + if (m == NULL) > + return 0; Is this inherited from old code? Let us remove the unnecessary check. Caller ensures it is not NULL. > > - while (cpy_len > 0) { > - /* Copy mbuf data to vring buffer */ > - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), > - rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset), > - cpy_len); > + LOG_DEBUG(VHOST_DATA, > + "(%"PRIu64") Current Index %d| End Index %d\n", > + dev->device_fh, cur_idx, res_end_idx); > > - PRINT_PACKET(dev, > - (uintptr_t)(vb_addr + vb_offset), > - cpy_len, 0); > + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); > + rte_prefetch0((void *)(uintptr_t)desc_addr); > + > + virtio_hdr.num_buffers = res_end_idx - res_start_idx; > + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", > + dev->device_fh, virtio_hdr.num_buffers); > > - seg_offset += cpy_len; > - vb_offset += cpy_len; > - seg_avail -= cpy_len; > - vb_avail -= cpy_len; > - entry_len += cpy_len; > - > - if (seg_avail != 0) { > - /* > - * The virtio buffer in this vring > - * entry reach to its end. > - * But the segment doesn't complete. > - */ > - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & > - VRING_DESC_F_NEXT) == 0) { > + virtio_enqueue_offload(m, &virtio_hdr.hdr); > + rte_memcpy((void *)(uintptr_t)desc_addr, > + (const void *)&virtio_hdr, vq->vhost_hlen); > + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); > + > + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; > + desc_offset = vq->vhost_hlen; As we know we are in merge-able path, use sizeof(virtio_net_hdr) to save one load for the header len. > + > + mbuf_avail = rte_pktmbuf_data_len(m); > + mbuf_offset = 0; > + while (1) { > + /* done with current desc buf, get the next one */ > + [...] > + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) > + break; > > + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, > + pkts[pkt_idx]); In which case couldn't we get nr_used from start and end? > rte_compiler_barrier(); > > /* > * Wait until it's our turn to add our buffer > * to the used ring. > */ > - while (unlikely(vq->last_used_idx != res_base_idx)) > + while (unlikely(vq->last_used_idx != start)) > rte_pause(); > > - *(volatile uint16_t *)&vq->used->idx += entry_success; > - vq->last_used_idx = res_cur_idx; > + *(volatile uint16_t *)&vq->used->idx += nr_used; > + vq->last_used_idx = end; > } > > -merge_rx_exit: > if (likely(pkt_idx)) { > /* flush used->idx update before we read avail->flags. */ > rte_mb(); ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 7:52 ` Xie, Huawei @ 2016-03-07 8:38 ` Yuanhan Liu 2016-03-07 9:27 ` Xie, Huawei 0 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 8:38 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 07:52:22AM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > Current virtio_dev_merge_rx() implementation just looks like the > > old rte_vhost_dequeue_burst(), full of twisted logic, that you > > can see same code block in quite many different places. > > > > However, the logic of virtio_dev_merge_rx() is quite similar to > > virtio_dev_rx(). The big difference is that the mergeable one > > could allocate more than one available entries to hold the data. > > Fetching all available entries to vec_buf at once makes the > [...] > > - } > > +static inline uint32_t __attribute__((always_inline)) > > +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, > > + uint16_t res_start_idx, uint16_t res_end_idx, > > + struct rte_mbuf *m) > > +{ > > + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > > + uint32_t vec_idx = 0; > > + uint16_t cur_idx = res_start_idx; > > + uint64_t desc_addr; > > + uint32_t mbuf_offset, mbuf_avail; > > + uint32_t desc_offset, desc_avail; > > + uint32_t cpy_len; > > + uint16_t desc_idx, used_idx; > > + uint32_t nr_used = 0; > > > > - cpy_len = RTE_MIN(vb_avail, seg_avail); > > + if (m == NULL) > > + return 0; > > Is this inherited from old code? Yes. > Let us remove the unnecessary check. > Caller ensures it is not NULL. ... > > + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; > > + desc_offset = vq->vhost_hlen; > > As we know we are in merge-able path, use sizeof(virtio_net_hdr) to save > one load for the header len. Please, it's a refactor patch series. You have mentioned quite many trivial issues here and there, which I don't care too much and I don't think they would matter somehow. In addition, they are actually from the old code. > > > + > > + mbuf_avail = rte_pktmbuf_data_len(m); > > + mbuf_offset = 0; > > + while (1) { > > + /* done with current desc buf, get the next one */ > > + > [...] > > + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) > > + break; > > > > + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, > > + pkts[pkt_idx]); > > In which case couldn't we get nr_used from start and end? When pkts[pkt_idx] is NULL, though you suggest to remove it, the check is here. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx 2016-03-07 8:38 ` Yuanhan Liu @ 2016-03-07 9:27 ` Xie, Huawei 0 siblings, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 9:27 UTC (permalink / raw) To: Yuanhan Liu; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On 3/7/2016 4:36 PM, Yuanhan Liu wrote: > On Mon, Mar 07, 2016 at 07:52:22AM +0000, Xie, Huawei wrote: >> On 2/18/2016 9:48 PM, Yuanhan Liu wrote: >>> Current virtio_dev_merge_rx() implementation just looks like the >>> old rte_vhost_dequeue_burst(), full of twisted logic, that you >>> can see same code block in quite many different places. >>> >>> However, the logic of virtio_dev_merge_rx() is quite similar to >>> virtio_dev_rx(). The big difference is that the mergeable one >>> could allocate more than one available entries to hold the data. >>> Fetching all available entries to vec_buf at once makes the >> [...] >>> - } >>> +static inline uint32_t __attribute__((always_inline)) >>> +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, >>> + uint16_t res_start_idx, uint16_t res_end_idx, >>> + struct rte_mbuf *m) >>> +{ >>> + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; >>> + uint32_t vec_idx = 0; >>> + uint16_t cur_idx = res_start_idx; >>> + uint64_t desc_addr; >>> + uint32_t mbuf_offset, mbuf_avail; >>> + uint32_t desc_offset, desc_avail; >>> + uint32_t cpy_len; >>> + uint16_t desc_idx, used_idx; >>> + uint32_t nr_used = 0; >>> >>> - cpy_len = RTE_MIN(vb_avail, seg_avail); >>> + if (m == NULL) >>> + return 0; >> Is this inherited from old code? > Yes. > >> Let us remove the unnecessary check. >> Caller ensures it is not NULL. > ... >>> + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; >>> + desc_offset = vq->vhost_hlen; >> As we know we are in merge-able path, use sizeof(virtio_net_hdr) to save >> one load for the header len. > Please, it's a refactor patch series. You have mentioned quite many > trivial issues here and there, which I don't care too much and I don't > think they would matter somehow. In addition, they are actually from > the old code. For normal code, it would be better using vq->vhost_hlen for example for future compatibility. For DPDK, we don't waste cycles whenever possible, especially vhost is the centralized bottleneck. For the check of m == NULL, it should be removed, which not only occupies unnecessary branch predication resource but also causes confusion for return nr_used from copy_mbuf_to_desc_mergeable. It is OK if you don't want to fix this in this patchset. >>> + >>> + mbuf_avail = rte_pktmbuf_data_len(m); >>> + mbuf_offset = 0; >>> + while (1) { >>> + /* done with current desc buf, get the next one */ >>> + >> [...] >>> + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) >>> + break; >>> >>> + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, >>> + pkts[pkt_idx]); >> In which case couldn't we get nr_used from start and end? > When pkts[pkt_idx] is NULL, though you suggest to remove it, the check > is here. > > --yliu > ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (2 preceding siblings ...) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-03-07 1:20 ` Xie, Huawei 2016-03-07 4:20 ` Stephen Hemminger 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 5/7] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu ` (4 subsequent siblings) 8 siblings, 2 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky First of all, rte_memcpy() is mostly useful for coping big packets by leveraging hardware advanced instructions like AVX. But for virtio net hdr, which is 12 bytes at most, invoking rte_memcpy() will not introduce any performance boost. And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy() is inlined, it increases the binary code size linearly every time we call it at a different place. Replacing the two rte_memcpy() with directly copy saves nearly 12K bytes of code size! Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 3909584..97690c3 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -92,6 +92,17 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) return; } +static inline void +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr, + struct virtio_net_hdr_mrg_rxbuf hdr) +{ + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + } else { + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; + } +} + static inline int __attribute__((always_inline)) copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) @@ -108,8 +119,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_prefetch0((void *)(uintptr_t)desc_addr); virtio_enqueue_offload(m, &virtio_hdr.hdr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + copy_virtio_net_hdr(vq, desc_addr, virtio_hdr); PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_offset = vq->vhost_hlen; @@ -404,8 +414,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, dev->device_fh, virtio_hdr.num_buffers); virtio_enqueue_offload(m, &virtio_hdr.hdr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + copy_virtio_net_hdr(vq, desc_addr, virtio_hdr); PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu @ 2016-03-07 1:20 ` Xie, Huawei 2016-03-07 4:20 ` Stephen Hemminger 1 sibling, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 1:20 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Huawei Xie <huawei.xie@intel.com> ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu 2016-03-07 1:20 ` Xie, Huawei @ 2016-03-07 4:20 ` Stephen Hemminger 2016-03-07 5:24 ` Xie, Huawei 2016-03-07 6:21 ` Yuanhan Liu 1 sibling, 2 replies; 84+ messages in thread From: Stephen Hemminger @ 2016-03-07 4:20 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Thu, 18 Feb 2016 21:49:09 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > +static inline void > +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr, > + struct virtio_net_hdr_mrg_rxbuf hdr) > +{ > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > + } else { > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > + } > +} > + Don't use {} around single statements. Since you are doing all this casting, why not just use regular old memcpy which will be inlined by Gcc into same instructions anyway. And since are always casting the desc_addr, why not pass a type that doesn't need the additional cast (like void *) ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy 2016-03-07 4:20 ` Stephen Hemminger @ 2016-03-07 5:24 ` Xie, Huawei 2016-03-07 6:21 ` Yuanhan Liu 1 sibling, 0 replies; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 5:24 UTC (permalink / raw) To: Stephen Hemminger, Yuanhan Liu; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On 3/7/2016 12:20 PM, Stephen Hemminger wrote: > On Thu, 18 Feb 2016 21:49:09 +0800 > Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > >> +static inline void >> +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr, >> + struct virtio_net_hdr_mrg_rxbuf hdr) >> +{ >> + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { >> + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; >> + } else { >> + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; >> + } >> +} >> + > Don't use {} around single statements. There are other cs issues, like used_idx = vq->last_used_idx & (vq->size -1); ^ space needed Please run checkpatch against your patch. > Since you are doing all this casting, why not just use regular old memcpy > which will be inlined by Gcc into same instructions anyway. > And since are always casting the desc_addr, why not pass a type that > doesn't need the additional cast (like void *) > ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy 2016-03-07 4:20 ` Stephen Hemminger 2016-03-07 5:24 ` Xie, Huawei @ 2016-03-07 6:21 ` Yuanhan Liu 1 sibling, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 6:21 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev, Victor Kaplansky, Michael S. Tsirkin On Sun, Mar 06, 2016 at 08:20:00PM -0800, Stephen Hemminger wrote: > On Thu, 18 Feb 2016 21:49:09 +0800 > Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > > > +static inline void > > +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr, > > + struct virtio_net_hdr_mrg_rxbuf hdr) > > +{ > > + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) { > > + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; > > + } else { > > + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; > > + } > > +} > > + > > Don't use {} around single statements. Oh, I was thinking that it's a personal preference. Okay, I will remove them. > Since you are doing all this casting, why not just use regular old memcpy > which will be inlined by Gcc into same instructions anyway. I thought there are some (tiny) differences: memcpy() is not an inlined function. And I was thinking it generates some slightly more complicated instructions. > And since are always casting the desc_addr, why not pass a type that > doesn't need the additional cast (like void *) You have to cast it from "uint64_t" to "void *" as well while call it. So, that makes no difference. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 5/7] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (3 preceding siblings ...) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 6/7] vhost: do sanity check for desc->len Yuanhan Liu ` (3 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost. Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't make sense to me at all. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 97690c3..04af9b3 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -538,7 +538,7 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint16_t count) { - if (unlikely(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))) + if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) return virtio_dev_merge_rx(dev, queue_id, pkts, count); else return virtio_dev_rx(dev, queue_id, pkts, count); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 6/7] vhost: do sanity check for desc->len 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (4 preceding siblings ...) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 5/7] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next Yuanhan Liu ` (2 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky We need make sure that desc->len is bigger than the size of virtio net header, otherwise, unexpected behaviour might happen due to "desc_avail" would become a huge number with for following code: desc_avail = desc->len - vq->vhost_hlen; For dequeue code path, it will try to allocate enough mbuf to hold such size of desc buf, which ends up with consuming all mbufs, leading to no free mbuf is avaliable. Therefore, you might see an error message: Failed to allocate memory for mbuf. Also, for both dequeue/enqueue code path, while it copies data from/to desc buf, the big "desc_avail" would result to access memory not belong the desc buf, which could lead to some potential memory access errors. A malicious guest could easily forge such malformed vring desc buf. Every time we restart an interrupted DPDK application inside guest would also trigger this issue, as all huge pages are reset to 0 during DPDK re-init, leading to desc->len being 0. Therefore, this patch does a sanity check for desc->len, to make vhost robust. Reported-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 04af9b3..c2adcd9 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -115,6 +115,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; desc = &vq->desc[desc_idx]; + if (unlikely(desc->len < vq->vhost_hlen)) + return -1; + desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); @@ -406,6 +409,9 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, "(%"PRIu64") Current Index %d| End Index %d\n", dev->device_fh, cur_idx, res_end_idx); + if (vq->buf_vec[vec_idx].buf_len < vq->vhost_hlen) + return -1; + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); rte_prefetch0((void *)(uintptr_t)desc_addr); @@ -649,6 +655,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct virtio_net_hdr *hdr; desc = &vq->desc[desc_idx]; + if (unlikely(desc->len < vq->vhost_hlen)) + return NULL; + desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (5 preceding siblings ...) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 6/7] vhost: do sanity check for desc->len Yuanhan Liu @ 2016-02-18 13:49 ` Yuanhan Liu 2016-03-07 3:10 ` Xie, Huawei 2016-02-29 16:06 ` [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor Thomas Monjalon 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu 8 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-02-18 13:49 UTC (permalink / raw) To: dev; +Cc: Michael S. Tsirkin, Victor Kaplansky A malicious guest may easily forge some illegal vring desc buf. To make our vhost robust, we need make sure desc->next will not go beyond the vq->desc[] array. Suggested-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index c2adcd9..b0c0c94 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -148,6 +148,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, /* Room in vring buffer is not enough */ return -1; } + if (unlikely(desc->next >= vq->size)) + return -1; desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); @@ -302,7 +304,7 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, uint32_t len = *allocated; while (1) { - if (vec_id >= BUF_VECTOR_MAX) + if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size)) return -1; len += vq->desc[idx].len; @@ -671,6 +673,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, while (desc_avail || (desc->flags & VRING_DESC_F_NEXT) != 0) { /* This desc reachs to its end, get the next one */ if (desc_avail == 0) { + if (unlikely(desc->next >= vq->size)) + goto fail; desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); @@ -691,9 +695,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, if (unlikely(!cur)) { RTE_LOG(ERR, VHOST_DATA, "Failed to " "allocate memory for mbuf.\n"); - if (head) - rte_pktmbuf_free(head); - return NULL; + goto fail; } if (!head) { head = cur; @@ -729,6 +731,11 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, } return head; + +fail: + if (head) + rte_pktmbuf_free(head); + return NULL; } uint16_t -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next Yuanhan Liu @ 2016-03-07 3:10 ` Xie, Huawei 2016-03-07 6:57 ` Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Xie, Huawei @ 2016-03-07 3:10 UTC (permalink / raw) To: Yuanhan Liu, dev; +Cc: Victor Kaplansky, Michael S. Tsirkin On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > + if (unlikely(desc->next >= vq->size)) > + goto fail; desc chains could be forged into a loop then vhost runs the dead loop until it exhaust all mbuf memory. ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next 2016-03-07 3:10 ` Xie, Huawei @ 2016-03-07 6:57 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-07 6:57 UTC (permalink / raw) To: Xie, Huawei; +Cc: Michael S. Tsirkin, dev, Victor Kaplansky On Mon, Mar 07, 2016 at 03:10:43AM +0000, Xie, Huawei wrote: > On 2/18/2016 9:48 PM, Yuanhan Liu wrote: > > + if (unlikely(desc->next >= vq->size)) > > + goto fail; > > desc chains could be forged into a loop then vhost runs the dead loop > until it exhaust all mbuf memory. Good point. Any elegant solution to avoid that? --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (6 preceding siblings ...) 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next Yuanhan Liu @ 2016-02-29 16:06 ` Thomas Monjalon 2016-03-01 6:01 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu 8 siblings, 1 reply; 84+ messages in thread From: Thomas Monjalon @ 2016-02-29 16:06 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev Hi Yuanhan 2016-02-18 21:49, Yuanhan Liu: > Here is a patchset for refactoring vhost rxtx code, mainly for > improving readability. This series requires to be rebased. And maybe you could check also the series about numa_realloc. Thanks ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor 2016-02-29 16:06 ` [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor Thomas Monjalon @ 2016-03-01 6:01 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-01 6:01 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev On Mon, Feb 29, 2016 at 05:06:27PM +0100, Thomas Monjalon wrote: > Hi Yuanhan > > 2016-02-18 21:49, Yuanhan Liu: > > Here is a patchset for refactoring vhost rxtx code, mainly for > > improving readability. > > This series requires to be rebased. > > And maybe you could check also the series about numa_realloc. Hi Thomas, Sure, I will. And since you are considering to merge it, I will do more tests, espeically on this patchset (it touches the vhost-user core). Thus, I may send the new version out a bit late, say, next week. --yliu ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu ` (7 preceding siblings ...) 2016-02-29 16:06 ` [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor Thomas Monjalon @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu ` (8 more replies) 8 siblings, 9 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev v3: - quite few minor changes, including using likely/unlikely when possible. - Added a new patch 8 to avoid desc dead loop chain The first 3 patches refactor 3 major functions at vhost_rxtx.c. It simplifies the code logic, making it more readable. OTOH, it reduces binary code size, due to a lot of duplicate code are removed, as well as some huge inline functions are diminished. Patch 4 gets rid of the rte_memcpy for virtio_hdr copy, which nearly saves 12K bytes of binary code size! Patch 5 removes "unlikely" for VIRTIO_NET_F_MRG_RXBUF detection. Patch 6, 7 and 8 do some sanity check for two desc fields, to make vhost robust and be protected from malicious guest or abnormal use cases. --- Yuanhan Liu (8): vhost: refactor rte_vhost_dequeue_burst vhost: refactor virtio_dev_rx vhost: refactor virtio_dev_merge_rx vhost: do not use rte_memcpy for virtio_hdr copy vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection vhost: do sanity check for desc->len vhost: do sanity check for desc->next against with vq->size vhost: avoid dead loop chain. lib/librte_vhost/vhost_rxtx.c | 1027 ++++++++++++++++++----------------------- 1 file changed, 453 insertions(+), 574 deletions(-) -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 2/8] vhost: refactor virtio_dev_rx Yuanhan Liu ` (7 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev The current rte_vhost_dequeue_burst() implementation is a bit messy and logic twisted. And you could see repeat code here and there. However, rte_vhost_dequeue_burst() acutally does a simple job: copy the packet data from vring desc to mbuf. What's tricky here is: - desc buff could be chained (by desc->next field), so that you need fetch next one if current is wholly drained. - One mbuf could not be big enough to hold all desc buff, hence you need to chain the mbuf as well, by the mbuf->next field. The simplified code looks like following: while (this_desc_is_not_drained_totally || has_next_desc) { if (this_desc_has_drained_totally) { this_desc = next_desc(); } if (mbuf_has_no_room) { mbuf = allocate_a_new_mbuf(); } COPY(mbuf, desc); } Note that the old patch does a special handling for skipping virtio header. However, that could be simply done by adjusting desc_avail and desc_offset var: desc_avail = desc->len - vq->vhost_hlen; desc_offset = vq->vhost_hlen; This refactor makes the code much more readable (IMO), yet it reduces binary code size. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- v2: - fix potential NULL dereference bug of var "prev" and "head" v3: - add back missing prefetches reported by Huawei - Passing head mbuf as an arg, instead of allocating it at copy_desc_to_mbuf(). --- lib/librte_vhost/vhost_rxtx.c | 301 +++++++++++++++++------------------------- 1 file changed, 121 insertions(+), 180 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 9d23eb1..e12e9ba 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -801,21 +801,97 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac) return 0; } +static inline int __attribute__((always_inline)) +copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, + struct rte_mbuf *m, uint16_t desc_idx, + struct rte_mempool *mbuf_pool) +{ + struct vring_desc *desc; + uint64_t desc_addr; + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + struct rte_mbuf *cur = m, *prev = m; + struct virtio_net_hdr *hdr; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + /* Retrieve virtio net header */ + hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr); + desc_avail = desc->len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; + + mbuf_offset = 0; + mbuf_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; + while (desc_avail != 0 || (desc->flags & VRING_DESC_F_NEXT) != 0) { + /* This desc reaches to its end, get the next one */ + if (desc_avail == 0) { + desc = &vq->desc[desc->next]; + + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + desc_offset = 0; + desc_avail = desc->len; + + PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0); + } + + /* + * This mbuf reaches to its end, get a new one + * to hold more data. + */ + if (mbuf_avail == 0) { + cur = rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(cur == NULL)) { + RTE_LOG(ERR, VHOST_DATA, "Failed to " + "allocate memory for mbuf.\n"); + return -1; + } + + prev->next = cur; + prev->data_len = mbuf_offset; + m->nb_segs += 1; + m->pkt_len += mbuf_offset; + prev = cur; + + mbuf_offset = 0; + mbuf_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; + } + + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, mbuf_offset), + (void *)((uintptr_t)(desc_addr + desc_offset)), + cpy_len); + + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + prev->data_len = mbuf_offset; + m->pkt_len += mbuf_offset; + + if (hdr->flags != 0 || hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) + vhost_dequeue_offload(hdr, m); + + return 0; +} + uint16_t rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) { - struct rte_mbuf *m, *prev, *rarp_mbuf = NULL; + struct rte_mbuf *rarp_mbuf = NULL; struct vhost_virtqueue *vq; - struct vring_desc *desc; - uint64_t vb_addr = 0; - uint64_t vb_net_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST]; + uint32_t desc_indexes[MAX_PKT_BURST]; uint32_t used_idx; - uint32_t i; - uint16_t free_entries, entry_success = 0; + uint32_t i = 0; + uint16_t free_entries; uint16_t avail_idx; - struct virtio_net_hdr *hdr = NULL; if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->virt_qp_nb))) { RTE_LOG(ERR, VHOST_DATA, @@ -852,198 +928,63 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id, } avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - /* If there are no available buffers then return. */ - if (vq->last_used_idx == avail_idx) + free_entries = avail_idx - vq->last_used_idx; + if (free_entries == 0) goto out; - LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, - dev->device_fh); + LOG_DEBUG(VHOST_DATA, "%s (%"PRIu64")\n", __func__, dev->device_fh); /* Prefetch available ring to retrieve head indexes. */ - rte_prefetch0(&vq->avail->ring[vq->last_used_idx & (vq->size - 1)]); + used_idx = vq->last_used_idx & (vq->size - 1); + rte_prefetch0(&vq->avail->ring[used_idx]); - /*get the number of free entries in the ring*/ - free_entries = (avail_idx - vq->last_used_idx); + count = RTE_MIN(count, MAX_PKT_BURST); + count = RTE_MIN(count, free_entries); + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") about to dequeue %u buffers\n", + dev->device_fh, count); - free_entries = RTE_MIN(free_entries, count); - /* Limit to MAX_PKT_BURST. */ - free_entries = RTE_MIN(free_entries, MAX_PKT_BURST); - - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Buffers available %d\n", - dev->device_fh, free_entries); /* Retrieve all of the head indexes first to avoid caching issues. */ - for (i = 0; i < free_entries; i++) - head[i] = vq->avail->ring[(vq->last_used_idx + i) & (vq->size - 1)]; + for (i = 0; i < count; i++) { + desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) & + (vq->size - 1)]; + } /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success]]); + rte_prefetch0(&vq->desc[desc_indexes[0]]); rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]); - while (entry_success < free_entries) { - uint32_t vb_avail, vb_offset; - uint32_t seg_avail, seg_offset; - uint32_t cpy_len; - uint32_t seg_num = 0; - struct rte_mbuf *cur; - uint8_t alloc_err = 0; - - desc = &vq->desc[head[entry_success]]; - - vb_net_hdr_addr = gpa_to_vva(dev, desc->addr); - hdr = (struct virtio_net_hdr *)((uintptr_t)vb_net_hdr_addr); - - /* Discard first buffer as it is the virtio header */ - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - vb_offset = 0; - vb_avail = desc->len; - } else { - vb_offset = vq->vhost_hlen; - vb_avail = desc->len - vb_offset; - } - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - - used_idx = vq->last_used_idx & (vq->size - 1); + for (i = 0; i < count; i++) { + int err; - if (entry_success < (free_entries - 1)) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[entry_success+1]]); - rte_prefetch0(&vq->used->ring[(used_idx + 1) & (vq->size - 1)]); + if (likely(i + 1 < count)) { + rte_prefetch0(&vq->desc[desc_indexes[i + 1]]); + rte_prefetch0(&vq->used->ring[(used_idx + 1) & + (vq->size - 1)]); } - /* Update used index buffer information. */ - vq->used->ring[used_idx].id = head[entry_success]; - vq->used->ring[used_idx].len = 0; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[used_idx]), - sizeof(vq->used->ring[used_idx])); - - /* Allocate an mbuf and populate the structure. */ - m = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(m == NULL)) { + pkts[i] = rte_pktmbuf_alloc(mbuf_pool); + if (unlikely(pkts[i] == NULL)) { RTE_LOG(ERR, VHOST_DATA, "Failed to allocate memory for mbuf.\n"); break; } - seg_offset = 0; - seg_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; - cpy_len = RTE_MIN(vb_avail, seg_avail); - - PRINT_PACKET(dev, (uintptr_t)vb_addr, desc->len, 0); - - seg_num++; - cur = m; - prev = m; - while (cpy_len != 0) { - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset), - (void *)((uintptr_t)(vb_addr + vb_offset)), - cpy_len); - - seg_offset += cpy_len; - vb_offset += cpy_len; - vb_avail -= cpy_len; - seg_avail -= cpy_len; - - if (vb_avail != 0) { - /* - * The segment reachs to its end, - * while the virtio buffer in TX vring has - * more data to be copied. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* Allocate mbuf and populate the structure. */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, VHOST_DATA, "Failed to " - "allocate memory for mbuf.\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } else { - if (desc->flags & VRING_DESC_F_NEXT) { - /* - * There are more virtio buffers in - * same vring entry need to be copied. - */ - if (seg_avail == 0) { - /* - * The current segment hasn't - * room to accomodate more - * data. - */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - /* - * Allocate an mbuf and - * populate the structure. - */ - cur = rte_pktmbuf_alloc(mbuf_pool); - if (unlikely(cur == NULL)) { - RTE_LOG(ERR, - VHOST_DATA, - "Failed to " - "allocate memory " - "for mbuf\n"); - rte_pktmbuf_free(m); - alloc_err = 1; - break; - } - seg_num++; - prev->next = cur; - prev = cur; - seg_offset = 0; - seg_avail = cur->buf_len - RTE_PKTMBUF_HEADROOM; - } - - desc = &vq->desc[desc->next]; - - /* Buffer address translation. */ - vb_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = desc->len; - - PRINT_PACKET(dev, (uintptr_t)vb_addr, - desc->len, 0); - } else { - /* The whole packet completes. */ - cur->data_len = seg_offset; - m->pkt_len += seg_offset; - vb_avail = 0; - } - } - - cpy_len = RTE_MIN(vb_avail, seg_avail); - } - - if (unlikely(alloc_err == 1)) + err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i], + mbuf_pool); + if (unlikely(err)) { + rte_pktmbuf_free(pkts[i]); break; + } - m->nb_segs = seg_num; - if ((hdr->flags != 0) || (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE)) - vhost_dequeue_offload(hdr, m); - - pkts[entry_success] = m; - vq->last_used_idx++; - entry_success++; + used_idx = vq->last_used_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_indexes[i]; + vq->used->ring[used_idx].len = 0; + vhost_log_used_vring(dev, vq, + offsetof(struct vring_used, ring[used_idx]), + sizeof(vq->used->ring[used_idx])); } rte_compiler_barrier(); - vq->used->idx += entry_success; + vq->used->idx += i; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); @@ -1057,10 +998,10 @@ out: * Inject it to the head of "pkts" array, so that switch's mac * learning table will get updated first. */ - memmove(&pkts[1], pkts, entry_success * sizeof(m)); + memmove(&pkts[1], pkts, i * sizeof(struct rte_mbuf *)); pkts[0] = rarp_mbuf; - entry_success += 1; + i += 1; } - return entry_success; + return i; } -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 2/8] vhost: refactor virtio_dev_rx 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx Yuanhan Liu ` (6 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev This is a simple refactor, as there isn't any twisted logic in old code. Here I just broke the code and introduced two helper functions, reserve_avail_buf() and copy_mbuf_to_desc() to make the code more readable. Also, it saves nearly 1K bytes of binary code size. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- v2: - fix NULL dereference bug found by Rich. v3: - use while (mbuf_avail || m->next) to align with the style of coyp_desc_to_mbuf() -- suggestec by Huawei - use likely/unlikely when possible --- lib/librte_vhost/vhost_rxtx.c | 296 ++++++++++++++++++++---------------------- 1 file changed, 141 insertions(+), 155 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index e12e9ba..0df0612 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -129,6 +129,115 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) return; } +static inline int __attribute__((always_inline)) +copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, + struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) +{ + uint32_t desc_avail, desc_offset; + uint32_t mbuf_avail, mbuf_offset; + uint32_t cpy_len; + struct vring_desc *desc; + uint64_t desc_addr; + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + + desc = &vq->desc[desc_idx]; + desc_addr = gpa_to_vva(dev, desc->addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); + + virtio_enqueue_offload(m, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + vhost_log_write(dev, desc->addr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); + + desc_offset = vq->vhost_hlen; + desc_avail = desc->len - vq->vhost_hlen; + + *copied = rte_pktmbuf_pkt_len(m); + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (mbuf_avail != 0 || m->next != NULL) { + /* done with current mbuf, fetch next */ + if (mbuf_avail == 0) { + m = m->next; + + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } + + /* done with current desc buf, fetch next */ + if (desc_avail == 0) { + if ((desc->flags & VRING_DESC_F_NEXT) == 0) { + /* Room in vring buffer is not enough */ + return -1; + } + + desc = &vq->desc[desc->next]; + desc_addr = gpa_to_vva(dev, desc->addr); + desc_offset = 0; + desc_avail = desc->len; + } + + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), + rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), + cpy_len); + vhost_log_write(dev, desc->addr + desc_offset, cpy_len); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); + + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + return 0; +} + +/* + * As many data cores may want to access available buffers + * they need to be reserved. + */ +static inline uint32_t +reserve_avail_buf(struct vhost_virtqueue *vq, uint32_t count, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_end_idx; + uint16_t avail_idx; + uint16_t free_entries; + int success; + + count = RTE_MIN(count, (uint32_t)MAX_PKT_BURST); + +again: + res_start_idx = vq->last_used_idx_res; + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + + free_entries = avail_idx - res_start_idx; + count = RTE_MIN(count, free_entries); + if (count == 0) + return 0; + + res_end_idx = res_start_idx + count; + + /* + * update vq->last_used_idx_res atomically; try again if failed. + * + * TODO: Allow to disable cmpset if no concurrency in application. + */ + success = rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_end_idx); + if (unlikely(!success)) + goto again; + + *start = res_start_idx; + *end = res_end_idx; + + return count; +} + /** * This function adds buffers to the virtio devices RX virtqueue. Buffers can * be received from the physical port or from another virtio device. A packet @@ -138,21 +247,12 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) */ static inline uint32_t __attribute__((always_inline)) virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, - struct rte_mbuf **pkts, uint32_t count) + struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - struct vring_desc *desc, *hdr_desc; - struct rte_mbuf *buff, *first_buff; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; - uint64_t buff_addr = 0; - uint64_t buff_hdr_addr = 0; - uint32_t head[MAX_PKT_BURST]; - uint32_t head_idx, packet_success = 0; - uint16_t avail_idx, res_cur_idx; - uint16_t res_base_idx, res_end_idx; - uint16_t free_entries; - uint8_t success = 0; + uint16_t res_start_idx, res_end_idx; + uint16_t desc_indexes[MAX_PKT_BURST]; + uint32_t i; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_rx()\n", dev->device_fh); if (unlikely(!is_valid_virt_queue_idx(queue_id, 0, dev->virt_qp_nb))) { @@ -166,161 +266,47 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, if (unlikely(vq->enabled == 0)) return 0; - count = (count > MAX_PKT_BURST) ? MAX_PKT_BURST : count; - - /* - * As many data cores may want access to available buffers, - * they need to be reserved. - */ - do { - res_base_idx = vq->last_used_idx_res; - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - - free_entries = (avail_idx - res_base_idx); - /*check that we have enough buffers*/ - if (unlikely(count > free_entries)) - count = free_entries; - - if (count == 0) - return 0; - - res_end_idx = res_base_idx + count; - /* vq->last_used_idx_res is atomically updated. */ - /* TODO: Allow to disable cmpset if no concurrency in application. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, res_end_idx); - } while (unlikely(success == 0)); - res_cur_idx = res_base_idx; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| End Index %d\n", - dev->device_fh, res_cur_idx, res_end_idx); - - /* Prefetch available ring to retrieve indexes. */ - rte_prefetch0(&vq->avail->ring[res_cur_idx & (vq->size - 1)]); - - /* Retrieve all of the head indexes first to avoid caching issues. */ - for (head_idx = 0; head_idx < count; head_idx++) - head[head_idx] = vq->avail->ring[(res_cur_idx + head_idx) & - (vq->size - 1)]; - - /*Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - - while (res_cur_idx != res_end_idx) { - uint32_t offset = 0, vb_offset = 0; - uint32_t pkt_len, len_to_cpy, data_len, total_copied = 0; - uint8_t hdr = 0, uncompleted_pkt = 0; - uint16_t idx; - - /* Get descriptor from available ring */ - desc = &vq->desc[head[packet_success]]; - - buff = pkts[packet_success]; - first_buff = buff; - - /* Convert from gpa to vva (guest physical addr -> vhost virtual addr) */ - buff_addr = gpa_to_vva(dev, desc->addr); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)buff_addr); - - /* Copy virtio_hdr to packet and increment buffer address */ - buff_hdr_addr = buff_addr; - hdr_desc = desc; - - /* - * If the descriptors are chained the header and data are - * placed in separate buffers. - */ - if ((desc->flags & VRING_DESC_F_NEXT) && - (desc->len == vq->vhost_hlen)) { - desc = &vq->desc[desc->next]; - /* Buffer address translation. */ - buff_addr = gpa_to_vva(dev, desc->addr); - } else { - vb_offset += vq->vhost_hlen; - hdr = 1; - } + count = reserve_avail_buf(vq, count, &res_start_idx, &res_end_idx); + if (count == 0) + return 0; - pkt_len = rte_pktmbuf_pkt_len(buff); - data_len = rte_pktmbuf_data_len(buff); - len_to_cpy = RTE_MIN(data_len, - hdr ? desc->len - vq->vhost_hlen : desc->len); - while (total_copied < pkt_len) { - /* Copy mbuf data to buffer */ - rte_memcpy((void *)(uintptr_t)(buff_addr + vb_offset), - rte_pktmbuf_mtod_offset(buff, const void *, offset), - len_to_cpy); - vhost_log_write(dev, desc->addr + vb_offset, len_to_cpy); - PRINT_PACKET(dev, (uintptr_t)(buff_addr + vb_offset), - len_to_cpy, 0); - - offset += len_to_cpy; - vb_offset += len_to_cpy; - total_copied += len_to_cpy; - - /* The whole packet completes */ - if (total_copied == pkt_len) - break; + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") res_start_idx %d| res_end_idx Index %d\n", + dev->device_fh, res_start_idx, res_end_idx); - /* The current segment completes */ - if (offset == data_len) { - buff = buff->next; - offset = 0; - data_len = rte_pktmbuf_data_len(buff); - } + /* Retrieve all of the desc indexes first to avoid caching issues. */ + rte_prefetch0(&vq->avail->ring[res_start_idx & (vq->size - 1)]); + for (i = 0; i < count; i++) { + desc_indexes[i] = vq->avail->ring[(res_start_idx + i) & + (vq->size - 1)]; + } - /* The current vring descriptor done */ - if (vb_offset == desc->len) { - if (desc->flags & VRING_DESC_F_NEXT) { - desc = &vq->desc[desc->next]; - buff_addr = gpa_to_vva(dev, desc->addr); - vb_offset = 0; - } else { - /* Room in vring buffer is not enough */ - uncompleted_pkt = 1; - break; - } - } - len_to_cpy = RTE_MIN(data_len - offset, desc->len - vb_offset); - } + rte_prefetch0(&vq->desc[desc_indexes[0]]); + for (i = 0; i < count; i++) { + uint16_t desc_idx = desc_indexes[i]; + uint16_t used_idx = (res_start_idx + i) & (vq->size - 1); + uint32_t copied; + int err; - /* Update used ring with desc information */ - idx = res_cur_idx & (vq->size - 1); - vq->used->ring[idx].id = head[packet_success]; + err = copy_mbuf_to_desc(dev, vq, pkts[i], desc_idx, &copied); - /* Drop the packet if it is uncompleted */ - if (unlikely(uncompleted_pkt == 1)) - vq->used->ring[idx].len = vq->vhost_hlen; + vq->used->ring[used_idx].id = desc_idx; + if (unlikely(err)) + vq->used->ring[used_idx].len = vq->vhost_hlen; else - vq->used->ring[idx].len = pkt_len + vq->vhost_hlen; - + vq->used->ring[used_idx].len = copied + vq->vhost_hlen; vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); + offsetof(struct vring_used, ring[used_idx]), + sizeof(vq->used->ring[used_idx])); - res_cur_idx++; - packet_success++; - - if (unlikely(uncompleted_pkt == 1)) - continue; - - virtio_enqueue_offload(first_buff, &virtio_hdr.hdr); - - rte_memcpy((void *)(uintptr_t)buff_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); - vhost_log_write(dev, hdr_desc->addr, vq->vhost_hlen); - - PRINT_PACKET(dev, (uintptr_t)buff_hdr_addr, vq->vhost_hlen, 1); - - if (res_cur_idx < res_end_idx) { - /* Prefetch descriptor index. */ - rte_prefetch0(&vq->desc[head[packet_success]]); - } + if (i + 1 < count) + rte_prefetch0(&vq->desc[desc_indexes[i+1]]); } rte_compiler_barrier(); /* Wait until it's our turn to add our buffer to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != res_start_idx)) rte_pause(); *(volatile uint16_t *)&vq->used->idx += count; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 2/8] vhost: refactor virtio_dev_rx Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-11 16:18 ` Thomas Monjalon 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 4/8] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu ` (5 subsequent siblings) 8 siblings, 1 reply; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev Current virtio_dev_merge_rx() implementation just looks like the old rte_vhost_dequeue_burst(), full of twisted logic, that you can see same code block in quite many different places. However, the logic of virtio_dev_merge_rx() is quite similar to virtio_dev_rx(). The big difference is that the mergeable one could allocate more than one available entries to hold the data. Fetching all available entries to vec_buf at once makes the difference a bit bigger then. The refactored code looks like below: while (mbuf_has_not_drained_totally || mbuf_has_next) { if (this_desc_has_no_room) { this_desc = fetch_next_from_vec_buf(); if (it is the last of a desc chain) update_used_ring(); } if (this_mbuf_has_drained_totally) mbuf = fetch_next_mbuf(); COPY(this_desc, this_mbuf); } This patch reduces quite many lines of code, therefore, make it much more readable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 404 +++++++++++++++++------------------------- 1 file changed, 166 insertions(+), 238 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 0df0612..9be3593 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -324,251 +324,204 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, return count; } -static inline uint32_t __attribute__((always_inline)) -copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, - uint16_t res_base_idx, uint16_t res_end_idx, - struct rte_mbuf *pkt) +static inline int +fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, + uint32_t *allocated, uint32_t *vec_idx) { - uint32_t vec_idx = 0; - uint32_t entry_success = 0; - struct vhost_virtqueue *vq; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = { - {0, 0, 0, 0, 0, 0}, 0}; - uint16_t cur_idx = res_base_idx; - uint64_t vb_addr = 0; - uint64_t vb_hdr_addr = 0; - uint32_t seg_offset = 0; - uint32_t vb_offset = 0; - uint32_t seg_avail; - uint32_t vb_avail; - uint32_t cpy_len, entry_len; - uint16_t idx; - - if (pkt == NULL) - return 0; + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; + uint32_t vec_id = *vec_idx; + uint32_t len = *allocated; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " - "End Index %d\n", - dev->device_fh, cur_idx, res_end_idx); + while (1) { + if (vec_id >= BUF_VECTOR_MAX) + return -1; - /* - * Convert from gpa to vva - * (guest physical addr -> vhost virtual addr) - */ - vq = dev->virtqueue[queue_id]; + len += vq->desc[idx].len; + vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; + vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; + vq->buf_vec[vec_id].desc_idx = idx; + vec_id++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - vb_hdr_addr = vb_addr; + if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0) + break; - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); + idx = vq->desc[idx].next; + } - virtio_hdr.num_buffers = res_end_idx - res_base_idx; + *allocated = len; + *vec_idx = vec_id; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", - dev->device_fh, virtio_hdr.num_buffers); + return 0; +} - virtio_enqueue_offload(pkt, &virtio_hdr.hdr); +/* + * As many data cores may want to access available buffers concurrently, + * they need to be reserved. + * + * Returns -1 on fail, 0 on success + */ +static inline int +reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_cur_idx; + uint16_t avail_idx; + uint32_t allocated; + uint32_t vec_idx; + uint16_t tries; - rte_memcpy((void *)(uintptr_t)vb_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); - vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen); +again: + res_start_idx = vq->last_used_idx_res; + res_cur_idx = res_start_idx; + + allocated = 0; + vec_idx = 0; + tries = 0; + while (1) { + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + if (unlikely(res_cur_idx == avail_idx)) { + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Failed " + "to get enough desc from vring\n", + dev->device_fh); + return -1; + } - PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); + if (fill_vec_buf(vq, res_cur_idx, &allocated, &vec_idx) < 0) + return -1; - seg_avail = rte_pktmbuf_data_len(pkt); - vb_offset = vq->vhost_hlen; - vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + res_cur_idx++; + tries++; - entry_len = vq->vhost_hlen; + if (allocated >= size) + break; - if (vb_avail == 0) { - uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx; + /* + * if we tried all available ring items, and still + * can't get enough buf, it means something abnormal + * happened. + */ + if (tries >= vq->size) + return -1; + } - if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { - idx = cur_idx & (vq->size - 1); + /* + * update vq->last_used_idx_res atomically. + * retry again if failed. + */ + if (rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_cur_idx) == 0) + goto again; - /* Update used ring with desc information */ - vq->used->ring[idx].id = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; + *start = res_start_idx; + *end = res_cur_idx; + return 0; +} - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); +static inline uint32_t __attribute__((always_inline)) +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t res_start_idx, uint16_t res_end_idx, + struct rte_mbuf *m) +{ + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + uint32_t vec_idx = 0; + uint16_t cur_idx = res_start_idx; + uint64_t desc_addr; + uint32_t mbuf_offset, mbuf_avail; + uint32_t desc_offset, desc_avail; + uint32_t cpy_len; + uint16_t desc_idx, used_idx; - entry_len = 0; - cur_idx++; - entry_success++; - } + if (unlikely(m == NULL)) + return 0; - vec_idx++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") Current Index %d| End Index %d\n", + dev->device_fh, cur_idx, res_end_idx); - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - } + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); - cpy_len = RTE_MIN(vb_avail, seg_avail); + virtio_hdr.num_buffers = res_end_idx - res_start_idx; + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", + dev->device_fh, virtio_hdr.num_buffers); - while (cpy_len > 0) { - /* Copy mbuf data to vring buffer */ - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), - rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset), - cpy_len); - vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr + vb_offset, - cpy_len); + virtio_enqueue_offload(m, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); - PRINT_PACKET(dev, - (uintptr_t)(vb_addr + vb_offset), - cpy_len, 0); + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; - seg_offset += cpy_len; - vb_offset += cpy_len; - seg_avail -= cpy_len; - vb_avail -= cpy_len; - entry_len += cpy_len; - - if (seg_avail != 0) { - /* - * The virtio buffer in this vring - * entry reach to its end. - * But the segment doesn't complete. - */ - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (mbuf_avail != 0 || m->next != NULL) { + /* done with current desc buf, get the next one */ + if (desc_avail == 0) { + desc_idx = vq->buf_vec[vec_idx].desc_idx; + + if (!(vq->desc[desc_idx].flags & VRING_DESC_F_NEXT)) { /* Update used ring with desc information */ - idx = cur_idx & (vq->size - 1); - vq->used->ring[idx].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; + used_idx = cur_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_idx; + vq->used->ring[used_idx].len = desc_offset; vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_len = 0; - cur_idx++; - entry_success++; + offsetof(struct vring_used, + ring[used_idx]), + sizeof(vq->used->ring[used_idx])); } vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This current segment complete, need continue to - * check if the whole packet complete or not. - */ - pkt = pkt->next; - if (pkt != NULL) { - /* - * There are more segments. - */ - if (vb_avail == 0) { - /* - * This current buffer from vring is - * used up, need fetch next buffer - * from buf_vec. - */ - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; - - if ((vq->desc[desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { - idx = cur_idx & (vq->size - 1); - /* - * Update used ring with the - * descriptor information - */ - vq->used->ring[idx].id - = desc_idx; - vq->used->ring[idx].len - = entry_len; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_success++; - entry_len = 0; - cur_idx++; - } - - /* Get next buffer from buf_vec. */ - vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_avail = - vq->buf_vec[vec_idx].buf_len; - vb_offset = 0; - } - - seg_offset = 0; - seg_avail = rte_pktmbuf_data_len(pkt); - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This whole packet completes. - */ - /* Update used ring with desc information */ - idx = cur_idx & (vq->size - 1); - vq->used->ring[idx].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_success++; - break; - } + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)desc_addr); + desc_offset = 0; + desc_avail = vq->buf_vec[vec_idx].buf_len; } - } - return entry_success; -} + /* done with current mbuf, get the next one */ + if (mbuf_avail == 0) { + m = m->next; -static inline void __attribute__((always_inline)) -update_secure_len(struct vhost_virtqueue *vq, uint32_t id, - uint32_t *secure_len, uint32_t *vec_idx) -{ - uint16_t wrapped_idx = id & (vq->size - 1); - uint32_t idx = vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - uint32_t len = *secure_len; - uint32_t vec_id = *vec_idx; + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } - do { - next_desc = 0; - len += vq->desc[idx].len; - vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; - vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; - vq->buf_vec[vec_id].desc_idx = idx; - vec_id++; + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), + rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), + cpy_len); + vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr + desc_offset, + cpy_len); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { - idx = vq->desc[idx].next; - next_desc = 1; - } - } while (next_desc); + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } + + used_idx = cur_idx & (vq->size - 1); + vq->used->ring[used_idx].id = vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[used_idx].len = desc_offset; + vhost_log_used_vring(dev, vq, + offsetof(struct vring_used, ring[used_idx]), + sizeof(vq->used->ring[used_idx])); - *secure_len = len; - *vec_idx = vec_id; + return res_end_idx - res_start_idx; } -/* - * This function works for mergeable RX. - */ static inline uint32_t __attribute__((always_inline)) virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - uint32_t pkt_idx = 0, entry_success = 0; - uint16_t avail_idx; - uint16_t res_base_idx, res_cur_idx; - uint8_t success = 0; + uint32_t pkt_idx = 0, nr_used = 0; + uint16_t start, end; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", dev->device_fh); @@ -584,57 +537,32 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, return 0; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); - if (count == 0) return 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; - do { - /* - * As many data cores may want access to available - * buffers, they need to be reserved. - */ - uint32_t secure_len = 0; - uint32_t vec_idx = 0; - - res_base_idx = vq->last_used_idx_res; - res_cur_idx = res_base_idx; - - do { - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - if (unlikely(res_cur_idx == avail_idx)) - goto merge_rx_exit; - - update_secure_len(vq, res_cur_idx, - &secure_len, &vec_idx); - res_cur_idx++; - } while (pkt_len > secure_len); - - /* vq->last_used_idx_res is atomically updated. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, - res_cur_idx); - } while (success == 0); - - entry_success = copy_from_mbuf_to_vring(dev, queue_id, - res_base_idx, res_cur_idx, pkts[pkt_idx]); + if (reserve_avail_buf_mergeable(vq, pkt_len, &start, &end) < 0) + break; + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, + pkts[pkt_idx]); rte_compiler_barrier(); /* * Wait until it's our turn to add our buffer * to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != start)) rte_pause(); - *(volatile uint16_t *)&vq->used->idx += entry_success; - vq->last_used_idx = res_cur_idx; + *(volatile uint16_t *)&vq->used->idx += nr_used; + vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), + sizeof(vq->used->idx)); + vq->last_used_idx = end; } -merge_rx_exit: if (likely(pkt_idx)) { /* flush used->idx update before we read avail->flags. */ rte_mb(); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx Yuanhan Liu @ 2016-03-11 16:18 ` Thomas Monjalon 2016-03-14 7:35 ` [dpdk-dev] [PATCH v4 " Yuanhan Liu 0 siblings, 1 reply; 84+ messages in thread From: Thomas Monjalon @ 2016-03-11 16:18 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev This patch does not compile: lib/librte_vhost/vhost_rxtx.c:386:5: error: ‘dev’ undeclared ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v4 3/8] vhost: refactor virtio_dev_merge_rx 2016-03-11 16:18 ` Thomas Monjalon @ 2016-03-14 7:35 ` Yuanhan Liu 0 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-14 7:35 UTC (permalink / raw) To: dev; +Cc: huawei.xie, Thomas Monjalon, Yuanhan Liu Current virtio_dev_merge_rx() implementation just looks like the old rte_vhost_dequeue_burst(), full of twisted logic, that you can see same code block in quite many different places. However, the logic of virtio_dev_merge_rx() is quite similar to virtio_dev_rx(). The big difference is that the mergeable one could allocate more than one available entries to hold the data. Fetching all available entries to vec_buf at once makes the difference a bit bigger then. The refactored code looks like below: while (mbuf_has_not_drained_totally || mbuf_has_next) { if (this_desc_has_no_room) { this_desc = fetch_next_from_vec_buf(); if (it is the last of a desc chain) update_used_ring(); } if (this_mbuf_has_drained_totally) mbuf = fetch_next_mbuf(); COPY(this_desc, this_mbuf); } This patch reduces quite many lines of code, therefore, make it much more readable. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- v4: fix build error when DEBUG is enabled --- lib/librte_vhost/vhost_rxtx.c | 406 +++++++++++++++++------------------------- 1 file changed, 168 insertions(+), 238 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 0df0612..3ebecee 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -324,251 +324,201 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, return count; } -static inline uint32_t __attribute__((always_inline)) -copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id, - uint16_t res_base_idx, uint16_t res_end_idx, - struct rte_mbuf *pkt) +static inline int +fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, + uint32_t *allocated, uint32_t *vec_idx) { - uint32_t vec_idx = 0; - uint32_t entry_success = 0; - struct vhost_virtqueue *vq; - /* The virtio_hdr is initialised to 0. */ - struct virtio_net_hdr_mrg_rxbuf virtio_hdr = { - {0, 0, 0, 0, 0, 0}, 0}; - uint16_t cur_idx = res_base_idx; - uint64_t vb_addr = 0; - uint64_t vb_hdr_addr = 0; - uint32_t seg_offset = 0; - uint32_t vb_offset = 0; - uint32_t seg_avail; - uint32_t vb_avail; - uint32_t cpy_len, entry_len; - uint16_t idx; - - if (pkt == NULL) - return 0; + uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)]; + uint32_t vec_id = *vec_idx; + uint32_t len = *allocated; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Current Index %d| " - "End Index %d\n", - dev->device_fh, cur_idx, res_end_idx); + while (1) { + if (vec_id >= BUF_VECTOR_MAX) + return -1; - /* - * Convert from gpa to vva - * (guest physical addr -> vhost virtual addr) - */ - vq = dev->virtqueue[queue_id]; + len += vq->desc[idx].len; + vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; + vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; + vq->buf_vec[vec_id].desc_idx = idx; + vec_id++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); - vb_hdr_addr = vb_addr; + if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0) + break; - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); + idx = vq->desc[idx].next; + } - virtio_hdr.num_buffers = res_end_idx - res_base_idx; + *allocated = len; + *vec_idx = vec_id; - LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", - dev->device_fh, virtio_hdr.num_buffers); + return 0; +} - virtio_enqueue_offload(pkt, &virtio_hdr.hdr); +/* + * As many data cores may want to access available buffers concurrently, + * they need to be reserved. + * + * Returns -1 on fail, 0 on success + */ +static inline int +reserve_avail_buf_mergeable(struct vhost_virtqueue *vq, uint32_t size, + uint16_t *start, uint16_t *end) +{ + uint16_t res_start_idx; + uint16_t res_cur_idx; + uint16_t avail_idx; + uint32_t allocated; + uint32_t vec_idx; + uint16_t tries; - rte_memcpy((void *)(uintptr_t)vb_hdr_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); - vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen); +again: + res_start_idx = vq->last_used_idx_res; + res_cur_idx = res_start_idx; - PRINT_PACKET(dev, (uintptr_t)vb_hdr_addr, vq->vhost_hlen, 1); + allocated = 0; + vec_idx = 0; + tries = 0; + while (1) { + avail_idx = *((volatile uint16_t *)&vq->avail->idx); + if (unlikely(res_cur_idx == avail_idx)) + return -1; - seg_avail = rte_pktmbuf_data_len(pkt); - vb_offset = vq->vhost_hlen; - vb_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + if (unlikely(fill_vec_buf(vq, res_cur_idx, &allocated, + &vec_idx) < 0)) + return -1; - entry_len = vq->vhost_hlen; + res_cur_idx++; + tries++; - if (vb_avail == 0) { - uint32_t desc_idx = vq->buf_vec[vec_idx].desc_idx; + if (allocated >= size) + break; - if ((vq->desc[desc_idx].flags & VRING_DESC_F_NEXT) == 0) { - idx = cur_idx & (vq->size - 1); + /* + * if we tried all available ring items, and still + * can't get enough buf, it means something abnormal + * happened. + */ + if (unlikely(tries >= vq->size)) + return -1; + } - /* Update used ring with desc information */ - vq->used->ring[idx].id = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; + /* + * update vq->last_used_idx_res atomically. + * retry again if failed. + */ + if (rte_atomic16_cmpset(&vq->last_used_idx_res, + res_start_idx, res_cur_idx) == 0) + goto again; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); + *start = res_start_idx; + *end = res_cur_idx; + return 0; +} - entry_len = 0; - cur_idx++; - entry_success++; - } +static inline uint32_t __attribute__((always_inline)) +copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint16_t res_start_idx, uint16_t res_end_idx, + struct rte_mbuf *m) +{ + struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; + uint32_t vec_idx = 0; + uint16_t cur_idx = res_start_idx; + uint64_t desc_addr; + uint32_t mbuf_offset, mbuf_avail; + uint32_t desc_offset, desc_avail; + uint32_t cpy_len; + uint16_t desc_idx, used_idx; - vec_idx++; - vb_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + if (unlikely(m == NULL)) + return 0; - /* Prefetch buffer address. */ - rte_prefetch0((void *)(uintptr_t)vb_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - } + LOG_DEBUG(VHOST_DATA, + "(%"PRIu64") Current Index %d| End Index %d\n", + dev->device_fh, cur_idx, res_end_idx); - cpy_len = RTE_MIN(vb_avail, seg_avail); + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + rte_prefetch0((void *)(uintptr_t)desc_addr); - while (cpy_len > 0) { - /* Copy mbuf data to vring buffer */ - rte_memcpy((void *)(uintptr_t)(vb_addr + vb_offset), - rte_pktmbuf_mtod_offset(pkt, const void *, seg_offset), - cpy_len); - vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr + vb_offset, - cpy_len); + virtio_hdr.num_buffers = res_end_idx - res_start_idx; + LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n", + dev->device_fh, virtio_hdr.num_buffers); - PRINT_PACKET(dev, - (uintptr_t)(vb_addr + vb_offset), - cpy_len, 0); + virtio_enqueue_offload(m, &virtio_hdr.hdr); + rte_memcpy((void *)(uintptr_t)desc_addr, + (const void *)&virtio_hdr, vq->vhost_hlen); + vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen); + PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); + + desc_avail = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen; + desc_offset = vq->vhost_hlen; - seg_offset += cpy_len; - vb_offset += cpy_len; - seg_avail -= cpy_len; - vb_avail -= cpy_len; - entry_len += cpy_len; - - if (seg_avail != 0) { - /* - * The virtio buffer in this vring - * entry reach to its end. - * But the segment doesn't complete. - */ - if ((vq->desc[vq->buf_vec[vec_idx].desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { + mbuf_avail = rte_pktmbuf_data_len(m); + mbuf_offset = 0; + while (mbuf_avail != 0 || m->next != NULL) { + /* done with current desc buf, get the next one */ + if (desc_avail == 0) { + desc_idx = vq->buf_vec[vec_idx].desc_idx; + + if (!(vq->desc[desc_idx].flags & VRING_DESC_F_NEXT)) { /* Update used ring with desc information */ - idx = cur_idx & (vq->size - 1); - vq->used->ring[idx].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; + used_idx = cur_idx++ & (vq->size - 1); + vq->used->ring[used_idx].id = desc_idx; + vq->used->ring[used_idx].len = desc_offset; vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_len = 0; - cur_idx++; - entry_success++; + offsetof(struct vring_used, + ring[used_idx]), + sizeof(vq->used->ring[used_idx])); } vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_offset = 0; - vb_avail = vq->buf_vec[vec_idx].buf_len; - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This current segment complete, need continue to - * check if the whole packet complete or not. - */ - pkt = pkt->next; - if (pkt != NULL) { - /* - * There are more segments. - */ - if (vb_avail == 0) { - /* - * This current buffer from vring is - * used up, need fetch next buffer - * from buf_vec. - */ - uint32_t desc_idx = - vq->buf_vec[vec_idx].desc_idx; - - if ((vq->desc[desc_idx].flags & - VRING_DESC_F_NEXT) == 0) { - idx = cur_idx & (vq->size - 1); - /* - * Update used ring with the - * descriptor information - */ - vq->used->ring[idx].id - = desc_idx; - vq->used->ring[idx].len - = entry_len; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_success++; - entry_len = 0; - cur_idx++; - } - - /* Get next buffer from buf_vec. */ - vec_idx++; - vb_addr = gpa_to_vva(dev, - vq->buf_vec[vec_idx].buf_addr); - vb_avail = - vq->buf_vec[vec_idx].buf_len; - vb_offset = 0; - } - - seg_offset = 0; - seg_avail = rte_pktmbuf_data_len(pkt); - cpy_len = RTE_MIN(vb_avail, seg_avail); - } else { - /* - * This whole packet completes. - */ - /* Update used ring with desc information */ - idx = cur_idx & (vq->size - 1); - vq->used->ring[idx].id - = vq->buf_vec[vec_idx].desc_idx; - vq->used->ring[idx].len = entry_len; - vhost_log_used_vring(dev, vq, - offsetof(struct vring_used, ring[idx]), - sizeof(vq->used->ring[idx])); - entry_success++; - break; - } + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); + + /* Prefetch buffer address. */ + rte_prefetch0((void *)(uintptr_t)desc_addr); + desc_offset = 0; + desc_avail = vq->buf_vec[vec_idx].buf_len; } - } - return entry_success; -} + /* done with current mbuf, get the next one */ + if (mbuf_avail == 0) { + m = m->next; -static inline void __attribute__((always_inline)) -update_secure_len(struct vhost_virtqueue *vq, uint32_t id, - uint32_t *secure_len, uint32_t *vec_idx) -{ - uint16_t wrapped_idx = id & (vq->size - 1); - uint32_t idx = vq->avail->ring[wrapped_idx]; - uint8_t next_desc; - uint32_t len = *secure_len; - uint32_t vec_id = *vec_idx; + mbuf_offset = 0; + mbuf_avail = rte_pktmbuf_data_len(m); + } - do { - next_desc = 0; - len += vq->desc[idx].len; - vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; - vq->buf_vec[vec_id].buf_len = vq->desc[idx].len; - vq->buf_vec[vec_id].desc_idx = idx; - vec_id++; + cpy_len = RTE_MIN(desc_avail, mbuf_avail); + rte_memcpy((void *)((uintptr_t)(desc_addr + desc_offset)), + rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), + cpy_len); + vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr + desc_offset, + cpy_len); + PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), + cpy_len, 0); - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) { - idx = vq->desc[idx].next; - next_desc = 1; - } - } while (next_desc); + mbuf_avail -= cpy_len; + mbuf_offset += cpy_len; + desc_avail -= cpy_len; + desc_offset += cpy_len; + } - *secure_len = len; - *vec_idx = vec_id; + used_idx = cur_idx & (vq->size - 1); + vq->used->ring[used_idx].id = vq->buf_vec[vec_idx].desc_idx; + vq->used->ring[used_idx].len = desc_offset; + vhost_log_used_vring(dev, vq, + offsetof(struct vring_used, ring[used_idx]), + sizeof(vq->used->ring[used_idx])); + + return res_end_idx - res_start_idx; } -/* - * This function works for mergeable RX. - */ static inline uint32_t __attribute__((always_inline)) virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint32_t count) { struct vhost_virtqueue *vq; - uint32_t pkt_idx = 0, entry_success = 0; - uint16_t avail_idx; - uint16_t res_base_idx, res_cur_idx; - uint8_t success = 0; + uint32_t pkt_idx = 0, nr_used = 0; + uint16_t start, end; LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n", dev->device_fh); @@ -584,57 +534,37 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id, return 0; count = RTE_MIN((uint32_t)MAX_PKT_BURST, count); - if (count == 0) return 0; for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen; - do { - /* - * As many data cores may want access to available - * buffers, they need to be reserved. - */ - uint32_t secure_len = 0; - uint32_t vec_idx = 0; - - res_base_idx = vq->last_used_idx_res; - res_cur_idx = res_base_idx; - - do { - avail_idx = *((volatile uint16_t *)&vq->avail->idx); - if (unlikely(res_cur_idx == avail_idx)) - goto merge_rx_exit; - - update_secure_len(vq, res_cur_idx, - &secure_len, &vec_idx); - res_cur_idx++; - } while (pkt_len > secure_len); - - /* vq->last_used_idx_res is atomically updated. */ - success = rte_atomic16_cmpset(&vq->last_used_idx_res, - res_base_idx, - res_cur_idx); - } while (success == 0); - - entry_success = copy_from_mbuf_to_vring(dev, queue_id, - res_base_idx, res_cur_idx, pkts[pkt_idx]); + if (unlikely(reserve_avail_buf_mergeable(vq, pkt_len, + &start, &end) < 0)) { + LOG_DEBUG(VHOST_DATA, + "(%" PRIu64 ") Failed to get enough desc from vring\n", + dev->device_fh); + break; + } + nr_used = copy_mbuf_to_desc_mergeable(dev, vq, start, end, + pkts[pkt_idx]); rte_compiler_barrier(); /* * Wait until it's our turn to add our buffer * to the used ring. */ - while (unlikely(vq->last_used_idx != res_base_idx)) + while (unlikely(vq->last_used_idx != start)) rte_pause(); - *(volatile uint16_t *)&vq->used->idx += entry_success; - vq->last_used_idx = res_cur_idx; + *(volatile uint16_t *)&vq->used->idx += nr_used; + vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), + sizeof(vq->used->idx)); + vq->last_used_idx = end; } -merge_rx_exit: if (likely(pkt_idx)) { /* flush used->idx update before we read avail->flags. */ rte_mb(); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 4/8] vhost: do not use rte_memcpy for virtio_hdr copy 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (2 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 5/8] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu ` (4 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev First of all, rte_memcpy() is mostly useful for coping big packets by leveraging hardware advanced instructions like AVX. But for virtio net hdr, which is 12 bytes at most, invoking rte_memcpy() will not introduce any performance boost. And, to my suprise, rte_memcpy() is VERY huge. Since rte_memcpy() is inlined, it increases the binary code size linearly every time we call it at a different place. Replacing the two rte_memcpy() with directly copy saves nearly 12K bytes of code size! Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 9be3593..bafcb52 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -129,6 +129,16 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr) return; } +static inline void +copy_virtio_net_hdr(struct vhost_virtqueue *vq, uint64_t desc_addr, + struct virtio_net_hdr_mrg_rxbuf hdr) +{ + if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) + *(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr; + else + *(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr; +} + static inline int __attribute__((always_inline)) copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, uint16_t desc_idx, uint32_t *copied) @@ -145,8 +155,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_prefetch0((void *)(uintptr_t)desc_addr); virtio_enqueue_offload(m, &virtio_hdr.hdr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + copy_virtio_net_hdr(vq, desc_addr, virtio_hdr); vhost_log_write(dev, desc->addr, vq->vhost_hlen); PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); @@ -447,8 +456,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, dev->device_fh, virtio_hdr.num_buffers); virtio_enqueue_offload(m, &virtio_hdr.hdr); - rte_memcpy((void *)(uintptr_t)desc_addr, - (const void *)&virtio_hdr, vq->vhost_hlen); + copy_virtio_net_hdr(vq, desc_addr, virtio_hdr); vhost_log_write(dev, vq->buf_vec[vec_idx].buf_addr, vq->vhost_hlen); PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 5/8] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (3 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 4/8] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 6/8] vhost: do sanity check for desc->len Yuanhan Liu ` (3 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev VIRTIO_NET_F_MRG_RXBUF is a default feature supported by vhost. Adding unlikely for VIRTIO_NET_F_MRG_RXBUF detection doesn't make sense to me at all. Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index bafcb52..50f449f 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -587,7 +587,7 @@ uint16_t rte_vhost_enqueue_burst(struct virtio_net *dev, uint16_t queue_id, struct rte_mbuf **pkts, uint16_t count) { - if (unlikely(dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))) + if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) return virtio_dev_merge_rx(dev, queue_id, pkts, count); else return virtio_dev_rx(dev, queue_id, pkts, count); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 6/8] vhost: do sanity check for desc->len 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (4 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 5/8] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 7/8] vhost: do sanity check for desc->next against with vq->size Yuanhan Liu ` (2 subsequent siblings) 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev We need make sure that desc->len is bigger than the size of virtio net header, otherwise, unexpected behaviour might happen due to "desc_avail" would become a huge number with for following code: desc_avail = desc->len - vq->vhost_hlen; For dequeue code path, it will try to allocate enough mbuf to hold such size of desc buf, which ends up with consuming all mbufs, leading to no free mbuf is available. Therefore, you might see an error message: Failed to allocate memory for mbuf. Also, for both dequeue/enqueue code path, while it copies data from/to desc buf, the big "desc_avail" would result to access memory not belong the desc buf, which could lead to some potential memory access errors. A malicious guest could easily forge such malformed vring desc buf. Every time we restart an interrupted DPDK application inside guest would also trigger this issue, as all huge pages are reset to 0 during DPDK re-init, leading to desc->len being 0. Therefore, this patch does a sanity check for desc->len, to make vhost robust. Reported-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 50f449f..86e4d1a 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -151,6 +151,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; desc = &vq->desc[desc_idx]; + if (unlikely(desc->len < vq->vhost_hlen)) + return -1; + desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); @@ -448,6 +451,9 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, "(%"PRIu64") Current Index %d| End Index %d\n", dev->device_fh, cur_idx, res_end_idx); + if (vq->buf_vec[vec_idx].buf_len < vq->vhost_hlen) + return -1; + desc_addr = gpa_to_vva(dev, vq->buf_vec[vec_idx].buf_addr); rte_prefetch0((void *)(uintptr_t)desc_addr); @@ -737,6 +743,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct virtio_net_hdr *hdr; desc = &vq->desc[desc_idx]; + if (unlikely(desc->len < vq->vhost_hlen)) + return -1; + desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 7/8] vhost: do sanity check for desc->next against with vq->size 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (5 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 6/8] vhost: do sanity check for desc->len Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 8/8] vhost: avoid dead loop chain Yuanhan Liu 2016-03-14 23:09 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Thomas Monjalon 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev A malicious guest may easily forge some illegal vring desc buf. To make our vhost robust, we need make sure desc->next will not go beyond the vq->desc[] array. Suggested-by: Rich Lane <rich.lane@bigswitch.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 86e4d1a..43db6c7 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -183,6 +183,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, /* Room in vring buffer is not enough */ return -1; } + if (unlikely(desc->next >= vq->size)) + return -1; desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); @@ -345,7 +347,7 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, uint32_t len = *allocated; while (1) { - if (vec_id >= BUF_VECTOR_MAX) + if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size)) return -1; len += vq->desc[idx].len; @@ -759,6 +761,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, while (desc_avail != 0 || (desc->flags & VRING_DESC_F_NEXT) != 0) { /* This desc reaches to its end, get the next one */ if (desc_avail == 0) { + if (unlikely(desc->next >= vq->size)) + return -1; desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* [dpdk-dev] [PATCH v3 8/8] vhost: avoid dead loop chain. 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (6 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 7/8] vhost: do sanity check for desc->next against with vq->size Yuanhan Liu @ 2016-03-10 4:32 ` Yuanhan Liu 2016-03-14 23:09 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Thomas Monjalon 8 siblings, 0 replies; 84+ messages in thread From: Yuanhan Liu @ 2016-03-10 4:32 UTC (permalink / raw) To: dev If a malicious guest forges a dead loop chain, it could lead to a dead loop of copying the desc buf to mbuf, which results to all mbuf being exhausted. Add a var nr_desc to avoid such case. Suggested-by: Huawei Xie <huawei.xie@intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --- lib/librte_vhost/vhost_rxtx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 43db6c7..73fab7e 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -743,6 +743,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t cpy_len; struct rte_mbuf *cur = m, *prev = m; struct virtio_net_hdr *hdr; + /* A counter to avoid desc dead loop chain */ + uint32_t nr_desc = 1; desc = &vq->desc[desc_idx]; if (unlikely(desc->len < vq->vhost_hlen)) @@ -761,7 +763,8 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, while (desc_avail != 0 || (desc->flags & VRING_DESC_F_NEXT) != 0) { /* This desc reaches to its end, get the next one */ if (desc_avail == 0) { - if (unlikely(desc->next >= vq->size)) + if (unlikely(desc->next >= vq->size || + ++nr_desc >= vq->size)) return -1; desc = &vq->desc[desc->next]; -- 1.9.0 ^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu ` (7 preceding siblings ...) 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 8/8] vhost: avoid dead loop chain Yuanhan Liu @ 2016-03-14 23:09 ` Thomas Monjalon 8 siblings, 0 replies; 84+ messages in thread From: Thomas Monjalon @ 2016-03-14 23:09 UTC (permalink / raw) To: Yuanhan Liu; +Cc: dev, huawei.xie, Rich Lane 2016-03-10 12:32, Yuanhan Liu: > v3: - quite few minor changes, including using likely/unlikely > when possible. > > - Added a new patch 8 to avoid desc dead loop chain > > The first 3 patches refactor 3 major functions at vhost_rxtx.c. > It simplifies the code logic, making it more readable. OTOH, it > reduces binary code size, due to a lot of duplicate code are > removed, as well as some huge inline functions are diminished. > > Patch 4 gets rid of the rte_memcpy for virtio_hdr copy, which > nearly saves 12K bytes of binary code size! > > Patch 5 removes "unlikely" for VIRTIO_NET_F_MRG_RXBUF detection. > > Patch 6, 7 and 8 do some sanity check for two desc fields, to make > vhost robust and be protected from malicious guest or abnormal use > cases. > > --- > Yuanhan Liu (8): > vhost: refactor rte_vhost_dequeue_burst > vhost: refactor virtio_dev_rx > vhost: refactor virtio_dev_merge_rx > vhost: do not use rte_memcpy for virtio_hdr copy > vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection > vhost: do sanity check for desc->len > vhost: do sanity check for desc->next against with vq->size > vhost: avoid dead loop chain. Applied with 3/8 v4, thanks. ^ permalink raw reply [flat|nested] 84+ messages in thread
end of thread, other threads:[~2016-03-14 23:10 UTC | newest] Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-12-03 6:06 [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 1/5] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2015-12-03 7:02 ` Stephen Hemminger 2015-12-03 7:25 ` Yuanhan Liu 2015-12-03 7:03 ` Stephen Hemminger 2015-12-12 6:55 ` Rich Lane 2015-12-14 1:55 ` Yuanhan Liu 2016-01-26 10:30 ` Xie, Huawei 2016-01-27 3:26 ` Yuanhan Liu 2016-01-27 6:12 ` Xie, Huawei 2016-01-27 6:16 ` Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 2/5] vhost: refactor virtio_dev_rx Yuanhan Liu 2015-12-11 20:42 ` Rich Lane 2015-12-14 1:47 ` Yuanhan Liu 2016-01-21 13:50 ` Jérôme Jutteau 2016-01-27 3:27 ` Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 3/5] vhost: refactor virtio_dev_merge_rx Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 4/5] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu 2016-01-27 2:46 ` Xie, Huawei 2016-01-27 3:22 ` Yuanhan Liu 2016-01-27 5:56 ` Xie, Huawei 2016-01-27 6:02 ` Yuanhan Liu 2016-01-27 6:16 ` Xie, Huawei 2016-01-27 6:35 ` Yuanhan Liu 2015-12-03 6:06 ` [dpdk-dev] [PATCH 5/5] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu 2016-02-17 22:50 ` [dpdk-dev] [PATCH 0/5 for 2.3] vhost rxtx refactor Thomas Monjalon 2016-02-18 4:09 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 0/7] " Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 1/7] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-03-03 16:21 ` Xie, Huawei 2016-03-04 2:21 ` Yuanhan Liu 2016-03-07 2:19 ` Xie, Huawei 2016-03-07 2:44 ` Yuanhan Liu 2016-03-03 16:30 ` Xie, Huawei 2016-03-04 2:17 ` Yuanhan Liu 2016-03-07 2:32 ` Xie, Huawei 2016-03-07 2:48 ` Yuanhan Liu 2016-03-07 2:59 ` Xie, Huawei 2016-03-07 6:14 ` Yuanhan Liu 2016-03-03 17:19 ` Xie, Huawei 2016-03-04 2:11 ` Yuanhan Liu 2016-03-07 2:55 ` Xie, Huawei 2016-03-03 17:40 ` Xie, Huawei 2016-03-04 2:32 ` Yuanhan Liu 2016-03-07 3:02 ` Xie, Huawei 2016-03-07 3:03 ` Xie, Huawei 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 2/7] vhost: refactor virtio_dev_rx Yuanhan Liu 2016-03-07 3:34 ` Xie, Huawei 2016-03-08 12:27 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 3/7] vhost: refactor virtio_dev_merge_rx Yuanhan Liu 2016-03-07 6:22 ` Xie, Huawei 2016-03-07 6:36 ` Yuanhan Liu 2016-03-07 6:38 ` Xie, Huawei 2016-03-07 6:51 ` Yuanhan Liu 2016-03-07 7:03 ` Xie, Huawei 2016-03-07 7:16 ` Xie, Huawei 2016-03-07 8:20 ` Yuanhan Liu 2016-03-07 7:52 ` Xie, Huawei 2016-03-07 8:38 ` Yuanhan Liu 2016-03-07 9:27 ` Xie, Huawei 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 4/7] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu 2016-03-07 1:20 ` Xie, Huawei 2016-03-07 4:20 ` Stephen Hemminger 2016-03-07 5:24 ` Xie, Huawei 2016-03-07 6:21 ` Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 5/7] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 6/7] vhost: do sanity check for desc->len Yuanhan Liu 2016-02-18 13:49 ` [dpdk-dev] [PATCH v2 7/7] vhost: do sanity check for desc->next Yuanhan Liu 2016-03-07 3:10 ` Xie, Huawei 2016-03-07 6:57 ` Yuanhan Liu 2016-02-29 16:06 ` [dpdk-dev] [PATCH v2 0/7] vhost rxtx refactor Thomas Monjalon 2016-03-01 6:01 ` Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 1/8] vhost: refactor rte_vhost_dequeue_burst Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 2/8] vhost: refactor virtio_dev_rx Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 3/8] vhost: refactor virtio_dev_merge_rx Yuanhan Liu 2016-03-11 16:18 ` Thomas Monjalon 2016-03-14 7:35 ` [dpdk-dev] [PATCH v4 " Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 4/8] vhost: do not use rte_memcpy for virtio_hdr copy Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 5/8] vhost: don't use unlikely for VIRTIO_NET_F_MRG_RXBUF detection Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 6/8] vhost: do sanity check for desc->len Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 7/8] vhost: do sanity check for desc->next against with vq->size Yuanhan Liu 2016-03-10 4:32 ` [dpdk-dev] [PATCH v3 8/8] vhost: avoid dead loop chain Yuanhan Liu 2016-03-14 23:09 ` [dpdk-dev] [PATCH v3 0/8] vhost rxtx refactor and fixes Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).