* [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
@ 2016-09-23 7:16 Maxime Coquelin
2016-09-23 7:29 ` Yuanhan Liu
2016-09-23 18:24 ` Stephen Hemminger
0 siblings, 2 replies; 9+ messages in thread
From: Maxime Coquelin @ 2016-09-23 7:16 UTC (permalink / raw)
To: yuanhan.liu, huawei.xie, dev; +Cc: vkaplans, mst, stephen
Indirect descriptors are usually supported by virtio-net devices,
allowing to dispatch a larger number of requests.
When the virtio device sends a packet using indirect descriptors,
only one slot is used in the ring, even for large packets.
The main effect is to improve the 0% packet loss benchmark.
A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
(fwd io for host, macswap for VM) on DUT shows a +50% gain for
zero loss.
On the downside, micro-benchmark using testpmd txonly in VM and
rxonly on host shows a loss between 1 and 4%.i But depending on
the needs, feature can be disabled at VM boot time by passing
indirect_desc=off argument to vhost-user device in Qemu.
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
Changes since v2:
-----------------
- Enrich commit message with figures
- Rebased on top of dpdk-next-virtio's master
- Add feature check to ensure we don't receive an indirect desc
if not supported by the virtio driver
lib/librte_vhost/vhost.c | 3 ++-
lib/librte_vhost/virtio_net.c | 38 +++++++++++++++++++++++++++++---------
2 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 46095c3..30bb0ce 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -65,7 +65,8 @@
(1ULL << VIRTIO_NET_F_CSUM) | \
(1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
- (1ULL << VIRTIO_NET_F_GUEST_TSO6))
+ (1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
+ (1ULL << VIRTIO_RING_F_INDIRECT_DESC))
uint64_t VHOST_FEATURES = VHOST_SUPPORTED_FEATURES;
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 8a151af..0f7dd81 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -679,8 +679,8 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
}
static inline int __attribute__((always_inline))
-copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
- struct rte_mbuf *m, uint16_t desc_idx,
+copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
+ uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
struct rte_mempool *mbuf_pool)
{
struct vring_desc *desc;
@@ -693,7 +693,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
/* A counter to avoid desc dead loop chain */
uint32_t nr_desc = 1;
- desc = &vq->desc[desc_idx];
+ desc = &descs[desc_idx];
if (unlikely(desc->len < dev->vhost_hlen))
return -1;
@@ -711,7 +711,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
*/
if (likely((desc->len == dev->vhost_hlen) &&
(desc->flags & VRING_DESC_F_NEXT) != 0)) {
- desc = &vq->desc[desc->next];
+ desc = &descs[desc->next];
desc_addr = gpa_to_vva(dev, desc->addr);
if (unlikely(!desc_addr))
@@ -747,10 +747,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
if ((desc->flags & VRING_DESC_F_NEXT) == 0)
break;
- if (unlikely(desc->next >= vq->size ||
- ++nr_desc > vq->size))
+ if (unlikely(desc->next >= max_desc ||
+ ++nr_desc > max_desc))
return -1;
- desc = &vq->desc[desc->next];
+ desc = &descs[desc->next];
desc_addr = gpa_to_vva(dev, desc->addr);
if (unlikely(!desc_addr))
@@ -878,19 +878,39 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
/* Prefetch descriptor index. */
rte_prefetch0(&vq->desc[desc_indexes[0]]);
for (i = 0; i < count; i++) {
+ struct vring_desc *desc;
+ uint16_t sz, idx;
int err;
if (likely(i + 1 < count))
rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
+ if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
+ if (unlikely(!(dev->features &
+ (1ULL << VIRTIO_RING_F_INDIRECT_DESC)))) {
+ RTE_LOG(ERR, VHOST_DATA,
+ "Indirect desc but feature not negotiated.\n");
+ break;
+ }
+
+ desc = (struct vring_desc *)gpa_to_vva(dev,
+ vq->desc[desc_indexes[i]].addr);
+ rte_prefetch0(desc);
+ sz = vq->desc[desc_indexes[i]].len / sizeof(*desc);
+ idx = 0;
+ } else {
+ desc = vq->desc;
+ sz = vq->size;
+ idx = desc_indexes[i];
+ }
+
pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
if (unlikely(pkts[i] == NULL)) {
RTE_LOG(ERR, VHOST_DATA,
"Failed to allocate memory for mbuf.\n");
break;
}
- err = copy_desc_to_mbuf(dev, vq, pkts[i], desc_indexes[i],
- mbuf_pool);
+ err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool);
if (unlikely(err)) {
rte_pktmbuf_free(pkts[i]);
break;
--
2.7.4
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 7:16 [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path Maxime Coquelin
@ 2016-09-23 7:29 ` Yuanhan Liu
2016-09-23 7:33 ` Maxime Coquelin
2016-09-23 18:24 ` Stephen Hemminger
1 sibling, 1 reply; 9+ messages in thread
From: Yuanhan Liu @ 2016-09-23 7:29 UTC (permalink / raw)
To: Maxime Coquelin; +Cc: huawei.xie, dev, vkaplans, mst, stephen
On Fri, Sep 23, 2016 at 09:16:49AM +0200, Maxime Coquelin wrote:
> + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
> + if (unlikely(!(dev->features &
> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC)))) {
> + RTE_LOG(ERR, VHOST_DATA,
> + "Indirect desc but feature not negotiated.\n");
> + break;
> + }
I thought the alignment we got before was to follow linux kernel: check
nested indirect only?
> +
> + desc = (struct vring_desc *)gpa_to_vva(dev,
> + vq->desc[desc_indexes[i]].addr);
I think we should check the desc addr here. Otherwise we may crash here
if a malicious guest fills some bad addresses.
--yliu
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 7:29 ` Yuanhan Liu
@ 2016-09-23 7:33 ` Maxime Coquelin
0 siblings, 0 replies; 9+ messages in thread
From: Maxime Coquelin @ 2016-09-23 7:33 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: huawei.xie, dev, vkaplans, mst, stephen
On 09/23/2016 09:29 AM, Yuanhan Liu wrote:
> On Fri, Sep 23, 2016 at 09:16:49AM +0200, Maxime Coquelin wrote:
>> + if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
>> + if (unlikely(!(dev->features &
>> + (1ULL << VIRTIO_RING_F_INDIRECT_DESC)))) {
>> + RTE_LOG(ERR, VHOST_DATA,
>> + "Indirect desc but feature not negotiated.\n");
>> + break;
>> + }
>
> I thought the alignment we got before was to follow linux kernel: check
> nested indirect only?
Right... I did the opposite..
Fixing this right now.
>
>> +
>> + desc = (struct vring_desc *)gpa_to_vva(dev,
>> + vq->desc[desc_indexes[i]].addr);
>
> I think we should check the desc addr here. Otherwise we may crash here
> if a malicious guest fills some bad addresses.
Good point!
Thanks,
Maxime
>
> --yliu
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 7:16 [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path Maxime Coquelin
2016-09-23 7:29 ` Yuanhan Liu
@ 2016-09-23 18:24 ` Stephen Hemminger
2016-09-23 18:31 ` Michael S. Tsirkin
1 sibling, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2016-09-23 18:24 UTC (permalink / raw)
To: Maxime Coquelin; +Cc: yuanhan.liu, huawei.xie, dev, vkaplans, mst
On Fri, 23 Sep 2016 09:16:49 +0200
Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> Indirect descriptors are usually supported by virtio-net devices,
> allowing to dispatch a larger number of requests.
>
> When the virtio device sends a packet using indirect descriptors,
> only one slot is used in the ring, even for large packets.
>
> The main effect is to improve the 0% packet loss benchmark.
> A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> zero loss.
>
> On the downside, micro-benchmark using testpmd txonly in VM and
> rxonly on host shows a loss between 1 and 4%.i But depending on
> the needs, feature can be disabled at VM boot time by passing
> indirect_desc=off argument to vhost-user device in Qemu.
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
What about supporting VIRTIO_F_ANY_LAYOUT?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 18:24 ` Stephen Hemminger
@ 2016-09-23 18:31 ` Michael S. Tsirkin
2016-09-23 20:28 ` Stephen Hemminger
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2016-09-23 18:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Maxime Coquelin, yuanhan.liu, huawei.xie, dev, vkaplans
On Fri, Sep 23, 2016 at 11:24:16AM -0700, Stephen Hemminger wrote:
> On Fri, 23 Sep 2016 09:16:49 +0200
> Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
>
> > Indirect descriptors are usually supported by virtio-net devices,
> > allowing to dispatch a larger number of requests.
> >
> > When the virtio device sends a packet using indirect descriptors,
> > only one slot is used in the ring, even for large packets.
> >
> > The main effect is to improve the 0% packet loss benchmark.
> > A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> > (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> > zero loss.
> >
> > On the downside, micro-benchmark using testpmd txonly in VM and
> > rxonly on host shows a loss between 1 and 4%.i But depending on
> > the needs, feature can be disabled at VM boot time by passing
> > indirect_desc=off argument to vhost-user device in Qemu.
> >
> > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
> What about supporting VIRTIO_F_ANY_LAYOUT?
I thought it's already supported.
That's required by virtio 1 and dpdk claims support for that.
--
MST
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 18:31 ` Michael S. Tsirkin
@ 2016-09-23 20:28 ` Stephen Hemminger
2016-09-25 1:02 ` Michael S. Tsirkin
0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2016-09-23 20:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, yuanhan.liu, huawei.xie, dev, vkaplans
On Fri, 23 Sep 2016 21:31:27 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Sep 23, 2016 at 11:24:16AM -0700, Stephen Hemminger wrote:
> > On Fri, 23 Sep 2016 09:16:49 +0200
> > Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> >
> > > Indirect descriptors are usually supported by virtio-net devices,
> > > allowing to dispatch a larger number of requests.
> > >
> > > When the virtio device sends a packet using indirect descriptors,
> > > only one slot is used in the ring, even for large packets.
> > >
> > > The main effect is to improve the 0% packet loss benchmark.
> > > A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> > > (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> > > zero loss.
> > >
> > > On the downside, micro-benchmark using testpmd txonly in VM and
> > > rxonly on host shows a loss between 1 and 4%.i But depending on
> > > the needs, feature can be disabled at VM boot time by passing
> > > indirect_desc=off argument to vhost-user device in Qemu.
> > >
> > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> >
> > What about supporting VIRTIO_F_ANY_LAYOUT?
>
> I thought it's already supported.
> That's required by virtio 1 and dpdk claims support for that.
>
I don't see the flag set in the DPDK vhost driver feature bits
(at least in the source).
/* Features supported by this lib. */
#define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
(1ULL << VIRTIO_NET_F_CTRL_RX) | \
(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
(VHOST_SUPPORTS_MQ) | \
(1ULL << VIRTIO_F_VERSION_1) | \
(1ULL << VHOST_F_LOG_ALL) | \
(1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
(1ULL << VIRTIO_NET_F_HOST_TSO4) | \
(1ULL << VIRTIO_NET_F_HOST_TSO6) | \
(1ULL << VIRTIO_NET_F_CSUM) | \
(1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO6))
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-23 20:28 ` Stephen Hemminger
@ 2016-09-25 1:02 ` Michael S. Tsirkin
2016-09-25 1:50 ` Stephen Hemminger
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2016-09-25 1:02 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Maxime Coquelin, yuanhan.liu, huawei.xie, dev, vkaplans
On Fri, Sep 23, 2016 at 01:28:05PM -0700, Stephen Hemminger wrote:
> On Fri, 23 Sep 2016 21:31:27 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Fri, Sep 23, 2016 at 11:24:16AM -0700, Stephen Hemminger wrote:
> > > On Fri, 23 Sep 2016 09:16:49 +0200
> > > Maxime Coquelin <maxime.coquelin@redhat.com> wrote:
> > >
> > > > Indirect descriptors are usually supported by virtio-net devices,
> > > > allowing to dispatch a larger number of requests.
> > > >
> > > > When the virtio device sends a packet using indirect descriptors,
> > > > only one slot is used in the ring, even for large packets.
> > > >
> > > > The main effect is to improve the 0% packet loss benchmark.
> > > > A PVP benchmark using Moongen (64 bytes) on the TE, and testpmd
> > > > (fwd io for host, macswap for VM) on DUT shows a +50% gain for
> > > > zero loss.
> > > >
> > > > On the downside, micro-benchmark using testpmd txonly in VM and
> > > > rxonly on host shows a loss between 1 and 4%.i But depending on
> > > > the needs, feature can be disabled at VM boot time by passing
> > > > indirect_desc=off argument to vhost-user device in Qemu.
> > > >
> > > > Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> > >
> > > What about supporting VIRTIO_F_ANY_LAYOUT?
> >
> > I thought it's already supported.
> > That's required by virtio 1 and dpdk claims support for that.
> >
>
> I don't see the flag set in the DPDK vhost driver feature bits
> (at least in the source).
>
> /* Features supported by this lib. */
> #define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
> (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> (1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> (VHOST_SUPPORTS_MQ) | \
> (1ULL << VIRTIO_F_VERSION_1) | \
> (1ULL << VHOST_F_LOG_ALL) | \
> (1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
> (1ULL << VIRTIO_NET_F_HOST_TSO4) | \
> (1ULL << VIRTIO_NET_F_HOST_TSO6) | \
> (1ULL << VIRTIO_NET_F_CSUM) | \
> (1ULL << VIRTIO_NET_F_GUEST_CSUM) | \
> (1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
> (1ULL << VIRTIO_NET_F_GUEST_TSO6))
I see. It's implied by VERSION_1 in fact.
In other words if VERSION_1 is negotiated then bit 27
isn't ANY_LAYOUT, it's in fact reserved.
--
MST
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-25 1:02 ` Michael S. Tsirkin
@ 2016-09-25 1:50 ` Stephen Hemminger
2016-09-25 1:53 ` Michael S. Tsirkin
0 siblings, 1 reply; 9+ messages in thread
From: Stephen Hemminger @ 2016-09-25 1:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Maxime Coquelin, yuanhan.liu, huawei.xie, dev, vkaplans
On Sun, 25 Sep 2016 04:02:28 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> I see. It's implied by VERSION_1 in fact.
> In other words if VERSION_1 is negotiated then bit 27
> isn't ANY_LAYOUT, it's in fact reserved.
But what if guest isn't using Version 1? Legacy distro's certainly
won't have it enabled.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path
2016-09-25 1:50 ` Stephen Hemminger
@ 2016-09-25 1:53 ` Michael S. Tsirkin
0 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2016-09-25 1:53 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Maxime Coquelin, yuanhan.liu, huawei.xie, dev, vkaplans
On Sat, Sep 24, 2016 at 06:50:07PM -0700, Stephen Hemminger wrote:
> On Sun, 25 Sep 2016 04:02:28 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > I see. It's implied by VERSION_1 in fact.
> > In other words if VERSION_1 is negotiated then bit 27
> > isn't ANY_LAYOUT, it's in fact reserved.
>
>
> But what if guest isn't using Version 1? Legacy distro's certainly
> won't have it enabled.
We probably can just expose ANY_LAYOUT to guest if backend
has VERSION_1.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-09-25 1:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-23 7:16 [dpdk-dev] [PATCH v2] vhost: Add indirect descriptors support to the TX path Maxime Coquelin
2016-09-23 7:29 ` Yuanhan Liu
2016-09-23 7:33 ` Maxime Coquelin
2016-09-23 18:24 ` Stephen Hemminger
2016-09-23 18:31 ` Michael S. Tsirkin
2016-09-23 20:28 ` Stephen Hemminger
2016-09-25 1:02 ` Michael S. Tsirkin
2016-09-25 1:50 ` Stephen Hemminger
2016-09-25 1:53 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).