DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature
@ 2016-09-26  6:40 Yuanhan Liu
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 1/2] vhost: enable " Yuanhan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-09-26  6:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Maxime Coquelin, Michael S. Tsirkin, Yuanhan Liu

The feature is actually supported both in virtio PMD and vhost lib.
We just haven't enabled it yet. This patchset simply enables it.

---
Yuanhan Liu (2):
  vhost: enable any layout feature
  net/virtio: enable any layout feature

 drivers/net/virtio/virtio_ethdev.h | 1 +
 lib/librte_vhost/vhost.c           | 1 +
 lib/librte_vhost/vhost.h           | 3 +++
 3 files changed, 5 insertions(+)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-26  6:40 [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature Yuanhan Liu
@ 2016-09-26  6:40 ` Yuanhan Liu
  2016-09-26 18:01   ` Stephen Hemminger
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 2/2] net/virtio: " Yuanhan Liu
  2016-11-10 15:18 ` [dpdk-dev] [PATCH 0/2] enables vhost/virtio " Michael S. Tsirkin
  2 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-09-26  6:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Maxime Coquelin, Michael S. Tsirkin, Yuanhan Liu

The VIRTIO_F_ANY_LAYOUT feature allows virtio-net header and packet
data in single vring desc if possible. Before that, it is assumed
they will always take two descs.

DPDK vhost removes this assumption since commit bc7f87a2c19f ("vhost:
refactor dequeueing"), meaning we actually support this feature. But
it is not enabled.

Thus, this patch enables it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/librte_vhost/vhost.c | 1 +
 lib/librte_vhost/vhost.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 46095c3..1d8d941 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -57,6 +57,7 @@
 				(1ULL << VIRTIO_NET_F_CTRL_RX) | \
 				(1ULL << VIRTIO_NET_F_GUEST_ANNOUNCE) | \
 				(VHOST_SUPPORTS_MQ)            | \
+				(1ULL << VIRTIO_F_ANY_LAYOUT)  | \
 				(1ULL << VIRTIO_F_VERSION_1)   | \
 				(1ULL << VHOST_F_LOG_ALL)      | \
 				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES) | \
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index c2dfc3c..5ee6c6c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -92,6 +92,9 @@ struct vhost_virtqueue {
  #define VIRTIO_NET_F_GUEST_ANNOUNCE 21
 #endif
 
+#ifndef VIRTIO_F_ANY_LAYOUT
+ #define VIRTIO_F_ANY_LAYOUT 27
+#endif
 
 /*
  * Make an extra wrapper for VIRTIO_NET_F_MQ and
-- 
1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [dpdk-dev] [PATCH 2/2] net/virtio: enable any layout feature
  2016-09-26  6:40 [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature Yuanhan Liu
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 1/2] vhost: enable " Yuanhan Liu
@ 2016-09-26  6:40 ` Yuanhan Liu
  2016-09-26 18:04   ` Stephen Hemminger
  2016-09-29 18:00   ` Michael S. Tsirkin
  2016-11-10 15:18 ` [dpdk-dev] [PATCH 0/2] enables vhost/virtio " Michael S. Tsirkin
  2 siblings, 2 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-09-26  6:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Maxime Coquelin, Michael S. Tsirkin, Yuanhan Liu

The feature VIRTIO_F_ANY_LAYOUT was actually supported by commit
dd856dfcb9e7 ("virtio: use any layout on Tx"). But it's not enabled.
Here this patch enables it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 drivers/net/virtio/virtio_ethdev.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
index 04d626b..3e31e6a 100644
--- a/drivers/net/virtio/virtio_ethdev.h
+++ b/drivers/net/virtio/virtio_ethdev.h
@@ -64,6 +64,7 @@
 	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
 	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
 	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
+	 1u << VIRTIO_F_ANY_LAYOUT |    \
 	 1ULL << VIRTIO_F_VERSION_1)
 
 /*
-- 
1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 1/2] vhost: enable " Yuanhan Liu
@ 2016-09-26 18:01   ` Stephen Hemminger
  2016-09-26 19:24     ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Hemminger @ 2016-09-26 18:01 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michael S. Tsirkin, Maxime Coquelin, dev

I assume that if using Version 1 that the bit will be ignored

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/virtio: enable any layout feature
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 2/2] net/virtio: " Yuanhan Liu
@ 2016-09-26 18:04   ` Stephen Hemminger
  2016-09-29 18:00   ` Michael S. Tsirkin
  1 sibling, 0 replies; 45+ messages in thread
From: Stephen Hemminger @ 2016-09-26 18:04 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michael S. Tsirkin, Maxime Coquelin, dev

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-26 18:01   ` Stephen Hemminger
@ 2016-09-26 19:24     ` Michael S. Tsirkin
  2016-09-27  3:11       ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-26 19:24 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Yuanhan Liu, Maxime Coquelin, dev, qemu-devel

On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> I assume that if using Version 1 that the bit will be ignored
> 

Therein lies a problem. If dpdk tweaks flags, updating it
will break guest migration.

One way is to require that users specify all flags fully when
creating the virtio net device.  QEMU could verify that all required
flags are set, and fail init if not.

This has other advantages, e.g. it adds ability to
init device without waiting for dpdk to connect.

However, enabling each new feature would now require
management work. How about dpdk ships the list
of supported features instead?
Management tools could read them on source and destination
and select features supported on both sides.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-26 19:24     ` Michael S. Tsirkin
@ 2016-09-27  3:11       ` Yuanhan Liu
  2016-09-27 19:48         ` Stephen Hemminger
  2016-09-27 19:56         ` Michael S. Tsirkin
  0 siblings, 2 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-09-27  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > I assume that if using Version 1 that the bit will be ignored

Yes, but I will just quote what you just said: what if the guest
virtio device is a legacy device? I also gave my reasons in another
email why I consistently set this flag:

  - we have to return all features we support to the guest.
  
    We don't know the guest is a modern or legacy device. That means
    we should claim we support both: VERSION_1 and ANY_LAYOUT.
  
    Assume guest is a legacy device and we just set VERSION_1 (the current
    case), ANY_LAYOUT will never be negotiated.
  
  - I'm following the way Linux kernel takes: it also set both features.
  
  Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?

The unset after negotiation I proposed turned out it won't work: the
feature is already negotiated; unsetting it only in vhost side doesn't
change anything. Besides, it may break the migration as Michael stated
below.

> Therein lies a problem. If dpdk tweaks flags, updating it
> will break guest migration.
> 
> One way is to require that users specify all flags fully when
> creating the virtio net device. 

Like how? By a new command line option? And user has to type
all those features?

> QEMU could verify that all required
> flags are set, and fail init if not.
> 
> This has other advantages, e.g. it adds ability to
> init device without waiting for dpdk to connect.
> 
> However, enabling each new feature would now require
> management work. How about dpdk ships the list
> of supported features instead?
> Management tools could read them on source and destination
> and select features supported on both sides.

That means the management tool would somehow has a dependency on
DPDK project, which I have no objection at all. But, is that
a good idea?

BTW, I'm not quite sure I followed your idea. I mean, how it supposed
to fix the ANY_LAYOUT issue here? How this flag will be set for
legacy device?

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-27  3:11       ` Yuanhan Liu
@ 2016-09-27 19:48         ` Stephen Hemminger
  2016-09-27 19:56         ` Michael S. Tsirkin
  1 sibling, 0 replies; 45+ messages in thread
From: Stephen Hemminger @ 2016-09-27 19:48 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michael S. Tsirkin, Maxime Coquelin, dev, qemu-devel

On Tue, 27 Sep 2016 11:11:58 +0800
Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:

> On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:  
> > > I assume that if using Version 1 that the bit will be ignored  
> 
> Yes, but I will just quote what you just said: what if the guest
> virtio device is a legacy device? I also gave my reasons in another
> email why I consistently set this flag:
> 
>   - we have to return all features we support to the guest.
>   
>     We don't know the guest is a modern or legacy device. That means
>     we should claim we support both: VERSION_1 and ANY_LAYOUT.
>   
>     Assume guest is a legacy device and we just set VERSION_1 (the current
>     case), ANY_LAYOUT will never be negotiated.
>   
>   - I'm following the way Linux kernel takes: it also set both features.

Agreed, just do what the Linux kernel does.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-27  3:11       ` Yuanhan Liu
  2016-09-27 19:48         ` Stephen Hemminger
@ 2016-09-27 19:56         ` Michael S. Tsirkin
  2016-09-28  2:28           ` Yuanhan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-27 19:56 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > I assume that if using Version 1 that the bit will be ignored
> 
> Yes, but I will just quote what you just said: what if the guest
> virtio device is a legacy device? I also gave my reasons in another
> email why I consistently set this flag:
> 
>   - we have to return all features we support to the guest.
>   
>     We don't know the guest is a modern or legacy device. That means
>     we should claim we support both: VERSION_1 and ANY_LAYOUT.
>   
>     Assume guest is a legacy device and we just set VERSION_1 (the current
>     case), ANY_LAYOUT will never be negotiated.
>   
>   - I'm following the way Linux kernel takes: it also set both features.
>   
>   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> 
> The unset after negotiation I proposed turned out it won't work: the
> feature is already negotiated; unsetting it only in vhost side doesn't
> change anything. Besides, it may break the migration as Michael stated
> below.

I think the reverse. Teach vhost user that for future machine types
only VERSION_1 implies ANY_LAYOUT.


> > Therein lies a problem. If dpdk tweaks flags, updating it
> > will break guest migration.
> > 
> > One way is to require that users specify all flags fully when
> > creating the virtio net device. 
> 
> Like how? By a new command line option? And user has to type
> all those features?

Make libvirt do this.  users use management normally. those that don't
likely don't migrate VMs.

> > QEMU could verify that all required
> > flags are set, and fail init if not.
> > 
> > This has other advantages, e.g. it adds ability to
> > init device without waiting for dpdk to connect.
> > 
> > However, enabling each new feature would now require
> > management work. How about dpdk ships the list
> > of supported features instead?
> > Management tools could read them on source and destination
> > and select features supported on both sides.
> 
> That means the management tool would somehow has a dependency on
> DPDK project, which I have no objection at all. But, is that
> a good idea?

It already starts the bridge somehow, does it not?

> BTW, I'm not quite sure I followed your idea. I mean, how it supposed
> to fix the ANY_LAYOUT issue here? How this flag will be set for
> legacy device?
> 
> 	--yliu

For ANY_LAYOUT, I think we should just set in in qemu,
but only for new machine types. This addresses migration
concerns.

But there will be more new features in the future and
it is necessary to think how we will enable them without
breaking migration.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-27 19:56         ` Michael S. Tsirkin
@ 2016-09-28  2:28           ` Yuanhan Liu
  2016-09-29 15:30             ` [dpdk-dev] [Qemu-devel] " Maxime Coquelin
  2016-10-09 23:20             ` [dpdk-dev] " Michael S. Tsirkin
  0 siblings, 2 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-09-28  2:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > I assume that if using Version 1 that the bit will be ignored
> > 
> > Yes, but I will just quote what you just said: what if the guest
> > virtio device is a legacy device? I also gave my reasons in another
> > email why I consistently set this flag:
> > 
> >   - we have to return all features we support to the guest.
> >   
> >     We don't know the guest is a modern or legacy device. That means
> >     we should claim we support both: VERSION_1 and ANY_LAYOUT.
> >   
> >     Assume guest is a legacy device and we just set VERSION_1 (the current
> >     case), ANY_LAYOUT will never be negotiated.
> >   
> >   - I'm following the way Linux kernel takes: it also set both features.
> >   
> >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > 
> > The unset after negotiation I proposed turned out it won't work: the
> > feature is already negotiated; unsetting it only in vhost side doesn't
> > change anything. Besides, it may break the migration as Michael stated
> > below.
> 
> I think the reverse. Teach vhost user that for future machine types
> only VERSION_1 implies ANY_LAYOUT.
> 
> 
> > > Therein lies a problem. If dpdk tweaks flags, updating it
> > > will break guest migration.
> > > 
> > > One way is to require that users specify all flags fully when
> > > creating the virtio net device. 
> > 
> > Like how? By a new command line option? And user has to type
> > all those features?
> 
> Make libvirt do this.  users use management normally. those that don't
> likely don't migrate VMs.

Fair enough.

> 
> > > QEMU could verify that all required
> > > flags are set, and fail init if not.
> > > 
> > > This has other advantages, e.g. it adds ability to
> > > init device without waiting for dpdk to connect.

Will the feature negotiation between DPDK and QEMU still exist
in your proposal?

> > > 
> > > However, enabling each new feature would now require
> > > management work. How about dpdk ships the list
> > > of supported features instead?
> > > Management tools could read them on source and destination
> > > and select features supported on both sides.
> > 
> > That means the management tool would somehow has a dependency on
> > DPDK project, which I have no objection at all. But, is that
> > a good idea?
> 
> It already starts the bridge somehow, does it not?

Indeed. I was firstly thinking about reading the dpdk source file
to determine the DPDK supported feature list, with which the bind
is too tight. I later realized you may ask DPDK to provide a binary
to dump the list, or something like that.

> 
> > BTW, I'm not quite sure I followed your idea. I mean, how it supposed
> > to fix the ANY_LAYOUT issue here? How this flag will be set for
> > legacy device?
> > 
> > 	--yliu
> 
> For ANY_LAYOUT, I think we should just set in in qemu,
> but only for new machine types.

What do you mean by "new machine types"? Virtio device with newer
virtio-spec version?

> This addresses migration
> concerns.

To make sure I followed you, do you mean the migration issue from
an older "dpdk + qemu" combo to a newer "dpdk + qemu" combo (that
more new features might be shipped)?

Besides that, your proposal looks like a big work to accomplish.
Are you okay to make it simple first: set it consistently like
what Linux kernel does? This would at least make the ANY_LAYOUT
actually be enabled for legacy device (which is also the default
one that's widely used so far).

	--yliu

> 
> But there will be more new features in the future and
> it is necessary to think how we will enable them without
> breaking migration.
> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-28  2:28           ` Yuanhan Liu
@ 2016-09-29 15:30             ` Maxime Coquelin
  2016-09-29 17:57               ` Michael S. Tsirkin
  2016-10-09 23:20             ` [dpdk-dev] " Michael S. Tsirkin
  1 sibling, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-09-29 15:30 UTC (permalink / raw)
  To: Yuanhan Liu, Michael S. Tsirkin; +Cc: Stephen Hemminger, dev, qemu-devel



On 09/28/2016 04:28 AM, Yuanhan Liu wrote:
> On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
>> On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
>>> On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
>>>> On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
>>>>> I assume that if using Version 1 that the bit will be ignored
>>>
>>> Yes, but I will just quote what you just said: what if the guest
>>> virtio device is a legacy device? I also gave my reasons in another
>>> email why I consistently set this flag:
>>>
>>>   - we have to return all features we support to the guest.
>>>
>>>     We don't know the guest is a modern or legacy device. That means
>>>     we should claim we support both: VERSION_1 and ANY_LAYOUT.
>>>
>>>     Assume guest is a legacy device and we just set VERSION_1 (the current
>>>     case), ANY_LAYOUT will never be negotiated.
>>>
>>>   - I'm following the way Linux kernel takes: it also set both features.
>>>
>>>   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
>>>
>>> The unset after negotiation I proposed turned out it won't work: the
>>> feature is already negotiated; unsetting it only in vhost side doesn't
>>> change anything. Besides, it may break the migration as Michael stated
>>> below.
>>
>> I think the reverse. Teach vhost user that for future machine types
>> only VERSION_1 implies ANY_LAYOUT.
>>
>>
>>>> Therein lies a problem. If dpdk tweaks flags, updating it
>>>> will break guest migration.
>>>>
>>>> One way is to require that users specify all flags fully when
>>>> creating the virtio net device.
>>>
>>> Like how? By a new command line option? And user has to type
>>> all those features?
>>
>> Make libvirt do this.  users use management normally. those that don't
>> likely don't migrate VMs.
>
> Fair enough.
>
>>
>>>> QEMU could verify that all required
>>>> flags are set, and fail init if not.
>>>>
>>>> This has other advantages, e.g. it adds ability to
>>>> init device without waiting for dpdk to connect.
>
> Will the feature negotiation between DPDK and QEMU still exist
> in your proposal?
>
>>>>
>>>> However, enabling each new feature would now require
>>>> management work. How about dpdk ships the list
>>>> of supported features instead?
>>>> Management tools could read them on source and destination
>>>> and select features supported on both sides.
>>>
>>> That means the management tool would somehow has a dependency on
>>> DPDK project, which I have no objection at all. But, is that
>>> a good idea?
>>
>> It already starts the bridge somehow, does it not?
>
> Indeed. I was firstly thinking about reading the dpdk source file
> to determine the DPDK supported feature list, with which the bind
> is too tight. I later realized you may ask DPDK to provide a binary
> to dump the list, or something like that.
>
>>
>>> BTW, I'm not quite sure I followed your idea. I mean, how it supposed
>>> to fix the ANY_LAYOUT issue here? How this flag will be set for
>>> legacy device?
>>>
>>> 	--yliu
>>
>> For ANY_LAYOUT, I think we should just set in in qemu,
>> but only for new machine types.
>
> What do you mean by "new machine types"? Virtio device with newer
> virtio-spec version?
>
>> This addresses migration
>> concerns.
>
> To make sure I followed you, do you mean the migration issue from
> an older "dpdk + qemu" combo to a newer "dpdk + qemu" combo (that
> more new features might be shipped)?
>
> Besides that, your proposal looks like a big work to accomplish.
> Are you okay to make it simple first: set it consistently like
> what Linux kernel does? This would at least make the ANY_LAYOUT
> actually be enabled for legacy device (which is also the default
> one that's widely used so far).

Before enabling anything by default, we should first optimize the 1 slot
case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
perf regression for 64 bytes case:
  - 2 descs per packet: 11.6Mpps
  - 1 desc per packet: 9.6Mpps

This is due to the virtio header clearing in virtqueue_enqueue_xmit().
Removing it, we get better results than with 2 descs (1.20Mpps).
Since the Virtio PMD doesn't support offloads, I wonder whether we can
just drop the memset?

  -- Maxime
[0]: For testing, you'll need these patches, else only first packets
will use a single slot:
  - http://dpdk.org/dev/patchwork/patch/16222/
  - http://dpdk.org/dev/patchwork/patch/16223/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 15:30             ` [dpdk-dev] [Qemu-devel] " Maxime Coquelin
@ 2016-09-29 17:57               ` Michael S. Tsirkin
  2016-09-29 20:05                 ` Maxime Coquelin
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-29 17:57 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel

On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/28/2016 04:28 AM, Yuanhan Liu wrote:
> > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > > I assume that if using Version 1 that the bit will be ignored
> > > > 
> > > > Yes, but I will just quote what you just said: what if the guest
> > > > virtio device is a legacy device? I also gave my reasons in another
> > > > email why I consistently set this flag:
> > > > 
> > > >   - we have to return all features we support to the guest.
> > > > 
> > > >     We don't know the guest is a modern or legacy device. That means
> > > >     we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > > > 
> > > >     Assume guest is a legacy device and we just set VERSION_1 (the current
> > > >     case), ANY_LAYOUT will never be negotiated.
> > > > 
> > > >   - I'm following the way Linux kernel takes: it also set both features.
> > > > 
> > > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > > 
> > > > The unset after negotiation I proposed turned out it won't work: the
> > > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > > change anything. Besides, it may break the migration as Michael stated
> > > > below.
> > > 
> > > I think the reverse. Teach vhost user that for future machine types
> > > only VERSION_1 implies ANY_LAYOUT.
> > > 
> > > 
> > > > > Therein lies a problem. If dpdk tweaks flags, updating it
> > > > > will break guest migration.
> > > > > 
> > > > > One way is to require that users specify all flags fully when
> > > > > creating the virtio net device.
> > > > 
> > > > Like how? By a new command line option? And user has to type
> > > > all those features?
> > > 
> > > Make libvirt do this.  users use management normally. those that don't
> > > likely don't migrate VMs.
> > 
> > Fair enough.
> > 
> > > 
> > > > > QEMU could verify that all required
> > > > > flags are set, and fail init if not.
> > > > > 
> > > > > This has other advantages, e.g. it adds ability to
> > > > > init device without waiting for dpdk to connect.
> > 
> > Will the feature negotiation between DPDK and QEMU still exist
> > in your proposal?
> > 
> > > > > 
> > > > > However, enabling each new feature would now require
> > > > > management work. How about dpdk ships the list
> > > > > of supported features instead?
> > > > > Management tools could read them on source and destination
> > > > > and select features supported on both sides.
> > > > 
> > > > That means the management tool would somehow has a dependency on
> > > > DPDK project, which I have no objection at all. But, is that
> > > > a good idea?
> > > 
> > > It already starts the bridge somehow, does it not?
> > 
> > Indeed. I was firstly thinking about reading the dpdk source file
> > to determine the DPDK supported feature list, with which the bind
> > is too tight. I later realized you may ask DPDK to provide a binary
> > to dump the list, or something like that.
> > 
> > > 
> > > > BTW, I'm not quite sure I followed your idea. I mean, how it supposed
> > > > to fix the ANY_LAYOUT issue here? How this flag will be set for
> > > > legacy device?
> > > > 
> > > > 	--yliu
> > > 
> > > For ANY_LAYOUT, I think we should just set in in qemu,
> > > but only for new machine types.
> > 
> > What do you mean by "new machine types"? Virtio device with newer
> > virtio-spec version?
> > 
> > > This addresses migration
> > > concerns.
> > 
> > To make sure I followed you, do you mean the migration issue from
> > an older "dpdk + qemu" combo to a newer "dpdk + qemu" combo (that
> > more new features might be shipped)?
> > 
> > Besides that, your proposal looks like a big work to accomplish.
> > Are you okay to make it simple first: set it consistently like
> > what Linux kernel does? This would at least make the ANY_LAYOUT
> > actually be enabled for legacy device (which is also the default
> > one that's widely used so far).
> 
> Before enabling anything by default, we should first optimize the 1 slot
> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
> perf regression for 64 bytes case:
>  - 2 descs per packet: 11.6Mpps
>  - 1 desc per packet: 9.6Mpps
> 
> This is due to the virtio header clearing in virtqueue_enqueue_xmit().
> Removing it, we get better results than with 2 descs (1.20Mpps).
> Since the Virtio PMD doesn't support offloads, I wonder whether we can
> just drop the memset?

What will happen? Will the header be uninitialized?

The spec says:
	The driver can send a completely checksummed packet. In this case, flags
	will be zero, and gso_type
	will be VIRTIO_NET_HDR_GSO_NONE.

and
	The driver MUST set num_buffers to zero.
	If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
	zero and SHOULD supply a fully
	checksummed packet to the device.

and
	If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
	negotiated, the driver MUST
	set gso_type to VIRTIO_NET_HDR_GSO_NONE.

so doing this unconditionally would be a spec violation, but if you see
value in this, we can add a feature bit.



>  -- Maxime
> [0]: For testing, you'll need these patches, else only first packets
> will use a single slot:
>  - http://dpdk.org/dev/patchwork/patch/16222/
>  - http://dpdk.org/dev/patchwork/patch/16223/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/virtio: enable any layout feature
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 2/2] net/virtio: " Yuanhan Liu
  2016-09-26 18:04   ` Stephen Hemminger
@ 2016-09-29 18:00   ` Michael S. Tsirkin
  2016-09-29 18:01     ` Michael S. Tsirkin
  1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-29 18:00 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Stephen Hemminger, Maxime Coquelin

On Mon, Sep 26, 2016 at 02:40:56PM +0800, Yuanhan Liu wrote:
> The feature VIRTIO_F_ANY_LAYOUT was actually supported by commit
> dd856dfcb9e7 ("virtio: use any layout on Tx"). But it's not enabled.
> Here this patch enables it.
> 
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Well this will break cross-version migration if just done
unconditionally.
Let's add a protocol feature for this bugfix?

> ---
>  drivers/net/virtio/virtio_ethdev.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> index 04d626b..3e31e6a 100644
> --- a/drivers/net/virtio/virtio_ethdev.h
> +++ b/drivers/net/virtio/virtio_ethdev.h
> @@ -64,6 +64,7 @@
>  	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
>  	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
>  	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
> +	 1u << VIRTIO_F_ANY_LAYOUT |    \
>  	 1ULL << VIRTIO_F_VERSION_1)
>  
>  /*
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 2/2] net/virtio: enable any layout feature
  2016-09-29 18:00   ` Michael S. Tsirkin
@ 2016-09-29 18:01     ` Michael S. Tsirkin
  0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-29 18:01 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Stephen Hemminger, Maxime Coquelin

On Thu, Sep 29, 2016 at 09:00:47PM +0300, Michael S. Tsirkin wrote:
> On Mon, Sep 26, 2016 at 02:40:56PM +0800, Yuanhan Liu wrote:
> > The feature VIRTIO_F_ANY_LAYOUT was actually supported by commit
> > dd856dfcb9e7 ("virtio: use any layout on Tx"). But it's not enabled.
> > Here this patch enables it.
> > 
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> 
> Well this will break cross-version migration if just done
> unconditionally.
> Let's add a protocol feature for this bugfix?

Hmm OTOH the bug is in dpdk not qemu ... this needs some thought.

> > ---
> >  drivers/net/virtio/virtio_ethdev.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/net/virtio/virtio_ethdev.h b/drivers/net/virtio/virtio_ethdev.h
> > index 04d626b..3e31e6a 100644
> > --- a/drivers/net/virtio/virtio_ethdev.h
> > +++ b/drivers/net/virtio/virtio_ethdev.h
> > @@ -64,6 +64,7 @@
> >  	 1u << VIRTIO_NET_F_CTRL_VLAN	  |	\
> >  	 1u << VIRTIO_NET_F_MRG_RXBUF	  |	\
> >  	 1u << VIRTIO_RING_F_INDIRECT_DESC |    \
> > +	 1u << VIRTIO_F_ANY_LAYOUT |    \
> >  	 1ULL << VIRTIO_F_VERSION_1)
> >  
> >  /*
> > -- 
> > 1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 17:57               ` Michael S. Tsirkin
@ 2016-09-29 20:05                 ` Maxime Coquelin
  2016-09-29 20:21                   ` Michael S. Tsirkin
  2016-10-10  3:50                   ` Yuanhan Liu
  0 siblings, 2 replies; 45+ messages in thread
From: Maxime Coquelin @ 2016-09-29 20:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel



On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
...
>>
>> Before enabling anything by default, we should first optimize the 1 slot
>> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
>> perf regression for 64 bytes case:
>>  - 2 descs per packet: 11.6Mpps
>>  - 1 desc per packet: 9.6Mpps
>>
>> This is due to the virtio header clearing in virtqueue_enqueue_xmit().
>> Removing it, we get better results than with 2 descs (1.20Mpps).
>> Since the Virtio PMD doesn't support offloads, I wonder whether we can
>> just drop the memset?
>
> What will happen? Will the header be uninitialized?
Yes..
I didn't look closely at the spec, but just looked at DPDK's and Linux
vhost implementations. IIUC, the header is just skipped in the two
implementations.
>
> The spec says:
> 	The driver can send a completely checksummed packet. In this case, flags
> 	will be zero, and gso_type
> 	will be VIRTIO_NET_HDR_GSO_NONE.
>
> and
> 	The driver MUST set num_buffers to zero.
> 	If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
> 	zero and SHOULD supply a fully
> 	checksummed packet to the device.
>
> and
> 	If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
> 	negotiated, the driver MUST
> 	set gso_type to VIRTIO_NET_HDR_GSO_NONE.
>
> so doing this unconditionally would be a spec violation, but if you see
> value in this, we can add a feature bit.
Right it would be a spec violation, so it should be done conditionally.
If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
If negotiated, we wouldn't need to prepend a header.

 From the micro-benchmarks results, we can expect +10% compared to
indirect descriptors, and + 5% compared to using 2 descs in the
virtqueue.
Also, it should have the same benefits as indirect descriptors for 0%
pkt loss (as we can fill 2x more packets in the virtqueue).

What do you think?

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 20:05                 ` Maxime Coquelin
@ 2016-09-29 20:21                   ` Michael S. Tsirkin
  2016-09-29 21:23                     ` Maxime Coquelin
                                       ` (2 more replies)
  2016-10-10  3:50                   ` Yuanhan Liu
  1 sibling, 3 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-29 20:21 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel

On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
> ...
> > > 
> > > Before enabling anything by default, we should first optimize the 1 slot
> > > case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
> > > perf regression for 64 bytes case:
> > >  - 2 descs per packet: 11.6Mpps
> > >  - 1 desc per packet: 9.6Mpps
> > > 
> > > This is due to the virtio header clearing in virtqueue_enqueue_xmit().
> > > Removing it, we get better results than with 2 descs (1.20Mpps).
> > > Since the Virtio PMD doesn't support offloads, I wonder whether we can
> > > just drop the memset?
> > 
> > What will happen? Will the header be uninitialized?
> Yes..
> I didn't look closely at the spec, but just looked at DPDK's and Linux
> vhost implementations. IIUC, the header is just skipped in the two
> implementations.

In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
first thing it does is
        memset(hdr, 0, sizeof(*hdr));



> > 
> > The spec says:
> > 	The driver can send a completely checksummed packet. In this case, flags
> > 	will be zero, and gso_type
> > 	will be VIRTIO_NET_HDR_GSO_NONE.
> > 
> > and
> > 	The driver MUST set num_buffers to zero.
> > 	If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
> > 	zero and SHOULD supply a fully
> > 	checksummed packet to the device.
> > 
> > and
> > 	If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
> > 	negotiated, the driver MUST
> > 	set gso_type to VIRTIO_NET_HDR_GSO_NONE.
> > 
> > so doing this unconditionally would be a spec violation, but if you see
> > value in this, we can add a feature bit.
> Right it would be a spec violation, so it should be done conditionally.
> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
> If negotiated, we wouldn't need to prepend a header.

Yes but two points.

1. why is this memset expensive? Is the test completely skipping looking
   at the packet otherwise?

2. As long as we are doing this, see
	Alignment vs. Networking
	========================
in Documentation/unaligned-memory-access.txt


> From the micro-benchmarks results, we can expect +10% compared to
> indirect descriptors, and + 5% compared to using 2 descs in the
> virtqueue.
> Also, it should have the same benefits as indirect descriptors for 0%
> pkt loss (as we can fill 2x more packets in the virtqueue).
> 
> What do you think?
> 
> Thanks,
> Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 20:21                   ` Michael S. Tsirkin
@ 2016-09-29 21:23                     ` Maxime Coquelin
  2016-09-30 12:05                       ` Maxime Coquelin
  2016-10-03 14:20                     ` Maxime Coquelin
  2016-10-10  3:37                     ` Yuanhan Liu
  2 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-09-29 21:23 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel



On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
>>> On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
>> ...
>>>>
>>>> Before enabling anything by default, we should first optimize the 1 slot
>>>> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
>>>> perf regression for 64 bytes case:
>>>>  - 2 descs per packet: 11.6Mpps
>>>>  - 1 desc per packet: 9.6Mpps
>>>>
>>>> This is due to the virtio header clearing in virtqueue_enqueue_xmit().
>>>> Removing it, we get better results than with 2 descs (1.20Mpps).
>>>> Since the Virtio PMD doesn't support offloads, I wonder whether we can
>>>> just drop the memset?
>>>
>>> What will happen? Will the header be uninitialized?
>> Yes..
>> I didn't look closely at the spec, but just looked at DPDK's and Linux
>> vhost implementations. IIUC, the header is just skipped in the two
>> implementations.
>
> In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
> first thing it does is
>         memset(hdr, 0, sizeof(*hdr));

I meant in vhost-net linux implementation, the header is just skipped:

static void handle_tx(struct vhost_net *net)
{
...
		/* Skip header. TODO: support TSO. */
		len = iov_length(vq->iov, out);
		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
		iov_iter_advance(&msg.msg_iter, hdr_size);

And the same is done is done in DPDK:

static inline int __attribute__((always_inline))
copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
		  uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
		  struct rte_mempool *mbuf_pool)
{
...
	/*
	 * A virtio driver normally uses at least 2 desc buffers
	 * for Tx: the first for storing the header, and others
	 * for storing the data.
	 */
	if (likely((desc->len == dev->vhost_hlen) &&
		   (desc->flags & VRING_DESC_F_NEXT) != 0)) {
		desc = &descs[desc->next];
		if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
			return -1;

		desc_addr = gpa_to_vva(dev, desc->addr);
		if (unlikely(!desc_addr))
			return -1;

		rte_prefetch0((void *)(uintptr_t)desc_addr);

		desc_offset = 0;
		desc_avail  = desc->len;
		nr_desc    += 1;

		PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
	} else {
		desc_avail  = desc->len - dev->vhost_hlen;
		desc_offset = dev->vhost_hlen;
	}
>
>
>
>>>
>>> The spec says:
>>> 	The driver can send a completely checksummed packet. In this case, flags
>>> 	will be zero, and gso_type
>>> 	will be VIRTIO_NET_HDR_GSO_NONE.
>>>
>>> and
>>> 	The driver MUST set num_buffers to zero.
>>> 	If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
>>> 	zero and SHOULD supply a fully
>>> 	checksummed packet to the device.
>>>
>>> and
>>> 	If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
>>> 	negotiated, the driver MUST
>>> 	set gso_type to VIRTIO_NET_HDR_GSO_NONE.
>>>
>>> so doing this unconditionally would be a spec violation, but if you see
>>> value in this, we can add a feature bit.
>> Right it would be a spec violation, so it should be done conditionally.
>> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
>> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
>> If negotiated, we wouldn't need to prepend a header.
>
> Yes but two points.
>
> 1. why is this memset expensive? Is the test completely skipping looking
>    at the packet otherwise?
Yes.
>
> 2. As long as we are doing this, see
> 	Alignment vs. Networking
> 	========================
> in Documentation/unaligned-memory-access.txt
Thanks, I'll have a look tomorrow.

Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 21:23                     ` Maxime Coquelin
@ 2016-09-30 12:05                       ` Maxime Coquelin
  2016-09-30 19:16                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-09-30 12:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel



On 09/29/2016 11:23 PM, Maxime Coquelin wrote:
>
>
> On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote:
>> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
>>>
>>>
>>> On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
>>> ...
>>>>>
>>>>> Before enabling anything by default, we should first optimize the 1
>>>>> slot
>>>>> case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
>>>>> perf regression for 64 bytes case:
>>>>>  - 2 descs per packet: 11.6Mpps
>>>>>  - 1 desc per packet: 9.6Mpps
>>>>>
>>>>> This is due to the virtio header clearing in virtqueue_enqueue_xmit().
>>>>> Removing it, we get better results than with 2 descs (1.20Mpps).
>>>>> Since the Virtio PMD doesn't support offloads, I wonder whether we can
>>>>> just drop the memset?
>>>>
>>>> What will happen? Will the header be uninitialized?
>>> Yes..
>>> I didn't look closely at the spec, but just looked at DPDK's and Linux
>>> vhost implementations. IIUC, the header is just skipped in the two
>>> implementations.
>>
>> In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
>> first thing it does is
>>         memset(hdr, 0, sizeof(*hdr));
>
> I meant in vhost-net linux implementation, the header is just skipped:
>
> static void handle_tx(struct vhost_net *net)
> {
> ...
>         /* Skip header. TODO: support TSO. */
>         len = iov_length(vq->iov, out);
>         iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
>         iov_iter_advance(&msg.msg_iter, hdr_size);
>
> And the same is done is done in DPDK:
>
> static inline int __attribute__((always_inline))
> copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
>           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
>           struct rte_mempool *mbuf_pool)
> {
> ...
>     /*
>      * A virtio driver normally uses at least 2 desc buffers
>      * for Tx: the first for storing the header, and others
>      * for storing the data.
>      */
>     if (likely((desc->len == dev->vhost_hlen) &&
>            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
>         desc = &descs[desc->next];
>         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>             return -1;
>
>         desc_addr = gpa_to_vva(dev, desc->addr);
>         if (unlikely(!desc_addr))
>             return -1;
>
>         rte_prefetch0((void *)(uintptr_t)desc_addr);
>
>         desc_offset = 0;
>         desc_avail  = desc->len;
>         nr_desc    += 1;
>
>         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
>     } else {
>         desc_avail  = desc->len - dev->vhost_hlen;
>         desc_offset = dev->vhost_hlen;
>     }

Actually, the header is parsed in DPDK vhost implementation.
But as Virtio PMD provides a zero'ed header, we could just parse
the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.

>>
>>
>>
>>>>
>>>> The spec says:
>>>>     The driver can send a completely checksummed packet. In this
>>>> case, flags
>>>>     will be zero, and gso_type
>>>>     will be VIRTIO_NET_HDR_GSO_NONE.
>>>>
>>>> and
>>>>     The driver MUST set num_buffers to zero.
>>>>     If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set
>>>> flags to
>>>>     zero and SHOULD supply a fully
>>>>     checksummed packet to the device.
>>>>
>>>> and
>>>>     If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have
>>>> been
>>>>     negotiated, the driver MUST
>>>>     set gso_type to VIRTIO_NET_HDR_GSO_NONE.
>>>>
>>>> so doing this unconditionally would be a spec violation, but if you see
>>>> value in this, we can add a feature bit.
>>> Right it would be a spec violation, so it should be done conditionally.
>>> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
>>> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
>>> If negotiated, we wouldn't need to prepend a header.
>>
>> Yes but two points.
>>
>> 1. why is this memset expensive? Is the test completely skipping looking
>>    at the packet otherwise?
> Yes.
>>
>> 2. As long as we are doing this, see
>>     Alignment vs. Networking
>>     ========================
>> in Documentation/unaligned-memory-access.txt
> Thanks, I'll have a look tomorrow.

I did a rough prototype which removes Tx headers unconditionally, to
see what gain we could expect. I expect the results to be a little lower
with no headers in full implementation, as some more checks will have
to be done.

For PVP use-case with 0.05% acceptable packets loss:
  - Original (with headers): 9.43Mpps
  - Indirect descs: 9.36 Mpps
  - Prototype (no headers): 10.65Mpps

For PVP use-case with 0% acceptable packets loss:
  - Original (with headers): 5.23Mpps
  - Indirect descs: 7.13 Mpps
  - Prototype (no headers): 7.92Mpps

Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-30 12:05                       ` Maxime Coquelin
@ 2016-09-30 19:16                         ` Michael S. Tsirkin
  2016-10-10  4:05                           ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-09-30 19:16 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel

On Fri, Sep 30, 2016 at 02:05:10PM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/29/2016 11:23 PM, Maxime Coquelin wrote:
> > 
> > 
> > On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote:
> > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > 
> > > > 
> > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
> > > > ...
> > > > > > 
> > > > > > Before enabling anything by default, we should first optimize the 1
> > > > > > slot
> > > > > > case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
> > > > > > perf regression for 64 bytes case:
> > > > > >  - 2 descs per packet: 11.6Mpps
> > > > > >  - 1 desc per packet: 9.6Mpps
> > > > > > 
> > > > > > This is due to the virtio header clearing in virtqueue_enqueue_xmit().
> > > > > > Removing it, we get better results than with 2 descs (1.20Mpps).
> > > > > > Since the Virtio PMD doesn't support offloads, I wonder whether we can
> > > > > > just drop the memset?
> > > > > 
> > > > > What will happen? Will the header be uninitialized?
> > > > Yes..
> > > > I didn't look closely at the spec, but just looked at DPDK's and Linux
> > > > vhost implementations. IIUC, the header is just skipped in the two
> > > > implementations.
> > > 
> > > In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
> > > first thing it does is
> > >         memset(hdr, 0, sizeof(*hdr));
> > 
> > I meant in vhost-net linux implementation, the header is just skipped:
> > 
> > static void handle_tx(struct vhost_net *net)
> > {
> > ...
> >         /* Skip header. TODO: support TSO. */
> >         len = iov_length(vq->iov, out);
> >         iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
> >         iov_iter_advance(&msg.msg_iter, hdr_size);
> > 
> > And the same is done is done in DPDK:
> > 
> > static inline int __attribute__((always_inline))
> > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> >           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> >           struct rte_mempool *mbuf_pool)
> > {
> > ...
> >     /*
> >      * A virtio driver normally uses at least 2 desc buffers
> >      * for Tx: the first for storing the header, and others
> >      * for storing the data.
> >      */
> >     if (likely((desc->len == dev->vhost_hlen) &&
> >            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> >         desc = &descs[desc->next];
> >         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> >             return -1;
> > 
> >         desc_addr = gpa_to_vva(dev, desc->addr);
> >         if (unlikely(!desc_addr))
> >             return -1;
> > 
> >         rte_prefetch0((void *)(uintptr_t)desc_addr);
> > 
> >         desc_offset = 0;
> >         desc_avail  = desc->len;
> >         nr_desc    += 1;
> > 
> >         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> >     } else {
> >         desc_avail  = desc->len - dev->vhost_hlen;
> >         desc_offset = dev->vhost_hlen;
> >     }
> 
> Actually, the header is parsed in DPDK vhost implementation.
> But as Virtio PMD provides a zero'ed header, we could just parse
> the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.

host can always skip the header parse if it wants to.
It didn't seem worth it to add branches there but
if I'm wrong, by all means code it up.


> > > 
> > > 
> > > 
> > > > > 
> > > > > The spec says:
> > > > >     The driver can send a completely checksummed packet. In this
> > > > > case, flags
> > > > >     will be zero, and gso_type
> > > > >     will be VIRTIO_NET_HDR_GSO_NONE.
> > > > > 
> > > > > and
> > > > >     The driver MUST set num_buffers to zero.
> > > > >     If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set
> > > > > flags to
> > > > >     zero and SHOULD supply a fully
> > > > >     checksummed packet to the device.
> > > > > 
> > > > > and
> > > > >     If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have
> > > > > been
> > > > >     negotiated, the driver MUST
> > > > >     set gso_type to VIRTIO_NET_HDR_GSO_NONE.
> > > > > 
> > > > > so doing this unconditionally would be a spec violation, but if you see
> > > > > value in this, we can add a feature bit.
> > > > Right it would be a spec violation, so it should be done conditionally.
> > > > If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
> > > > It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
> > > > If negotiated, we wouldn't need to prepend a header.
> > > 
> > > Yes but two points.
> > > 
> > > 1. why is this memset expensive? Is the test completely skipping looking
> > >    at the packet otherwise?
> > Yes.
> > > 
> > > 2. As long as we are doing this, see
> > >     Alignment vs. Networking
> > >     ========================
> > > in Documentation/unaligned-memory-access.txt
> > Thanks, I'll have a look tomorrow.
> 
> I did a rough prototype which removes Tx headers unconditionally, to
> see what gain we could expect. I expect the results to be a little lower
> with no headers in full implementation, as some more checks will have
> to be done.
> 
> For PVP use-case with 0.05% acceptable packets loss:
>  - Original (with headers): 9.43Mpps
>  - Indirect descs: 9.36 Mpps
>  - Prototype (no headers): 10.65Mpps
> 
> For PVP use-case with 0% acceptable packets loss:
>  - Original (with headers): 5.23Mpps
>  - Indirect descs: 7.13 Mpps
>  - Prototype (no headers): 7.92Mpps
> 
> Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 20:21                   ` Michael S. Tsirkin
  2016-09-29 21:23                     ` Maxime Coquelin
@ 2016-10-03 14:20                     ` Maxime Coquelin
  2016-10-10  3:37                     ` Yuanhan Liu
  2 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2016-10-03 14:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Yuanhan Liu, Stephen Hemminger, dev, qemu-devel



On 09/29/2016 10:21 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
>> >
>> >
>> > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
>>> > > On Thu, Sep 29, 2016 at 05:30:53PM +0200, Maxime Coquelin wrote:
>> > ...
>>>> > > >
>>>> > > > Before enabling anything by default, we should first optimize the 1 slot
>>>> > > > case. Indeed, micro-benchmark using testpmd in txonly[0] shows ~17%
>>>> > > > perf regression for 64 bytes case:
>>>> > > >  - 2 descs per packet: 11.6Mpps
>>>> > > >  - 1 desc per packet: 9.6Mpps
>>>> > > >
>>>> > > > This is due to the virtio header clearing in virtqueue_enqueue_xmit().
>>>> > > > Removing it, we get better results than with 2 descs (1.20Mpps).
>>>> > > > Since the Virtio PMD doesn't support offloads, I wonder whether we can
>>>> > > > just drop the memset?
>>> > >
>>> > > What will happen? Will the header be uninitialized?
>> > Yes..
>> > I didn't look closely at the spec, but just looked at DPDK's and Linux
>> > vhost implementations. IIUC, the header is just skipped in the two
>> > implementations.
> In linux guest skbs are initialized AFAIK. See virtio_net_hdr_from_skb
> first thing it does is
>         memset(hdr, 0, sizeof(*hdr));
>
>
>
>>> > >
>>> > > The spec says:
>>> > > 	The driver can send a completely checksummed packet. In this case, flags
>>> > > 	will be zero, and gso_type
>>> > > 	will be VIRTIO_NET_HDR_GSO_NONE.
>>> > >
>>> > > and
>>> > > 	The driver MUST set num_buffers to zero.
>>> > > 	If VIRTIO_NET_F_CSUM is not negotiated, the driver MUST set flags to
>>> > > 	zero and SHOULD supply a fully
>>> > > 	checksummed packet to the device.
>>> > >
>>> > > and
>>> > > 	If none of the VIRTIO_NET_F_HOST_TSO4, TSO6 or UFO options have been
>>> > > 	negotiated, the driver MUST
>>> > > 	set gso_type to VIRTIO_NET_HDR_GSO_NONE.
>>> > >
>>> > > so doing this unconditionally would be a spec violation, but if you see
>>> > > value in this, we can add a feature bit.
>> > Right it would be a spec violation, so it should be done conditionally.
>> > If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
>> > It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
>> > If negotiated, we wouldn't need to prepend a header.
> Yes but two points.
>
> 1. why is this memset expensive? Is the test completely skipping looking
>    at the packet otherwise?
>
> 2. As long as we are doing this, see
> 	Alignment vs. Networking
> 	========================
> in Documentation/unaligned-memory-access.txt

This change will not have an impact on the IP header alignment,
as is offset in the mbuf will not change.

Regards,
Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-09-28  2:28           ` Yuanhan Liu
  2016-09-29 15:30             ` [dpdk-dev] [Qemu-devel] " Maxime Coquelin
@ 2016-10-09 23:20             ` Michael S. Tsirkin
  2016-10-10  3:03               ` Yuanhan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-09 23:20 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > I assume that if using Version 1 that the bit will be ignored
> > > 
> > > Yes, but I will just quote what you just said: what if the guest
> > > virtio device is a legacy device? I also gave my reasons in another
> > > email why I consistently set this flag:
> > > 
> > >   - we have to return all features we support to the guest.
> > >   
> > >     We don't know the guest is a modern or legacy device. That means
> > >     we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > >   
> > >     Assume guest is a legacy device and we just set VERSION_1 (the current
> > >     case), ANY_LAYOUT will never be negotiated.
> > >   
> > >   - I'm following the way Linux kernel takes: it also set both features.
> > >   
> > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > 
> > > The unset after negotiation I proposed turned out it won't work: the
> > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > change anything. Besides, it may break the migration as Michael stated
> > > below.
> > 
> > I think the reverse. Teach vhost user that for future machine types
> > only VERSION_1 implies ANY_LAYOUT.

So I guess at this point, we can teach vhost-user in qemu
that version 1 implies any_layout but only for machine types
qemu 2.8 and up. It sets a bad precedent but oh well.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-10-09 23:20             ` [dpdk-dev] " Michael S. Tsirkin
@ 2016-10-10  3:03               ` Yuanhan Liu
  2016-10-10  3:04                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  3:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Mon, Oct 10, 2016 at 02:20:22AM +0300, Michael S. Tsirkin wrote:
> On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > > I assume that if using Version 1 that the bit will be ignored
> > > > 
> > > > Yes, but I will just quote what you just said: what if the guest
> > > > virtio device is a legacy device? I also gave my reasons in another
> > > > email why I consistently set this flag:
> > > > 
> > > >   - we have to return all features we support to the guest.
> > > >   
> > > >     We don't know the guest is a modern or legacy device. That means
> > > >     we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > > >   
> > > >     Assume guest is a legacy device and we just set VERSION_1 (the current
> > > >     case), ANY_LAYOUT will never be negotiated.
> > > >   
> > > >   - I'm following the way Linux kernel takes: it also set both features.
> > > >   
> > > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > > 
> > > > The unset after negotiation I proposed turned out it won't work: the
> > > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > > change anything. Besides, it may break the migration as Michael stated
> > > > below.
> > > 
> > > I think the reverse. Teach vhost user that for future machine types
> > > only VERSION_1 implies ANY_LAYOUT.
> 
> So I guess at this point, we can teach vhost-user in qemu
> that version 1 implies any_layout but only for machine types
> qemu 2.8 and up. It sets a bad precedent but oh well.

It should work.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  3:03               ` Yuanhan Liu
@ 2016-10-10  3:04                 ` Michael S. Tsirkin
  2016-10-10  3:10                   ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  3:04 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Mon, Oct 10, 2016 at 11:03:33AM +0800, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 02:20:22AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Sep 28, 2016 at 10:28:48AM +0800, Yuanhan Liu wrote:
> > > On Tue, Sep 27, 2016 at 10:56:40PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 27, 2016 at 11:11:58AM +0800, Yuanhan Liu wrote:
> > > > > On Mon, Sep 26, 2016 at 10:24:55PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 26, 2016 at 11:01:58AM -0700, Stephen Hemminger wrote:
> > > > > > > I assume that if using Version 1 that the bit will be ignored
> > > > > 
> > > > > Yes, but I will just quote what you just said: what if the guest
> > > > > virtio device is a legacy device? I also gave my reasons in another
> > > > > email why I consistently set this flag:
> > > > > 
> > > > >   - we have to return all features we support to the guest.
> > > > >   
> > > > >     We don't know the guest is a modern or legacy device. That means
> > > > >     we should claim we support both: VERSION_1 and ANY_LAYOUT.
> > > > >   
> > > > >     Assume guest is a legacy device and we just set VERSION_1 (the current
> > > > >     case), ANY_LAYOUT will never be negotiated.
> > > > >   
> > > > >   - I'm following the way Linux kernel takes: it also set both features.
> > > > >   
> > > > >   Maybe, we could unset ANY_LAYOUT when VERSION_1 is _negotiated_?
> > > > > 
> > > > > The unset after negotiation I proposed turned out it won't work: the
> > > > > feature is already negotiated; unsetting it only in vhost side doesn't
> > > > > change anything. Besides, it may break the migration as Michael stated
> > > > > below.
> > > > 
> > > > I think the reverse. Teach vhost user that for future machine types
> > > > only VERSION_1 implies ANY_LAYOUT.
> > 
> > So I guess at this point, we can teach vhost-user in qemu
> > that version 1 implies any_layout but only for machine types
> > qemu 2.8 and up. It sets a bad precedent but oh well.
> 
> It should work.
> 
> 	--yliu

Cool. Want to post a patch?

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  3:04                 ` Michael S. Tsirkin
@ 2016-10-10  3:10                   ` Yuanhan Liu
  0 siblings, 0 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  3:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stephen Hemminger, Maxime Coquelin, dev, qemu-devel

On Mon, Oct 10, 2016 at 06:04:32AM +0300, Michael S. Tsirkin wrote:
> > > So I guess at this point, we can teach vhost-user in qemu
> > > that version 1 implies any_layout but only for machine types
> > > qemu 2.8 and up. It sets a bad precedent but oh well.
> > 
> > It should work.
> > 
> > 	--yliu
> 
> Cool. Want to post a patch?

I could have a try this week (Well, it's very unlikely though).
If not, it will be postponed for a while: I am traveling next week.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 20:21                   ` Michael S. Tsirkin
  2016-09-29 21:23                     ` Maxime Coquelin
  2016-10-03 14:20                     ` Maxime Coquelin
@ 2016-10-10  3:37                     ` Yuanhan Liu
  2016-10-10  3:46                       ` Michael S. Tsirkin
  2 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  3:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel, Wang, Zhihong

On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > 
> > 
> > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> Yes but two points.
> 
> 1. why is this memset expensive?

I don't have the exact answer, but just some rough thoughts:

It's an external clib function: there is a call stack and the
IP register will bounch back and forth. BTW, It's kind of an
overkill to use that for resetting 14 bytes structure.

Some trick like
    *(struct virtio_net_hdr *)hdr = {0, };

Or even 
    hdr->xxx = 0;
    hdr->yyy = 0;

should behaviour better.

There was an example: the vhost enqueue optmization patchset from
Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
on my Ivybridge server: it has no such issue on his server though.

[0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html

	--yliu

> Is the test completely skipping looking
>    at the packet otherwise?
> 
> 2. As long as we are doing this, see
> 	Alignment vs. Networking
> 	========================
> in Documentation/unaligned-memory-access.txt
> 
> 
> > From the micro-benchmarks results, we can expect +10% compared to
> > indirect descriptors, and + 5% compared to using 2 descs in the
> > virtqueue.
> > Also, it should have the same benefits as indirect descriptors for 0%
> > pkt loss (as we can fill 2x more packets in the virtqueue).
> > 
> > What do you think?
> > 
> > Thanks,
> > Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  3:37                     ` Yuanhan Liu
@ 2016-10-10  3:46                       ` Michael S. Tsirkin
  2016-10-10  3:59                         ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  3:46 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel, Wang, Zhihong

On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > Yes but two points.
> > 
> > 1. why is this memset expensive?
> 
> I don't have the exact answer, but just some rough thoughts:
> 
> It's an external clib function: there is a call stack and the
> IP register will bounch back and forth.

for memset 0?  gcc 5.3.1 on fedora happily inlines it.

> BTW, It's kind of an
> overkill to use that for resetting 14 bytes structure.
> 
> Some trick like
>     *(struct virtio_net_hdr *)hdr = {0, };
> 
> Or even 
>     hdr->xxx = 0;
>     hdr->yyy = 0;
> 
> should behaviour better.
> 
> There was an example: the vhost enqueue optmization patchset from
> Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> on my Ivybridge server: it has no such issue on his server though.
> 
> [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> 
> 	--yliu

I'd say that's weird. what's your config? any chance you
are using an old compiler?


> > Is the test completely skipping looking
> >    at the packet otherwise?
> > 
> > 2. As long as we are doing this, see
> > 	Alignment vs. Networking
> > 	========================
> > in Documentation/unaligned-memory-access.txt
> > 
> > 
> > > From the micro-benchmarks results, we can expect +10% compared to
> > > indirect descriptors, and + 5% compared to using 2 descs in the
> > > virtqueue.
> > > Also, it should have the same benefits as indirect descriptors for 0%
> > > pkt loss (as we can fill 2x more packets in the virtqueue).
> > > 
> > > What do you think?
> > > 
> > > Thanks,
> > > Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-29 20:05                 ` Maxime Coquelin
  2016-09-29 20:21                   ` Michael S. Tsirkin
@ 2016-10-10  3:50                   ` Yuanhan Liu
  1 sibling, 0 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  3:50 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel

On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> >so doing this unconditionally would be a spec violation, but if you see
> >value in this, we can add a feature bit.
> Right it would be a spec violation, so it should be done conditionally.
> If a feature bit is to be added, what about VIRTIO_NET_F_NO_TX_HEADER?
> It would imply VIRTIO_NET_F_CSUM not set, and no GSO features set.
> If negotiated, we wouldn't need to prepend a header.

If we could skip Tx header, I think we could also skip Rx header, in the
case when mrg-rx is aslo turned off?

> From the micro-benchmarks results, we can expect +10% compared to
> indirect descriptors, and + 5% compared to using 2 descs in the
> virtqueue.
> Also, it should have the same benefits as indirect descriptors for 0%
> pkt loss (as we can fill 2x more packets in the virtqueue).
> 
> What do you think?

I would vote for this. It should yield maximum performance for the case
that it's guaranteed that packet size will always fit in a typical MTU
(1500).

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  3:46                       ` Michael S. Tsirkin
@ 2016-10-10  3:59                         ` Yuanhan Liu
  2016-10-10  4:16                           ` Wang, Zhihong
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  3:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel, Wang, Zhihong

On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > 
> > > > 
> > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > Yes but two points.
> > > 
> > > 1. why is this memset expensive?
> > 
> > I don't have the exact answer, but just some rough thoughts:
> > 
> > It's an external clib function: there is a call stack and the
> > IP register will bounch back and forth.
> 
> for memset 0?  gcc 5.3.1 on fedora happily inlines it.

Good to know!

> > overkill to use that for resetting 14 bytes structure.
> > 
> > Some trick like
> >     *(struct virtio_net_hdr *)hdr = {0, };
> > 
> > Or even 
> >     hdr->xxx = 0;
> >     hdr->yyy = 0;
> > 
> > should behaviour better.
> > 
> > There was an example: the vhost enqueue optmization patchset from
> > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > on my Ivybridge server: it has no such issue on his server though.
> > 
> > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > 
> > 	--yliu
> 
> I'd say that's weird. what's your config? any chance you
> are using an old compiler?

Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
he said the memset is not well optimized for Ivybridge server.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-09-30 19:16                         ` Michael S. Tsirkin
@ 2016-10-10  4:05                           ` Yuanhan Liu
  2016-10-10  4:17                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  4:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > And the same is done is done in DPDK:
> > > 
> > > static inline int __attribute__((always_inline))
> > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > >           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > >           struct rte_mempool *mbuf_pool)
> > > {
> > > ...
> > >     /*
> > >      * A virtio driver normally uses at least 2 desc buffers
> > >      * for Tx: the first for storing the header, and others
> > >      * for storing the data.
> > >      */
> > >     if (likely((desc->len == dev->vhost_hlen) &&
> > >            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > >         desc = &descs[desc->next];
> > >         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > >             return -1;
> > > 
> > >         desc_addr = gpa_to_vva(dev, desc->addr);
> > >         if (unlikely(!desc_addr))
> > >             return -1;
> > > 
> > >         rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > 
> > >         desc_offset = 0;
> > >         desc_avail  = desc->len;
> > >         nr_desc    += 1;
> > > 
> > >         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > >     } else {
> > >         desc_avail  = desc->len - dev->vhost_hlen;
> > >         desc_offset = dev->vhost_hlen;
> > >     }
> > 
> > Actually, the header is parsed in DPDK vhost implementation.
> > But as Virtio PMD provides a zero'ed header, we could just parse
> > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> 
> host can always skip the header parse if it wants to.
> It didn't seem worth it to add branches there but
> if I'm wrong, by all means code it up.

It's added by following commit, which yields about 10% performance
boosts for PVP case (with 64B packet size).

At that time, a packet always use 2 descs. Since indirect desc is
enabled (by default) now, the assumption is not true then. What's
worse, it might even slow things a bit down. That should also be
part of the reason why performance is slightly worse than before.

	--yliu

commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date:   Mon May 2 17:46:17 2016 -0700

    vhost: optimize dequeue for small packets

    A virtio driver normally uses at least 2 desc buffers for Tx: the
    first for storing the header, and the others for storing the data.

    Therefore, we could fetch the first data desc buf before the main
    loop, and do the copy first before the check of "are we done yet?".
    This could save one check for small packets that just have one data
    desc buffer and need one mbuf to store it.

    Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
    Acked-by: Huawei Xie <huawei.xie@intel.com>
    Tested-by: Rich Lane <rich.lane@bigswitch.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  3:59                         ` Yuanhan Liu
@ 2016-10-10  4:16                           ` Wang, Zhihong
  2016-10-10  4:24                             ` Michael S. Tsirkin
  2016-10-10  4:39                             ` Michael S. Tsirkin
  0 siblings, 2 replies; 45+ messages in thread
From: Wang, Zhihong @ 2016-10-10  4:16 UTC (permalink / raw)
  To: Yuanhan Liu, Michael S. Tsirkin
  Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel



> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Monday, October 10, 2016 11:59 AM
> To: Michael S. Tsirkin <mst@redhat.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Stephen Hemminger
> <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> devel@nongnu.org; Wang, Zhihong <zhihong.wang@intel.com>
> Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> 
> On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > >
> > > > >
> > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > Yes but two points.
> > > >
> > > > 1. why is this memset expensive?
> > >
> > > I don't have the exact answer, but just some rough thoughts:
> > >
> > > It's an external clib function: there is a call stack and the
> > > IP register will bounch back and forth.
> >
> > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> 
> Good to know!
> 
> > > overkill to use that for resetting 14 bytes structure.
> > >
> > > Some trick like
> > >     *(struct virtio_net_hdr *)hdr = {0, };
> > >
> > > Or even
> > >     hdr->xxx = 0;
> > >     hdr->yyy = 0;
> > >
> > > should behaviour better.
> > >
> > > There was an example: the vhost enqueue optmization patchset from
> > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > on my Ivybridge server: it has no such issue on his server though.
> > >
> > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > >
> > > 	--yliu
> >
> > I'd say that's weird. what's your config? any chance you
> > are using an old compiler?
> 
> Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> he said the memset is not well optimized for Ivybridge server.

The dst is remote in that case. It's fine on Haswell but has complication
in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.

I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.

> 
> 	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:05                           ` Yuanhan Liu
@ 2016-10-10  4:17                             ` Michael S. Tsirkin
  2016-10-10  4:22                               ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  4:17 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > And the same is done is done in DPDK:
> > > > 
> > > > static inline int __attribute__((always_inline))
> > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > >           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > >           struct rte_mempool *mbuf_pool)
> > > > {
> > > > ...
> > > >     /*
> > > >      * A virtio driver normally uses at least 2 desc buffers
> > > >      * for Tx: the first for storing the header, and others
> > > >      * for storing the data.
> > > >      */
> > > >     if (likely((desc->len == dev->vhost_hlen) &&
> > > >            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > >         desc = &descs[desc->next];
> > > >         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > >             return -1;
> > > > 
> > > >         desc_addr = gpa_to_vva(dev, desc->addr);
> > > >         if (unlikely(!desc_addr))
> > > >             return -1;
> > > > 
> > > >         rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > 
> > > >         desc_offset = 0;
> > > >         desc_avail  = desc->len;
> > > >         nr_desc    += 1;
> > > > 
> > > >         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > >     } else {
> > > >         desc_avail  = desc->len - dev->vhost_hlen;
> > > >         desc_offset = dev->vhost_hlen;
> > > >     }
> > > 
> > > Actually, the header is parsed in DPDK vhost implementation.
> > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > 
> > host can always skip the header parse if it wants to.
> > It didn't seem worth it to add branches there but
> > if I'm wrong, by all means code it up.
> 
> It's added by following commit, which yields about 10% performance
> boosts for PVP case (with 64B packet size).
> 
> At that time, a packet always use 2 descs. Since indirect desc is
> enabled (by default) now, the assumption is not true then. What's
> worse, it might even slow things a bit down. That should also be
> part of the reason why performance is slightly worse than before.
> 
> 	--yliu

I'm not sure I get what you are saying

> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Date:   Mon May 2 17:46:17 2016 -0700
> 
>     vhost: optimize dequeue for small packets
> 
>     A virtio driver normally uses at least 2 desc buffers for Tx: the
>     first for storing the header, and the others for storing the data.
> 
>     Therefore, we could fetch the first data desc buf before the main
>     loop, and do the copy first before the check of "are we done yet?".
>     This could save one check for small packets that just have one data
>     desc buffer and need one mbuf to store it.
> 
>     Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>     Acked-by: Huawei Xie <huawei.xie@intel.com>
>     Tested-by: Rich Lane <rich.lane@bigswitch.com>

This fast-paths the 2-descriptors format but it's not active
for indirect descriptors. Is this what you mean?
Should be a simple matter to apply this optimization for indirect.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:17                             ` Michael S. Tsirkin
@ 2016-10-10  4:22                               ` Yuanhan Liu
  2016-10-10  4:25                                 ` Michael S. Tsirkin
  2016-10-10 12:40                                 ` Maxime Coquelin
  0 siblings, 2 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10  4:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
> On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> > On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > > And the same is done is done in DPDK:
> > > > > 
> > > > > static inline int __attribute__((always_inline))
> > > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > > >           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > > >           struct rte_mempool *mbuf_pool)
> > > > > {
> > > > > ...
> > > > >     /*
> > > > >      * A virtio driver normally uses at least 2 desc buffers
> > > > >      * for Tx: the first for storing the header, and others
> > > > >      * for storing the data.
> > > > >      */
> > > > >     if (likely((desc->len == dev->vhost_hlen) &&
> > > > >            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > > >         desc = &descs[desc->next];
> > > > >         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > > >             return -1;
> > > > > 
> > > > >         desc_addr = gpa_to_vva(dev, desc->addr);
> > > > >         if (unlikely(!desc_addr))
> > > > >             return -1;
> > > > > 
> > > > >         rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > > 
> > > > >         desc_offset = 0;
> > > > >         desc_avail  = desc->len;
> > > > >         nr_desc    += 1;
> > > > > 
> > > > >         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > > >     } else {
> > > > >         desc_avail  = desc->len - dev->vhost_hlen;
> > > > >         desc_offset = dev->vhost_hlen;
> > > > >     }
> > > > 
> > > > Actually, the header is parsed in DPDK vhost implementation.
> > > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > > 
> > > host can always skip the header parse if it wants to.
> > > It didn't seem worth it to add branches there but
> > > if I'm wrong, by all means code it up.
> > 
> > It's added by following commit, which yields about 10% performance
> > boosts for PVP case (with 64B packet size).
> > 
> > At that time, a packet always use 2 descs. Since indirect desc is
> > enabled (by default) now, the assumption is not true then. What's
> > worse, it might even slow things a bit down. That should also be
> > part of the reason why performance is slightly worse than before.
> > 
> > 	--yliu
> 
> I'm not sure I get what you are saying
> 
> > commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> > Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > Date:   Mon May 2 17:46:17 2016 -0700
> > 
> >     vhost: optimize dequeue for small packets
> > 
> >     A virtio driver normally uses at least 2 desc buffers for Tx: the
> >     first for storing the header, and the others for storing the data.
> > 
> >     Therefore, we could fetch the first data desc buf before the main
> >     loop, and do the copy first before the check of "are we done yet?".
> >     This could save one check for small packets that just have one data
> >     desc buffer and need one mbuf to store it.
> > 
> >     Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >     Acked-by: Huawei Xie <huawei.xie@intel.com>
> >     Tested-by: Rich Lane <rich.lane@bigswitch.com>
> 
> This fast-paths the 2-descriptors format but it's not active
> for indirect descriptors. Is this what you mean?

Yes. It's also not active when ANY_LAYOUT is actually turned on.

> Should be a simple matter to apply this optimization for indirect.

Might be.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:16                           ` Wang, Zhihong
@ 2016-10-10  4:24                             ` Michael S. Tsirkin
  2016-10-10  4:39                             ` Michael S. Tsirkin
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  4:24 UTC (permalink / raw)
  To: Wang, Zhihong
  Cc: Yuanhan Liu, Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 04:16:19AM +0000, Wang, Zhihong wrote:
> 
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Monday, October 10, 2016 11:59 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> > devel@nongnu.org; Wang, Zhihong <zhihong.wang@intel.com>
> > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> > 
> > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > Yes but two points.
> > > > >
> > > > > 1. why is this memset expensive?
> > > >
> > > > I don't have the exact answer, but just some rough thoughts:
> > > >
> > > > It's an external clib function: there is a call stack and the
> > > > IP register will bounch back and forth.
> > >
> > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > 
> > Good to know!
> > 
> > > > overkill to use that for resetting 14 bytes structure.
> > > >
> > > > Some trick like
> > > >     *(struct virtio_net_hdr *)hdr = {0, };
> > > >
> > > > Or even
> > > >     hdr->xxx = 0;
> > > >     hdr->yyy = 0;
> > > >
> > > > should behaviour better.
> > > >
> > > > There was an example: the vhost enqueue optmization patchset from
> > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > on my Ivybridge server: it has no such issue on his server though.
> > > >
> > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > >
> > > > 	--yliu
> > >
> > > I'd say that's weird. what's your config? any chance you
> > > are using an old compiler?
> > 
> > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > he said the memset is not well optimized for Ivybridge server.
> 
> The dst is remote in that case. It's fine on Haswell but has complication
> in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> 
> I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.

I just wrote some test code and compiled with gcc -O2,
it did get inlined.

It's probably this:
        uint16_t head_size = vq->hw->vtnet_hdr_size;
...
                memset(hdr, 0, head_size);
IOW head_size is not known to compiler.

Try sticking a bool there instead of vtnet_hdr_size, and
	 memset(hdr, 0, bigheader ? 10 : 12);


> > 
> > 	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:22                               ` Yuanhan Liu
@ 2016-10-10  4:25                                 ` Michael S. Tsirkin
  2016-10-10 12:40                                 ` Maxime Coquelin
  1 sibling, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  4:25 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 12:22:09PM +0800, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
> > On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
> > > On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
> > > > > > And the same is done is done in DPDK:
> > > > > > 
> > > > > > static inline int __attribute__((always_inline))
> > > > > > copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
> > > > > >           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
> > > > > >           struct rte_mempool *mbuf_pool)
> > > > > > {
> > > > > > ...
> > > > > >     /*
> > > > > >      * A virtio driver normally uses at least 2 desc buffers
> > > > > >      * for Tx: the first for storing the header, and others
> > > > > >      * for storing the data.
> > > > > >      */
> > > > > >     if (likely((desc->len == dev->vhost_hlen) &&
> > > > > >            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
> > > > > >         desc = &descs[desc->next];
> > > > > >         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
> > > > > >             return -1;
> > > > > > 
> > > > > >         desc_addr = gpa_to_vva(dev, desc->addr);
> > > > > >         if (unlikely(!desc_addr))
> > > > > >             return -1;
> > > > > > 
> > > > > >         rte_prefetch0((void *)(uintptr_t)desc_addr);
> > > > > > 
> > > > > >         desc_offset = 0;
> > > > > >         desc_avail  = desc->len;
> > > > > >         nr_desc    += 1;
> > > > > > 
> > > > > >         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
> > > > > >     } else {
> > > > > >         desc_avail  = desc->len - dev->vhost_hlen;
> > > > > >         desc_offset = dev->vhost_hlen;
> > > > > >     }
> > > > > 
> > > > > Actually, the header is parsed in DPDK vhost implementation.
> > > > > But as Virtio PMD provides a zero'ed header, we could just parse
> > > > > the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
> > > > 
> > > > host can always skip the header parse if it wants to.
> > > > It didn't seem worth it to add branches there but
> > > > if I'm wrong, by all means code it up.
> > > 
> > > It's added by following commit, which yields about 10% performance
> > > boosts for PVP case (with 64B packet size).
> > > 
> > > At that time, a packet always use 2 descs. Since indirect desc is
> > > enabled (by default) now, the assumption is not true then. What's
> > > worse, it might even slow things a bit down. That should also be
> > > part of the reason why performance is slightly worse than before.
> > > 
> > > 	--yliu
> > 
> > I'm not sure I get what you are saying
> > 
> > > commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> > > Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > > Date:   Mon May 2 17:46:17 2016 -0700
> > > 
> > >     vhost: optimize dequeue for small packets
> > > 
> > >     A virtio driver normally uses at least 2 desc buffers for Tx: the
> > >     first for storing the header, and the others for storing the data.
> > > 
> > >     Therefore, we could fetch the first data desc buf before the main
> > >     loop, and do the copy first before the check of "are we done yet?".
> > >     This could save one check for small packets that just have one data
> > >     desc buffer and need one mbuf to store it.
> > > 
> > >     Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> > >     Acked-by: Huawei Xie <huawei.xie@intel.com>
> > >     Tested-by: Rich Lane <rich.lane@bigswitch.com>
> > 
> > This fast-paths the 2-descriptors format but it's not active
> > for indirect descriptors. Is this what you mean?
> 
> Yes. It's also not active when ANY_LAYOUT is actually turned on.

It's not needed there though - you only use 1 desc, no point in
fetching the next one.


> > Should be a simple matter to apply this optimization for indirect.
> 
> Might be.
> 
> 	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:16                           ` Wang, Zhihong
  2016-10-10  4:24                             ` Michael S. Tsirkin
@ 2016-10-10  4:39                             ` Michael S. Tsirkin
  2016-10-11  6:57                               ` Yuanhan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-10-10  4:39 UTC (permalink / raw)
  To: Wang, Zhihong
  Cc: Yuanhan Liu, Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 04:16:19AM +0000, Wang, Zhihong wrote:
> 
> 
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> > Sent: Monday, October 10, 2016 11:59 AM
> > To: Michael S. Tsirkin <mst@redhat.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> > devel@nongnu.org; Wang, Zhihong <zhihong.wang@intel.com>
> > Subject: Re: [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
> > 
> > On Mon, Oct 10, 2016 at 06:46:44AM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Oct 10, 2016 at 11:37:44AM +0800, Yuanhan Liu wrote:
> > > > On Thu, Sep 29, 2016 at 11:21:48PM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Sep 29, 2016 at 10:05:22PM +0200, Maxime Coquelin wrote:
> > > > > >
> > > > > >
> > > > > > On 09/29/2016 07:57 PM, Michael S. Tsirkin wrote:
> > > > > Yes but two points.
> > > > >
> > > > > 1. why is this memset expensive?
> > > >
> > > > I don't have the exact answer, but just some rough thoughts:
> > > >
> > > > It's an external clib function: there is a call stack and the
> > > > IP register will bounch back and forth.
> > >
> > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > 
> > Good to know!
> > 
> > > > overkill to use that for resetting 14 bytes structure.
> > > >
> > > > Some trick like
> > > >     *(struct virtio_net_hdr *)hdr = {0, };
> > > >
> > > > Or even
> > > >     hdr->xxx = 0;
> > > >     hdr->yyy = 0;
> > > >
> > > > should behaviour better.
> > > >
> > > > There was an example: the vhost enqueue optmization patchset from
> > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > on my Ivybridge server: it has no such issue on his server though.
> > > >
> > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > >
> > > > 	--yliu
> > >
> > > I'd say that's weird. what's your config? any chance you
> > > are using an old compiler?
> > 
> > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > he said the memset is not well optimized for Ivybridge server.
> 
> The dst is remote in that case. It's fine on Haswell but has complication
> in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> 
> I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.


So try something like this then:

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h
index dd7693f..7a3f88e 100644
--- a/drivers/net/virtio/virtio_pci.h
+++ b/drivers/net/virtio/virtio_pci.h
@@ -292,6 +292,16 @@ vtpci_with_feature(struct virtio_hw *hw, uint64_t bit)
 	return (hw->guest_features & (1ULL << bit)) != 0;
 }
 
+static inline int
+vtnet_hdr_size(struct virtio_hw *hw)
+{
+	if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) ||
+	    vtpci_with_feature(hw, VIRTIO_F_VERSION_1))
+		return sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		return sizeof(struct virtio_net_hdr);
+}
+
 /*
  * Function declaration from virtio_pci.c
  */
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index a27208e..21a45e1 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -216,7 +216,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 	struct vring_desc *start_dp;
 	uint16_t seg_num = cookie->nb_segs;
 	uint16_t head_idx, idx;
-	uint16_t head_size = vq->hw->vtnet_hdr_size;
+	uint16_t head_size = vtnet_hdr_size(vq->hw);
 	unsigned long offs;
 
 	head_idx = vq->vq_desc_head_idx;

Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
for performance. Move fields used on data path into vq
and use from there to avoid indirections?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:22                               ` Yuanhan Liu
  2016-10-10  4:25                                 ` Michael S. Tsirkin
@ 2016-10-10 12:40                                 ` Maxime Coquelin
  2016-10-10 14:42                                   ` Yuanhan Liu
  1 sibling, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-10-10 12:40 UTC (permalink / raw)
  To: Yuanhan Liu, Michael S. Tsirkin; +Cc: Stephen Hemminger, dev, qemu-devel



On 10/10/2016 06:22 AM, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 07:17:06AM +0300, Michael S. Tsirkin wrote:
>> On Mon, Oct 10, 2016 at 12:05:31PM +0800, Yuanhan Liu wrote:
>>> On Fri, Sep 30, 2016 at 10:16:43PM +0300, Michael S. Tsirkin wrote:
>>>>>> And the same is done is done in DPDK:
>>>>>>
>>>>>> static inline int __attribute__((always_inline))
>>>>>> copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
>>>>>>           uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
>>>>>>           struct rte_mempool *mbuf_pool)
>>>>>> {
>>>>>> ...
>>>>>>     /*
>>>>>>      * A virtio driver normally uses at least 2 desc buffers
>>>>>>      * for Tx: the first for storing the header, and others
>>>>>>      * for storing the data.
>>>>>>      */
>>>>>>     if (likely((desc->len == dev->vhost_hlen) &&
>>>>>>            (desc->flags & VRING_DESC_F_NEXT) != 0)) {
>>>>>>         desc = &descs[desc->next];
>>>>>>         if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
>>>>>>             return -1;
>>>>>>
>>>>>>         desc_addr = gpa_to_vva(dev, desc->addr);
>>>>>>         if (unlikely(!desc_addr))
>>>>>>             return -1;
>>>>>>
>>>>>>         rte_prefetch0((void *)(uintptr_t)desc_addr);
>>>>>>
>>>>>>         desc_offset = 0;
>>>>>>         desc_avail  = desc->len;
>>>>>>         nr_desc    += 1;
>>>>>>
>>>>>>         PRINT_PACKET(dev, (uintptr_t)desc_addr, desc->len, 0);
>>>>>>     } else {
>>>>>>         desc_avail  = desc->len - dev->vhost_hlen;
>>>>>>         desc_offset = dev->vhost_hlen;
>>>>>>     }
>>>>>
>>>>> Actually, the header is parsed in DPDK vhost implementation.
>>>>> But as Virtio PMD provides a zero'ed header, we could just parse
>>>>> the header only if VIRTIO_NET_F_NO_TX_HEADER is not negotiated.
>>>>
>>>> host can always skip the header parse if it wants to.
>>>> It didn't seem worth it to add branches there but
>>>> if I'm wrong, by all means code it up.
>>>
>>> It's added by following commit, which yields about 10% performance
>>> boosts for PVP case (with 64B packet size).
>>>
>>> At that time, a packet always use 2 descs. Since indirect desc is
>>> enabled (by default) now, the assumption is not true then. What's
>>> worse, it might even slow things a bit down. That should also be
>>> part of the reason why performance is slightly worse than before.
>>>
>>> 	--yliu
>>
>> I'm not sure I get what you are saying
>>
>>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
>>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>> Date:   Mon May 2 17:46:17 2016 -0700
>>>
>>>     vhost: optimize dequeue for small packets
>>>
>>>     A virtio driver normally uses at least 2 desc buffers for Tx: the
>>>     first for storing the header, and the others for storing the data.
>>>
>>>     Therefore, we could fetch the first data desc buf before the main
>>>     loop, and do the copy first before the check of "are we done yet?".
>>>     This could save one check for small packets that just have one data
>>>     desc buffer and need one mbuf to store it.
>>>
>>>     Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>     Acked-by: Huawei Xie <huawei.xie@intel.com>
>>>     Tested-by: Rich Lane <rich.lane@bigswitch.com>
>>
>> This fast-paths the 2-descriptors format but it's not active
>> for indirect descriptors. Is this what you mean?
>
> Yes. It's also not active when ANY_LAYOUT is actually turned on.
>> Should be a simple matter to apply this optimization for indirect.
>
> Might be.

If I understand the code correctly, indirect descs also benefit from 
this optimization, or am I missing something?

Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10 12:40                                 ` Maxime Coquelin
@ 2016-10-10 14:42                                   ` Yuanhan Liu
  2016-10-10 14:54                                     ` Maxime Coquelin
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-10 14:42 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>enabled (by default) now, the assumption is not true then. What's
> >>>worse, it might even slow things a bit down. That should also be
> >>>part of the reason why performance is slightly worse than before.
> >>>
> >>>	--yliu
> >>
> >>I'm not sure I get what you are saying
> >>
> >>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>
> >>>    vhost: optimize dequeue for small packets
> >>>
> >>>    A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>    first for storing the header, and the others for storing the data.
> >>>
> >>>    Therefore, we could fetch the first data desc buf before the main
> >>>    loop, and do the copy first before the check of "are we done yet?".
> >>>    This could save one check for small packets that just have one data
> >>>    desc buffer and need one mbuf to store it.
> >>>
> >>>    Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>    Acked-by: Huawei Xie <huawei.xie@intel.com>
> >>>    Tested-by: Rich Lane <rich.lane@bigswitch.com>
> >>
> >>This fast-paths the 2-descriptors format but it's not active
> >>for indirect descriptors. Is this what you mean?
> >
> >Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>Should be a simple matter to apply this optimization for indirect.
> >
> >Might be.
> 
> If I understand the code correctly, indirect descs also benefit from this
> optimization, or am I missing something?

Aha..., you are right!

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10 14:42                                   ` Yuanhan Liu
@ 2016-10-10 14:54                                     ` Maxime Coquelin
  2016-10-11  6:04                                       ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-10-10 14:54 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel



On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
>>>>> At that time, a packet always use 2 descs. Since indirect desc is
>>>>> enabled (by default) now, the assumption is not true then. What's
>>>>> worse, it might even slow things a bit down. That should also be
>>>>> part of the reason why performance is slightly worse than before.
>>>>>
>>>>> 	--yliu
>>>>
>>>> I'm not sure I get what you are saying
>>>>
>>>>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
>>>>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>> Date:   Mon May 2 17:46:17 2016 -0700
>>>>>
>>>>>    vhost: optimize dequeue for small packets
>>>>>
>>>>>    A virtio driver normally uses at least 2 desc buffers for Tx: the
>>>>>    first for storing the header, and the others for storing the data.
>>>>>
>>>>>    Therefore, we could fetch the first data desc buf before the main
>>>>>    loop, and do the copy first before the check of "are we done yet?".
>>>>>    This could save one check for small packets that just have one data
>>>>>    desc buffer and need one mbuf to store it.
>>>>>
>>>>>    Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>>    Acked-by: Huawei Xie <huawei.xie@intel.com>
>>>>>    Tested-by: Rich Lane <rich.lane@bigswitch.com>
>>>>
>>>> This fast-paths the 2-descriptors format but it's not active
>>>> for indirect descriptors. Is this what you mean?
>>>
>>> Yes. It's also not active when ANY_LAYOUT is actually turned on.
>>>> Should be a simple matter to apply this optimization for indirect.
>>>
>>> Might be.
>>
>> If I understand the code correctly, indirect descs also benefit from this
>> optimization, or am I missing something?
>
> Aha..., you are right!

The interesting thing is that the patch I send on Thursday that removes
header access when no offload has been negotiated[0] seems to reduce
almost to zero the performance seen with indirect descriptors enabled.
I see this with 64 bytes packets using testpmd on both ends.

When I did the patch, I would have expected the same gain with both
modes, whereas I measured +1% for direct and +4% for indirect.

Maxime
[0]: http://dpdk.org/dev/patchwork/patch/16423/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10 14:54                                     ` Maxime Coquelin
@ 2016-10-11  6:04                                       ` Yuanhan Liu
  2016-10-11  6:39                                         ` Maxime Coquelin
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-11  6:04 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote:
> 
> 
> On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> >On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>>>enabled (by default) now, the assumption is not true then. What's
> >>>>>worse, it might even slow things a bit down. That should also be
> >>>>>part of the reason why performance is slightly worse than before.
> >>>>>
> >>>>>	--yliu
> >>>>
> >>>>I'm not sure I get what you are saying
> >>>>
> >>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>>>Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>>>
> >>>>>   vhost: optimize dequeue for small packets
> >>>>>
> >>>>>   A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>>>   first for storing the header, and the others for storing the data.
> >>>>>
> >>>>>   Therefore, we could fetch the first data desc buf before the main
> >>>>>   loop, and do the copy first before the check of "are we done yet?".
> >>>>>   This could save one check for small packets that just have one data
> >>>>>   desc buffer and need one mbuf to store it.
> >>>>>
> >>>>>   Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>>>   Acked-by: Huawei Xie <huawei.xie@intel.com>
> >>>>>   Tested-by: Rich Lane <rich.lane@bigswitch.com>
> >>>>
> >>>>This fast-paths the 2-descriptors format but it's not active
> >>>>for indirect descriptors. Is this what you mean?
> >>>
> >>>Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>>>Should be a simple matter to apply this optimization for indirect.
> >>>
> >>>Might be.
> >>
> >>If I understand the code correctly, indirect descs also benefit from this
> >>optimization, or am I missing something?
> >
> >Aha..., you are right!
> 
> The interesting thing is that the patch I send on Thursday that removes
> header access when no offload has been negotiated[0] seems to reduce
> almost to zero the performance seen with indirect descriptors enabled.

Didn't follow that.

> I see this with 64 bytes packets using testpmd on both ends.
> 
> When I did the patch, I would have expected the same gain with both
> modes, whereas I measured +1% for direct and +4% for indirect.

IIRC, I did a test before (remove those offload code piece), and the
performance was basically the same before and after that. Well, there
might be some small difference, say 1% as you said. But the result has
never been steady.

Anyway, I think your patch is good to have: I just didn't see v2.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-11  6:04                                       ` Yuanhan Liu
@ 2016-10-11  6:39                                         ` Maxime Coquelin
  2016-10-11  6:49                                           ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2016-10-11  6:39 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel



On 10/11/2016 08:04 AM, Yuanhan Liu wrote:
> On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
>>> On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
>>>>>>> At that time, a packet always use 2 descs. Since indirect desc is
>>>>>>> enabled (by default) now, the assumption is not true then. What's
>>>>>>> worse, it might even slow things a bit down. That should also be
>>>>>>> part of the reason why performance is slightly worse than before.
>>>>>>>
>>>>>>> 	--yliu
>>>>>>
>>>>>> I'm not sure I get what you are saying
>>>>>>
>>>>>>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
>>>>>>> Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>>>> Date:   Mon May 2 17:46:17 2016 -0700
>>>>>>>
>>>>>>>   vhost: optimize dequeue for small packets
>>>>>>>
>>>>>>>   A virtio driver normally uses at least 2 desc buffers for Tx: the
>>>>>>>   first for storing the header, and the others for storing the data.
>>>>>>>
>>>>>>>   Therefore, we could fetch the first data desc buf before the main
>>>>>>>   loop, and do the copy first before the check of "are we done yet?".
>>>>>>>   This could save one check for small packets that just have one data
>>>>>>>   desc buffer and need one mbuf to store it.
>>>>>>>
>>>>>>>   Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>>>>>>>   Acked-by: Huawei Xie <huawei.xie@intel.com>
>>>>>>>   Tested-by: Rich Lane <rich.lane@bigswitch.com>
>>>>>>
>>>>>> This fast-paths the 2-descriptors format but it's not active
>>>>>> for indirect descriptors. Is this what you mean?
>>>>>
>>>>> Yes. It's also not active when ANY_LAYOUT is actually turned on.
>>>>>> Should be a simple matter to apply this optimization for indirect.
>>>>>
>>>>> Might be.
>>>>
>>>> If I understand the code correctly, indirect descs also benefit from this
>>>> optimization, or am I missing something?
>>>
>>> Aha..., you are right!
>>
>> The interesting thing is that the patch I send on Thursday that removes
>> header access when no offload has been negotiated[0] seems to reduce
>> almost to zero the performance seen with indirect descriptors enabled.
>
> Didn't follow that.
>
>> I see this with 64 bytes packets using testpmd on both ends.
>>
>> When I did the patch, I would have expected the same gain with both
>> modes, whereas I measured +1% for direct and +4% for indirect.
>
> IIRC, I did a test before (remove those offload code piece), and the
> performance was basically the same before and after that. Well, there
> might be some small difference, say 1% as you said. But the result has
> never been steady.
>
> Anyway, I think your patch is good to have: I just didn't see v2.

I waited to gather some comments/feedback before sending the v2.
I'll send it today or tomorrow.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-11  6:39                                         ` Maxime Coquelin
@ 2016-10-11  6:49                                           ` Yuanhan Liu
  0 siblings, 0 replies; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-11  6:49 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: Michael S. Tsirkin, Stephen Hemminger, dev, qemu-devel

On Tue, Oct 11, 2016 at 08:39:54AM +0200, Maxime Coquelin wrote:
> 
> 
> On 10/11/2016 08:04 AM, Yuanhan Liu wrote:
> >On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote:
> >>
> >>
> >>On 10/10/2016 04:42 PM, Yuanhan Liu wrote:
> >>>On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote:
> >>>>>>>At that time, a packet always use 2 descs. Since indirect desc is
> >>>>>>>enabled (by default) now, the assumption is not true then. What's
> >>>>>>>worse, it might even slow things a bit down. That should also be
> >>>>>>>part of the reason why performance is slightly worse than before.
> >>>>>>>
> >>>>>>>	--yliu
> >>>>>>
> >>>>>>I'm not sure I get what you are saying
> >>>>>>
> >>>>>>>commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98
> >>>>>>>Author: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>>>>>Date:   Mon May 2 17:46:17 2016 -0700
> >>>>>>>
> >>>>>>>  vhost: optimize dequeue for small packets
> >>>>>>>
> >>>>>>>  A virtio driver normally uses at least 2 desc buffers for Tx: the
> >>>>>>>  first for storing the header, and the others for storing the data.
> >>>>>>>
> >>>>>>>  Therefore, we could fetch the first data desc buf before the main
> >>>>>>>  loop, and do the copy first before the check of "are we done yet?".
> >>>>>>>  This could save one check for small packets that just have one data
> >>>>>>>  desc buffer and need one mbuf to store it.
> >>>>>>>
> >>>>>>>  Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >>>>>>>  Acked-by: Huawei Xie <huawei.xie@intel.com>
> >>>>>>>  Tested-by: Rich Lane <rich.lane@bigswitch.com>
> >>>>>>
> >>>>>>This fast-paths the 2-descriptors format but it's not active
> >>>>>>for indirect descriptors. Is this what you mean?
> >>>>>
> >>>>>Yes. It's also not active when ANY_LAYOUT is actually turned on.
> >>>>>>Should be a simple matter to apply this optimization for indirect.
> >>>>>
> >>>>>Might be.
> >>>>
> >>>>If I understand the code correctly, indirect descs also benefit from this
> >>>>optimization, or am I missing something?
> >>>
> >>>Aha..., you are right!
> >>
> >>The interesting thing is that the patch I send on Thursday that removes
> >>header access when no offload has been negotiated[0] seems to reduce
> >>almost to zero the performance seen with indirect descriptors enabled.
> >
> >Didn't follow that.
> >
> >>I see this with 64 bytes packets using testpmd on both ends.
> >>
> >>When I did the patch, I would have expected the same gain with both
> >>modes, whereas I measured +1% for direct and +4% for indirect.
> >
> >IIRC, I did a test before (remove those offload code piece), and the
> >performance was basically the same before and after that. Well, there
> >might be some small difference, say 1% as you said. But the result has
> >never been steady.
> >
> >Anyway, I think your patch is good to have: I just didn't see v2.
> 
> I waited to gather some comments/feedback before sending the v2.
> I'll send it today or tomorrow.

Interesting, I saw a deadlock then: I haven't looked at the code
carefully once you said there is a v2, thus I'm waiting for it.
However, you are waitting for my review. :)

Anyway, I will take time to look at it shortly.

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-10  4:39                             ` Michael S. Tsirkin
@ 2016-10-11  6:57                               ` Yuanhan Liu
  2016-10-12  3:21                                 ` Yuanhan Liu
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-11  6:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wang, Zhihong, Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Mon, Oct 10, 2016 at 07:39:59AM +0300, Michael S. Tsirkin wrote:
> > > > > > 1. why is this memset expensive?
> > > > >
> > > > > I don't have the exact answer, but just some rough thoughts:
> > > > >
> > > > > It's an external clib function: there is a call stack and the
> > > > > IP register will bounch back and forth.
> > > >
> > > > for memset 0?  gcc 5.3.1 on fedora happily inlines it.
> > > 
> > > Good to know!
> > > 
> > > > > overkill to use that for resetting 14 bytes structure.
> > > > >
> > > > > Some trick like
> > > > >     *(struct virtio_net_hdr *)hdr = {0, };
> > > > >
> > > > > Or even
> > > > >     hdr->xxx = 0;
> > > > >     hdr->yyy = 0;
> > > > >
> > > > > should behaviour better.
> > > > >
> > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)
> > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > >
> > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > >
> > > > > 	--yliu
> > > >
> > > > I'd say that's weird. what's your config? any chance you
> > > > are using an old compiler?
> > > 
> > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > he said the memset is not well optimized for Ivybridge server.
> > 
> > The dst is remote in that case. It's fine on Haswell but has complication
> > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> > 
> > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> 
> 
> So try something like this then:

Yes, I saw memset is inlined when this diff is applied.

So, mind to send a formal patch? You might want to try build at least:
it doesn't build.

 
> Generally pointer chasing in vq->hw->vtnet_hdr_size can't be good
> for performance. Move fields used on data path into vq
> and use from there to avoid indirections?

Good suggestion!

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
  2016-10-11  6:57                               ` Yuanhan Liu
@ 2016-10-12  3:21                                 ` Yuanhan Liu
       [not found]                                   ` <F5DF4F0E3AFEF648ADC1C3C33AD4DBF16C2409EB@SHSMSX101.ccr.corp.intel.com>
  0 siblings, 1 reply; 45+ messages in thread
From: Yuanhan Liu @ 2016-10-12  3:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Thomas Monjalon
  Cc: Wang, Zhihong, Maxime Coquelin, Stephen Hemminger, dev, qemu-devel

On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote:
> > > > > > There was an example: the vhost enqueue optmization patchset from
> > > > > > Zhihong [0] uses memset, and it introduces more than 15% drop (IIRC)

Though it doesn't matter now, but I have verified it yesterday (with and
wihtout memset), the drop could be up to 30+%.

This is to let you know that it could behaviour badly if memset is not
inlined.

> > > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > > >
> > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > > >
> > > > > > 	--yliu
> > > > >
> > > > > I'd say that's weird. what's your config? any chance you
> > > > > are using an old compiler?
> > > > 
> > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more. IIRC,
> > > > he said the memset is not well optimized for Ivybridge server.
> > > 
> > > The dst is remote in that case. It's fine on Haswell but has complication
> > > in Ivy Bridge which (wasn't supposed to but) causes serious frontend issue.
> > > 
> > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> > 
> > 
> > So try something like this then:
> 
> Yes, I saw memset is inlined when this diff is applied.

I have another concern though: It's a trick could let gcc do the inline,
I am not quite sure whether that's ture with other compilers (i.e. clang,
icc, or even, older gcc).

For this case, I think I still prefer some trick like
    *(struct ..*) = {0, }

Or even, we may could introduce rte_memset(). IIRC, that has been
proposed somehow before?

	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout feature
       [not found]                                   ` <F5DF4F0E3AFEF648ADC1C3C33AD4DBF16C2409EB@SHSMSX101.ccr.corp.intel.com>
@ 2016-10-13  2:52                                     ` Yang, Zhiyong
  0 siblings, 0 replies; 45+ messages in thread
From: Yang, Zhiyong @ 2016-10-13  2:52 UTC (permalink / raw)
  To: Liu, Yuanhan
  Cc: mst, thomas.monjalon, Wang, Zhihong, stephen, dev, qemu-devel,
	maxime.coquelin

Hi, Yuanhan:

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
> Sent: Wednesday, October 12, 2016 11:22 AM
> To: Michael S. Tsirkin <mst@redhat.com>; Thomas Monjalon
> <thomas.monjalon@6wind.com>
> Cc: Wang, Zhihong <zhihong.wang@intel.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Stephen Hemminger
> <stephen@networkplumber.org>; dev@dpdk.org; qemu-
> devel@nongnu.org
> Subject: Re: [dpdk-dev] [Qemu-devel] [PATCH 1/2] vhost: enable any layout
> feature
> 
> On Tue, Oct 11, 2016 at 02:57:49PM +0800, Yuanhan Liu wrote:
> > > > > > > There was an example: the vhost enqueue optmization patchset
> > > > > > > from Zhihong [0] uses memset, and it introduces more than
> > > > > > > 15% drop (IIRC)
> 
> Though it doesn't matter now, but I have verified it yesterday (with and
> wihtout memset), the drop could be up to 30+%.
> 
> This is to let you know that it could behaviour badly if memset is not inlined.
> 
> > > > > > > on my Ivybridge server: it has no such issue on his server though.
> > > > > > >
> > > > > > > [0]: http://dpdk.org/ml/archives/dev/2016-August/045272.html
> > > > > > >
> > > > > > > 	--yliu
> > > > > >
> > > > > > I'd say that's weird. what's your config? any chance you are
> > > > > > using an old compiler?
> > > > >
> > > > > Not really, it's gcc 5.3.1. Maybe Zhihong could explain more.
> > > > > IIRC, he said the memset is not well optimized for Ivybridge server.
> > > >
> > > > The dst is remote in that case. It's fine on Haswell but has
> > > > complication in Ivy Bridge which (wasn't supposed to but) causes
> serious frontend issue.
> > > >
> > > > I don't think gcc inlined it there. I'm using fc24 gcc 6.1.1.
> > >
> > >
> > > So try something like this then:
> >
> > Yes, I saw memset is inlined when this diff is applied.
> 
> I have another concern though: It's a trick could let gcc do the inline, I am not
> quite sure whether that's ture with other compilers (i.e. clang, icc, or even,
> older gcc).
> 
> For this case, I think I still prefer some trick like
>     *(struct ..*) = {0, }
> 
> Or even, we may could introduce rte_memset(). IIRC, that has been
> proposed somehow before?
> 

I'm trying to introduce rte_memset to have a prototype  It have
Gotten some performance enhancement For small size, I'm optimize it further.

--Zhiyong

> 	--yliu

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature
  2016-09-26  6:40 [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature Yuanhan Liu
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 1/2] vhost: enable " Yuanhan Liu
  2016-09-26  6:40 ` [dpdk-dev] [PATCH 2/2] net/virtio: " Yuanhan Liu
@ 2016-11-10 15:18 ` Michael S. Tsirkin
  2 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-11-10 15:18 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Stephen Hemminger, Maxime Coquelin

On Mon, Sep 26, 2016 at 02:40:54PM +0800, Yuanhan Liu wrote:
> The feature is actually supported both in virtio PMD and vhost lib.
> We just haven't enabled it yet. This patchset simply enables it.

Any input on handling versioning? Do people prefer to handle it
completely at the backend, or through libvirt?

> ---
> Yuanhan Liu (2):
>   vhost: enable any layout feature
>   net/virtio: enable any layout feature
> 
>  drivers/net/virtio/virtio_ethdev.h | 1 +
>  lib/librte_vhost/vhost.c           | 1 +
>  lib/librte_vhost/vhost.h           | 3 +++
>  3 files changed, 5 insertions(+)
> 
> -- 
> 1.9.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-11-10 15:18 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26  6:40 [dpdk-dev] [PATCH 0/2] enables vhost/virtio any layout feature Yuanhan Liu
2016-09-26  6:40 ` [dpdk-dev] [PATCH 1/2] vhost: enable " Yuanhan Liu
2016-09-26 18:01   ` Stephen Hemminger
2016-09-26 19:24     ` Michael S. Tsirkin
2016-09-27  3:11       ` Yuanhan Liu
2016-09-27 19:48         ` Stephen Hemminger
2016-09-27 19:56         ` Michael S. Tsirkin
2016-09-28  2:28           ` Yuanhan Liu
2016-09-29 15:30             ` [dpdk-dev] [Qemu-devel] " Maxime Coquelin
2016-09-29 17:57               ` Michael S. Tsirkin
2016-09-29 20:05                 ` Maxime Coquelin
2016-09-29 20:21                   ` Michael S. Tsirkin
2016-09-29 21:23                     ` Maxime Coquelin
2016-09-30 12:05                       ` Maxime Coquelin
2016-09-30 19:16                         ` Michael S. Tsirkin
2016-10-10  4:05                           ` Yuanhan Liu
2016-10-10  4:17                             ` Michael S. Tsirkin
2016-10-10  4:22                               ` Yuanhan Liu
2016-10-10  4:25                                 ` Michael S. Tsirkin
2016-10-10 12:40                                 ` Maxime Coquelin
2016-10-10 14:42                                   ` Yuanhan Liu
2016-10-10 14:54                                     ` Maxime Coquelin
2016-10-11  6:04                                       ` Yuanhan Liu
2016-10-11  6:39                                         ` Maxime Coquelin
2016-10-11  6:49                                           ` Yuanhan Liu
2016-10-03 14:20                     ` Maxime Coquelin
2016-10-10  3:37                     ` Yuanhan Liu
2016-10-10  3:46                       ` Michael S. Tsirkin
2016-10-10  3:59                         ` Yuanhan Liu
2016-10-10  4:16                           ` Wang, Zhihong
2016-10-10  4:24                             ` Michael S. Tsirkin
2016-10-10  4:39                             ` Michael S. Tsirkin
2016-10-11  6:57                               ` Yuanhan Liu
2016-10-12  3:21                                 ` Yuanhan Liu
     [not found]                                   ` <F5DF4F0E3AFEF648ADC1C3C33AD4DBF16C2409EB@SHSMSX101.ccr.corp.intel.com>
2016-10-13  2:52                                     ` Yang, Zhiyong
2016-10-10  3:50                   ` Yuanhan Liu
2016-10-09 23:20             ` [dpdk-dev] " Michael S. Tsirkin
2016-10-10  3:03               ` Yuanhan Liu
2016-10-10  3:04                 ` Michael S. Tsirkin
2016-10-10  3:10                   ` Yuanhan Liu
2016-09-26  6:40 ` [dpdk-dev] [PATCH 2/2] net/virtio: " Yuanhan Liu
2016-09-26 18:04   ` Stephen Hemminger
2016-09-29 18:00   ` Michael S. Tsirkin
2016-09-29 18:01     ` Michael S. Tsirkin
2016-11-10 15:18 ` [dpdk-dev] [PATCH 0/2] enables vhost/virtio " Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).