patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling
       [not found] <1484108832-19907-1-git-send-email-yuanhan.liu@linux.intel.com>
@ 2017-01-11  4:27 ` Yuanhan Liu
  2017-01-11  7:59   ` Maxime Coquelin
  2017-01-11  4:27 ` [dpdk-stable] [PATCH 2/2] net/virtio: optimize header reset on any layout Yuanhan Liu
  1 sibling, 1 reply; 6+ messages in thread
From: Yuanhan Liu @ 2017-01-11  4:27 UTC (permalink / raw)
  To: dev
  Cc: Tan Jianfeng, Wang Zhihong, Yuanhan Liu, Olivier Matz,
	Maxime Coquelin, Michael S. Tsirkin, stable

TSO is now enabled, but it's not actually being used by default in a
simple L2 forward mode. In such case, we have to zero the virtio net
headers, to inform the vhost backend that no offload is being used:

    hdr->csum_start = 0;
    hdr->csum_offset = 0;
    hdr->flags = 0;

    hdr->gso_type = 0;
    hdr->gso_size = 0;
    hdr->hdr_len = 0;

Such writes could be very costly; it introduces severe cache issues:
The above operations introduce cache write for each packet, which
stalls the read operation from the vhost backend.

The fact that virtio net header is initiated to zero in PMD driver
init stage means that these costly writes are unnecessary and could
be avoided:

    if (hdr->csum_start != 0)
        hdr->csum_start = 0;

And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
performance drop introduced by TSO enabling is recovered: it could
be up to 20% in micro benchmarking.

Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
Fixes: 696573046e9e ("net/virtio: support TSO")

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 1e5a6b9..8ec2f1a 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -258,6 +258,12 @@
 		vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
 }
 
+/* avoid write operation when necessary, to lessen cache issues */
+#define ASSIGN_UNLESS_EQUAL(var, val) do {	\
+	if ((var) != (val))			\
+		(var) = (val);			\
+} while (0)
+
 static inline void
 virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
 		       uint16_t needed, int use_indirect, int can_push)
@@ -337,9 +343,9 @@
 			break;
 
 		default:
-			hdr->csum_start = 0;
-			hdr->csum_offset = 0;
-			hdr->flags = 0;
+			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
 			break;
 		}
 
@@ -355,9 +361,9 @@
 				cookie->l3_len +
 				cookie->l4_len;
 		} else {
-			hdr->gso_type = 0;
-			hdr->gso_size = 0;
-			hdr->hdr_len = 0;
+			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
 		}
 	}
 
-- 
1.9.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [dpdk-stable] [PATCH 2/2] net/virtio: optimize header reset on any layout
       [not found] <1484108832-19907-1-git-send-email-yuanhan.liu@linux.intel.com>
  2017-01-11  4:27 ` [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling Yuanhan Liu
@ 2017-01-11  4:27 ` Yuanhan Liu
  2017-01-11  8:01   ` Maxime Coquelin
  1 sibling, 1 reply; 6+ messages in thread
From: Yuanhan Liu @ 2017-01-11  4:27 UTC (permalink / raw)
  To: dev
  Cc: Tan Jianfeng, Wang Zhihong, Yuanhan Liu, Maxime Coquelin,
	Michael S. Tsirkin, stable

When any layout is used, the header is stored in the head room of mbuf.
mbuf is allocated and filled by user, means there is no gurateen the
header is all zero for non TSO case. Therefore, we have to do the reset
by ourself:

    memest(hdr, 0, head_size);

The memset has two impacts on performance:

- memset could not be inlined, which is a bit costly.
- more importantly, it touches the mbuf, which could introduce severe
  cache issues as described by former patch.

Similiary, we could do the same trick: reset just when necessary, when
the corresponding field is already 0, which is likely true for a simple
l2 forward case. It could boost the performance up to 20+% in micro
benchmarking.

Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: stable@dpdk.org
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 drivers/net/virtio/virtio_rxtx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 8ec2f1a..5ca3a88 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -292,8 +292,14 @@
 		hdr = (struct virtio_net_hdr *)
 			rte_pktmbuf_prepend(cookie, head_size);
 		/* if offload disabled, it is not zeroed below, do it now */
-		if (offload == 0)
-			memset(hdr, 0, head_size);
+		if (offload == 0) {
+			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
+			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
+		}
 	} else if (use_indirect) {
 		/* setup tx ring slot to point to indirect
 		 * descriptor list stored in reserved region.
-- 
1.9.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling
  2017-01-11  4:27 ` [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling Yuanhan Liu
@ 2017-01-11  7:59   ` Maxime Coquelin
  2017-01-11  8:08     ` Yuanhan Liu
  2017-01-11  8:22     ` Olivier MATZ
  0 siblings, 2 replies; 6+ messages in thread
From: Maxime Coquelin @ 2017-01-11  7:59 UTC (permalink / raw)
  To: Yuanhan Liu, dev
  Cc: Tan Jianfeng, Wang Zhihong, Olivier Matz, Michael S. Tsirkin, stable



On 01/11/2017 05:27 AM, Yuanhan Liu wrote:
> TSO is now enabled, but it's not actually being used by default in a
> simple L2 forward mode. In such case, we have to zero the virtio net
> headers, to inform the vhost backend that no offload is being used:
>
>     hdr->csum_start = 0;
>     hdr->csum_offset = 0;
>     hdr->flags = 0;
>
>     hdr->gso_type = 0;
>     hdr->gso_size = 0;
>     hdr->hdr_len = 0;
>
> Such writes could be very costly; it introduces severe cache issues:
> The above operations introduce cache write for each packet, which
> stalls the read operation from the vhost backend.
>
> The fact that virtio net header is initiated to zero in PMD driver
> init stage means that these costly writes are unnecessary and could
> be avoided:
>
>     if (hdr->csum_start != 0)
>         hdr->csum_start = 0;
>
> And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
> performance drop introduced by TSO enabling is recovered: it could
> be up to 20% in micro benchmarking.
Very nice!

>
> Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> Fixes: 696573046e9e ("net/virtio: support TSO")
>
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: stable@dpdk.org
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 1e5a6b9..8ec2f1a 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -258,6 +258,12 @@
>  		vtpci_with_feature(hw, VIRTIO_NET_F_HOST_TSO6);
>  }
>
> +/* avoid write operation when necessary, to lessen cache issues */
> +#define ASSIGN_UNLESS_EQUAL(var, val) do {	\
> +	if ((var) != (val))			\
> +		(var) = (val);			\
> +} while (0)
As it is intended to go in -stable, I think this is fine to have it
only in the driver, but for v17.02, maybe we should have another patch 
on top that declares it somewhere so that other libs and drivers can
make use of it?

> +
>  static inline void
>  virtqueue_enqueue_xmit(struct virtnet_tx *txvq, struct rte_mbuf *cookie,
>  		       uint16_t needed, int use_indirect, int can_push)
> @@ -337,9 +343,9 @@
>  			break;
>
>  		default:
> -			hdr->csum_start = 0;
> -			hdr->csum_offset = 0;
> -			hdr->flags = 0;
> +			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
>  			break;
>  		}
>
> @@ -355,9 +361,9 @@
>  				cookie->l3_len +
>  				cookie->l4_len;
>  		} else {
> -			hdr->gso_type = 0;
> -			hdr->gso_size = 0;
> -			hdr->hdr_len = 0;
> +			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
>  		}
>  	}
>
>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks!
Maxime

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH 2/2] net/virtio: optimize header reset on any layout
  2017-01-11  4:27 ` [dpdk-stable] [PATCH 2/2] net/virtio: optimize header reset on any layout Yuanhan Liu
@ 2017-01-11  8:01   ` Maxime Coquelin
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Coquelin @ 2017-01-11  8:01 UTC (permalink / raw)
  To: Yuanhan Liu, dev; +Cc: Tan Jianfeng, Wang Zhihong, Michael S. Tsirkin, stable



On 01/11/2017 05:27 AM, Yuanhan Liu wrote:
> When any layout is used, the header is stored in the head room of mbuf.
> mbuf is allocated and filled by user, means there is no gurateen the
> header is all zero for non TSO case. Therefore, we have to do the reset
> by ourself:
>
>     memest(hdr, 0, head_size);
>
> The memset has two impacts on performance:
>
> - memset could not be inlined, which is a bit costly.
> - more importantly, it touches the mbuf, which could introduce severe
>   cache issues as described by former patch.
>
> Similiary, we could do the same trick: reset just when necessary, when
> the corresponding field is already 0, which is likely true for a simple
> l2 forward case. It could boost the performance up to 20+% in micro
> benchmarking.
>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: stable@dpdk.org
> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 8ec2f1a..5ca3a88 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -292,8 +292,14 @@
>  		hdr = (struct virtio_net_hdr *)
>  			rte_pktmbuf_prepend(cookie, head_size);
>  		/* if offload disabled, it is not zeroed below, do it now */
> -		if (offload == 0)
> -			memset(hdr, 0, head_size);
> +		if (offload == 0) {
> +			ASSIGN_UNLESS_EQUAL(hdr->csum_start, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->csum_offset, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->flags, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->gso_type, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->gso_size, 0);
> +			ASSIGN_UNLESS_EQUAL(hdr->hdr_len, 0);
> +		}
>  	} else if (use_indirect) {
>  		/* setup tx ring slot to point to indirect
>  		 * descriptor list stored in reserved region.
>

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks!
Maxime

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling
  2017-01-11  7:59   ` Maxime Coquelin
@ 2017-01-11  8:08     ` Yuanhan Liu
  2017-01-11  8:22     ` Olivier MATZ
  1 sibling, 0 replies; 6+ messages in thread
From: Yuanhan Liu @ 2017-01-11  8:08 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: dev, Tan Jianfeng, Wang Zhihong, Olivier Matz,
	Michael S. Tsirkin, stable, Thomas Monjalon

On Wed, Jan 11, 2017 at 08:59:28AM +0100, Maxime Coquelin wrote:
> >+/* avoid write operation when necessary, to lessen cache issues */
> >+#define ASSIGN_UNLESS_EQUAL(var, val) do {	\
> >+	if ((var) != (val))			\
> >+		(var) = (val);			\
> >+} while (0)
> As it is intended to go in -stable, I think this is fine to have it
> only in the driver, but for v17.02, maybe we should have another patch on
> top that declares it somewhere so that other libs and drivers can
> make use of it?

Yes, if it has any other usages :)


	--yliu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling
  2017-01-11  7:59   ` Maxime Coquelin
  2017-01-11  8:08     ` Yuanhan Liu
@ 2017-01-11  8:22     ` Olivier MATZ
  1 sibling, 0 replies; 6+ messages in thread
From: Olivier MATZ @ 2017-01-11  8:22 UTC (permalink / raw)
  To: Maxime Coquelin
  Cc: Yuanhan Liu, dev, Tan Jianfeng, Wang Zhihong, Olivier Matz,
	Michael S. Tsirkin, stable

Hi Yuanhan,

Thanks for the fix.

On Wed, 11 Jan 2017 08:59:28 +0100, Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
> On 01/11/2017 05:27 AM, Yuanhan Liu wrote:
> > TSO is now enabled, but it's not actually being used by default in a
> > simple L2 forward mode. In such case, we have to zero the virtio net
> > headers, to inform the vhost backend that no offload is being used:
> >
> >     hdr->csum_start = 0;
> >     hdr->csum_offset = 0;
> >     hdr->flags = 0;
> >
> >     hdr->gso_type = 0;
> >     hdr->gso_size = 0;
> >     hdr->hdr_len = 0;
> >
> > Such writes could be very costly; it introduces severe cache issues:
> > The above operations introduce cache write for each packet, which
> > stalls the read operation from the vhost backend.
> >
> > The fact that virtio net header is initiated to zero in PMD driver
> > init stage means that these costly writes are unnecessary and could
> > be avoided:
> >
> >     if (hdr->csum_start != 0)
> >         hdr->csum_start = 0;
> >
> > And that's what the macro ASSIGN_UNLESS_EQUAL does. With this, the
> > performance drop introduced by TSO enabling is recovered: it could
> > be up to 20% in micro benchmarking.  
> Very nice!

Yep, I'm also impressed by the result.
I would have think this is something already done by the hardware and
transparent to the software.

> 
> >
> > Fixes: 58169a9c8153 ("net/virtio: support Tx checksum offload")
> > Fixes: 696573046e9e ("net/virtio: support TSO")
> >
> > Cc: Olivier Matz <olivier.matz@6wind.com>
> > Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Cc: stable@dpdk.org
> > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>
> [...]
>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Reviewed-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-11  8:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1484108832-19907-1-git-send-email-yuanhan.liu@linux.intel.com>
2017-01-11  4:27 ` [dpdk-stable] [PATCH 1/2] net/virtio: fix performance regression due to TSO enabling Yuanhan Liu
2017-01-11  7:59   ` Maxime Coquelin
2017-01-11  8:08     ` Yuanhan Liu
2017-01-11  8:22     ` Olivier MATZ
2017-01-11  4:27 ` [dpdk-stable] [PATCH 2/2] net/virtio: optimize header reset on any layout Yuanhan Liu
2017-01-11  8:01   ` Maxime Coquelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).