DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
@ 2019-10-17  7:27 Shahaf Shuler
  2019-10-17  8:16 ` Jerin Jacob
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Shahaf Shuler @ 2019-10-17  7:27 UTC (permalink / raw)
  To: dev, Thomas Monjalon, olivier.matz
  Cc: wwasko, spotluri, Asaf Penso, Slava Ovsiienko

Some PMDs inline the mbuf data buffer directly to device. This is in
order to save the overhead of the PCI headers involved when the device
DMA read the buffer pointer. For some devices it is essential in order
to reach the pick BW.

However, there are cases where such inlining is in-efficient. For example
when the data buffer resides on other device memory (like GPU or storage
device). attempt to inline such buffer will result in high PCI overhead
for reading and copying the data from the remote device.

To support a mixed traffic pattern (some buffers from local DRAM, some
buffers from other devices) with high BW, a hint flag is introduced in
the mbuf.
Application will hint the PMD whether or not it should try to inline the
given mbuf data buffer. PMD should do best effort to act upon this
request.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
 lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 98225ec80b..5934532b7f 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -203,6 +203,15 @@ extern "C" {
 /* add new TX flags here */
 
 /**
+ * Hint to PMD to not inline the mbuf data buffer to device
+ * rather let the device use its DMA engine to fetch the data with the
+ * provided pointer.
+ *
+ * This flag is a only a hint. PMD should enforce it as best effort.
+ */
+#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
+
+/**
  * Indicate that the metadata field in the mbuf is in use.
  */
 #define PKT_TX_METADATA	(1ULL << 40)
-- 
2.12.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17  7:27 [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet Shahaf Shuler
@ 2019-10-17  8:16 ` Jerin Jacob
  2019-10-17 10:59   ` Shahaf Shuler
  2019-10-17 15:14 ` Stephen Hemminger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Jerin Jacob @ 2019-10-17  8:16 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Some PMDs inline the mbuf data buffer directly to device. This is in
> order to save the overhead of the PCI headers involved when the device
> DMA read the buffer pointer. For some devices it is essential in order
> to reach the pick BW.
>
> However, there are cases where such inlining is in-efficient. For example
> when the data buffer resides on other device memory (like GPU or storage
> device). attempt to inline such buffer will result in high PCI overhead
> for reading and copying the data from the remote device.

Some questions to understand the use case
# Is this use case where CPU, local DRAM, NW card and GPU memory
connected on the coherent bus
# Assuming the CPU needs to touch the buffer prior to Tx, In that
case, it will be useful?
# How the application knows, The data buffer is in GPU memory in order
to use this flag efficiently?
# Just an random thought, Does it help, if we create two different
mempools one from local DRAM
and one from GPU memory so that the application can work transparently.





>
> To support a mixed traffic pattern (some buffers from local DRAM, some
> buffers from other devices) with high BW, a hint flag is introduced in
> the mbuf.
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do best effort to act upon this
> request.
>
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 98225ec80b..5934532b7f 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -203,6 +203,15 @@ extern "C" {
>  /* add new TX flags here */
>
>  /**
> + * Hint to PMD to not inline the mbuf data buffer to device
> + * rather let the device use its DMA engine to fetch the data with the
> + * provided pointer.
> + *
> + * This flag is a only a hint. PMD should enforce it as best effort.
> + */
> +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> +
> +/**
>   * Indicate that the metadata field in the mbuf is in use.
>   */
>  #define PKT_TX_METADATA        (1ULL << 40)
> --
> 2.12.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17  8:16 ` Jerin Jacob
@ 2019-10-17 10:59   ` Shahaf Shuler
  2019-10-17 17:18     ` Jerin Jacob
  0 siblings, 1 reply; 13+ messages in thread
From: Shahaf Shuler @ 2019-10-17 10:59 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
> >
> > Some PMDs inline the mbuf data buffer directly to device. This is in
> > order to save the overhead of the PCI headers involved when the device
> > DMA read the buffer pointer. For some devices it is essential in order
> > to reach the pick BW.
> >
> > However, there are cases where such inlining is in-efficient. For
> > example when the data buffer resides on other device memory (like GPU
> > or storage device). attempt to inline such buffer will result in high
> > PCI overhead for reading and copying the data from the remote device.
> 
> Some questions to understand the use case 
> # Is this use case where CPU, local DRAM, NW card and GPU memory connected on the coherent bus

Yes. For example one can allocate GPU memory and map it to the GPU bar, make it accessible from the host CPU through LD/ST. 

> # Assuming the CPU needs to touch the buffer prior to Tx, In that case, it will
> be useful?

If the CPU needs to modify the data then no. it will be more efficient to copy the data to CPU and then send it.
However there are use cases where the data is DMA w/ zero copy to the GPU (for example) , GPU perform the processing on the data, and then CPU send the mbuf (w/o touching the data). 

> # How the application knows, The data buffer is in GPU memory in order to
> use this flag efficiently?

Because it made it happen. For example it attached the mbuf external buffer from the other device memory. 

> # Just an random thought, Does it help, if we create two different mempools
> one from local DRAM and one from GPU memory so that the application can
> work transparently.

But you will still need to teach the PMD which pool it can inline and which cannot. 
IMO it is more generic to have it per mbuf. Moreover, application has this info. 

> 
> 
> 
> 
> 
> >
> > To support a mixed traffic pattern (some buffers from local DRAM, some
> > buffers from other devices) with high BW, a hint flag is introduced in
> > the mbuf.
> > Application will hint the PMD whether or not it should try to inline
> > the given mbuf data buffer. PMD should do best effort to act upon this
> > request.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 98225ec80b..5934532b7f 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -203,6 +203,15 @@ extern "C" {
> >  /* add new TX flags here */
> >
> >  /**
> > + * Hint to PMD to not inline the mbuf data buffer to device
> > + * rather let the device use its DMA engine to fetch the data with
> > +the
> > + * provided pointer.
> > + *
> > + * This flag is a only a hint. PMD should enforce it as best effort.
> > + */
> > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > +
> > +/**
> >   * Indicate that the metadata field in the mbuf is in use.
> >   */
> >  #define PKT_TX_METADATA        (1ULL << 40)
> > --
> > 2.12.0
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17  7:27 [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet Shahaf Shuler
  2019-10-17  8:16 ` Jerin Jacob
@ 2019-10-17 15:14 ` Stephen Hemminger
  2019-10-22  6:29   ` Shahaf Shuler
  2019-12-11 17:01 ` [dpdk-dev] [RFC v2] mlx5/net: " Viacheslav Ovsiienko
  2020-01-14  7:57 ` [dpdk-dev] [PATCH] net/mlx5: update Tx datapath to support no inline hint Viacheslav Ovsiienko
  3 siblings, 1 reply; 13+ messages in thread
From: Stephen Hemminger @ 2019-10-17 15:14 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

On Thu, 17 Oct 2019 07:27:34 +0000
Shahaf Shuler <shahafs@mellanox.com> wrote:

> Some PMDs inline the mbuf data buffer directly to device. This is in
> order to save the overhead of the PCI headers involved when the device
> DMA read the buffer pointer. For some devices it is essential in order
> to reach the pick BW.
> 
> However, there are cases where such inlining is in-efficient. For example
> when the data buffer resides on other device memory (like GPU or storage
> device). attempt to inline such buffer will result in high PCI overhead
> for reading and copying the data from the remote device.
> 
> To support a mixed traffic pattern (some buffers from local DRAM, some
> buffers from other devices) with high BW, a hint flag is introduced in
> the mbuf.
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do best effort to act upon this
> request.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>

This kind of optimization is hard, and pushing the problem to the application
to decide seems like the wrong step. Can the driver just infer this
already because some mbuf's are external?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17 10:59   ` Shahaf Shuler
@ 2019-10-17 17:18     ` Jerin Jacob
  2019-10-22  6:26       ` Shahaf Shuler
  0 siblings, 1 reply; 13+ messages in thread
From: Jerin Jacob @ 2019-10-17 17:18 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
> >
> > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:
> > >
> > > Some PMDs inline the mbuf data buffer directly to device. This is in
> > > order to save the overhead of the PCI headers involved when the device
> > > DMA read the buffer pointer. For some devices it is essential in order
> > > to reach the pick BW.
> > >
> > > However, there are cases where such inlining is in-efficient. For
> > > example when the data buffer resides on other device memory (like GPU
> > > or storage device). attempt to inline such buffer will result in high
> > > PCI overhead for reading and copying the data from the remote device.
> >
> > Some questions to understand the use case
> > # Is this use case where CPU, local DRAM, NW card and GPU memory connected on the coherent bus
>
> Yes. For example one can allocate GPU memory and map it to the GPU bar, make it accessible from the host CPU through LD/ST.
>
> > # Assuming the CPU needs to touch the buffer prior to Tx, In that case, it will
> > be useful?
>
> If the CPU needs to modify the data then no. it will be more efficient to copy the data to CPU and then send it.
> However there are use cases where the data is DMA w/ zero copy to the GPU (for example) , GPU perform the processing on the data, and then CPU send the mbuf (w/o touching the data).

OK. If I understanding it correctly it is for offloading the
Network/Compute functions to GPU from NW card and/or CPU.

>
> > # How the application knows, The data buffer is in GPU memory in order to
> > use this flag efficiently?
>
> Because it made it happen. For example it attached the mbuf external buffer from the other device memory.
>
> > # Just an random thought, Does it help, if we create two different mempools
> > one from local DRAM and one from GPU memory so that the application can
> > work transparently.
>
> But you will still need to teach the PMD which pool it can inline and which cannot.
> IMO it is more generic to have it per mbuf. Moreover, application has this info.

IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic applications,
The application usage will be tightly coupled with the platform and
capabilities of GPU or Host CPU etc.

I think, pushing this logic to the application is bad idea. But if you
are writing some custom application
and the per packet-level you need to control then this flag may be the only way.


>
> >
> >
> >
> >
> >
> > >
> > > To support a mixed traffic pattern (some buffers from local DRAM, some
> > > buffers from other devices) with high BW, a hint flag is introduced in
> > > the mbuf.
> > > Application will hint the PMD whether or not it should try to inline
> > > the given mbuf data buffer. PMD should do best effort to act upon this
> > > request.
> > >
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 98225ec80b..5934532b7f 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -203,6 +203,15 @@ extern "C" {
> > >  /* add new TX flags here */
> > >
> > >  /**
> > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > + * rather let the device use its DMA engine to fetch the data with
> > > +the
> > > + * provided pointer.
> > > + *
> > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > + */
> > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > +
> > > +/**
> > >   * Indicate that the metadata field in the mbuf is in use.
> > >   */
> > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > --
> > > 2.12.0
> > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17 17:18     ` Jerin Jacob
@ 2019-10-22  6:26       ` Shahaf Shuler
  2019-10-22 15:17         ` Jerin Jacob
  0 siblings, 1 reply; 13+ messages in thread
From: Shahaf Shuler @ 2019-10-22  6:26 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

Thursday, October 17, 2019 8:19 PM, Jerin Jacob:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com>
> wrote:
> >
> > Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to
> > > inline packet
> > >
> > > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler
> > > <shahafs@mellanox.com>
> > > wrote:
> > > >
> > > > Some PMDs inline the mbuf data buffer directly to device. This is
> > > > in order to save the overhead of the PCI headers involved when the
> > > > device DMA read the buffer pointer. For some devices it is
> > > > essential in order to reach the pick BW.
> > > >
> > > > However, there are cases where such inlining is in-efficient. For
> > > > example when the data buffer resides on other device memory (like
> > > > GPU or storage device). attempt to inline such buffer will result
> > > > in high PCI overhead for reading and copying the data from the remote
> device.
> > >
> > > Some questions to understand the use case # Is this use case where
> > > CPU, local DRAM, NW card and GPU memory connected on the coherent
> > > bus
> >
> > Yes. For example one can allocate GPU memory and map it to the GPU bar,
> make it accessible from the host CPU through LD/ST.
> >
> > > # Assuming the CPU needs to touch the buffer prior to Tx, In that
> > > case, it will be useful?
> >
> > If the CPU needs to modify the data then no. it will be more efficient to
> copy the data to CPU and then send it.
> > However there are use cases where the data is DMA w/ zero copy to the
> GPU (for example) , GPU perform the processing on the data, and then CPU
> send the mbuf (w/o touching the data).
> 
> OK. If I understanding it correctly it is for offloading the Network/Compute
> functions to GPU from NW card and/or CPU.

Mostly the compute. The networking on this model is expected to be done by the CPU. 
Note this is only one use case. 

> 
> >
> > > # How the application knows, The data buffer is in GPU memory in
> > > order to use this flag efficiently?
> >
> > Because it made it happen. For example it attached the mbuf external
> buffer from the other device memory.
> >
> > > # Just an random thought, Does it help, if we create two different
> > > mempools one from local DRAM and one from GPU memory so that the
> > > application can work transparently.
> >
> > But you will still need to teach the PMD which pool it can inline and which
> cannot.
> > IMO it is more generic to have it per mbuf. Moreover, application has this
> info.
> 
> IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic
> applications, The application usage will be tightly coupled with the platform
> and capabilities of GPU or Host CPU etc.
> 
> I think, pushing this logic to the application is bad idea. But if you are writing
> some custom application and the per packet-level you need to control then
> this flag may be the only way.

Yes. This flag is for custom application who do unique acceleration (by doing Zero copy for compute/compression/encryption accelerators) on specific platforms. 
Such application is fully aware to the platform and the location where the data resides hence it is very simple for it to know how to set this flag. 

Note, This flag is 0 by default - meaning no hint and generic application works same as today.

> 
> 
> >
> > >
> > >
> > >
> > >
> > >
> > > >
> > > > To support a mixed traffic pattern (some buffers from local DRAM,
> > > > some buffers from other devices) with high BW, a hint flag is
> > > > introduced in the mbuf.
> > > > Application will hint the PMD whether or not it should try to
> > > > inline the given mbuf data buffer. PMD should do best effort to
> > > > act upon this request.
> > > >
> > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > ---
> > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > >  1 file changed, 9 insertions(+)
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -203,6 +203,15 @@ extern "C" {
> > > >  /* add new TX flags here */
> > > >
> > > >  /**
> > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > + * rather let the device use its DMA engine to fetch the data
> > > > +with the
> > > > + * provided pointer.
> > > > + *
> > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > + */
> > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > +
> > > > +/**
> > > >   * Indicate that the metadata field in the mbuf is in use.
> > > >   */
> > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > --
> > > > 2.12.0
> > > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-17 15:14 ` Stephen Hemminger
@ 2019-10-22  6:29   ` Shahaf Shuler
  0 siblings, 0 replies; 13+ messages in thread
From: Shahaf Shuler @ 2019-10-22  6:29 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

Thursday, October 17, 2019 6:15 PM, Stephen Hemminger:
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet
> 
> On Thu, 17 Oct 2019 07:27:34 +0000
> Shahaf Shuler <shahafs@mellanox.com> wrote:
> 
> > Some PMDs inline the mbuf data buffer directly to device. This is in
> > order to save the overhead of the PCI headers involved when the device
> > DMA read the buffer pointer. For some devices it is essential in order
> > to reach the pick BW.
> >
> > However, there are cases where such inlining is in-efficient. For
> > example when the data buffer resides on other device memory (like GPU
> > or storage device). attempt to inline such buffer will result in high
> > PCI overhead for reading and copying the data from the remote device.
> >
> > To support a mixed traffic pattern (some buffers from local DRAM, some
> > buffers from other devices) with high BW, a hint flag is introduced in
> > the mbuf.
> > Application will hint the PMD whether or not it should try to inline
> > the given mbuf data buffer. PMD should do best effort to act upon this
> > request.
> >
> > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> 
> This kind of optimization is hard, and pushing the problem to the application
> to decide seems like the wrong step.

See my comments to Jerin on other thread. This optimization is for custom application who do unique acceleration using look aside accelerators for compute while utilizing network device zero copy. 

 Can the driver just infer this already
> because some mbuf's are external?

Having mbuf w/ external buffer does not necessarily  means the buffer location is on other PCI device. 
Making optimization based on such heuristics may lead to unexpected behavior.   


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-22  6:26       ` Shahaf Shuler
@ 2019-10-22 15:17         ` Jerin Jacob
  2019-10-23 11:24           ` Shahaf Shuler
  0 siblings, 1 reply; 13+ messages in thread
From: Jerin Jacob @ 2019-10-22 15:17 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

On Tue, Oct 22, 2019 at 11:56 AM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Thursday, October 17, 2019 8:19 PM, Jerin Jacob:
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
> >
> > On Thu, Oct 17, 2019 at 4:30 PM Shahaf Shuler <shahafs@mellanox.com>
> > wrote:
> > >
> > > Thursday, October 17, 2019 11:17 AM, Jerin Jacob:
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to
> > > > inline packet
> > > >
> > > > On Thu, Oct 17, 2019 at 12:57 PM Shahaf Shuler
> > > > <shahafs@mellanox.com>
> > > > wrote:
> > > > >
> > > > > Some PMDs inline the mbuf data buffer directly to device. This is
> > > > > in order to save the overhead of the PCI headers involved when the
> > > > > device DMA read the buffer pointer. For some devices it is
> > > > > essential in order to reach the pick BW.
> > > > >
> > > > > However, there are cases where such inlining is in-efficient. For
> > > > > example when the data buffer resides on other device memory (like
> > > > > GPU or storage device). attempt to inline such buffer will result
> > > > > in high PCI overhead for reading and copying the data from the remote
> > device.
> > > >
> > > > Some questions to understand the use case # Is this use case where
> > > > CPU, local DRAM, NW card and GPU memory connected on the coherent
> > > > bus
> > >
> > > Yes. For example one can allocate GPU memory and map it to the GPU bar,
> > make it accessible from the host CPU through LD/ST.
> > >
> > > > # Assuming the CPU needs to touch the buffer prior to Tx, In that
> > > > case, it will be useful?
> > >
> > > If the CPU needs to modify the data then no. it will be more efficient to
> > copy the data to CPU and then send it.
> > > However there are use cases where the data is DMA w/ zero copy to the
> > GPU (for example) , GPU perform the processing on the data, and then CPU
> > send the mbuf (w/o touching the data).
> >
> > OK. If I understanding it correctly it is for offloading the Network/Compute
> > functions to GPU from NW card and/or CPU.
>
> Mostly the compute. The networking on this model is expected to be done by the CPU.
> Note this is only one use case.
>
> >
> > >
> > > > # How the application knows, The data buffer is in GPU memory in
> > > > order to use this flag efficiently?
> > >
> > > Because it made it happen. For example it attached the mbuf external
> > buffer from the other device memory.
> > >
> > > > # Just an random thought, Does it help, if we create two different
> > > > mempools one from local DRAM and one from GPU memory so that the
> > > > application can work transparently.
> > >
> > > But you will still need to teach the PMD which pool it can inline and which
> > cannot.
> > > IMO it is more generic to have it per mbuf. Moreover, application has this
> > info.
> >
> > IMO, we can not use PKT_TX_DONT_INLINE_HINT flag for generic
> > applications, The application usage will be tightly coupled with the platform
> > and capabilities of GPU or Host CPU etc.
> >
> > I think, pushing this logic to the application is bad idea. But if you are writing
> > some custom application and the per packet-level you need to control then
> > this flag may be the only way.
>
> Yes. This flag is for custom application who do unique acceleration (by doing Zero copy for compute/compression/encryption accelerators) on specific platforms.
> Such application is fully aware to the platform and the location where the data resides hence it is very simple for it to know how to set this flag.

# if it is per packet, it will be an implicit requirement to add it mbuf.

If so,
# Does it makes sense to add through dynamic mbuf? Maybe it is not
worth it for a single bit.

Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is
custom application requirement,
how about adding PKT_PMD_CUSTOM1 flags so that similar requirement by other PMDs
can leverage the same bit for such custom applications.(We have a
similar use case for smart NIC (not so make much sense for generic
applications)  but needed for per packet)

>
> Note, This flag is 0 by default - meaning no hint and generic application works same as today.






>
> >
> >
> > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > To support a mixed traffic pattern (some buffers from local DRAM,
> > > > > some buffers from other devices) with high BW, a hint flag is
> > > > > introduced in the mbuf.
> > > > > Application will hint the PMD whether or not it should try to
> > > > > inline the given mbuf data buffer. PMD should do best effort to
> > > > > act upon this request.
> > > > >
> > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > ---
> > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > >  1 file changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > >  /* add new TX flags here */
> > > > >
> > > > >  /**
> > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > +with the
> > > > > + * provided pointer.
> > > > > + *
> > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > + */
> > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > +
> > > > > +/**
> > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > >   */
> > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > --
> > > > > 2.12.0
> > > > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-22 15:17         ` Jerin Jacob
@ 2019-10-23 11:24           ` Shahaf Shuler
  2019-10-25 11:17             ` Jerin Jacob
  0 siblings, 1 reply; 13+ messages in thread
From: Shahaf Shuler @ 2019-10-23 11:24 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

Tuesday, October 22, 2019 6:17 PM, Jerin Jacob:
> <viacheslavo@mellanox.com>
> Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> packet

[...]

> > > I think, pushing this logic to the application is bad idea. But if
> > > you are writing some custom application and the per packet-level you
> > > need to control then this flag may be the only way.
> >
> > Yes. This flag is for custom application who do unique acceleration (by doing
> Zero copy for compute/compression/encryption accelerators) on specific
> platforms.
> > Such application is fully aware to the platform and the location where the
> data resides hence it is very simple for it to know how to set this flag.
> 
> # if it is per packet, it will be an implicit requirement to add it mbuf.
> 
> If so,
> # Does it makes sense to add through dynamic mbuf? Maybe it is not worth it
> for a single bit.

You mean 
1. expose PMD cap for it
2. application enables it on dev offloads
3. PMD register bitfield to the dynamic mbuf flags (rte_mbuf_dynflag_register)
4. application register same flag to get the bit offset

It can be OK, if the community don't see common use for such flag. 


> 
> Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is custom
> application requirement, how about adding PKT_PMD_CUSTOM1 flags so
> that similar requirement by other PMDs can leverage the same bit for such
> custom applications.(We have a similar use case for smart NIC (not so make
> much sense for generic
> applications)  but needed for per packet)
> 
> >
> > Note, This flag is 0 by default - meaning no hint and generic application
> works same as today.
> 
> 
> 
> 
> 
> 
> >
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > To support a mixed traffic pattern (some buffers from local
> > > > > > DRAM, some buffers from other devices) with high BW, a hint
> > > > > > flag is introduced in the mbuf.
> > > > > > Application will hint the PMD whether or not it should try to
> > > > > > inline the given mbuf data buffer. PMD should do best effort
> > > > > > to act upon this request.
> > > > > >
> > > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > > ---
> > > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > > >  1 file changed, 9 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f
> > > > > > 100644
> > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > > >  /* add new TX flags here */
> > > > > >
> > > > > >  /**
> > > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > > +with the
> > > > > > + * provided pointer.
> > > > > > + *
> > > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > > + */
> > > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > > +
> > > > > > +/**
> > > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > > >   */
> > > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > > --
> > > > > > 2.12.0
> > > > > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet
  2019-10-23 11:24           ` Shahaf Shuler
@ 2019-10-25 11:17             ` Jerin Jacob
  0 siblings, 0 replies; 13+ messages in thread
From: Jerin Jacob @ 2019-10-25 11:17 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: dev, Thomas Monjalon, olivier.matz, wwasko, spotluri, Asaf Penso,
	Slava Ovsiienko

On Wed, Oct 23, 2019 at 4:54 PM Shahaf Shuler <shahafs@mellanox.com> wrote:
>
> Tuesday, October 22, 2019 6:17 PM, Jerin Jacob:
> > <viacheslavo@mellanox.com>
> > Subject: Re: [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline
> > packet
>
> [...]
>
> > > > I think, pushing this logic to the application is bad idea. But if
> > > > you are writing some custom application and the per packet-level you
> > > > need to control then this flag may be the only way.
> > >
> > > Yes. This flag is for custom application who do unique acceleration (by doing
> > Zero copy for compute/compression/encryption accelerators) on specific
> > platforms.
> > > Such application is fully aware to the platform and the location where the
> > data resides hence it is very simple for it to know how to set this flag.
> >
> > # if it is per packet, it will be an implicit requirement to add it mbuf.
> >
> > If so,
> > # Does it makes sense to add through dynamic mbuf? Maybe it is not worth it
> > for a single bit.
>
> You mean
> 1. expose PMD cap for it
> 2. application enables it on dev offloads
> 3. PMD register bitfield to the dynamic mbuf flags (rte_mbuf_dynflag_register)
> 4. application register same flag to get the bit offset
>
> It can be OK, if the community don't see common use for such flag.

Any scheme based on dynamic mbuf flags should be fine.

>
>
> >
> > Since we have only 17 bits (40 - 23) remaining for Rx and Tx and it is custom
> > application requirement, how about adding PKT_PMD_CUSTOM1 flags so
> > that similar requirement by other PMDs can leverage the same bit for such
> > custom applications.(We have a similar use case for smart NIC (not so make
> > much sense for generic
> > applications)  but needed for per packet)
> >
> > >
> > > Note, This flag is 0 by default - meaning no hint and generic application
> > works same as today.
> >
> >
> >
> >
> >
> >
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > To support a mixed traffic pattern (some buffers from local
> > > > > > > DRAM, some buffers from other devices) with high BW, a hint
> > > > > > > flag is introduced in the mbuf.
> > > > > > > Application will hint the PMD whether or not it should try to
> > > > > > > inline the given mbuf data buffer. PMD should do best effort
> > > > > > > to act upon this request.
> > > > > > >
> > > > > > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > > > > > ---
> > > > > > >  lib/librte_mbuf/rte_mbuf.h | 9 +++++++++
> > > > > > >  1 file changed, 9 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > b/lib/librte_mbuf/rte_mbuf.h index 98225ec80b..5934532b7f
> > > > > > > 100644
> > > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > > @@ -203,6 +203,15 @@ extern "C" {
> > > > > > >  /* add new TX flags here */
> > > > > > >
> > > > > > >  /**
> > > > > > > + * Hint to PMD to not inline the mbuf data buffer to device
> > > > > > > + * rather let the device use its DMA engine to fetch the data
> > > > > > > +with the
> > > > > > > + * provided pointer.
> > > > > > > + *
> > > > > > > + * This flag is a only a hint. PMD should enforce it as best effort.
> > > > > > > + */
> > > > > > > +#define PKT_TX_DONT_INLINE_HINT (1ULL << 39)
> > > > > > > +
> > > > > > > +/**
> > > > > > >   * Indicate that the metadata field in the mbuf is in use.
> > > > > > >   */
> > > > > > >  #define PKT_TX_METADATA        (1ULL << 40)
> > > > > > > --
> > > > > > > 2.12.0
> > > > > > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [RFC v2] mlx5/net: hint PMD not to inline packet
  2019-10-17  7:27 [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet Shahaf Shuler
  2019-10-17  8:16 ` Jerin Jacob
  2019-10-17 15:14 ` Stephen Hemminger
@ 2019-12-11 17:01 ` " Viacheslav Ovsiienko
  2019-12-27  8:59   ` Olivier Matz
  2020-01-14  7:57 ` [dpdk-dev] [PATCH] net/mlx5: update Tx datapath to support no inline hint Viacheslav Ovsiienko
  3 siblings, 1 reply; 13+ messages in thread
From: Viacheslav Ovsiienko @ 2019-12-11 17:01 UTC (permalink / raw)
  To: dev; +Cc: shahafs, matan, rasland, thomas, orika

Some PMDs inline the mbuf data buffer directly to device transmit descriptor.
This is in order to save the overhead of the PCI headers imposed when the
device DMA reads the data by buffer pointer. For some devices it is essential
in order to provide the full bandwidth.

However, there are cases where such inlining is in-efficient. For example, when
the data buffer resides on other device memory (like GPU or storage device).
Attempt to inline such buffer will result in high PCI overhead for reading
and copying the data from the remote device to the host memory.

To support a mixed traffic pattern (some buffers from local host memory, some
buffers from other devices) with high bandwidth, a hint flag is introduced in
the mbuf.

Application will hint the PMD whether or not it should try to inline the
given mbuf data buffer. PMD should do the best effort to act upon this request.

The hint flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME is supposed to be dynamic,
registered by application with rte_mbuf_dynflag_register(). This flag is
purely vendor specific and declared in PMD specific header rte_pmd_mlx5.h,
which is intended to be used by specific application.

To query the supported specific flags in runtime the private routine is
introduced:

int rte_pmd_mlx5_get_dyn_flag_names(
        uint16_t port,
	char *names[],
        uint16_t n)

It returns the array of currently (over present hardware and configuration)
supported specific flags.

The "not inline hint" feature operating flow is the following one:
- application start
- probe the devices, ports are created
- query the port capabilities
- if port supporting the feature is found
  - register dynamic flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME
- application starts the ports
- on dev_start() PMD checks whether the feature flag is registered and
  enables the feature support in datapath
- application might set this flag in ol_flags field of mbuf in the packets
  being sent and PMD will handle ones appropriately.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>

---
v1: https://patches.dpdk.org/patch/61348/


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-dev] [RFC v2] mlx5/net: hint PMD not to inline packet
  2019-12-11 17:01 ` [dpdk-dev] [RFC v2] mlx5/net: " Viacheslav Ovsiienko
@ 2019-12-27  8:59   ` Olivier Matz
  0 siblings, 0 replies; 13+ messages in thread
From: Olivier Matz @ 2019-12-27  8:59 UTC (permalink / raw)
  To: Viacheslav Ovsiienko; +Cc: dev, shahafs, matan, rasland, thomas, orika

Hi Viacheslav,

On Wed, Dec 11, 2019 at 05:01:33PM +0000, Viacheslav Ovsiienko wrote:
> Some PMDs inline the mbuf data buffer directly to device transmit descriptor.
> This is in order to save the overhead of the PCI headers imposed when the
> device DMA reads the data by buffer pointer. For some devices it is essential
> in order to provide the full bandwidth.
> 
> However, there are cases where such inlining is in-efficient. For example, when
> the data buffer resides on other device memory (like GPU or storage device).
> Attempt to inline such buffer will result in high PCI overhead for reading
> and copying the data from the remote device to the host memory.
> 
> To support a mixed traffic pattern (some buffers from local host memory, some
> buffers from other devices) with high bandwidth, a hint flag is introduced in
> the mbuf.
> 
> Application will hint the PMD whether or not it should try to inline the
> given mbuf data buffer. PMD should do the best effort to act upon this request.
> 
> The hint flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME is supposed to be dynamic,
> registered by application with rte_mbuf_dynflag_register(). This flag is
> purely vendor specific and declared in PMD specific header rte_pmd_mlx5.h,
> which is intended to be used by specific application.
> 
> To query the supported specific flags in runtime the private routine is
> introduced:
> 
> int rte_pmd_mlx5_get_dyn_flag_names(
>         uint16_t port,
> 	char *names[],
>         uint16_t n)
> 
> It returns the array of currently (over present hardware and configuration)
> supported specific flags.
> 
> The "not inline hint" feature operating flow is the following one:
> - application start
> - probe the devices, ports are created
> - query the port capabilities
> - if port supporting the feature is found
>   - register dynamic flag RTE_NET_MLX5_DYNFLAG_NO_INLINE_NAME
> - application starts the ports
> - on dev_start() PMD checks whether the feature flag is registered and
>   enables the feature support in datapath
> - application might set this flag in ol_flags field of mbuf in the packets
>   being sent and PMD will handle ones appropriately.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> 
> ---
> v1: https://patches.dpdk.org/patch/61348/
> 

It looks the patch is missing.

I think a dynamic flag is the good solution for this problem: the pmd
can send a pmd-specific hint to the application, without impacting the
way it works today.


Olivier

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-dev] [PATCH] net/mlx5: update Tx datapath to support no inline hint
  2019-10-17  7:27 [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet Shahaf Shuler
                   ` (2 preceding siblings ...)
  2019-12-11 17:01 ` [dpdk-dev] [RFC v2] mlx5/net: " Viacheslav Ovsiienko
@ 2020-01-14  7:57 ` Viacheslav Ovsiienko
  3 siblings, 0 replies; 13+ messages in thread
From: Viacheslav Ovsiienko @ 2020-01-14  7:57 UTC (permalink / raw)
  To: dev; +Cc: matan, rasland, orika

This patch adds support for dynamic flag that hints transmit
datapath do not copy data to the descriptors. This flag is
useful when data are located in the memory of another (not NIC)
physical device and copying to the host memory is undesirable.

This hint flag is per mbuf for multi-segment packets.

This hint flag might be partially ignored if:

- hardware requires minimal data header to be inline into
  descriptor, it depends on the hardware type and its configuration.
  In this case PMD copies the minimal required number of bytes to
  the descriptor, ignoring the no inline hint flag, the rest of data
  is not copied.

- VLAN tag insertion offload is requested and hardware does not
  support this options. In this case the VLAN tag is inserted by
  software means and at least 18B are copied to descriptor.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
---
RFC: http://patches.dpdk.org/patch/61348/

NOTE: This patch should be applied after the series:
"net/mlx5: add PMD dynf" http://patches.dpdk.org/patch/64542/

 drivers/net/mlx5/mlx5_rxtx.c | 104 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 88 insertions(+), 16 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index aa6aa22..6cb5a2b 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -127,6 +127,7 @@ enum mlx5_txcmp_code {
 uint8_t mlx5_swp_types_table[1 << 10] __rte_cache_aligned;
 
 uint64_t rte_net_mlx5_dynf_inline_mask;
+#define PKT_TX_DYNF_NOINLINE rte_net_mlx5_dynf_inline_mask
 
 /**
  * Build a table to translate Rx completion flags to packet type.
@@ -2527,21 +2528,30 @@ enum mlx5_txcmp_code {
  *   Pointer to burst routine local context.
  * @param len
  *   Length of data to be copied.
+ * @param must
+ *   Length of data to be copied ignoring no inline hint.
  * @param olx
  *   Configured Tx offloads mask. It is fully defined at
  *   compile time and may be used for optimization.
+ *
+ * @return
+ *   Number of actual copied data bytes. This is always greater than or
+ *   equal to must parameter and might be lesser than len in no inline
+ *   hint flag is encountered.
  */
-static __rte_always_inline void
+static __rte_always_inline unsigned int
 mlx5_tx_mseg_memcpy(uint8_t *pdst,
 		    struct mlx5_txq_local *restrict loc,
 		    unsigned int len,
+		    unsigned int must,
 		    unsigned int olx __rte_unused)
 {
 	struct rte_mbuf *mbuf;
-	unsigned int part, dlen;
+	unsigned int part, dlen, copy = 0;
 	uint8_t *psrc;
 
 	assert(len);
+	assert(must <= len);
 	do {
 		/* Allow zero length packets, must check first. */
 		dlen = rte_pktmbuf_data_len(loc->mbuf);
@@ -2554,6 +2564,25 @@ enum mlx5_txcmp_code {
 			assert(loc->mbuf_nseg > 1);
 			assert(loc->mbuf);
 			--loc->mbuf_nseg;
+			if (loc->mbuf->ol_flags & PKT_TX_DYNF_NOINLINE) {
+				unsigned int diff;
+
+				if (copy >= must) {
+					/*
+					 * We already copied the minimal
+					 * requested amount of data.
+					 */
+					return copy;
+				}
+				diff = must - copy;
+				if (diff <= rte_pktmbuf_data_len(loc->mbuf)) {
+					/*
+					 * Copy only the minimal required
+					 * part of the data buffer.
+					 */
+					len = diff;
+				}
+			}
 			continue;
 		}
 		dlen -= loc->mbuf_off;
@@ -2561,6 +2590,7 @@ enum mlx5_txcmp_code {
 					       loc->mbuf_off);
 		part = RTE_MIN(len, dlen);
 		rte_memcpy(pdst, psrc, part);
+		copy += part;
 		loc->mbuf_off += part;
 		len -= part;
 		if (!len) {
@@ -2574,7 +2604,7 @@ enum mlx5_txcmp_code {
 				assert(loc->mbuf_nseg >= 1);
 				--loc->mbuf_nseg;
 			}
-			return;
+			return copy;
 		}
 		pdst += part;
 	} while (true);
@@ -2619,7 +2649,7 @@ enum mlx5_txcmp_code {
 	struct mlx5_wqe_eseg *restrict es = &wqe->eseg;
 	uint32_t csum;
 	uint8_t *pdst;
-	unsigned int part;
+	unsigned int part, tlen = 0;
 
 	/*
 	 * Calculate and set check sum flags first, uint32_t field
@@ -2652,17 +2682,18 @@ enum mlx5_txcmp_code {
 				 2 * RTE_ETHER_ADDR_LEN),
 		      "invalid Ethernet Segment data size");
 	assert(inlen >= MLX5_ESEG_MIN_INLINE_SIZE);
-	es->inline_hdr_sz = rte_cpu_to_be_16(inlen);
 	pdst = (uint8_t *)&es->inline_data;
 	if (MLX5_TXOFF_CONFIG(VLAN) && vlan) {
 		/* Implement VLAN tag insertion as part inline data. */
-		mlx5_tx_mseg_memcpy(pdst, loc, 2 * RTE_ETHER_ADDR_LEN, olx);
+		mlx5_tx_mseg_memcpy(pdst, loc,
+				    2 * RTE_ETHER_ADDR_LEN,
+				    2 * RTE_ETHER_ADDR_LEN, olx);
 		pdst += 2 * RTE_ETHER_ADDR_LEN;
 		*(unaligned_uint32_t *)pdst = rte_cpu_to_be_32
 						((RTE_ETHER_TYPE_VLAN << 16) |
 						 loc->mbuf->vlan_tci);
 		pdst += sizeof(struct rte_vlan_hdr);
-		inlen -= 2 * RTE_ETHER_ADDR_LEN + sizeof(struct rte_vlan_hdr);
+		tlen += 2 * RTE_ETHER_ADDR_LEN + sizeof(struct rte_vlan_hdr);
 	}
 	assert(pdst < (uint8_t *)txq->wqes_end);
 	/*
@@ -2670,18 +2701,26 @@ enum mlx5_txcmp_code {
 	 * Here we should be aware of WQE ring buffer wraparound only.
 	 */
 	part = (uint8_t *)txq->wqes_end - pdst;
-	part = RTE_MIN(part, inlen);
+	part = RTE_MIN(part, inlen - tlen);
 	assert(part);
 	do {
-		mlx5_tx_mseg_memcpy(pdst, loc, part, olx);
-		inlen -= part;
-		if (likely(!inlen)) {
-			pdst += part;
+		unsigned int copy;
+
+		/*
+		 * Copying may be interrupted inside the routine
+		 * if run into no inline hint flag.
+		 */
+		copy = tlen >= txq->inlen_mode ? 0 : (txq->inlen_mode - tlen);
+		copy = mlx5_tx_mseg_memcpy(pdst, loc, part, copy, olx);
+		tlen += copy;
+		if (likely(inlen <= tlen) || copy < part) {
+			es->inline_hdr_sz = rte_cpu_to_be_16(tlen);
+			pdst += copy;
 			pdst = RTE_PTR_ALIGN(pdst, MLX5_WSEG_SIZE);
 			return (struct mlx5_wqe_dseg *)pdst;
 		}
 		pdst = (uint8_t *)txq->wqes;
-		part = inlen;
+		part = inlen - tlen;
 	} while (true);
 }
 
@@ -3280,7 +3319,8 @@ enum mlx5_txcmp_code {
 	if (inlen <= MLX5_ESEG_MIN_INLINE_SIZE)
 		return MLX5_TXCMP_CODE_ERROR;
 	assert(txq->inlen_send >= MLX5_ESEG_MIN_INLINE_SIZE);
-	if (inlen > txq->inlen_send) {
+	if (inlen > txq->inlen_send ||
+	    loc->mbuf->ol_flags & PKT_TX_DYNF_NOINLINE) {
 		struct rte_mbuf *mbuf;
 		unsigned int nxlen;
 		uintptr_t start;
@@ -3295,7 +3335,8 @@ enum mlx5_txcmp_code {
 			assert(txq->inlen_mode <= txq->inlen_send);
 			inlen = txq->inlen_mode;
 		} else {
-			if (!vlan || txq->vlan_en) {
+			if (loc->mbuf->ol_flags & PKT_TX_DYNF_NOINLINE ||
+			    !vlan || txq->vlan_en) {
 				/*
 				 * VLAN insertion will be done inside by HW.
 				 * It is not utmost effective - VLAN flag is
@@ -4106,7 +4147,8 @@ enum mlx5_txcmp_code {
 				return MLX5_TXCMP_CODE_ERROR;
 			}
 			/* Inline or not inline - that's the Question. */
-			if (dlen > txq->inlen_empw)
+			if (dlen > txq->inlen_empw ||
+			    loc->mbuf->ol_flags & PKT_TX_DYNF_NOINLINE)
 				goto pointer_empw;
 			/* Inline entire packet, optional VLAN insertion. */
 			tlen = sizeof(dseg->bcount) + dlen;
@@ -4302,6 +4344,33 @@ enum mlx5_txcmp_code {
 				/* Check against minimal length. */
 				if (inlen <= MLX5_ESEG_MIN_INLINE_SIZE)
 					return MLX5_TXCMP_CODE_ERROR;
+				if (unlikely(loc->mbuf->ol_flags &
+					      PKT_TX_DYNF_NOINLINE)) {
+					/*
+					 * The hint flag not to inline packet
+					 * data is set. Check whether we can
+					 * follow the hint.
+					 */
+					if ((!MLX5_TXOFF_CONFIG(EMPW) &&
+					      txq->inlen_mode) ||
+					    (MLX5_TXOFF_CONFIG(MPW) &&
+					     txq->inlen_mode)) {
+						/*
+						 * The hardware requires the
+						 * minimal inline data header.
+						 */
+						goto single_min_inline;
+					}
+					if (MLX5_TXOFF_CONFIG(VLAN) &&
+					    vlan && !txq->vlan_en) {
+						/*
+						 * We must insert VLAN tag
+						 * by software means.
+						 */
+						goto single_part_inline;
+					}
+					goto single_no_inline;
+				}
 				/*
 				 * Completely inlined packet data WQE:
 				 * - Control Segment, SEND opcode
@@ -4351,6 +4420,7 @@ enum mlx5_txcmp_code {
 				 * We should check the free space in
 				 * WQE ring buffer to inline partially.
 				 */
+single_min_inline:
 				assert(txq->inlen_send >= txq->inlen_mode);
 				assert(inlen > txq->inlen_mode);
 				assert(txq->inlen_mode >=
@@ -4418,6 +4488,7 @@ enum mlx5_txcmp_code {
 				 * We also get here if VLAN insertion is not
 				 * supported by HW, the inline is enabled.
 				 */
+single_part_inline:
 				wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 				loc->wqe_last = wqe;
 				mlx5_tx_cseg_init(txq, loc, wqe, 4,
@@ -4458,6 +4529,7 @@ enum mlx5_txcmp_code {
 			 * - Ethernet Segment, optional VLAN, no inline
 			 * - Data Segment, pointer type
 			 */
+single_no_inline:
 			wqe = txq->wqes + (txq->wqe_ci & txq->wqe_m);
 			loc->wqe_last = wqe;
 			mlx5_tx_cseg_init(txq, loc, wqe, 3,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17  7:27 [dpdk-dev] [RFC PATCH 20.02] mbuf: hint PMD not to inline packet Shahaf Shuler
2019-10-17  8:16 ` Jerin Jacob
2019-10-17 10:59   ` Shahaf Shuler
2019-10-17 17:18     ` Jerin Jacob
2019-10-22  6:26       ` Shahaf Shuler
2019-10-22 15:17         ` Jerin Jacob
2019-10-23 11:24           ` Shahaf Shuler
2019-10-25 11:17             ` Jerin Jacob
2019-10-17 15:14 ` Stephen Hemminger
2019-10-22  6:29   ` Shahaf Shuler
2019-12-11 17:01 ` [dpdk-dev] [RFC v2] mlx5/net: " Viacheslav Ovsiienko
2019-12-27  8:59   ` Olivier Matz
2020-01-14  7:57 ` [dpdk-dev] [PATCH] net/mlx5: update Tx datapath to support no inline hint Viacheslav Ovsiienko

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox