DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Liu, Jijiang" <jijiang.liu@intel.com>,
	"Olivier Matz (olivier.matz@6wind.com)" <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
Date: Thu, 27 Nov 2014 14:56:25 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB977258213BADB8@IRSMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>



> 
> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Thursday, November 27, 2014 6:00 PM
> To: Liu, Jijiang; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields
> 
> Hi Jijiang,
> 
> Please see some comments below.
> 
> On 11/27/2014 09:18 AM, Jijiang Liu wrote:
> > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT,
> and a new field: l4_tun_len.
> > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_len and outer_l3_len field.
> >
> > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware outer checksum for tunnelling packet.
> > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is a UDP tunneling packet.
> > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN header length.
> >
> > Signed-off-by: Jijiang Liu <jijiang.liu@intel.com>
> > ---
> >   lib/librte_mbuf/rte_mbuf.c |    2 +-
> >   lib/librte_mbuf/rte_mbuf.h |   23 ++++++++++++++---------
> >   2 files changed, 15 insertions(+), 10 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> > index 87c2963..e89c310 100644
> > --- a/lib/librte_mbuf/rte_mbuf.c
> > +++ b/lib/librte_mbuf/rte_mbuf.c
> > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask)
> >   	case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM";
> >   	case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM";
> >   	case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST";
> > -	case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM";
> > +	case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT";
> >   	case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG";
> >   	default: return NULL;
> 
> As I said as a reply to the cover letter, I suggest to use PKT_TX_OUT_UDP_CKSUM instead of PKT_TX_UDP_TUNNEL_PKT.

HW don't support outer L4 checksum offload.
But to calculate inner checksums correctly, it needs a hint from SW about L4 Tunneling Type.
Currently the following values are recognised by HW:

L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
10b - GRE tunneling header
Else - reserved

You can check yourself:
http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html
Sections 8.4.2.2.1 and 8.4.4.2

> 
> Also, the PKT_TX_OUT_IP_CKSUM case is missing here.
> 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 367fc56..48cd8e1 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -99,10 +99,9 @@ extern "C" {
> >   #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet with IPv6 header. */
> >   #define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
> >   #define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
> > -/* add new RX flags here */
> >
> 
> We should probably not remove this line.
> 
> 
> >   /* add new TX flags here */
> > -#define PKT_TX_VXLAN_CKSUM   (1ULL << 50) /**< TX checksum of VXLAN computed by NIC */
> > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP
> > +tunneling packet */
> >   #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to
> > timestamp. */
> >
> >   /**
> > @@ -125,13 +124,20 @@ extern "C" {
> >   #define PKT_TX_IP_CKSUM      (1ULL << 54) /**< IP cksum of TX pkt. computed by NIC. */
> >   #define PKT_TX_IPV4_CSUM     PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_CKSUM. */
> >
> > +#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +
> >   /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV4          PKT_RX_IPV4_HDR
> > +#define PKT_TX_IPV4          (1ULL << 56)
> >
> >   /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offload or TSO. */
> > -#define PKT_TX_IPV6          PKT_RX_IPV6_HDR
> > +#define PKT_TX_IPV6          (1ULL << 57)
> 
> The description in comment does not match the description in the cover letter.
> 
> Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another commit.
> 
> 
> > -#define PKT_TX_VLAN_PKT      (1ULL << 55) /**< TX packet is a 802.1q VLAN packet. */
> > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */
> > +#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
> > +#define PKT_TX_OUTER_IPV4_CSUM  PKT_TX_OUTER_IP_CKSUM /**< Alias of
> > +PKT_TX_OUTER_IP_CKSUM. */
> 
> Why do we need an alias?
> 
> By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can be removed. But it's not the topic of your series.
> 
> Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover letter and commit logs.
> 
> 
> > +
> > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/
> > +#define PKT_TX_OUTER_IPV6    (1ULL << 59)
> >
> 
> This flag is not in the cover letter or commit log. What is its purpose?


My bad, forgot that for outer IP, will also need to specify it's type.
So same story here as for inner IP.
So in total, we might need 3 flags for outer IP:

/* Tells HW that outer IP is IPV4 and checksum for it should be calculated by HW. */
PKT_TX_OUTER_IP_CKSUM

/* Tells HW that outer IP is IPV4 and checksum for it should not be calculated by HW. */
PKT_TX_OUTER_IPV4

/* Tells HW that outer IP is IPV6. */
PKT_TX_OUTER_IPV6

> 
> 
> >   /**
> >    * TCP segmentation offload. To enable this offload feature for a @@
> > -266,10 +272,9 @@ struct rte_mbuf {
> >   			uint64_t tso_segsz:16; /**< TCP TSO segment size */
> >
> >   			/* fields for TX offloading of tunnels */
> > -			uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */
> > -			uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */
> > -
> > -			/* uint64_t unused:8; */
> > +			uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */
> > +			uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */
> > +			uint64_t l4_tun_len:8; /**< L4 tunnelling header length */
> >   		};
> >   	};
> >   } __rte_cache_aligned;
> >
> 
> About l4_tun_len, I have another comment I forgot to add in the cover letter. Can we remove it and include its length in outer_l2_len
> instead? For instance, replace:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->l4tun_len = vxlan_hdr;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> by:
> 
>       mb->l2_len =  eth_hdr_in;
>       mb->l3_len = ipv4_hdr_in;
>       mb->outer_l2_len = eth_hdr_out + vxlan_hdr;
>       mb->outer_l3_len = ipv4_hdr_out;
>       mb->ol_flags |= PKT_TX_OUT_IP_CKSUM  | PKT_TX_UDP_TUNNEL |
>         PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> 
> I think it won't bother the driver, and it's coherent with case B.2 of your cover letter.

You probably meant:
mb->l2_len =  eth_hdr_in + vxlan_hdr;
?
Yes, I think it could be done that way too.
Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at least to me). 
After all  we do have space for it in mbuf's tx_offload.
Konstantin

> 
> Regards,
> Olivier

  parent reply	other threads:[~2014-11-27 14:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-27  8:18 [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Jijiang Liu
2014-11-27  8:18 ` [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Jijiang Liu
2014-11-27 10:00   ` Olivier MATZ
2014-11-27 13:14     ` Liu, Jijiang
2014-11-28  9:17       ` Olivier MATZ
     [not found]     ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com>
2014-11-27 14:56       ` Ananyev, Konstantin [this message]
2014-11-27 17:01         ` Ananyev, Konstantin
2014-11-28 10:45           ` Olivier MATZ
2014-11-28 11:16             ` Ananyev, Konstantin
2014-11-30 14:50             ` Ananyev, Konstantin
2014-12-01  2:30               ` Liu, Jijiang
2014-12-01  9:52                 ` Olivier MATZ
2014-12-01 11:58                   ` Ananyev, Konstantin
2014-12-01 12:28                     ` Olivier MATZ
2014-12-01 13:07                       ` Liu, Jijiang
2014-12-01 14:31                         ` Ananyev, Konstantin
2014-11-27  8:18 ` [dpdk-dev] [PATCH 2/3] i40e:PMD change for VXLAN TX checksum Jijiang Liu
2014-11-27  8:18 ` [dpdk-dev] [PATCH 3/3] testpmd:rework csum forward engine Jijiang Liu
2014-11-27 10:23   ` Olivier MATZ
2014-11-27  8:50 ` [dpdk-dev] [PATCH 0/3] i40e VXLAN TX checksum rework Liu, Jijiang
2014-11-27  9:44 ` Olivier MATZ
2014-11-27 10:12   ` Olivier MATZ
2014-11-27 12:06     ` Liu, Jijiang
2014-11-27 12:07   ` Liu, Jijiang
2014-11-27 15:29   ` Ananyev, Konstantin
2014-11-27 16:31     ` Liu, Jijiang
2014-12-03  8:02       ` Liu, Jijiang
2014-11-28  9:26     ` Olivier MATZ

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB977258213BADB8@IRSMSX105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).