DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Liu, Jijiang" <jijiang.liu@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>
Cc: dev <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Date: Fri, 21 Nov 2014 05:40:08 +0000
Message-ID: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9C6C2@SHSMSX101.ccr.corp.intel.com> (raw)
In-Reply-To: <546E1887.1020800@6wind.com>



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Friday, November 21, 2014 12:36 AM
> To: Liu, Jijiang
> Cc: Thomas Monjalon; dev
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hi Jijiang,
> 
> On 11/20/2014 08:28 AM, Liu, Jijiang wrote:
> >> The original behavior (without your vxlan patches), which still works
> >> today, is to select inner or outer using the m->l2_len field:
> >>
> >>    - checksum outer IP + UDP
> >>      m->l2_len=14 m->l3_len=20
> >>      flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
> >>
> >>    - checksum inner IP + UDP
> >>      m->l2_len=64 m->l3_len=20
> >>      flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
> >>      of course, the packet is valid only if the outer IP checksum is
> >>      already correct and outer UDP checksum is 0
> >>
> >> If i40e does not act like this, it does not follow the previous API.
> >
> > No,  i40e follows this.
In this case, I meant that if the packet with format  outer IP / outer UDP / VxLAN / Ether / inner IP / inner UDP / data   is not recognized VXLAN packet, it also can work on 40G If it can work on 10G today.
 
> OK. This is assumption (A):
> To calculate the inner IP + UDP checksum, you don't need VXLAN flag.
> You just acked it.

The VXLAN packet can be recognized after  VXLAN UDP destination port is configured on i40e , 
TX checksum offload must not work if you still set  m->l2_len=64 without  the PKT_TX_VXLAN_CKSUM flag.
Because we need this PKT_TX_VXLAN/TUNNEL_CKSUM to tell driver to set some related tunneling  registers.
 Do you hope that we can use L2 length to check if the packet is tunneling? If yes,  I don't think it makes sense.

As to tunneling parameters, for example, 
L4TUNT L4 Tunneling Type parameter

9:10   L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
    00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
    01b - UDP tunneling header (Any UDP tunneling, VxLAN and Geneve)
    10b - GRE tunneling header
   Else - reserved

L4 Tunneling length
 
12:18 L4TUNLEN L4 Tunneling Length (Teredo / GRE header / VXLAN header) defined in Words (field must be set to zero
if L4TUNT equals to zero).
• For standard Teredo headers with no additional header payload it should be set to 4 which equals to
8 bytes. If the tunneling header includes proprietary content it should be included as well.
• For IP in GRE it should be set to the length of the GRE header.
• For MAC in GRE or MAC in UDP it should be set to the length of the GRE or UDP headers plus the inner MAC up to including its last Ethertype.
If the L4TUNT is cleared, this field is ignored and must be set to zero.

Olivier, Thomas
I don't know  if you got intel 40G datasheet,  we all known you are focusing on generic concept and programming, I also think it is very important.
But  I think if you can read intel 40G data sheet, you probably understand easily what these 40G patches are for and what we are talking about. This is my personal opinion.


> >>> 2. only set inner L3/L4 header TX checksum
> >>>      tx_checksum set 0x30 0
> >>>    In this case, the PKT_TX_VXLAN_CKSUM flag is set, so driver think
> >>> it is VXLAN
> >> packet, and we don't need to change outer ones because we don't set
> >> outer flags here (PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM).
> 
> Assumption (B):
> To calculate the inner IP + UDP checksum (this is what you wrote "only set inner
> L3/L4 header TX checksum"), you say you set the VXLAN flag.
> This is the opposite of (A).
> 
> >> As explained above, there is no need to set the PKT_TX_VXLAN_CKSUM if
> >> you only want to set the inner L3/L4 checksum.
> >> This was already working like this
> >> before your patches, as long as l2_len and l3_len are set properly in
> >> the mbuf (l2_len should include the outer headers).
> >
> > Does VXLAN TX checksum offload or ordinary L2 packet TX checksum offload
> work?
> > Have you ever tested it on a NIC that supports VXLAN.
> 
> You don't answer the question: which between (A) or (B) is correct.
> 
> I'm sorry I don't understand your question above.
> 
> I have done no test on i40e, because I don't have access to this hardware.
> 
> > The PKT_TX_VXLAN_CKSUM flag meaning just tell driver this is encapsulation
> packet, so driver should set TX checksum offload for the packet using outer l2/l3
> len, inner l2/l3 len and tunneling header length.
> >
> > If you don't like this flag name, I can change it for  PKT_TX_TUNNEL_CKSUM,
> which have more generic meaning.
> 
> The problem is not only the name. After tens of mails, I'm still not able to
> understand the VxLAN checksum API.
> 
> I wanted to rework the csum forward engine code, because it is not understable
> today. I wanted to clarify the API. But sorry I think I'll give up now.
> 
> >> Moreover, PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, ... are not "outer
> flags".
> >> They are hardware checksum flags, and before your vxland patch, they
> >> concerned the headers referenced by m->l2_len and m->l3_len.
> >
> > Actually, the  key point of debate is that you still think the l2_len filed and the
> l3_len filed  in mbuf are inner part in the case of tunneling, right?  If yes, let me
> explain what I thought.
> 
> This is not the only key point of debate. The very first key point is that the VxLAN
> checksum offload API is not documented and I'm not able to rework the csum
> code to use it.
> 
> > As you know, NIC itself is not responsible for packet decapsulation /
> encapsulation at all. It sends and receives the whole packet, not only for inner
> part in the case of tunneling. The translation from receive descriptor to mbuf
> structure is also for the whole packet. And these fields defined in mbuf structure
> are also for the whole packet, no matter it is tunneling or non-tunneling.
> >
> > 1) We assume that a NIC can't  recognize VXLAN packet, when a packet  with
> the format  outer IP / outer UDP / VxLAN / Ether / inner IP / inner UDP / data is
> received,
> >   do you think whether l2  header and l3 header length of this packet is outer or
> inner,  according to my understanding, I think it is outer, and m->l2_len and m-
> >l3_len is also outer. Do you agree?
> 
> The l2_len and l3_len are never set up by any driver on rx side. Your example does
> not apply.
> 
> These fields are set by the application (a network stack for instance) to indicate to
> the driver and hardware where to find the l3 and l4 headers whose checksum
> need to be calculated.
> 
> The l2_len and l3_len does not refer to inner or outer header. It refers to the
> header that has to be checksum'd in hardware when the flag is set. It can be
> inner or outer. At least, it was the case before the adding of VxLAN offload
> feature.
> 
> 
> > 2) We also assume that a NIC can  recognize VXLAN packet,  but there is no
> difference between 1)  and 2) on data in mbuf before patching my VXLAN patch,
> so I also think  m->l2_len and m->l3_len is outer.  Do you agree?
> > After patching my VXLAN, the inner_l2_len and inner_l3_len were used to stand
> for inner header part.
> 
> Your argumentation would make sense if l2_len and l3_len were filled by a NIC in
> RX functions. But that's not the case. Today, these fields are only used in TX when
> a checksum flag is also set. And I think that a flag should always refer to the same
> length fields.
> 
> But I'm not the one who decides this, I'm just trying to help to define an API that
> makes sense.
> 
> 
> >> With your vxlan patch, it changed without beeing documented. These
> >> flags use either (m->l2_len, m->l3_len) or (m->inner_l2_len,
> >> m->inner_l3_len), which is not a good idea in my opinion.
> >
> > The PKT_RX_IPV4_HDR  definition is listed below,
> > #define PKT_RX_IPV4_HDR      (1ULL << 5)  /**< RX packet with IPv4 header. */
> > I don't think it just stand for inner IP TX checksum offload, I just extend its usage
> scope in the case of tunneling.
> 
> If you reread my mail, I was not talking about PKT_RX_IPV4_HDR but about
> PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, (etc...) which are TX flags.
> I think my previous mail was clear enough:
> 
>    They are hardware checksum flags, and before your vxlan patch, they
>    concerned the headers referenced by m->l2_len and m->l3_len.
> 
>    With your vxlan patch, it changed without beeing documented. These
>    flags use either (m->l2_len, m->l3_len) or (m->inner_l2_len,
>    m->inner_l3_len), which is not a good idea in my opinion.
> 
> 
> >>> 3. set outer L3/L4 TX checksum and inner L3&L4 TX checksum
> >>> tx_checksum set 0x31(0x33) 0 in this case, the PKT_TX_VXLAN_CKSUM
> >>> flag is set, driver think it is VXLAN packet, and we need to change
> >>> outer and inner as both
> >> outer and inner flags are set.
> >>
> >> Here you are talking about test pmd flags. You do not describe the mbuf API:
> >> PKT_TX_* flags and lengths values that need to be set (l2_len,
> >> l3_len, ...) and to what they refer to.
> >>
> >> I think if you want to explain the vxlan checksum offload mbuf API,
> >> you should not talk about the testpmd flags as:
> >> - they don't match the mbuf flags
> >> - they have undocumented (or uncoherent) behavior in the csumonly
> >>    forward engine
> >>
> >> After several exchanges about this vxlan API.
> >> Unfortunately, it is still vague and obscure to me.
> >
> > As to tunneling packet TX checksum offload, please don't think it is only an inner
> or outer part.
> > You should regard it as whole part.
> 
> So what?
> 
> >> So here is a proposition of API documentation that looks
> >> understandable. Maybe it is easier to change the code to match this API:
> >>
> > Ok, thanks.
> >
> >>
> >> PKT_TX_IP_CKSUM flag enables hardware computation of IP cksum. To use it:
> >> - fill l2_len and l3_len in mbuf
> >> - set the flag PKT_TX_IP_CKSUM
> >> - set the ip checksum to 0 in IP header See (1) and (2).
> >>
> >> One value among PKT_TX_L4_NO_CKSUM, PKT_TX_UDP_CKSUM,
> >> PKT_TX_TCP_CKSUM and PKT_TX_SCTP_CKSUM can be assigned to the bits of
> >> PKT_TX_L4_MASK. These flags are used to offload the L4 checksum in
> hardware.
> >> The user requires to:
> >> - fill l2_len and l3_len in mbuf
> >> - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or
> >>    PKT_TX_UDP_CKSUM
> >> - calculate the pseudo header checksum and set it in the L4
> >>    header (only for TCP or UDP). See rte_ipv4_phdr_cksum() and
> >>    rte_ipv6_phdr_cksum().  For SCTP, set the crc field to 0.
> >> See (1) and (2).
> >>
> >> The PKT_TX_TCP_SEG flag can be set to enable TCP segmentation offload
> >> for a packet to be transmitted on hardware supporting
> >> TSO:
> >> - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag
> >>    implies PKT_TX_TCP_CKSUM)
> >> - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP
> >>    checksum to 0 in the packet
> >> - fill the mbuf offload information: l2_len, l3_len, l4_len,
> >>    tso_segsz
> >> - calculate the pseudo header checksum without taking ip_len in
> >>    accound, and set it in the TCP header. Refer to
> >>    rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum() that can be
> >>    used as helpers.
> >> See (1) and (2).
> >>
> >> (1) In case the packet is an encapsulated packet, the m->l2_len
> >>      field can include all the outer tunnel headers. These headers
> >>      will remain unmodified by the hardware.
> >>
> >> (2) If outer_l2_len and outer_l3_len are not 0, the beginning of
> >>      the inner headers is relative to outer_l2_len + outer_l3_len.
> >>
> >>
> >> [To replace the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags]
> >>
> >> PKT_TX_OUTER_IP_CKSUM flag enables hardware computation of IP cksum
> >> in outer headers. To use it:
> >> - fill outer_l2_len and outer_l3_len in mbuf
> >> - set the flag PKT_TX_OUTER_IP_CKSUM
> >> - set the ip checksum to 0 in outer IP header
> >>
> >> One value among PKT_TX_OUTER_L4_NO_CKSUM,
> PKT_TX_OUTER_UDP_CKSUM,
> >> PKT_TX_OUTER_TCP_CKSUM and PKT_TX_OUTER_SCTP_CKSUM can be
> assigned to
> >> the bits of PKT_TX_L4_MASK.
> >> These flags are used to offload the outer L4 checksum in hardware.
> >> The user requires to:
> >> - fill outer_l2_len and outer_l3_len in mbuf
> >> - set the flags PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM or
> >>    PKT_TX_OUTER_UDP_CKSUM
> >> - calculate the pseudo header checksum and set it in the outer L4
> >>    header (only for TCP or UDP). See rte_ipv4_phdr_cksum() and
> >>    rte_ipv6_phdr_cksum().  For SCTP, set the crc field to 0.
> >
> > Good. You provide a common approach.
> >
> > Actually, I have another common approach, 1. Change PKT_TX_VXLAN_CKSUM
> > to PKT_TX_TUNNEL_CKSUM 2. Add field "uint8_t tun_header_len
> > "(tunneling header length, for example, GRE header )into mbuf structure.
> > After above change, the API can supports other tunnels.
> 
> No. I don't want that you explain me what should be modified in the current API,
> as I don't understand it. I need (the community
> needs?) a full definition of the API, like I just did in my previous mail.
> 
> I think my description was clear. Please, do the same effort to describe the vxlan
> API from the beginning to the end, and how it changes (or not) the legacy
> checksum API.
> 
> >> This proposition has several advantages:
> >> - it is documented :)
> >> - the API is straightforward: inner and outer work in the same
> >>    manner.
> >> - the API already supports other tunnels (IPIP, GRE, STT, ...)
> >> - adding m->outer_* fields allows to keep the same semantic for
> >>    the existing flags. Indeed, it does not map linux skb, but this
> >>    is not an argument. Moreover, linux does not seem to support
> >>    hardware tx checksum of outer+inner headers.
> >
> > Just as I have mentioned in the previous email,  Linux have already supported
> hardware tx checksum of outer+inner headers for i40e.
> 
> Yes you are right. But I think that's not the point here.
> 
> Regards,
> Olivier

  reply	other threads:[~2014-11-21  5:29 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-27  2:13 [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 01/10] librte_mbuf:the rte_mbuf structure changes Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 02/10] librte_ether:add the basic data structures of VxLAN Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 03/10] librte_ether:add VxLAN packet identification API Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 04/10] i40e:support VxLAN packet identification in i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 05/10] app/test-pmd:test VxLAN packet identification Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 06/10] librte_ether:add data structures of VxLAN filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 07/10] i40e:implement the API of VxLAN filter in librte_pmd_i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 08/10] app/testpmd:test VxLAN packet filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 09/10] i40e:support VxLAN Tx checksum offload Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 10/10] app/testpmd:test " Jijiang Liu
2014-11-04  8:19   ` Olivier MATZ
2014-11-05  6:02     ` Liu, Jijiang
2014-11-05 10:28       ` Olivier MATZ
2014-11-06 11:24         ` Liu, Jijiang
2014-11-06 13:08           ` Olivier MATZ
2014-11-06 14:27             ` Liu, Jijiang
2014-11-07  0:43         ` Yong Wang
2014-11-07 17:16           ` Olivier MATZ
2014-11-10 11:39             ` Ananyev, Konstantin
2014-11-10 15:57               ` Olivier MATZ
2014-11-12  9:55                 ` Ananyev, Konstantin
2014-11-12 13:05                   ` Olivier MATZ
2014-11-12 13:40                     ` Thomas Monjalon
2014-11-12 23:14                       ` Ananyev, Konstantin
2014-11-12 14:39                     ` Ananyev, Konstantin
2014-11-12 14:56                       ` Olivier MATZ
     [not found]             ` <D0868B54.24DBB%yongwang@vmware.com>
2014-11-11  0:07               ` [dpdk-dev] FW: " Yong Wang
2014-11-10  6:03         ` [dpdk-dev] " Liu, Jijiang
2014-11-10 16:17           ` Olivier MATZ
     [not found]             ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8F7A7@SHSMSX101.ccr.corp.intel.com>
2014-11-12 17:26               ` Thomas Monjalon
2014-11-13  5:35                 ` Liu, Jijiang
2014-11-13  5:39                   ` Liu, Jijiang
2014-11-13  6:51                 ` Liu, Jijiang
2014-11-13  9:10                   ` Thomas Monjalon
2014-11-14  8:15                     ` Liu, Jijiang
2014-11-14  9:09                       ` Olivier MATZ
2014-11-17  6:52                         ` Liu, Jijiang
2014-11-17 11:21                           ` Olivier MATZ
2014-11-20  7:28                             ` Liu, Jijiang
2014-11-20 16:36                               ` Olivier MATZ
2014-11-21  5:40                                 ` Liu, Jijiang [this message]
2014-10-27  2:20 ` [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Liu, Yong
2014-10-27  2:41 ` Zhang, Helin
2014-10-27 13:46   ` Thomas Monjalon
2014-10-27 14:34     ` Liu, Jijiang
2014-10-27 15:15       ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ED644BD7E0A5F4091CF203DAFB8E4CC01D9C6C2@SHSMSX101.ccr.corp.intel.com \
    --to=jijiang.liu@intel.com \
    --cc=dev@dpdk.org \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git