DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Liu, Jijiang" <jijiang.liu@intel.com>,
	 Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Date: Mon, 17 Nov 2014 12:21:36 +0100
Message-ID: <5469DA40.7050107@6wind.com> (raw)
In-Reply-To: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9BAC0@SHSMSX101.ccr.corp.intel.com>

Hi Jijiang,

On 11/17/2014 07:52 AM, Liu, Jijiang wrote:
> Anyway, I explain the checksum mechanism here again.
> 
> In my VXLAN patch set, for an VXLAN packet TX checksum offload,  the logics below:
> 
> 1. only set outer L3/L4 header TX checksum
>     tx_checksum set 0x3(0r 0x1) 0
>   In this case, the PKT_TX_VXLAN_CKSUM flag is not set as we don't set inner flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM), so we don't need to change inner ones, driver think it is the ordinary packet,  
> HW will do outer L3/L4 checksum offload. 

Let's take an example with the following packet:
Ether / IP / UDP / VxLAN / Ether / IP / UDP / data

The original behavior (without your vxlan patches), which still
works today, is to select inner or outer using the m->l2_len field:

  - checksum outer IP + UDP
    m->l2_len=14 m->l3_len=20
    flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM

  - checksum inner IP + UDP
    m->l2_len=64 m->l3_len=20
    flags=PKT_TX_IP_CKSUM PKT_TX_UDP_CKSUM
    of course, the packet is valid only if the outer IP checksum is
    already correct and outer UDP checksum is 0

If i40e does not act like this, it does not follow the previous API.

> 2. only set inner L3/L4 header TX checksum
>     tx_checksum set 0x30 0
>   In this case, the PKT_TX_VXLAN_CKSUM flag is set, so driver think it is VXLAN packet, and we don't need to change outer ones because we don't set outer flags here (PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM).

As explained above, there is no need to set the PKT_TX_VXLAN_CKSUM if
you only want to set the inner L3/L4 checksum. This was already working
like this before your patches, as long as l2_len and l3_len are set
properly in the mbuf (l2_len should include the outer headers).

Moreover, PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, ... are not "outer flags".
They are hardware checksum flags, and before your vxland patch, they
concerned the headers referenced by m->l2_len and m->l3_len.

With your vxlan patch, it changed without beeing documented. These
flags use either (m->l2_len, m->l3_len) or (m->inner_l2_len,
m->inner_l3_len), which is not a good idea in my opinion.

> 3. set outer L3/L4 TX checksum and inner L3&L4 TX checksum 
> tx_checksum set 0x31(0x33) 0
> in this case, the PKT_TX_VXLAN_CKSUM flag is set, driver think it is VXLAN packet, and we need to change outer and inner as both outer and inner flags are set.

Here you are talking about test pmd flags. You do not describe the
mbuf API: PKT_TX_* flags and lengths values that need to be set (l2_len,
l3_len, ...) and to what they refer to.

I think if you want to explain the vxlan checksum offload mbuf API,
you should not talk about the testpmd flags as:
- they don't match the mbuf flags
- they have undocumented (or uncoherent) behavior in the csumonly
  forward engine

After several exchanges about this vxlan API.
Unfortunately, it is still vague and obscure to me.

So here is a proposition of API documentation that looks
understandable. Maybe it is easier to change the code to match this API:



PKT_TX_IP_CKSUM flag enables hardware computation of IP cksum. To
use it:
- fill l2_len and l3_len in mbuf
- set the flag PKT_TX_IP_CKSUM
- set the ip checksum to 0 in IP header
See (1) and (2).

One value among PKT_TX_L4_NO_CKSUM, PKT_TX_UDP_CKSUM,
PKT_TX_TCP_CKSUM and PKT_TX_SCTP_CKSUM can be assigned to the bits
of PKT_TX_L4_MASK. These flags are used to offload the L4 checksum in
hardware.
The user requires to:
- fill l2_len and l3_len in mbuf
- set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or
  PKT_TX_UDP_CKSUM
- calculate the pseudo header checksum and set it in the L4
  header (only for TCP or UDP). See rte_ipv4_phdr_cksum() and
  rte_ipv6_phdr_cksum().  For SCTP, set the crc field to 0.
See (1) and (2).

The PKT_TX_TCP_SEG flag can be set to enable TCP segmentation
offload for a packet to be transmitted on hardware supporting
TSO:
- set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag
  implies PKT_TX_TCP_CKSUM)
- if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP
  checksum to 0 in the packet
- fill the mbuf offload information: l2_len, l3_len, l4_len,
  tso_segsz
- calculate the pseudo header checksum without taking ip_len in
  accound, and set it in the TCP header. Refer to
  rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum() that can be
  used as helpers.
See (1) and (2).

(1) In case the packet is an encapsulated packet, the m->l2_len
    field can include all the outer tunnel headers. These headers
    will remain unmodified by the hardware.

(2) If outer_l2_len and outer_l3_len are not 0, the beginning of
    the inner headers is relative to outer_l2_len + outer_l3_len.


[To replace the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags]

PKT_TX_OUTER_IP_CKSUM flag enables hardware computation of IP cksum
in outer headers. To use it:
- fill outer_l2_len and outer_l3_len in mbuf
- set the flag PKT_TX_OUTER_IP_CKSUM
- set the ip checksum to 0 in outer IP header

One value among PKT_TX_OUTER_L4_NO_CKSUM, PKT_TX_OUTER_UDP_CKSUM,
PKT_TX_OUTER_TCP_CKSUM and PKT_TX_OUTER_SCTP_CKSUM can be assigned
to the bits of PKT_TX_L4_MASK. These flags are used to offload the
outer L4 checksum in hardware.
The user requires to:
- fill outer_l2_len and outer_l3_len in mbuf
- set the flags PKT_TX_OUTER_TCP_CKSUM, PKT_TX_OUTER_SCTP_CKSUM or
  PKT_TX_OUTER_UDP_CKSUM
- calculate the pseudo header checksum and set it in the outer L4
  header (only for TCP or UDP). See rte_ipv4_phdr_cksum() and
  rte_ipv6_phdr_cksum().  For SCTP, set the crc field to 0.


This proposition has several advantages:
- it is documented :)
- the API is straightforward: inner and outer work in the same
  manner.
- the API already supports other tunnels (IPIP, GRE, STT, ...)
- adding m->outer_* fields allows to keep the same semantic for
  the existing flags. Indeed, it does not map linux skb, but this
  is not an argument. Moreover, linux does not seem to support
  hardware tx checksum of outer+inner headers.


Regards,
Olivier

  reply	other threads:[~2014-11-17 11:11 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-27  2:13 [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 01/10] librte_mbuf:the rte_mbuf structure changes Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 02/10] librte_ether:add the basic data structures of VxLAN Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 03/10] librte_ether:add VxLAN packet identification API Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 04/10] i40e:support VxLAN packet identification in i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 05/10] app/test-pmd:test VxLAN packet identification Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 06/10] librte_ether:add data structures of VxLAN filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 07/10] i40e:implement the API of VxLAN filter in librte_pmd_i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 08/10] app/testpmd:test VxLAN packet filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 09/10] i40e:support VxLAN Tx checksum offload Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 10/10] app/testpmd:test " Jijiang Liu
2014-11-04  8:19   ` Olivier MATZ
2014-11-05  6:02     ` Liu, Jijiang
2014-11-05 10:28       ` Olivier MATZ
2014-11-06 11:24         ` Liu, Jijiang
2014-11-06 13:08           ` Olivier MATZ
2014-11-06 14:27             ` Liu, Jijiang
2014-11-07  0:43         ` Yong Wang
2014-11-07 17:16           ` Olivier MATZ
2014-11-10 11:39             ` Ananyev, Konstantin
2014-11-10 15:57               ` Olivier MATZ
2014-11-12  9:55                 ` Ananyev, Konstantin
2014-11-12 13:05                   ` Olivier MATZ
2014-11-12 13:40                     ` Thomas Monjalon
2014-11-12 23:14                       ` Ananyev, Konstantin
2014-11-12 14:39                     ` Ananyev, Konstantin
2014-11-12 14:56                       ` Olivier MATZ
     [not found]             ` <D0868B54.24DBB%yongwang@vmware.com>
2014-11-11  0:07               ` [dpdk-dev] FW: " Yong Wang
2014-11-10  6:03         ` [dpdk-dev] " Liu, Jijiang
2014-11-10 16:17           ` Olivier MATZ
     [not found]             ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8F7A7@SHSMSX101.ccr.corp.intel.com>
2014-11-12 17:26               ` Thomas Monjalon
2014-11-13  5:35                 ` Liu, Jijiang
2014-11-13  5:39                   ` Liu, Jijiang
2014-11-13  6:51                 ` Liu, Jijiang
2014-11-13  9:10                   ` Thomas Monjalon
2014-11-14  8:15                     ` Liu, Jijiang
2014-11-14  9:09                       ` Olivier MATZ
2014-11-17  6:52                         ` Liu, Jijiang
2014-11-17 11:21                           ` Olivier MATZ [this message]
2014-11-20  7:28                             ` Liu, Jijiang
2014-11-20 16:36                               ` Olivier MATZ
2014-11-21  5:40                                 ` Liu, Jijiang
2014-10-27  2:20 ` [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Liu, Yong
2014-10-27  2:41 ` Zhang, Helin
2014-10-27 13:46   ` Thomas Monjalon
2014-10-27 14:34     ` Liu, Jijiang
2014-10-27 15:15       ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5469DA40.7050107@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    --cc=thomas.monjalon@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git