DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Liu, Jijiang" <jijiang.liu@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload
Date: Wed, 05 Nov 2014 11:28:02 +0100
Message-ID: <5459FBB2.1040408@6wind.com> (raw)
In-Reply-To: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8510E@SHSMSX101.ccr.corp.intel.com>

Hi Jijiang,

Thank you for your answer. Please find some comments below.

On 11/05/2014 07:02 AM, Liu, Jijiang wrote:
>> First, the code checks if the mbuf has the flag PKT_RX_TUNNEL_IPV4_HDR.
>> What is the meaning of this flag? It was added by [3], but there is no description
>> in comments or in the commit log explaining in which case this flag is set by the
>> driver. The name supposes that this flag is set when the received packet is an IPv4
>> tunnel, but the commit log talks about vxlan.
>
> The flag PKT_RX_TUNNEL_IPV4_HDR can be used for all tunneling packet types with outer IPV4 header.
> For example:
> IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4:
> MAC, IPV4, GRENAT, MAC, IPV4, SCTP, PAY4
> MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
> MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4
> These tunneling packet formats have a common point that is outer IPv4 header here.
>
> Only VXLAN tunneling packet is supported in DPDK for i40e now, so  the commit log talks about VXLAN .

Is it possible to have a more formal definition? For instance, is the
following definition below correct?

  "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet
   contains a tunneling protocol inside an IPv4 header".

If the definition above is correct, I don't see how this flag can help
an application to run faster. There is already a flag telling if there
is a valid IPv4 header (PKT_RX_IPV4_HDR). As the PKT_RX_TUNNEL_IPV4_HDR
flag does not tell what is ip->proto, the work done by an application
to dissect a packet would be exactly the same with or without this flag.

Please, can you give an example showing in which conditions this flag
can help an application?


>> What is the meaning of this flag?
>> Is it enough to checksum outer L3, inner L3, and inner L4 as specified in commit
>> log? If yes, why are the other flags PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM,
>> (...) added in the mbuf later?
>> In my comprehension, these flags are needed in addition to
>> PKT_TX_VXLAN_CKSUM to do the checksum of the inner headers.
>
> Yes, these flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM)  are needed by HW offload of non-tunneling and tunneling  packet.

OK, so I understand that when PKT_TX_VXLAN_CKSUM is set, if the
driver supports it, it will process IP and UDP checksum of outer
header, using l2_len and l3_len.


>> In general, I need to understand how to use all the new offloading stuff. In [5],
>> new fields were added in the mbuf structure (inner_l3_len and inner_l2_len). I'm
>> not sure I understand which fields have to be filled. Below is my understanding,
>> can you please check that it is correct?
>>
>> A- To send a Ether/IP/TCP packet and process IP and TCP TX
>>      checksum in the NIC:
>>
>>     - set l2_len = 14, l3_len = 20,
>>       ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>>       write all checksums to 0 in the packet
>     IP checksum is 0, but tcp checksum is not 0.
>     tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

OK, right. I forgot it but indeed it's in csumonly.c


>> B- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>      process inner IP and inner TCP TX checksum in the NIC:
>>
>>     - set l2_len = 14+20+8+8+14, l3_len = 20,
>>       ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM,
>>       write all checksums to 0 in the packet
>
> No, l2_len is outer L2 length, its value also is 14.

If you set l2_len to 14, how could the driver guess the offset of
the inner IP header? I'm pretty sure it should work with 14+20+8+8+14.

>> C- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>      process outer IP and UDP, plus inner IP and inner TCP TX
>>      checksum in the NIC:
>>
>>     - set l2_len = 14, l3_len = 20, inner_l2_len = 14,
>>       inner_l3_len = 20,
> Yes
>>       ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM |
>> PKT_TX_VXLAN_CKSUM,
> Yes
>>       write all checksums to 0 in the packet
>
>     Outer and inner IP checksum is 0, but tcp checksum is not 0.
>     tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);

What about outer udp checksum? Why shouldn't we set it to the
phdr checksum too?


>> D- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and
>>      process outer IP and UDP TX checksum in the NIC:
>>
>>     - set l2_len = 14, l3_len = 20,
>>       ol_flags = PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM,
>
> Yes
>>       write all checksums to 0 in the packet
> Outer  IP checksum is 0, but tcp checksum is not 0.
>     tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr);
>
>> First, can you confirm this? I think it is very important to document it, as this is a
>> public API that can be used by DPDK users.
>
> These should be included in documents.

Another thing is surprising me.

- if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the
   driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums.

- if PKT_TX_VXLAN_CKSUM is set, then the driver has to use
   inner_l{23}_len instead of l{23}_len for the same operation.

Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len.
To fix this, I suggest to remove the new fields inner_l{23}_len then
add outer_l{23}_len instead. Therefore, the semantic of l2_len and
l3_len would not change, and a driver would always use the same field
for a specific offload.

For my TSO development, I will follow the current semantic.


>> To validate my modifications, I will try to reproduce the test plan described in [6].
>> The test report contains useful information but I'm not sure to understand the
>> following:
>>
>>      Enable outer IP,UDP,TCP,SCTP and inner IP,UDP checksum offload
>>      when inner L4 protocal is UDP.
>>         testpmd>tx_checksum set 0 0xf3
>
> "tx_checksum set 0 0xf3" should be "tx_checksum set 0 0x33", but the tester use 0xFX (that is to say, enable all inner TX flag)in order to write test script easily.

To me, there is a strange thing:

- the mbuf flags only allows to checksum l3 (ip), l4 (tcp/udp/sctp),
   and vxlan.
- the user api in command line talks about inner and outer l3 + l4
   layers + vxlan.

Therefore many combinations are impossible to do or are non-sense.
Examples:

- outer IP + outer TCP + inner IP  -> no sense, there is no supported
   tunnel protocol above TCP
- outer IP + inner IP -> impossible to set up in mbuf flags
- outer IP: what is the meaning? same than inner IP? or does it
   means that the application has to parse the packet to find a
   tunnel? In this case, which tunnels are supported by the application?
- ...

Anyway, I will try to continue to use the current logic and document
it as much as I can.


>>      Enable outer IP,UDP,TCP,SCTP and inner IP,TCP,SCTP checksum offload
>>      when inner L4 protocal is TCP or SCTP.
>>         testpmd>tx_checksum set 0 0xfd
>>
>> Can you give details about the signification of the bits used in the tx_checksum
>> command? For instance, there is only one flag to enable tx checksum in the mbuf,
>> so I don't understand how it's possible to do 0x44 = (inner tcp + outer tcp).
>
> These bits meanings/help information  were provided in the patch
> http://dpdk.org/ml/archives/dev/2014-October/007156.html

Sorry, but the commit provides very few information about how to use
these flags (although it was already the case before your commit). I
will try to document it from what I understand from the code.


>> Today, there are hard-coded values for flags: 4 bits (0-3) for legacy checksums, 4
>> bits for inner checksums (4-7), and one bit (bit 11?!) for vlan header. These bits
>> are checked without defines at several places in the code. Moreover, there are
>> some places in code where the testpmd port flags and the mbuf flags are mixed
>
>> up. Example in macswap.c : mb->ol_flags = txp->tx_ol_flags;
> I thought this is an issue.
>
>> I suggest to remove all hardcoded values except in
>> cmd_tx_cksum_set_parsed() and replace them by the flags definitions from
>> rte_mbuf.h. As a result, the new possible arguments of
>> cmd_tx_cksum_set_parsed() will map the mbuf flags:
>>     - a flag to enable ip tx cksum
>>     - a flag to enable udp tx cksum
>>     - a flag to enable tcp tx cksum
>>     - a flag to enable sctp tx cksum
>>     - a flag to enable vxlan tx cksum
>> If vxlan tx cksum is set, therefore the other considered checksum will concern the
>> inner headers. Do you think it matches your needs?
>
> As to HW TX checksum offload, do you have special requirement for implementing TSO?

Yes. TSO implies TX TCP and IP checksum offload.

My first requirement is to understand what is currently implemented, and
the second one is to do build an API that is easily understandable and
usable by someone else.


>> By the way, I was looking at the mbuf structure, and I see that a new packet_type
>> field was added. There is no explanation about what this field should contain and
>> how we shall use it (in commit log or in comments). Can you please explain it?
>
> +	/**
> +	 * The packet type, which is used to indicate ordinary packet and also
> +	 * tunneled packet format, i.e. each number is represented a type of
> +	 * packet.
> +	 */
> +	uint16_t packet_type
> Regarding adding a packet_type in mbuf, we ever had a lot of discussions as follows:
>
> http://dpdk.org/ml/archives/dev/2014-October/007027.html
> http://dpdk.org/ml/archives/dev/2014-September/005240.html
> http://dpdk.org/ml/archives/dev/2014-September/005241.html
> http://dpdk.org/ml/archives/dev/2014-September/005274.html

Sure, there was a lot of discussions... and I still don't understand how
I can use it in my application. The reason is simple: I don't know
what kind of data is stored in this field (no #define in rte_mbuf.h).
This generic field is filled by reading a very specific register of i40e
driver. So:

- I cannot code an application that use it
- I cannot code a pmd that would fill this field

Conclusion: this field is absolutely useless for anyone that has not
read the datasheets of i40e... which is probably the case for 95% of
DPDK users.


Regards,
Olivier

  reply	other threads:[~2014-11-05 10:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-27  2:13 [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 01/10] librte_mbuf:the rte_mbuf structure changes Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 02/10] librte_ether:add the basic data structures of VxLAN Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 03/10] librte_ether:add VxLAN packet identification API Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 04/10] i40e:support VxLAN packet identification in i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 05/10] app/test-pmd:test VxLAN packet identification Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 06/10] librte_ether:add data structures of VxLAN filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 07/10] i40e:implement the API of VxLAN filter in librte_pmd_i40e Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 08/10] app/testpmd:test VxLAN packet filter Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 09/10] i40e:support VxLAN Tx checksum offload Jijiang Liu
2014-10-27  2:13 ` [dpdk-dev] [PATCH v8 10/10] app/testpmd:test " Jijiang Liu
2014-11-04  8:19   ` Olivier MATZ
2014-11-05  6:02     ` Liu, Jijiang
2014-11-05 10:28       ` Olivier MATZ [this message]
2014-11-06 11:24         ` Liu, Jijiang
2014-11-06 13:08           ` Olivier MATZ
2014-11-06 14:27             ` Liu, Jijiang
2014-11-07  0:43         ` Yong Wang
2014-11-07 17:16           ` Olivier MATZ
2014-11-10 11:39             ` Ananyev, Konstantin
2014-11-10 15:57               ` Olivier MATZ
2014-11-12  9:55                 ` Ananyev, Konstantin
2014-11-12 13:05                   ` Olivier MATZ
2014-11-12 13:40                     ` Thomas Monjalon
2014-11-12 23:14                       ` Ananyev, Konstantin
2014-11-12 14:39                     ` Ananyev, Konstantin
2014-11-12 14:56                       ` Olivier MATZ
     [not found]             ` <D0868B54.24DBB%yongwang@vmware.com>
2014-11-11  0:07               ` [dpdk-dev] FW: " Yong Wang
2014-11-10  6:03         ` [dpdk-dev] " Liu, Jijiang
2014-11-10 16:17           ` Olivier MATZ
     [not found]             ` <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8F7A7@SHSMSX101.ccr.corp.intel.com>
2014-11-12 17:26               ` Thomas Monjalon
2014-11-13  5:35                 ` Liu, Jijiang
2014-11-13  5:39                   ` Liu, Jijiang
2014-11-13  6:51                 ` Liu, Jijiang
2014-11-13  9:10                   ` Thomas Monjalon
2014-11-14  8:15                     ` Liu, Jijiang
2014-11-14  9:09                       ` Olivier MATZ
2014-11-17  6:52                         ` Liu, Jijiang
2014-11-17 11:21                           ` Olivier MATZ
2014-11-20  7:28                             ` Liu, Jijiang
2014-11-20 16:36                               ` Olivier MATZ
2014-11-21  5:40                                 ` Liu, Jijiang
2014-10-27  2:20 ` [dpdk-dev] [PATCH v8 00/10] Support VxLAN on Fortville Liu, Yong
2014-10-27  2:41 ` Zhang, Helin
2014-10-27 13:46   ` Thomas Monjalon
2014-10-27 14:34     ` Liu, Jijiang
2014-10-27 15:15       ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5459FBB2.1040408@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git