From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f174.google.com (mail-wi0-f174.google.com [209.85.212.174]) by dpdk.org (Postfix) with ESMTP id 726DC5944 for ; Wed, 5 Nov 2014 11:18:41 +0100 (CET) Received: by mail-wi0-f174.google.com with SMTP id d1so11940831wiv.7 for ; Wed, 05 Nov 2014 02:28:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=o2qcfSAEq0ZMaYog8dOxzcwHw5dnBRrwCiJK6rT/j+A=; b=eG1MirK6XX/J8bKEWKZPD71Ejf7Wy+z7fKu2pql5uA+5Gibxv/WO9IPeZa3UL8S2R3 GfNxStlABiKM4Aigq1/9Ki8kA7/MqigM/J2JyEClhyyYBjNCmShHmF7MA4LA385NlIv6 gJoNDdPPj1nOZhrkA4YY5Dn875vmmw1m9hmzxB1UHsDNUFILphlaez404/LWSQhPucxr g79NsMn76WJjWGPjKoa9jKvgvuzI+NJ+PYERuc9zc1Lc6EEFmW6yeMhuXnFJ0ndJWfsr AgLzADxCs0iW15SU322znyDhRMhG0REwt6n7Zv/8GAMa/yyRYXR/UV0K/YrsFD42kQbo jIoA== X-Gm-Message-State: ALoCoQm+X1GbmA3IGOjIsbKYgVoY7M21hMs+OJebhZwIEKgMlqzNv26OkkPqfOadRpvuL4aEFyES X-Received: by 10.180.207.77 with SMTP id lu13mr4867335wic.12.1415183283818; Wed, 05 Nov 2014 02:28:03 -0800 (PST) Received: from [10.16.0.195] (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by mx.google.com with ESMTPSA id v10sm4041504wiy.21.2014.11.05.02.28.03 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 05 Nov 2014 02:28:03 -0800 (PST) Message-ID: <5459FBB2.1040408@6wind.com> Date: Wed, 05 Nov 2014 11:28:02 +0100 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.5.0 MIME-Version: 1.0 To: "Liu, Jijiang" References: <1414376006-31402-1-git-send-email-jijiang.liu@intel.com> <1414376006-31402-11-git-send-email-jijiang.liu@intel.com> <54588BF7.309@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8510E@SHSMSX101.ccr.corp.intel.com> In-Reply-To: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8510E@SHSMSX101.ccr.corp.intel.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum offload X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Nov 2014 10:18:41 -0000 Hi Jijiang, Thank you for your answer. Please find some comments below. On 11/05/2014 07:02 AM, Liu, Jijiang wrote: >> First, the code checks if the mbuf has the flag PKT_RX_TUNNEL_IPV4_HDR. >> What is the meaning of this flag? It was added by [3], but there is no description >> in comments or in the commit log explaining in which case this flag is set by the >> driver. The name supposes that this flag is set when the received packet is an IPv4 >> tunnel, but the commit log talks about vxlan. > > The flag PKT_RX_TUNNEL_IPV4_HDR can be used for all tunneling packet types with outer IPV4 header. > For example: > IPv4 --> GRE/Teredo/VXLAN --> MAC --> IPv4: > MAC, IPV4, GRENAT, MAC, IPV4, SCTP, PAY4 > MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4 > MAC, IPV4, GRENAT, MAC, IPV6, UDP, PAY4 > These tunneling packet formats have a common point that is outer IPv4 header here. > > Only VXLAN tunneling packet is supported in DPDK for i40e now, so the commit log talks about VXLAN . Is it possible to have a more formal definition? For instance, is the following definition below correct? "the PKT_RX_TUNNEL_IPV4_HDR flag CAN be set by a driver if the packet contains a tunneling protocol inside an IPv4 header". If the definition above is correct, I don't see how this flag can help an application to run faster. There is already a flag telling if there is a valid IPv4 header (PKT_RX_IPV4_HDR). As the PKT_RX_TUNNEL_IPV4_HDR flag does not tell what is ip->proto, the work done by an application to dissect a packet would be exactly the same with or without this flag. Please, can you give an example showing in which conditions this flag can help an application? >> What is the meaning of this flag? >> Is it enough to checksum outer L3, inner L3, and inner L4 as specified in commit >> log? If yes, why are the other flags PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM, >> (...) added in the mbuf later? >> In my comprehension, these flags are needed in addition to >> PKT_TX_VXLAN_CKSUM to do the checksum of the inner headers. > > Yes, these flags(PKT_TX_IPV4_CSUM, PKT_TX_UDP_CKSUM) are needed by HW offload of non-tunneling and tunneling packet. OK, so I understand that when PKT_TX_VXLAN_CKSUM is set, if the driver supports it, it will process IP and UDP checksum of outer header, using l2_len and l3_len. >> In general, I need to understand how to use all the new offloading stuff. In [5], >> new fields were added in the mbuf structure (inner_l3_len and inner_l2_len). I'm >> not sure I understand which fields have to be filled. Below is my understanding, >> can you please check that it is correct? >> >> A- To send a Ether/IP/TCP packet and process IP and TCP TX >> checksum in the NIC: >> >> - set l2_len = 14, l3_len = 20, >> ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM, >> write all checksums to 0 in the packet > IP checksum is 0, but tcp checksum is not 0. > tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr); OK, right. I forgot it but indeed it's in csumonly.c >> B- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and >> process inner IP and inner TCP TX checksum in the NIC: >> >> - set l2_len = 14+20+8+8+14, l3_len = 20, >> ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM, >> write all checksums to 0 in the packet > > No, l2_len is outer L2 length, its value also is 14. If you set l2_len to 14, how could the driver guess the offset of the inner IP header? I'm pretty sure it should work with 14+20+8+8+14. >> C- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and >> process outer IP and UDP, plus inner IP and inner TCP TX >> checksum in the NIC: >> >> - set l2_len = 14, l3_len = 20, inner_l2_len = 14, >> inner_l3_len = 20, > Yes >> ol_flags = PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM | >> PKT_TX_VXLAN_CKSUM, > Yes >> write all checksums to 0 in the packet > > Outer and inner IP checksum is 0, but tcp checksum is not 0. > tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr); What about outer udp checksum? Why shouldn't we set it to the phdr checksum too? >> D- To send a Ether/IP/UDP/vxlan/Ether/IP/TCP packet and >> process outer IP and UDP TX checksum in the NIC: >> >> - set l2_len = 14, l3_len = 20, >> ol_flags = PKT_TX_IP_CKSUM | PKT_TX_UDP_CKSUM, > > Yes >> write all checksums to 0 in the packet > Outer IP checksum is 0, but tcp checksum is not 0. > tcp_hdr->cksum = get_ipv4_psd_sum(ipv4_hdr); > >> First, can you confirm this? I think it is very important to document it, as this is a >> public API that can be used by DPDK users. > > These should be included in documents. Another thing is surprising me. - if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums. - if PKT_TX_VXLAN_CKSUM is set, then the driver has to use inner_l{23}_len instead of l{23}_len for the same operation. Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len. To fix this, I suggest to remove the new fields inner_l{23}_len then add outer_l{23}_len instead. Therefore, the semantic of l2_len and l3_len would not change, and a driver would always use the same field for a specific offload. For my TSO development, I will follow the current semantic. >> To validate my modifications, I will try to reproduce the test plan described in [6]. >> The test report contains useful information but I'm not sure to understand the >> following: >> >> Enable outer IP,UDP,TCP,SCTP and inner IP,UDP checksum offload >> when inner L4 protocal is UDP. >> testpmd>tx_checksum set 0 0xf3 > > "tx_checksum set 0 0xf3" should be "tx_checksum set 0 0x33", but the tester use 0xFX (that is to say, enable all inner TX flag)in order to write test script easily. To me, there is a strange thing: - the mbuf flags only allows to checksum l3 (ip), l4 (tcp/udp/sctp), and vxlan. - the user api in command line talks about inner and outer l3 + l4 layers + vxlan. Therefore many combinations are impossible to do or are non-sense. Examples: - outer IP + outer TCP + inner IP -> no sense, there is no supported tunnel protocol above TCP - outer IP + inner IP -> impossible to set up in mbuf flags - outer IP: what is the meaning? same than inner IP? or does it means that the application has to parse the packet to find a tunnel? In this case, which tunnels are supported by the application? - ... Anyway, I will try to continue to use the current logic and document it as much as I can. >> Enable outer IP,UDP,TCP,SCTP and inner IP,TCP,SCTP checksum offload >> when inner L4 protocal is TCP or SCTP. >> testpmd>tx_checksum set 0 0xfd >> >> Can you give details about the signification of the bits used in the tx_checksum >> command? For instance, there is only one flag to enable tx checksum in the mbuf, >> so I don't understand how it's possible to do 0x44 = (inner tcp + outer tcp). > > These bits meanings/help information were provided in the patch > http://dpdk.org/ml/archives/dev/2014-October/007156.html Sorry, but the commit provides very few information about how to use these flags (although it was already the case before your commit). I will try to document it from what I understand from the code. >> Today, there are hard-coded values for flags: 4 bits (0-3) for legacy checksums, 4 >> bits for inner checksums (4-7), and one bit (bit 11?!) for vlan header. These bits >> are checked without defines at several places in the code. Moreover, there are >> some places in code where the testpmd port flags and the mbuf flags are mixed > >> up. Example in macswap.c : mb->ol_flags = txp->tx_ol_flags; > I thought this is an issue. > >> I suggest to remove all hardcoded values except in >> cmd_tx_cksum_set_parsed() and replace them by the flags definitions from >> rte_mbuf.h. As a result, the new possible arguments of >> cmd_tx_cksum_set_parsed() will map the mbuf flags: >> - a flag to enable ip tx cksum >> - a flag to enable udp tx cksum >> - a flag to enable tcp tx cksum >> - a flag to enable sctp tx cksum >> - a flag to enable vxlan tx cksum >> If vxlan tx cksum is set, therefore the other considered checksum will concern the >> inner headers. Do you think it matches your needs? > > As to HW TX checksum offload, do you have special requirement for implementing TSO? Yes. TSO implies TX TCP and IP checksum offload. My first requirement is to understand what is currently implemented, and the second one is to do build an API that is easily understandable and usable by someone else. >> By the way, I was looking at the mbuf structure, and I see that a new packet_type >> field was added. There is no explanation about what this field should contain and >> how we shall use it (in commit log or in comments). Can you please explain it? > > + /** > + * The packet type, which is used to indicate ordinary packet and also > + * tunneled packet format, i.e. each number is represented a type of > + * packet. > + */ > + uint16_t packet_type > Regarding adding a packet_type in mbuf, we ever had a lot of discussions as follows: > > http://dpdk.org/ml/archives/dev/2014-October/007027.html > http://dpdk.org/ml/archives/dev/2014-September/005240.html > http://dpdk.org/ml/archives/dev/2014-September/005241.html > http://dpdk.org/ml/archives/dev/2014-September/005274.html Sure, there was a lot of discussions... and I still don't understand how I can use it in my application. The reason is simple: I don't know what kind of data is stored in this field (no #define in rte_mbuf.h). This generic field is filled by reading a very specific register of i40e driver. So: - I cannot code an application that use it - I cannot code a pmd that would fill this field Conclusion: this field is absolutely useless for anyone that has not read the datasheets of i40e... which is probably the case for 95% of DPDK users. Regards, Olivier