From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 0447E5A6F for ; Mon, 19 Jan 2015 15:38:54 +0100 (CET) Received: from was59-1-82-226-113-214.fbx.proxad.net ([82.226.113.214] helo=[192.168.0.10]) by mail.droids-corp.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from ) id 1YDDX7-0005Hx-T7; Mon, 19 Jan 2015 15:42:30 +0100 Message-ID: <54BD16F1.6050409@6wind.com> Date: Mon, 19 Jan 2015 15:38:41 +0100 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "Liu, Jijiang" References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7699@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D337B@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA789E@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2015 14:38:54 -0000 Hi Konstantin, On 01/19/2015 02:04 PM, Ananyev, Konstantin wrote: >> case 2) calculate checksum of out_ip and out_udp >> >> mb->l2_len = len(out_eth) >> mb->l3_len = len(out_ip) >> mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM >> set out_ip checksum to 0 in the packet >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >> >> supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and >> DEV_TX_OFFLOAD_UDP_CKSUM >> >> *Problem 1*: The comment above PKT_TX_IPV4 says "Packet is IPv4 >> without requiring IP checksum offload" [2], and the help of L4 >> checksum and TSO says that it is required to set the PKT_TX_IPV4 >> flag [3]. This is not coherent. > > So what is the problem? > Comments in rte_mbuf.h are not coherent? No there're not coherent >> We are back on the debate about the meaning of PKT_TX_IPV4 vs >> PKT_TX_IP_CSUM from [4]. This incoherency in comments are introduced >> by patch [5]. The question is "when an application should set >> this flag? for any IP packet that does not require IP checksum?". > > Yes, if it is an IPv4 packet and application required TX offload for L4 checksum or TSO, > but doesn't want HW offload ofr IPV4 checksum calculation. > >> This would break many applications. > > Which ones? > As I know, so far nothing is broken. The problem today is that it's not obvious for a developper to know when an application should set the PKT_TX_IPV4 flag. From the comments, we could think that an application has to set it for any transmitted IP packet, even for packets that do not require tx offload. Asking to do this in the API would break many applications. The comment should at least say that this flag is *only* required when asking for L4 checksum. As TSO implies IP checksum, it means the PKT_TX_IPV4 should not be set, but PKT_TX_IP_CSUM instead. >> I think a good definition would >> be: >> >> Packet is IPv4. This flag must be set when using any offload >> feature (TSO, L3 or L4 checksum) to tell the NIC that the packet >> is an IPv4 packet. >> >> That's why I added PKT_TX_IPV4 in the examples. > > I suppose we discussed it several times: both ways are possible. > From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM > As mutually exclusive seems a bit more plausible. > From the upper layer - my understanding, that it is doesn't really matter. > I thought we had an agreement about it in 1.8, no? Indeed, this was already discussed, but there was a lot of pressure for 1.8.0 to push something, even not perfect. The fog around comments shows that the API was not very clearly defined for 1.8.0. If you read the comments of the API, it is impossible to understand when the PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say more: the only place where the comments bring a valuable information (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and PKT_TX_IP_CSUM are not exclusive... So I will fix that in my coming patch series. Just for information, I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not exclusive flag would not require any change anywhere in the PMDs (even in i40e). On the contrary, making them exclusive would require to change the ixgbe TSO code because we check PKT_TX_IPV4. >> *Problem 3*: without using the word "fortville", it is difficult >> to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed, >> once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a >> tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT >> flag. In linux, the driver doesn't care about the tunnel type, >> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6]. > > It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set, > but it is not obvious what type of tunnelling it would be. > FVL HW supports HW TX offloads for different type of tunnelling and > requires that SW provide information about tunnelling type. > From i40e datasheet: > L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication: > 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero) > 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). > 10b - GRE tunneling header > As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep > PKT_TX_UDP_TUNNEL_PKT flag. As I've said: in linux, the driver doesn't care about the tunnel type, it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations. However I suppose that linux driver is able to process the hw outer checksum even for other encapsulations (gre, ipip). And, does it mean that ipip tunnels are not supported by i40e? I can't believe it. If it's the case... how an application on top of DPDK can know which tunnel types are supported by the underlying port? >>From what I've read, what the datasheet does not explain is: "what is done differently for this packet between setting the register to GRE (10b) or UDP (01b)?" >> case 7) calculate checksum of out_ip, out_udp, in_ip and in_tcp >> >> mb->outer_l2_len = len(out_eth) >> mb->outer_l3_len = len(out_ip) >> mb->l2_len = len(out_udp + vxlan + in_eth) >> mb->l3_len = len(in_ip) >> mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \ >> PKT_TX_OUTER_UDP_CKSUM | PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; >> set out_ip checksum to 0 in the packet >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >> set in_ip checksum to 0 in the packet >> set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum() >> >> We need to add the flag PKT_TX_OUTER_UDP_CKSUM. > > We can, though right now we don't have a HW that is able to do that. > Why need to do it now? No, I agree we should not add it now. I just want to be sure we have a consensus that it will work like this the day we'll have such drivers. >> I think the following cases should be *forbidden by the API*: >> >> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]) >> >> mb->outer_l2_len = len(out_eth) >> mb->outer_l3_len = len(out_ip) >> mb->l2_len = len(out_udp + vxlan + in_eth) >> mb->l3_len = len(out_ip) >> mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ >> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; >> set out_ip checksum to 0 in the packet >> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >> >> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be >> supported, but there is no reason to support it as there is >> already one way to do the same. >> >> I think the driver should not even look at mb->outer_l2_len >> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. > > Why it should be forbidden? > I admit it might be a bit slower than case 4), > but I think absolutely legal way to setup HW offloads for inner L3/L4. > As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose > PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not. > PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not. I don't understand. The result in terms of hardware is exactly the same than case 4). Why should we have 2 different ways for doing the same thing? This is really confusing for an API. Moreover, you said it: it is slower that case 4). It also seems easier to understand from an API point of view: the PMD uses mb->outer_lX_len if and only if a PKT_TX_OUTER_* flag is present. >> case 10) calculate a checksum using only outer_lX fields >> >> The outer_lX fields or PKT_TX_OUTER_* flags can only be used >> if a inner checksum is enabled. So it's not possible to do >> the following: >> >> mb->outer_l2_len = len(out_eth) >> mb->outer_l3_len = len(out_ip) >> mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CSUM >> set out_ip checksum to 0 in the packet > > Ok, I think no one plans to use it anyway. > > Konstantin Thanks Konstantin for taking the time to reply and progress on this. Regards, Olivier