From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by dpdk.org (Postfix) with ESMTP id 50B8A5A29 for ; Tue, 20 Jan 2015 13:39:14 +0100 (CET) Received: by mail-wi0-f176.google.com with SMTP id em10so5128420wid.3 for ; Tue, 20 Jan 2015 04:39:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=1Smw3RvYCNDKaev8OeJEVy5/cJ+iLKld93SioemCqQg=; b=f9w6WNy9Rl8Vp+xJAIGtPFFE4pYfq0wlRbK0zOUi8m4y8xlixJWOJIfdBVMdoz0q5g tcjS6+VwtRZ7slQ2nBwiEpenilCTrelDuayvxc6fYrg9RhRP++D/ZXAWe06EDmuzjFNU JE+b+nw+GOwOYv/lsWGWsPCUxIa3lt/3w/zcZeTK1JW6C84kJj8w3YVe5Y2EwTZFgMEF i2EVYGcqgSm3Z0D7SnFb78sWiSVNJIhzYnMo35vy+ZZK0W6FHumXbI16i2rRwNqg2qdM 45NcOXf6OD4zt3zFr5Vlo6fW+2TMIESCoXP9YQ3xk4dMlSd48UpZXhXZs22zivCk5L4M PlYQ== X-Gm-Message-State: ALoCoQkx6dDfvkTIimHfQcG4rXGpkrK+F8dNyoC6oEmUSbtE7ZmPOkW+wjwNAZZcxw/LIMykLyBE X-Received: by 10.194.20.137 with SMTP id n9mr13662190wje.114.1421757554110; Tue, 20 Jan 2015 04:39:14 -0800 (PST) Received: from [10.16.0.195] (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by mx.google.com with ESMTPSA id dp8sm2856629wib.20.2015.01.20.04.39.12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Jan 2015 04:39:13 -0800 (PST) Message-ID: <54BE4C70.7050406@6wind.com> Date: Tue, 20 Jan 2015 13:39:12 +0100 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.2.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "Liu, Jijiang" References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA789E@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> <54BD16F1.6050409@6wind.com> <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 12:39:14 -0000 Hi, On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote: >>>> I think a good definition would >>>> be: >>>> >>>> Packet is IPv4. This flag must be set when using any offload >>>> feature (TSO, L3 or L4 checksum) to tell the NIC that the packet >>>> is an IPv4 packet. >>>> >>>> That's why I added PKT_TX_IPV4 in the examples. >>> >>> I suppose we discussed it several times: both ways are possible. >>> From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM >>> As mutually exclusive seems a bit more plausible. >>> From the upper layer - my understanding, that it is doesn't really matter. >>> I thought we had an agreement about it in 1.8, no? >> >> Indeed, this was already discussed, but there was a lot of pressure >> for 1.8.0 to push something, even not perfect. The fog around comments >> shows that the API was not very clearly defined for 1.8.0. If you read >> the comments of the API, it is impossible to understand when the >> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say >> more: the only place where the comments bring a valuable information >> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and >> PKT_TX_IP_CSUM are not exclusive... >> >> So I will fix that in my coming patch series. Just for information, >> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not >> exclusive flag would not require any change anywhere in the PMDs (even >> in i40e). > > Right now - no. > Though as I said from PMD perspective having them exclusive is a bit preferable. > Again, I don't see any big difference from upper layer code. Sure, it does not make a big difference in terms of code. But in terms of API, the naming of the flag is coherent to what it is used for. And it's easier to find a simple definition, like: * Packet is IPv4. This flag must be set when using any offload feature * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4 * packet. >> On the contrary, making them exclusive would require to >> change the ixgbe TSO code because we check. > > Hmm, so you are saying there is a bug somewhere in ixbe_rxtx.c? > What particular place you are talking about? Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not PKT_TX_IPV4 as I thought), so it would work for both methods without patching the code. In this case, it means that both approach would not require to modify the code. >>>> *Problem 3*: without using the word "fortville", it is difficult >>>> to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed, >>>> once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a >>>> tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT >>>> flag. In linux, the driver doesn't care about the tunnel type, >>>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6]. >>> >>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set, >>> but it is not obvious what type of tunnelling it would be. >>> FVL HW supports HW TX offloads for different type of tunnelling and >>> requires that SW provide information about tunnelling type. >>> From i40e datasheet: >>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication: >>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero) >>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). >>> 10b - GRE tunneling header >>> As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep >>> PKT_TX_UDP_TUNNEL_PKT flag. >> >> As I've said: in linux, the driver doesn't care about the tunnel type, >> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations. > > Ok, and why it should be our problem? > We have a lot of things done in a different manner then linux/freebsd kernel drivers, > Why now it became a problem? If linux doesn't need an equivalent flag for doing the same thing, it probably means we don't need it either. In a performance-oriented software like dpdk, having a flag that we don't know what the hardware does with, that is not needed in other drivers of the same harware, that makes the API harder to understand could be a problem. Another argument: if we can remove this flag, it would make the testpmd commands reworkd proposed by Jijiang much more easy to understand: only a new "csum parse-tunnel on|off" would be required, and it can be explained in a few words. I'll try to do some tests on a fortville NIC if I can find one. I'm curious to see if we can transmit any encapsulation packet (ip in ip, ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel). We should avoid the need to specify the tunnel type in the OUTER checksum API if we can, else it would limit us to specific supported protocols. >>>> I think the following cases should be *forbidden by the API*: >>>> >>>> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]) >>>> >>>> mb->outer_l2_len = len(out_eth) >>>> mb->outer_l3_len = len(out_ip) >>>> mb->l2_len = len(out_udp + vxlan + in_eth) >>>> mb->l3_len = len(out_ip) >>>> mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ >>>> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; >>>> set out_ip checksum to 0 in the packet >>>> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >>>> >>>> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be >>>> supported, but there is no reason to support it as there is >>>> already one way to do the same. >>>> >>>> I think the driver should not even look at mb->outer_l2_len >>>> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. >>> >>> Why it should be forbidden? >>> I admit it might be a bit slower than case 4), >>> but I think absolutely legal way to setup HW offloads for inner L3/L4. >>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose >>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not. >>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not. >> >> I don't understand. The result in terms of hardware is exactly the >> same than case 4). Why should we have 2 different ways for doing the >> same thing? > > If HW supports that capability, why should we forbid it? > Let user to choose himself what way to use. > FVL spec lists it as a valid approach. It is not a hardware feature. Case 4) and case 9) would fill the hardware registers exactly the same. To me, it's just an API question. > As one of possible use-cases: HW VLAN tags insertion for both inner and outer packets. > FVL can do that, though as I know our PMD doesn't implement it yet. > For that, we'll need to specify at least: > outer_l2_len, outer_l3_len, l2_len. > While PKT_TX_OUTER_* might stay cleared. If a VLAN flag has to be inserted in outer header, a new flag PKT_TX_OUTER_INSERT_VLAN would be added. So my specification would still be correct: The driver should look at mb->outer_lX_len only if a PKT_TX_OUTER_* flag is present. >> This is really confusing for an API. Moreover, you said >> it: it is slower that case 4). > > I don't know would be slower then 4) or not for sure. > That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1. > Though I didn't measure the difference. > That's actually one more reason why to allow and support it - > so people can make sure that on FVL both ways work as expected and measure the difference. > > Konstantin Regards, Olivier