From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f177.google.com (mail-we0-f177.google.com [74.125.82.177]) by dpdk.org (Postfix) with ESMTP id 618EA5A76 for ; Tue, 20 Jan 2015 19:15:52 +0100 (CET) Received: by mail-we0-f177.google.com with SMTP id l61so17846792wev.8 for ; Tue, 20 Jan 2015 10:15:52 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=rIw+xxYEqGIlk5i7rW6+7n/G2XDKWwQ5ViwLAM9rVpQ=; b=KJYkaGYpws8Fsk49Hkc64azn4MvGcyID/v+0DS3XOxVy3Aeq3OM4SJxtPrUHD1bsQG n0+K1fa5pnY/PA/xONUs0XUp/z3PSiZzxv99RaO1/rvf5tiF24JfGxcVL77K70WZQYnQ pDJXocAAXzTQG1at7du+kD9a7vwUtCbubAUC+Hfgfo8xqdMHQ2aeXA5GLg9Bw4nzV2s5 blJjh6AmAqkw8fGIjLIoOqtJ6up3CiXh6f1bxJdoTmG3ozllBvGWjyYazjtExh0DPur0 9LdQj8yC3BoQy2sYpJNm9vuqdP1ROlSRcEPFSCBhW0+n/31PlL3aoqupPRhXUywyhmfY Y9jA== X-Gm-Message-State: ALoCoQm5MddFwLlIFrS0XDrau8eWl3f8QqszhHaEnEKpjZVIS9jG4XeK5ix5EQw8PUKA8kDcKVbI X-Received: by 10.180.87.42 with SMTP id u10mr21635471wiz.51.1421777752207; Tue, 20 Jan 2015 10:15:52 -0800 (PST) Received: from [10.16.0.195] (guy78-3-82-239-227-177.fbx.proxad.net. [82.239.227.177]) by mx.google.com with ESMTPSA id a14sm3976885wib.22.2015.01.20.10.15.51 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Jan 2015 10:15:51 -0800 (PST) Message-ID: <54BE9B56.7050108@6wind.com> Date: Tue, 20 Jan 2015 19:15:50 +0100 From: Olivier MATZ User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.2.0 MIME-Version: 1.0 To: "Ananyev, Konstantin" , "Liu, Jijiang" References: <1418173403-30202-1-git-send-email-jijiang.liu@intel.com> <2601191342CEEE43887BDE71AB977258213D34AE@irsmsx105.ger.corp.intel.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA7CC5@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D3897@irsmsx105.ger.corp.intel.com> <54AFB13E.2080200@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA85A1@SHSMSX101.ccr.corp.intel.com> <54B3B35A.5030803@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DA8E36@SHSMSX101.ccr.corp.intel.com> <54B4EB92.40209@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01DB0789@SHSMSX101.ccr.corp.intel.com> <2601191342CEEE43887BDE71AB977258213D4FCF@irsmsx105.ger.corp.intel.com> <54B94A18.5030700@6wind.com> <2601191342CEEE43887BDE71AB977258213DCD25@irsmsx105.ger.corp.intel.com> <54BD16F1.6050409@6wind.com> <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> <54BE4C70.7050406@6wind.com> <2601191342CEEE43887BDE71AB977258213DE5FB@irsmsx105.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB977258213DE5FB@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jan 2015 18:15:52 -0000 Hi Konstantin, On 01/20/2015 06:23 PM, Ananyev, Konstantin wrote: >> Sure, it does not make a big difference in terms of code. But >> in terms of API, the naming of the flag is coherent to what it is >> used for. And it's easier to find a simple definition, like: >> >> * Packet is IPv4. This flag must be set when using any offload feature >> * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4 >> * packet. > > Ok, and what's wrong with: > "Packet is IPv4. This flag must be set when using any offload feature > (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4 > packet and no HW offload for IPv4 header checksum calculation is required" > ? I honestly find the first one simpler. Again, I understand both are possible, but I think choosing the most trivial one is the right way for an API. >>> Ok, and why it should be our problem? >>> We have a lot of things done in a different manner then linux/freebsd kernel drivers, >>> Why now it became a problem? >> >> If linux doesn't need an equivalent flag for doing the same thing, >> it probably means we don't need it either. > > Probably yes .... Or probably not. > Why do we need to guess what was the intention of guys who wrote that part of linux driver? Because the dpdk looks very similar to that part of linux driver. > BTW, the macro for GRE is here: > find lib/librte_pmd_i40e/i40e -type f | xargs grep TUN | grep TXD > lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_UDP_TUNNELING (0x1ULL << I40E_TXD_CTX_QW0_NATT_SHIFT) > lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_GRE_TUNNELING (0x2ULL << I40E_TXD_CTX_QW0_NATT_SHIFT) > > Though it not used (yet?) by some reason. > >> >> In a performance-oriented software like dpdk, having a flag that we >> don't know what the hardware does with, that is not needed in other >> drivers of the same harware, that makes the API harder to understand >> could be a problem. > > Here is a HW spec, that says what values have to be setup for L4TUNT. > Yes, I am not sure why they need to distinguish between VXLAN/GRE tunnelling. > Though, I suppose that wouldn't eliminate the requirement. > But for same, there is no good explanation why FVL HW need to know that it is IPv4 or IPv6 packet, > in the case when only L4 checksum offload is required (IIPT field). > Niantic, as I remember, is able to work ok without that requirement. > Though, we still have to set it up. > >> Another argument: if we can remove this flag, it would make the >> testpmd commands reworkd proposed by Jijiang much more easy to >> understand: only a new "csum parse-tunnel on|off" would be required, >> and it can be explained in a few words. > > Well, from my point - testpmd commands that Jijiang proposed are perfectly clear and understandable. > Another thing, as I remember, our primary concern should be public API, no testpmd. OK let's talk about testpmd later. >> We should avoid the need to specify the tunnel type in the OUTER >> checksum API if we can, else it would limit us to specific >> supported protocols. > > From the FVL spec it is required by HW, it is not what we introducing on our own. > Spec stays explicitly that L4TUNT (L4 tunneling type) has to be setup for tunnelling packets. > Again from the spec, there are 3 different values it can take. > If you have an idea how to pass that information to PMD without using flags, sure we can consider it. > >> >>>>>> I think the following cases should be *forbidden by the API*: >>>>>> >>>>>> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1]) >>>>>> >>>>>> mb->outer_l2_len = len(out_eth) >>>>>> mb->outer_l3_len = len(out_ip) >>>>>> mb->l2_len = len(out_udp + vxlan + in_eth) >>>>>> mb->l3_len = len(out_ip) >>>>>> mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \ >>>>>> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM; >>>>>> set out_ip checksum to 0 in the packet >>>>>> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum() >>>>>> >>>>>> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be >>>>>> supported, but there is no reason to support it as there is >>>>>> already one way to do the same. >>>>>> >>>>>> I think the driver should not even look at mb->outer_l2_len >>>>>> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set. >>>>> >>>>> Why it should be forbidden? >>>>> I admit it might be a bit slower than case 4), >>>>> but I think absolutely legal way to setup HW offloads for inner L3/L4. >>>>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose >>>>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not. >>>>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not. >>>> >>>> I don't understand. The result in terms of hardware is exactly the >>>> same than case 4). Why should we have 2 different ways for doing the >>>> same thing? >>> >>> If HW supports that capability, why should we forbid it? >>> Let user to choose himself what way to use. >>> FVL spec lists it as a valid approach. >> >> It is not a hardware feature. > > It is. > >> Case 4) and case 9) would fill the hardware registers exactly the same. > > No, they wouldn't. > Please read corresponding section of FVL spec and i40e_rxtx.c > For case 4) we only need to setup TDD (TX data descriptor) with the following values: > IIPT, IPLEN, L4T, L4LEN > For case 9) we need to setup both TDD and TCD (TX context descriptor) with the following values: > TDD: IIPT, IPLEN, L4T, L4LEN > TCD: EIPT, EIPLEN, L4TUNT, L4TUNLEN > >> To me, it's just an API question. > > No, it is not. > > I still don't understand why you are so eager to 'forbid' it. > Yes we support it for FVL, but no one forces you to use it. Well, how would you describe this 2 ways of doing the same thing in the offload API? Would you talk about the i40e registers? It's not because i40e has 2 ways to do the same operation that the DPDK should do the same. How will you explain to a user how to choose between these 2 cases? Having to support these 2 different cases for the same thing will complexify all future drivers that will not work the same way than i40e. >>> As one of possible use-cases: HW VLAN tags insertion for both inner and outer packets. >>> FVL can do that, though as I know our PMD doesn't implement it yet. >>> For that, we'll need to specify at least: >>> outer_l2_len, outer_l3_len, l2_len. >>> While PKT_TX_OUTER_* might stay cleared. >> >> If a VLAN flag has to be inserted in outer header, a new flag >> PKT_TX_OUTER_INSERT_VLAN would be added. So my specification >> would still be correct: >> >> The driver should look at mb->outer_lX_len only if a >> PKT_TX_OUTER_* flag is present. >> > > Introducing PKT_TX_OUTER_INSERT_VLAN is ok. > Though I still think we'll need TX_*_TUNNEL flags and no need to 'forbid' case 9). > BTW, as I can see linux i40e driver for tunnelling packets uses case 9), not case 4), right? I need to check this. Regards, Olivier