DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	 "Liu, Jijiang" <jijiang.liu@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
Date: Tue, 20 Jan 2015 19:15:50 +0100
Message-ID: <54BE9B56.7050108@6wind.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258213DE5FB@irsmsx105.ger.corp.intel.com>

Hi Konstantin,

On 01/20/2015 06:23 PM, Ananyev, Konstantin wrote:
>> Sure, it does not make a big difference in terms of code. But
>> in terms of API, the naming of the flag is coherent to what it is
>> used for. And it's easier to find a simple definition, like:
>>
>>    * Packet is IPv4. This flag must be set when using any offload feature
>>    * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
>>    * packet.
>
> Ok, and what's wrong with:
> "Packet is IPv4. This flag must be set when using any offload feature
> (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
> packet and no HW offload for IPv4 header checksum calculation is required"
> ?

I honestly find the first one simpler.

Again, I understand both are possible, but I think choosing the
most trivial one is the right way for an API.


>>> Ok, and why it should be our problem?
>>> We have a lot of things done in a different manner then linux/freebsd kernel drivers,
>>> Why now it became a problem?
>>
>> If linux doesn't need an equivalent flag for doing the same thing,
>> it probably means we don't need it either.
>
> Probably yes .... Or probably not.
> Why do we need to guess what was the intention of guys who wrote that part of linux driver?

Because the dpdk looks very similar to that part of linux driver.

> BTW, the macro for GRE is here:
> find lib/librte_pmd_i40e/i40e -type f | xargs grep TUN | grep TXD
> lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_UDP_TUNNELING (0x1ULL << I40E_TXD_CTX_QW0_NATT_SHIFT)
> lib/librte_pmd_i40e/i40e/i40e_type.h:#define I40E_TXD_CTX_GRE_TUNNELING (0x2ULL << I40E_TXD_CTX_QW0_NATT_SHIFT)
>
> Though it not used (yet?) by some reason.
>
>>
>> In a performance-oriented software like dpdk, having a flag that we
>> don't know what the hardware does with, that is not needed in other
>> drivers of the same harware, that makes the API harder to understand
>> could be a problem.
>
> Here is a HW spec, that says what values have to be setup for L4TUNT.
> Yes, I am not sure why they need to distinguish between VXLAN/GRE tunnelling.
> Though, I suppose that wouldn't eliminate the requirement.
> But for same, there is no good explanation why FVL HW need to know that it is IPv4 or IPv6 packet,
> in the case when only L4 checksum offload is required (IIPT field).
> Niantic, as I remember, is able to work ok without that requirement.
> Though, we still have to set it up.
>
>> Another argument: if we can remove this flag, it would make the
>> testpmd commands reworkd proposed by Jijiang much more easy to
>> understand: only a new "csum parse-tunnel on|off" would be required,
>> and it can be explained in a few words.
>
> Well, from my point - testpmd commands that Jijiang proposed are perfectly clear and understandable.
> Another thing, as I remember, our primary concern should be public API, no testpmd.

OK let's talk about testpmd later.

>> We should avoid the need to specify the tunnel type in the OUTER
>> checksum API if we can, else it would limit us to specific
>> supported protocols.
>
>  From the FVL spec it is required by HW, it is not what we introducing on our own.
> Spec stays explicitly that L4TUNT (L4 tunneling type) has to be setup for tunnelling packets.
> Again from the spec, there are 3 different values it can take.
> If you have an idea how to pass that information to  PMD without using flags, sure we can consider it.
>
>>
>>>>>> I think the following cases should be *forbidden by the API*:
>>>>>>
>>>>>> case 9) calculate checksum of in_ip and in_tcp  (was case B.1 in [1])
>>>>>>
>>>>>>      mb->outer_l2_len = len(out_eth)
>>>>>>      mb->outer_l3_len = len(out_ip)
>>>>>>      mb->l2_len = len(out_udp + vxlan + in_eth)
>>>>>>      mb->l3_len = len(out_ip)
>>>>>>      mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
>>>>>>        PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
>>>>>>      set out_ip checksum to 0 in the packet
>>>>>>      set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
>>>>>>
>>>>>>      If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
>>>>>>      supported, but there is no reason to support it as there is
>>>>>>      already one way to do the same.
>>>>>>
>>>>>>      I think the driver should not even look at mb->outer_l2_len
>>>>>>      and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
>>>>>
>>>>> Why it should be forbidden?
>>>>> I admit it might be a bit slower than case 4),
>>>>> but I think absolutely legal way to setup HW offloads for inner L3/L4.
>>>>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
>>>>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
>>>>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
>>>>
>>>> I don't understand. The result in terms of hardware is exactly the
>>>> same than case 4). Why should we have 2 different ways for doing the
>>>> same thing?
>>>
>>> If HW supports that capability, why should we forbid it?
>>> Let user to choose himself what way to use.
>>> FVL spec lists it as a valid approach.
>>
>> It is not a hardware feature.
>
> It is.
>
>> Case 4) and case 9) would fill the hardware registers exactly the same.
>
> No, they wouldn't.
> Please read corresponding section of FVL spec and i40e_rxtx.c
> For case 4) we only need to setup TDD (TX data descriptor) with the following values:
> IIPT, IPLEN, L4T, L4LEN
> For case 9) we need to setup both TDD and TCD (TX context descriptor) with the following values:
> TDD: IIPT, IPLEN, L4T, L4LEN
> TCD: EIPT, EIPLEN,  L4TUNT, L4TUNLEN
>
>> To me, it's just an API question.
>
> No, it is not.
>
> I still don't understand why you are so eager to 'forbid' it.
> Yes we support it for FVL, but no one forces you to use it.

Well, how would you describe this 2 ways of doing the same thing
in the offload API? Would you talk about the i40e registers? It's not
because i40e has 2 ways to do the same operation that the DPDK should
do the same.

How will you explain to a user how to choose between these 2 cases?

Having to support these 2 different cases for the same thing will
complexify all future drivers that will not work the same way than i40e.

>>> As one of possible use-cases:  HW VLAN tags insertion for both inner and outer packets.
>>> FVL can do that, though as I know our PMD doesn't implement it yet.
>>> For that, we'll need to specify at least:
>>> outer_l2_len, outer_l3_len, l2_len.
>>> While PKT_TX_OUTER_* might stay cleared.
>>
>> If a VLAN flag has to be inserted in outer header, a new flag
>> PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
>> would still be correct:
>>
>>     The driver should look at mb->outer_lX_len only if a
>>     PKT_TX_OUTER_* flag is present.
>>
>
> Introducing PKT_TX_OUTER_INSERT_VLAN is ok.
> Though I still think we'll need TX_*_TUNNEL flags and no need to 'forbid' case 9).
> BTW, as I can see linux i40e driver for tunnelling packets uses case 9), not case 4), right?

I need to check this.

Regards,
Olivier

  reply	other threads:[~2015-01-20 18:15 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10  1:03 Jijiang Liu
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 1/3] librte_ether:add outer IP offload capability flag Jijiang Liu
2014-12-11 10:33   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 2/3] i40e:support outer IPv4 checksum capability Jijiang Liu
2014-12-11 10:34   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 3/3] app/testpmd:change tx_checksum command and csum forwarding engine Jijiang Liu
2014-12-11 10:52   ` Olivier MATZ
2014-12-12  4:06     ` Liu, Jijiang
2014-12-11 10:17 ` [dpdk-dev] [PATCH v3 0/3] enhance TX checksum " Olivier MATZ
2014-12-12  3:48   ` Liu, Jijiang
2014-12-12 16:33     ` Olivier MATZ
2015-01-07  2:03       ` Liu, Jijiang
2015-01-07  9:59         ` Ananyev, Konstantin
2015-01-07 11:39           ` Liu, Jijiang
2015-01-07 12:07             ` Ananyev, Konstantin
2015-01-08  8:51               ` Liu, Jijiang
2015-01-08 10:54                 ` Ananyev, Konstantin
2015-01-09 10:45                   ` Olivier MATZ
2015-01-12  3:41                     ` Liu, Jijiang
2015-01-12 11:43                       ` Olivier MATZ
2015-01-13  3:04                         ` Liu, Jijiang
2015-01-13  9:55                           ` Olivier MATZ
2015-01-14  3:01                             ` Liu, Jijiang
2015-01-15 13:31                               ` Ananyev, Konstantin
2015-01-16 17:27                                 ` Olivier MATZ
2015-01-19 13:04                                   ` Ananyev, Konstantin
2015-01-19 14:38                                     ` Olivier MATZ
2015-01-20  1:12                                       ` Ananyev, Konstantin
2015-01-20 12:39                                         ` Olivier MATZ
2015-01-20 15:18                                           ` Thomas Monjalon
2015-01-20 17:10                                             ` Stephen Hemminger
2015-01-20 17:23                                           ` Ananyev, Konstantin
2015-01-20 18:15                                             ` Olivier MATZ [this message]
2015-01-21  3:12                                               ` Liu, Jijiang
2015-01-21 15:25                                                 ` Olivier MATZ
2015-01-21 16:28                                                   ` Ananyev, Konstantin
2015-01-21 17:13                                                     ` Olivier MATZ
2015-01-26  4:13                                                   ` Ananyev, Konstantin
2015-01-26  6:02                                                     ` Liu, Jijiang
2015-01-26 14:07                                                       ` Olivier MATZ
2015-01-26 14:15                                                         ` Ananyev, Konstantin
2015-01-27  8:34                                                           ` Olivier MATZ
2015-01-27 15:26                                                             ` Ananyev, Konstantin
2015-01-21 19:44                                                 ` Stephen Hemminger
2015-01-22  1:40                                                   ` Liu, Jijiang
2015-01-21  8:01                                               ` Liu, Jijiang
2015-01-21  9:10                                                 ` Olivier MATZ
2015-01-21 11:52                                                   ` Ananyev, Konstantin
2015-01-07 13:06 ` Qiu, Michael

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54BE9B56.7050108@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git