DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	 "Liu, Jijiang" <jijiang.liu@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
Date: Tue, 20 Jan 2015 13:39:12 +0100	[thread overview]
Message-ID: <54BE4C70.7050406@6wind.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com>

Hi,

On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote:
>>>> I think a good definition would
>>>>     be:
>>>>
>>>>     Packet is IPv4. This flag must be set when using any offload
>>>>     feature (TSO, L3 or L4 checksum) to tell the NIC that the packet
>>>>     is an IPv4 packet.
>>>>
>>>>     That's why I added PKT_TX_IPV4 in the examples.
>>>
>>> I suppose we discussed it several times: both ways are possible.
>>>  From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM
>>> As mutually exclusive seems a bit more plausible.
>>>  From the upper layer - my understanding, that it is doesn't really matter.
>>> I thought we had an agreement about it in 1.8, no?
>>
>> Indeed, this was already discussed, but there was a lot of pressure
>> for 1.8.0 to push something, even not perfect. The fog around comments
>> shows that the API was not very clearly defined for 1.8.0. If you read
>> the comments of the API, it is impossible to understand when the
>> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say
>> more: the only place where the comments bring a valuable information
>> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and
>> PKT_TX_IP_CSUM are not exclusive...
>>
>> So I will fix that in my coming patch series. Just for information,
>> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not
>> exclusive flag would not require any change anywhere in the PMDs (even
>> in i40e).
>
> Right now - no.
> Though as I said from PMD perspective having them exclusive is a bit preferable.
> Again, I don't see any big difference from upper layer code.

Sure, it does not make a big difference in terms of code. But
in terms of API, the naming of the flag is coherent to what it is
used for. And it's easier to find a simple definition, like:

  * Packet is IPv4. This flag must be set when using any offload feature
  * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
  * packet.

>> On the contrary, making them exclusive would require to
>> change the ixgbe TSO code because we check.
>
> Hmm, so you are saying there is a bug somewhere  in ixbe_rxtx.c?
> What particular place you are talking about?

Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not
PKT_TX_IPV4 as I thought), so it would work for both methods without
patching the code.

In this case, it means that both approach would not require to
modify the code.

>>>>     *Problem 3*: without using the word "fortville", it is difficult
>>>>     to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed,
>>>>     once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a
>>>>     tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT
>>>>     flag. In linux, the driver doesn't care about the tunnel type,
>>>>     it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6].
>>>
>>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set,
>>> but it is not obvious what type of tunnelling it would be.
>>> FVL HW supports HW TX offloads for different type of tunnelling and
>>> requires that SW provide information about tunnelling type.
>>>  From i40e datasheet:
>>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
>>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
>>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
>>> 10b - GRE tunneling header
>>> As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep
>>> PKT_TX_UDP_TUNNEL_PKT flag.
>>
>> As I've said: in linux, the driver doesn't care about the tunnel type,
>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations.
>
> Ok, and why it should be our problem?
> We have a lot of things done in a different manner then linux/freebsd kernel drivers,
> Why now it became a problem?

If linux doesn't need an equivalent flag for doing the same thing,
it probably means we don't need it either.

In a performance-oriented software like dpdk, having a flag that we
don't know what the hardware does with, that is not needed in other
drivers of the same harware, that makes the API harder to understand
could be a problem.

Another argument: if we can remove this flag, it would make the
testpmd commands reworkd proposed by Jijiang much more easy to
understand: only a new "csum parse-tunnel on|off" would be required,
and it can be explained in a few words.

I'll try to do some tests on a fortville NIC if I can find one. I'm
curious to see if we can transmit any encapsulation packet (ip in ip,
ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel).

We should avoid the need to specify the tunnel type in the OUTER
checksum API if we can, else it would limit us to specific
supported protocols.

>>>> I think the following cases should be *forbidden by the API*:
>>>>
>>>> case 9) calculate checksum of in_ip and in_tcp  (was case B.1 in [1])
>>>>
>>>>     mb->outer_l2_len = len(out_eth)
>>>>     mb->outer_l3_len = len(out_ip)
>>>>     mb->l2_len = len(out_udp + vxlan + in_eth)
>>>>     mb->l3_len = len(out_ip)
>>>>     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
>>>>       PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
>>>>     set out_ip checksum to 0 in the packet
>>>>     set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
>>>>
>>>>     If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
>>>>     supported, but there is no reason to support it as there is
>>>>     already one way to do the same.
>>>>
>>>>     I think the driver should not even look at mb->outer_l2_len
>>>>     and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
>>>
>>> Why it should be forbidden?
>>> I admit it might be a bit slower than case 4),
>>> but I think absolutely legal way to setup HW offloads for inner L3/L4.
>>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
>>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
>>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
>>
>> I don't understand. The result in terms of hardware is exactly the
>> same than case 4). Why should we have 2 different ways for doing the
>> same thing?
>
> If HW supports that capability, why should we forbid it?
> Let user to choose himself what way to use.
> FVL spec lists it as a valid approach.

It is not a hardware feature. Case 4) and case 9) would fill the
hardware registers exactly the same. To me, it's just an API question.

> As one of possible use-cases:  HW VLAN tags insertion for both inner and outer packets.
> FVL can do that, though as I know our PMD doesn't implement it yet.
> For that, we'll need to specify at least:
> outer_l2_len, outer_l3_len, l2_len.
> While PKT_TX_OUTER_* might stay cleared.

If a VLAN flag has to be inserted in outer header, a new flag
PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
would still be correct:

   The driver should look at mb->outer_lX_len only if a
   PKT_TX_OUTER_* flag is present.

>> This is really confusing for an API. Moreover, you said
>> it: it is slower that case 4).
>
> I don't know would be slower then 4) or not for sure.
> That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1.
> Though I didn't measure the difference.
> That's actually one more reason why to allow and support it -
> so people can make sure that on FVL both ways work as expected and measure the difference.
>
> Konstantin


Regards,
Olivier

  reply	other threads:[~2015-01-20 12:39 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10  1:03 Jijiang Liu
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 1/3] librte_ether:add outer IP offload capability flag Jijiang Liu
2014-12-11 10:33   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 2/3] i40e:support outer IPv4 checksum capability Jijiang Liu
2014-12-11 10:34   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 3/3] app/testpmd:change tx_checksum command and csum forwarding engine Jijiang Liu
2014-12-11 10:52   ` Olivier MATZ
2014-12-12  4:06     ` Liu, Jijiang
2014-12-11 10:17 ` [dpdk-dev] [PATCH v3 0/3] enhance TX checksum " Olivier MATZ
2014-12-12  3:48   ` Liu, Jijiang
2014-12-12 16:33     ` Olivier MATZ
2015-01-07  2:03       ` Liu, Jijiang
2015-01-07  9:59         ` Ananyev, Konstantin
2015-01-07 11:39           ` Liu, Jijiang
2015-01-07 12:07             ` Ananyev, Konstantin
2015-01-08  8:51               ` Liu, Jijiang
2015-01-08 10:54                 ` Ananyev, Konstantin
2015-01-09 10:45                   ` Olivier MATZ
2015-01-12  3:41                     ` Liu, Jijiang
2015-01-12 11:43                       ` Olivier MATZ
2015-01-13  3:04                         ` Liu, Jijiang
2015-01-13  9:55                           ` Olivier MATZ
2015-01-14  3:01                             ` Liu, Jijiang
2015-01-15 13:31                               ` Ananyev, Konstantin
2015-01-16 17:27                                 ` Olivier MATZ
2015-01-19 13:04                                   ` Ananyev, Konstantin
2015-01-19 14:38                                     ` Olivier MATZ
2015-01-20  1:12                                       ` Ananyev, Konstantin
2015-01-20 12:39                                         ` Olivier MATZ [this message]
2015-01-20 15:18                                           ` Thomas Monjalon
2015-01-20 17:10                                             ` Stephen Hemminger
2015-01-20 17:23                                           ` Ananyev, Konstantin
2015-01-20 18:15                                             ` Olivier MATZ
2015-01-21  3:12                                               ` Liu, Jijiang
2015-01-21 15:25                                                 ` Olivier MATZ
2015-01-21 16:28                                                   ` Ananyev, Konstantin
2015-01-21 17:13                                                     ` Olivier MATZ
2015-01-26  4:13                                                   ` Ananyev, Konstantin
2015-01-26  6:02                                                     ` Liu, Jijiang
2015-01-26 14:07                                                       ` Olivier MATZ
2015-01-26 14:15                                                         ` Ananyev, Konstantin
2015-01-27  8:34                                                           ` Olivier MATZ
2015-01-27 15:26                                                             ` Ananyev, Konstantin
2015-01-21 19:44                                                 ` Stephen Hemminger
2015-01-22  1:40                                                   ` Liu, Jijiang
2015-01-21  8:01                                               ` Liu, Jijiang
2015-01-21  9:10                                                 ` Olivier MATZ
2015-01-21 11:52                                                   ` Ananyev, Konstantin
2015-01-07 13:06 ` Qiu, Michael

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54BE4C70.7050406@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).