DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>,
	"Liu, Jijiang" <jijiang.liu@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
Date: Tue, 20 Jan 2015 01:12:29 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com> (raw)
In-Reply-To: <54BD16F1.6050409@6wind.com>



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Monday, January 19, 2015 2:39 PM
> To: Ananyev, Konstantin; Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
> 
> Hi Konstantin,
> 
> On 01/19/2015 02:04 PM, Ananyev, Konstantin wrote:
> >> case 2) calculate checksum of out_ip and out_udp
> >>
> >>    mb->l2_len = len(out_eth)
> >>    mb->l3_len = len(out_ip)
> >>    mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM
> >>    set out_ip checksum to 0 in the packet
> >>    set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
> >>
> >>    supported on hardware advertising DEV_TX_OFFLOAD_IPV4_CKSUM and
> >>    DEV_TX_OFFLOAD_UDP_CKSUM
> >>
> >>    *Problem 1*: The comment above PKT_TX_IPV4 says "Packet is IPv4
> >>    without requiring IP checksum offload" [2], and the help of L4
> >>    checksum and TSO says that it is required to set the PKT_TX_IPV4
> >>    flag [3]. This is not coherent.
> >
> > So what is the problem?
> > Comments in rte_mbuf.h are not coherent?
> 
> No there're not coherent

Ok, if the problem is just comments - let's fix it.

> 
> >>    We are back on the debate about the meaning of PKT_TX_IPV4 vs
> >>    PKT_TX_IP_CSUM from [4]. This incoherency in comments are introduced
> >>    by patch [5]. The question is "when an application should set
> >>    this flag? for any IP packet that does not require IP checksum?".
> >
> > Yes, if it is an IPv4 packet and application required TX offload for L4 checksum or TSO,
> > but doesn't want HW offload ofr IPV4 checksum calculation.
> >
> >>    This would break many applications.
> >
> > Which ones?
> > As I know, so far nothing is broken.
> 
> The problem today is that it's not obvious for a developper to
> know when an application should set the PKT_TX_IPV4 flag. From the
> comments, we could think that an application has to set it for any
> transmitted IP packet, even for packets that do not require tx
> offload. Asking to do this in the API would break many applications.
> 
> The comment should at least say that this flag is *only* required
> when asking for L4 checksum. As TSO implies IP checksum, it means the
> PKT_TX_IPV4 should not be set, but PKT_TX_IP_CSUM instead.

Ok, so the problem is in comments again?
If so, sure let's update them to make things clear.

> 
> >> I think a good definition would
> >>    be:
> >>
> >>    Packet is IPv4. This flag must be set when using any offload
> >>    feature (TSO, L3 or L4 checksum) to tell the NIC that the packet
> >>    is an IPv4 packet.
> >>
> >>    That's why I added PKT_TX_IPV4 in the examples.
> >
> > I suppose we discussed it several times: both ways are possible.
> > From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM
> > As mutually exclusive seems a bit more plausible.
> > From the upper layer - my understanding, that it is doesn't really matter.
> > I thought we had an agreement about it in 1.8, no?
> 
> Indeed, this was already discussed, but there was a lot of pressure
> for 1.8.0 to push something, even not perfect. The fog around comments
> shows that the API was not very clearly defined for 1.8.0. If you read
> the comments of the API, it is impossible to understand when the
> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say
> more: the only place where the comments bring a valuable information
> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and
> PKT_TX_IP_CSUM are not exclusive...
> 
> So I will fix that in my coming patch series. Just for information,
> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not
> exclusive flag would not require any change anywhere in the PMDs (even
> in i40e).

Right now - no.
Though as I said from PMD perspective having them exclusive is a bit preferable.
Again, I don't see any big difference from upper layer code.

> On the contrary, making them exclusive would require to
> change the ixgbe TSO code because we check PKT_TX_IPV4.

Hmm, so you are saying there is a bug somewhere  in ixbe_rxtx.c?
What particular place you are talking about?

> 
> >>    *Problem 3*: without using the word "fortville", it is difficult
> >>    to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed,
> >>    once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a
> >>    tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT
> >>    flag. In linux, the driver doesn't care about the tunnel type,
> >>    it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6].
> >
> > It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set,
> > but it is not obvious what type of tunnelling it would be.
> > FVL HW supports HW TX offloads for different type of tunnelling and
> > requires that SW provide information about tunnelling type.
> > From i40e datasheet:
> > L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
> > 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
> > 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
> > 10b - GRE tunneling header
> > As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep
> > PKT_TX_UDP_TUNNEL_PKT flag.
> 
> As I've said: in linux, the driver doesn't care about the tunnel type,
> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations.

Ok, and why it should be our problem?
We have a lot of things done in a different manner then linux/freebsd kernel drivers,
Why now it became a problem?

> However I suppose that linux driver is able to process the hw outer
> checksum even for other encapsulations (gre, ipip). 

No idea, never tried.

>And, does it mean
> that ipip tunnels are not supported by i40e? I can't believe it.

IPIP tunnels are supported.
>From the FVL spec, for IPIP L4TUNT and L4TUN should be set to 0.
You can read all that yourself:
http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-10-40-controller-datasheet.html
Sections 8.4.2.2.1 and 8.4.4
No need to do a reverse engineering from linux i40e source code. 

> If it's the case... how an application on top of DPDK can know which
> tunnel types are supported by the underlying port?
> From what I've read, what the datasheet does not explain is:
> "what is done differently for this packet between setting the register
> to GRE (10b) or UDP (01b)?"

That's a good question. 
I don't know why HW requires that information, just follow the spec here.

> 
> >> case 7) calculate checksum of out_ip, out_udp, in_ip and in_tcp
> >>
> >>    mb->outer_l2_len = len(out_eth)
> >>    mb->outer_l3_len = len(out_ip)
> >>    mb->l2_len = len(out_udp + vxlan + in_eth)
> >>    mb->l3_len = len(in_ip)
> >>    mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CKSUM | \
> >>      PKT_TX_OUTER_UDP_CKSUM | PKT_TX_IP_CKSUM |  PKT_TX_TCP_CKSUM;
> >>    set out_ip checksum to 0 in the packet
> >>    set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
> >>    set in_ip checksum to 0 in the packet
> >>    set in_tcp checksum to pseudo header using rte_ipv4_phdr_cksum()
> >>
> >>   We need to add the flag PKT_TX_OUTER_UDP_CKSUM.
> >
> > We can, though right now we don't have a HW that is able to do that.
> > Why need to do it now?
> 
> No, I agree we should not add it now. I just want to be sure we have
> a consensus that it will work like this the day we'll have such drivers.

Ok.

> 
> >> I think the following cases should be *forbidden by the API*:
> >>
> >> case 9) calculate checksum of in_ip and in_tcp  (was case B.1 in [1])
> >>
> >>    mb->outer_l2_len = len(out_eth)
> >>    mb->outer_l3_len = len(out_ip)
> >>    mb->l2_len = len(out_udp + vxlan + in_eth)
> >>    mb->l3_len = len(out_ip)
> >>    mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
> >>      PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
> >>    set out_ip checksum to 0 in the packet
> >>    set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
> >>
> >>    If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
> >>    supported, but there is no reason to support it as there is
> >>    already one way to do the same.
> >>
> >>    I think the driver should not even look at mb->outer_l2_len
> >>    and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
> >
> > Why it should be forbidden?
> > I admit it might be a bit slower than case 4),
> > but I think absolutely legal way to setup HW offloads for inner L3/L4.
> > As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
> > PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
> > PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
> 
> I don't understand. The result in terms of hardware is exactly the
> same than case 4). Why should we have 2 different ways for doing the
> same thing?

If HW supports that capability, why should we forbid it? 
Let user to choose himself what way to use.
FVL spec lists it as a valid approach.   
As one of possible use-cases:  HW VLAN tags insertion for both inner and outer packets.
FVL can do that, though as I know our PMD doesn't implement it yet.
For that, we'll need to specify at least:
outer_l2_len, outer_l3_len, l2_len.
While PKT_TX_OUTER_* might stay cleared.

> This is really confusing for an API. Moreover, you said
> it: it is slower that case 4).

I don't know would be slower then 4) or not for sure.
That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1.
Though I didn't measure the difference.
That's actually one more reason why to allow and support it -
so people can make sure that on FVL both ways work as expected and measure the difference.

Konstantin

> 
> It also seems easier to understand from an API point of view: the
> PMD uses mb->outer_lX_len if and only if a PKT_TX_OUTER_* flag
> is present.
> 
> >> case 10) calculate a checksum using only outer_lX fields
> >>
> >>    The outer_lX fields or PKT_TX_OUTER_* flags can only be used
> >>    if a inner checksum is enabled. So it's not possible to do
> >>    the following:
> >>
> >>    mb->outer_l2_len = len(out_eth)
> >>    mb->outer_l3_len = len(out_ip)
> >>    mb->ol_flags |= PKT_TX_OUTER_IPV4 | PKT_TX_OUTER_IP_CSUM
> >>    set out_ip checksum to 0 in the packet
> >
> > Ok, I think no one plans to use it anyway.
> >
> > Konstantin
> 
> Thanks Konstantin for taking the time to reply and progress on this.
> 
> Regards,
> Olivier

  reply	other threads:[~2015-01-20  1:12 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-10  1:03 Jijiang Liu
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 1/3] librte_ether:add outer IP offload capability flag Jijiang Liu
2014-12-11 10:33   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 2/3] i40e:support outer IPv4 checksum capability Jijiang Liu
2014-12-11 10:34   ` Olivier MATZ
2014-12-10  1:03 ` [dpdk-dev] [PATCH v3 3/3] app/testpmd:change tx_checksum command and csum forwarding engine Jijiang Liu
2014-12-11 10:52   ` Olivier MATZ
2014-12-12  4:06     ` Liu, Jijiang
2014-12-11 10:17 ` [dpdk-dev] [PATCH v3 0/3] enhance TX checksum " Olivier MATZ
2014-12-12  3:48   ` Liu, Jijiang
2014-12-12 16:33     ` Olivier MATZ
2015-01-07  2:03       ` Liu, Jijiang
2015-01-07  9:59         ` Ananyev, Konstantin
2015-01-07 11:39           ` Liu, Jijiang
2015-01-07 12:07             ` Ananyev, Konstantin
2015-01-08  8:51               ` Liu, Jijiang
2015-01-08 10:54                 ` Ananyev, Konstantin
2015-01-09 10:45                   ` Olivier MATZ
2015-01-12  3:41                     ` Liu, Jijiang
2015-01-12 11:43                       ` Olivier MATZ
2015-01-13  3:04                         ` Liu, Jijiang
2015-01-13  9:55                           ` Olivier MATZ
2015-01-14  3:01                             ` Liu, Jijiang
2015-01-15 13:31                               ` Ananyev, Konstantin
2015-01-16 17:27                                 ` Olivier MATZ
2015-01-19 13:04                                   ` Ananyev, Konstantin
2015-01-19 14:38                                     ` Olivier MATZ
2015-01-20  1:12                                       ` Ananyev, Konstantin [this message]
2015-01-20 12:39                                         ` Olivier MATZ
2015-01-20 15:18                                           ` Thomas Monjalon
2015-01-20 17:10                                             ` Stephen Hemminger
2015-01-20 17:23                                           ` Ananyev, Konstantin
2015-01-20 18:15                                             ` Olivier MATZ
2015-01-21  3:12                                               ` Liu, Jijiang
2015-01-21 15:25                                                 ` Olivier MATZ
2015-01-21 16:28                                                   ` Ananyev, Konstantin
2015-01-21 17:13                                                     ` Olivier MATZ
2015-01-26  4:13                                                   ` Ananyev, Konstantin
2015-01-26  6:02                                                     ` Liu, Jijiang
2015-01-26 14:07                                                       ` Olivier MATZ
2015-01-26 14:15                                                         ` Ananyev, Konstantin
2015-01-27  8:34                                                           ` Olivier MATZ
2015-01-27 15:26                                                             ` Ananyev, Konstantin
2015-01-21 19:44                                                 ` Stephen Hemminger
2015-01-22  1:40                                                   ` Liu, Jijiang
2015-01-21  8:01                                               ` Liu, Jijiang
2015-01-21  9:10                                                 ` Olivier MATZ
2015-01-21 11:52                                                   ` Ananyev, Konstantin
2015-01-07 13:06 ` Qiu, Michael

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB977258213DDF46@irsmsx105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jijiang.liu@intel.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git