DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier MATZ <olivier.matz@6wind.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	 "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support
Date: Fri, 16 May 2014 14:11:37 +0200	[thread overview]
Message-ID: <53760079.7090106@6wind.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB9772580EFA6B2A@IRSMSX105.ger.corp.intel.com>

Hi Konstantin,

On 05/15/2014 06:30 PM, Ananyev, Konstantin wrote:
> With the current DPDK implementation the upper code would still be different for TCP checksum (without segmentation) and TCP segmentation:
> different flags in mbuf, with TSO you need to setup l4_len and mss fields inside mbuf, with just checksum - you don't.

You are right on this point.

> Plus, as I said, it is a bit confusing to calculate PSD csum once in the stack and then re-calculate in PMD.
> Again - unnecessary slowdown.
> So why not just have get_ipv4_psd_sum() and get_ipv4_psd_tso_sum() inside testpmd/csumonly.c and call them accordingly?

Yes, recalculating the pseudo-header checksum without the ip_len
is a slow down. This slow down should however be compared to the
operation in progress. When you do TSO, you are generally transmitting
a large TCP packet (several KB), and the cost of the TCP stack is
probably much higher than fixing the checksum. But that's not the
main argument: my idea was to choose the proper API that will
reduce the slow down for most cases.

Let's take the case of a future vnic pmd driver supporting an emulation
of TSO. In this case, the calculation of the pseudo header is also an
unnecessary slowdown.

Also, some other hardware I've seen don't need to calculate a different
pseudo header checksum when doing TSO.

Last argument, the way Linux works is the same that what I've
implemented. See in linux ixgbe driver [1] at line 6461, there is a
call to csum_tcpudp_magic() which reprocesses the checksum without
the ip_len.

On the other hand, that's true that today ixgbe is the only hardware
supporting TSO in DPDK. The pragmatic approach could be to choose the
API that gives the best performance with what we have (the ixgbe
driver). I'm ok with this approach if we accept to reconsider the API
(and maybe modifying it) when another PMD supporting TSO will be
implemented.

> About potential future problem with NICs that implement TX checksum/segmentation offloads in a different manner - yeh that's true...
> I think at the final point all that logic could be hidden inside some function at rte_ethdev level, something like:  rte_eth_dev_prep_tx(portid, mbuf[], num).

I don't see the real difference between:

   rte_eth_dev_prep_tx(portid, mbuf[], num)
   rte_eth_dev_tx(portid, mbuf[], num)

and:

   rte_eth_dev_tx(portid, mbuf[], num) /* the tx does the cksum job */

And the second is faster because there is only one pointer dereference.

> So,  based on mbuf TX offload flags and device type, it would do necessary modifications inside the packet.
> But that's future discussion, I suppose.

To me, it's not an option to fill that the network stack fills the
mbuf differently depending on the device type. For instance, when doing
ethernet bonding or bridging, the stack may not know which physical
device will be used at the end. So the API to enable TSO on a packet
has to be the same whatever the device.

If fixing the checksum in the PMD is an unnecessary slowdown, forcing
the network stack to check what has to be filled in the mbuf depending
on the device type also has a cost.

> For now, I still think we need to keep pseudo checksum calculations out of PMD code.

To me, there are 2 options:

1/ Update patch to calculate the pseudo header without the ip_len when
    doing TSO. In this case the API is mapped on ixgbe behavior,
    probably providing the best performance today. If another PMD comes
    in the future, this API may change to something more generic.

2/ Try to define a generic API today, accepting that the first driver
    that supports TSO is a bit slower, but reducing the risks of changing
    the API for TSO in the future.

I'm fine with both options.

Regards,
Olivier

[1] 
http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c?v=3.14#L6434

  reply	other threads:[~2014-05-16 12:11 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-09 14:50 [dpdk-dev] [PATCH RFC 00/11] " Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 01/11] igb/ixgbe: fix IP checksum calculation Olivier Matz
2014-05-15 10:40   ` Ananyev, Konstantin
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 02/11] mbuf: rename RTE_MBUF_SCATTER_GATHER into RTE_MBUF_REFCNT Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 03/11] mbuf: remove rte_ctrlmbuf Olivier Matz
2014-05-25 21:39   ` Gilmore, Walter E
2014-05-26 12:23     ` Olivier MATZ
2014-05-26 16:40     ` Dumitrescu, Cristian
2014-05-26 22:43     ` Neil Horman
2014-05-27  0:17   ` Stephen Hemminger
2014-05-28  9:45     ` Ananyev, Konstantin
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 04/11] mbuf: remove the rte_pktmbuf structure Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 05/11] mbuf: merge physaddr and buf_len in a bitfield Olivier Matz
2014-05-09 15:39   ` Shaw, Jeffrey B
2014-05-09 16:06     ` Olivier MATZ
2014-05-09 16:11       ` Shaw, Jeffrey B
2014-05-14 14:07         ` Ananyev, Konstantin
2014-05-15  9:53           ` Olivier MATZ
2014-05-19  7:27         ` Olivier MATZ
2014-05-19  8:25           ` Richardson, Bruce
2014-05-19  9:30             ` Olivier MATZ
2014-05-19  9:57               ` Richardson, Bruce
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 06/11] mbuf: replace data pointer by an offset Olivier Matz
2014-05-12 14:12   ` Thomas Monjalon
2014-05-12 14:36     ` Venkatesan, Venky
2014-05-12 14:41       ` Neil Horman
2014-05-12 15:07         ` Olivier MATZ
2014-05-12 15:59           ` Stephen Hemminger
2014-05-12 16:13             ` Olivier MATZ
2014-05-12 17:13               ` Stephen Hemminger
2014-05-13 13:29                 ` Olivier MATZ
2014-05-12 16:06           ` Venkatesan, Venky
2014-05-12 18:39             ` Neil Horman
2014-05-13 13:54               ` Venkatesan, Venky
2014-05-13 14:09                 ` Thomas Monjalon
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 07/11] mbuf: add functions to get the name of an ol_flag Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 08/11] mbuf: change ol_flags to 32 bits Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 09/11] mbuf: rename vlan_macip_len in hw_offload and increase its size Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 10/11] testpmd: modify source address to validate checksum calculation Olivier Matz
2014-05-09 14:50 ` [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support Olivier Matz
2014-05-12 14:30   ` Thomas Monjalon
2014-05-15 15:09   ` Ananyev, Konstantin
2014-05-15 15:39     ` Olivier MATZ
2014-05-15 16:30       ` Ananyev, Konstantin
2014-05-16 12:11         ` Olivier MATZ [this message]
2014-05-16 17:01           ` Ananyev, Konstantin
2014-05-19 12:32             ` Thomas Monjalon
2014-05-09 17:04 ` [dpdk-dev] [PATCH RFC 00/11] " Stephen Hemminger
2014-05-09 21:49   ` Olivier MATZ
2014-05-10  0:39     ` Stephen Hemminger
2014-05-19 12:47 ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53760079.7090106@6wind.com \
    --to=olivier.matz@6wind.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).