From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id B22F5B0FE
 for <dev@dpdk.org>; Fri, 16 May 2014 19:01:34 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga101.fm.intel.com with ESMTP; 16 May 2014 10:01:41 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.97,1068,1389772800"; d="scan'208";a="540326607"
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159])
 by fmsmga002.fm.intel.com with ESMTP; 16 May 2014 10:01:15 -0700
Received: from irsmsx152.ger.corp.intel.com (163.33.192.66) by
 IRSMSX104.ger.corp.intel.com (163.33.3.159) with Microsoft SMTP Server (TLS)
 id 14.3.123.3; Fri, 16 May 2014 18:01:14 +0100
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.70]) by
 IRSMSX152.ger.corp.intel.com ([169.254.6.128]) with mapi id 14.03.0123.003;
 Fri, 16 May 2014 18:01:14 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support
Thread-Index: AQHPa5ZWUBVmR8JmOEOPGVNhAVBtGJtBxefg///59wCAABPWgIABRFSAgABRdbA=
Date: Fri, 16 May 2014 17:01:13 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772580EFA6ED5@IRSMSX105.ger.corp.intel.com>
References: <1399647038-15095-1-git-send-email-olivier.matz@6wind.com>
 <1399647038-15095-12-git-send-email-olivier.matz@6wind.com>
 <2601191342CEEE43887BDE71AB9772580EFA6ADD@IRSMSX105.ger.corp.intel.com>
 <5374DFC4.50808@6wind.com>
 <2601191342CEEE43887BDE71AB9772580EFA6B2A@IRSMSX105.ger.corp.intel.com>
 <53760079.7090106@6wind.com>
In-Reply-To: <53760079.7090106@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 16 May 2014 17:01:35 -0000


Hi Oliver,

>Yes, recalculating the pseudo-header checksum without the ip_len
>is a slow down. This slow down should however be compared to the
>operation in progress. When you do TSO, you are generally transmitting
>a large TCP packet (several KB), and the cost of the TCP stack is
>probably much higher than fixing the checksum.

You can't always predict the context in which PMD TX routine will be called=
.
Consider the scenario: one core doing IO over several ports, while few othe=
r cores doing upper layer processing of the packets.
In that case, pseudo-header checksum (re)calculation inside PMD TX function=
 will slow-down not only that particular packet flow,
but the RX/TX over all ports that are managed by the given core. =20
That's why I think that=20

rte_eth_dev_prep_tx(portid, mbuf[], num)
rte_eth_dev_tx(portid, mbuf[], num)

might have an advantage over

rte_eth_dev_tx(portid, mbuf[], num) /* the tx does the cksum job */

As it gives us a freedom to choose: do  prep_tx() either on the same execut=
ion context with actual tx() or on different one.=20
Though yes, it comes with a price:  extra function call with all correspond=
ing drawbacks.

Anyway, right now we probably can argue for a while trying to define how ge=
neric TX HW offload API should look like.
So, from your options list:

>1/ Update patch to calculate the pseudo header without the ip_len when
>    doing TSO. In this case the API is mapped on ixgbe behavior,
>    probably providing the best performance today. If another PMD comes
>   in the future, this API may change to something more generic.

>2/ Try to define a generic API today, accepting that the first driver
>    that supports TSO is a bit slower, but reducing the risks of changing
>   the API for TSO in the future.

If #1 means moving  pseudo checksum calculation out of PMD code, then my vo=
te would be for it.

Konstantin

-----Original Message-----
From: Olivier MATZ [mailto:olivier.matz@6wind.com]=20
Sent: Friday, May 16, 2014 1:12 PM
To: Ananyev, Konstantin; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH RFC 11/11] ixgbe/mbuf: add TSO support

Hi Konstantin,

On 05/15/2014 06:30 PM, Ananyev, Konstantin wrote:
> With the current DPDK implementation the upper code would still be differ=
ent for TCP checksum (without segmentation) and TCP segmentation:
> different flags in mbuf, with TSO you need to setup l4_len and mss fields=
 inside mbuf, with just checksum - you don't.

You are right on this point.

> Plus, as I said, it is a bit confusing to calculate PSD csum once in the =
stack and then re-calculate in PMD.
> Again - unnecessary slowdown.
> So why not just have get_ipv4_psd_sum() and get_ipv4_psd_tso_sum() inside=
 testpmd/csumonly.c and call them accordingly?

Yes, recalculating the pseudo-header checksum without the ip_len
is a slow down. This slow down should however be compared to the
operation in progress. When you do TSO, you are generally transmitting
a large TCP packet (several KB), and the cost of the TCP stack is
probably much higher than fixing the checksum. But that's not the
main argument: my idea was to choose the proper API that will
reduce the slow down for most cases.

Let's take the case of a future vnic pmd driver supporting an emulation
of TSO. In this case, the calculation of the pseudo header is also an
unnecessary slowdown.

Also, some other hardware I've seen don't need to calculate a different
pseudo header checksum when doing TSO.

Last argument, the way Linux works is the same that what I've
implemented. See in linux ixgbe driver [1] at line 6461, there is a
call to csum_tcpudp_magic() which reprocesses the checksum without
the ip_len.

On the other hand, that's true that today ixgbe is the only hardware
supporting TSO in DPDK. The pragmatic approach could be to choose the
API that gives the best performance with what we have (the ixgbe
driver). I'm ok with this approach if we accept to reconsider the API
(and maybe modifying it) when another PMD supporting TSO will be
implemented.

> About potential future problem with NICs that implement TX checksum/segme=
ntation offloads in a different manner - yeh that's true...
> I think at the final point all that logic could be hidden inside some fun=
ction at rte_ethdev level, something like:  rte_eth_dev_prep_tx(portid, mbu=
f[], num).

I don't see the real difference between:

   rte_eth_dev_prep_tx(portid, mbuf[], num)
   rte_eth_dev_tx(portid, mbuf[], num)

and:

   rte_eth_dev_tx(portid, mbuf[], num) /* the tx does the cksum job */

And the second is faster because there is only one pointer dereference.

> So,  based on mbuf TX offload flags and device type, it would do necessar=
y modifications inside the packet.
> But that's future discussion, I suppose.

To me, it's not an option to fill that the network stack fills the
mbuf differently depending on the device type. For instance, when doing
ethernet bonding or bridging, the stack may not know which physical
device will be used at the end. So the API to enable TSO on a packet
has to be the same whatever the device.

If fixing the checksum in the PMD is an unnecessary slowdown, forcing
the network stack to check what has to be filled in the mbuf depending
on the device type also has a cost.

> For now, I still think we need to keep pseudo checksum calculations out o=
f PMD code.

To me, there are 2 options:

1/ Update patch to calculate the pseudo header without the ip_len when
    doing TSO. In this case the API is mapped on ixgbe behavior,
    probably providing the best performance today. If another PMD comes
    in the future, this API may change to something more generic.

2/ Try to define a generic API today, accepting that the first driver
    that supports TSO is a bit slower, but reducing the risks of changing
    the API for TSO in the future.

I'm fine with both options.

Regards,
Olivier

[1]=20
http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe=
_main.c?v=3D3.14#L6434