From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id 474105582
 for <dev@dpdk.org>; Wed, 30 Nov 2016 11:54:55 +0100 (CET)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga104.jf.intel.com with ESMTP; 30 Nov 2016 02:54:55 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.31,573,1473145200"; d="scan'208";a="1066095576"
Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75])
 by orsmga001.jf.intel.com with ESMTP; 30 Nov 2016 02:54:51 -0800
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.43]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.203]) with mapi id 14.03.0248.002;
 Wed, 30 Nov 2016 10:54:51 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>, Thomas Monjalon
 <thomas.monjalon@6wind.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Rahul Lakkireddy
 <rahul.lakkireddy@chelsio.com>, Stephen Hurd <stephen.hurd@broadcom.com>,
 "Jan Medala" <jan@semihalf.com>, Jakub Palider <jpa@semihalf.com>, John Daley
 <johndale@cisco.com>, Alejandro Lucero <alejandro.lucero@netronome.com>,
 Harish Patil <harish.patil@qlogic.com>, Rasesh Mody <rasesh.mody@qlogic.com>, 
 Jerin Jacob <jerin.jacob@caviumnetworks.com>, Yuanhan Liu
 <yuanhan.liu@linux.intel.com>, Yong Wang <yongwang@vmware.com>, "Kulasek,
 TomaszX" <tomaszx.kulasek@intel.com>, "olivier.matz@6wind.com"
 <olivier.matz@6wind.com>
Thread-Topic: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
Thread-Index: AQHSRbBDKacvBhx2eUCvrHI7LRX1L6DuQp0AgALr7oCAACwIoA==
Date: Wed, 30 Nov 2016 10:54:50 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772583F0E20C8@irsmsx105.ger.corp.intel.com>
References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com>
 <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com>
 <8317180.L80Qf11uiu@xps13> <20161130074003.GD10340@6wind.com>
In-Reply-To: <20161130074003.GD10340@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Nov 2016 10:54:56 -0000

Hi Adrien,

>=20
> On Mon, Nov 28, 2016 at 12:03:06PM +0100, Thomas Monjalon wrote:
> > We need attention of every PMD developers on this thread.
>=20
> I've been following this thread from the beginning while working on rte_f=
low
> and wanted to see where it was headed before replying. (I know, v11 was
> submitted about 1 month ago but still.)
>=20
> > Reminder of what Konstantin suggested:
> > "
> > - if the PMD supports TX offloads AND
> > - if to be able use any of these offloads the upper layer SW would have=
 to:
> >     * modify the contents of the packet OR
> >     * obey HW specific restrictions
> > then it is a PMD developer responsibility to provide tx_prep() that wou=
ld implement
> > expected modifications of the packet contents and restriction checks.
> > Otherwise, tx_prep() implementation is not required and can be safely s=
et to NULL.
> > "
> >
> > I copy/paste also my previous conclusion:
> >
> > Before txprep, there is only one API: the application must prepare the
> > packets checksum itself (get_psd_sum in testpmd).
> > With txprep, the application have 2 choices: keep doing the job itself
> > or call txprep which calls a PMD-specific function.
>=20
> Something is definitely needed here, and only PMDs can provide it. I thin=
k
> applications should not have to clear checksum fields or initialize them =
to
> some magic value, same goes for any other offload or hardware limitation
> that needs to be worked around.
>=20
> tx_prep() is one possible answer to this issue, however as mentioned in t=
he
> original patch it can be very expensive if exposed by the PMD.
>=20
> Another issue I'm more concerned about is the way limitations are managed
> (struct rte_eth_desc_lim). While not officially tied to tx_prep(), this
> structure contains new fields that are only relevant to a few devices, an=
d I
> fear it will keep growing with each new hardware quirk to manage, breakin=
g
> ABIs in the process.

Well, if some new HW capability/limitation would arise and we'd like to sup=
port
it in DPDK, then yes we probably would need to think how to incorporate it =
here.
Do you have anything particular in mind here?

>=20
> What are applications supposed to do, check each of them regardless befor=
e
> attempting to send a burst?
>=20
> I understand tx_prep() automates this process, however I'm wondering why
> isn't the TX burst function doing that itself. Using nb_mtu_seg_max as an
> example, tx_prep() has an extra check in case of TSO that the TX burst
> function does not perform. This ends up being much more expensive to
> applications due to the additional loop doing redundant testing on each
> mbuf.
>=20
> If, say as a performance improvement, we decided to leave the validation
> part to the TX burst function; what remains in tx_prep() is basically hea=
vy
> "preparation" requiring mbuf changes (i.e. erasing checksums, for now).
>=20
> Following the same logic, why can't such a thing be made part of the TX
> burst function as well (through a direct call to rte_phdr_cksum_fix()
> whenever necessary). From an application standpoint, what are the advanta=
ges
> of having to:
>=20
>  if (tx_prep()) // iterate and update mbufs as needed
>      tx_burst(); // iterate and send
>=20
> Compared to:
>=20
>  tx_burst(); // iterate, update as needed and send

I think that was discussed extensively quite a lot previously here:
As Thomas already replied - main motivation is to allow user
to execute them on different stages of packet TX pipeline,
and probably on different cores.
I think that provides better flexibility to the user to when/where
do these preparations and hopefully would lead to better performance.

Though, if you or any other PMD developer/maintainer would prefer
for particular PMD to combine both functionalities into tx_burst() and
keep tx_prep() as NOP - this is still possible too. =20

>=20
> Note that PMDs could still provide different TX callbacks depending on th=
e
> set of enabled offloads so performance is not unnecessarily impacted.
>=20
> In my opinion the second approach is both faster to applications and more
> friendly from a usability perspective, am I missing something obvious?
>=20
> > The question is: does non-Intel drivers need a checksum preparation for=
 TSO?
> > Will it behave well if txprep does nothing in these drivers?
> >
> > When looking at the code, most of drivers handle the TSO flags.
> > But it is hard to know whether they rely on the pseudo checksum or not.
> >
> > git grep -l 'PKT_TX_UDP_CKSUM\|PKT_TX_TCP_CKSUM\|PKT_TX_TCP_SEG' driver=
s/net/
> >
> > drivers/net/bnxt/bnxt_txr.c
> > drivers/net/cxgbe/sge.c
> > drivers/net/e1000/em_rxtx.c
> > drivers/net/e1000/igb_rxtx.c
> > drivers/net/ena/ena_ethdev.c
> > drivers/net/enic/enic_rxtx.c
> > drivers/net/fm10k/fm10k_rxtx.c
> > drivers/net/i40e/i40e_rxtx.c
> > drivers/net/ixgbe/ixgbe_rxtx.c
> > drivers/net/mlx4/mlx4.c
> > drivers/net/mlx5/mlx5_rxtx.c
> > drivers/net/nfp/nfp_net.c
> > drivers/net/qede/qede_rxtx.c
> > drivers/net/thunderx/nicvf_rxtx.c
> > drivers/net/virtio/virtio_rxtx.c
> > drivers/net/vmxnet3/vmxnet3_rxtx.c
> >
> > Please, we need a comment for each driver saying
> > "it is OK, we do not need any checksum preparation for TSO"
> > or
> > "yes we have to implement tx_prepare or TSO will not work in this mode"
>=20
> For both mlx4 and mlx5 then,
> "it is OK, we do not need any checksum preparation for TSO".
>=20
> Actually I do not think we'll ever need tx_prep() unless we add our own
> quirks to struct rte_eth_desc_lim (and friends) which are currently quiet=
ly
> handled by TX burst functions.

Ok, so MLX PMD is not affected by these changes and tx_prep for MLX can be =
safely
set to NULL, correct?

Thanks
Konstantin

>=20
> --
> Adrien Mazarguil
> 6WIND