From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 5B67F58CB
 for <dev@dpdk.org>; Fri,  2 Dec 2016 02:01:01 +0100 (CET)
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga105.fm.intel.com with ESMTP; 01 Dec 2016 17:01:00 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.33,284,1477983600"; d="scan'208";a="198101324"
Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25])
 by fmsmga004.fm.intel.com with ESMTP; 01 Dec 2016 17:00:57 -0800
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.43]) by
 irsmsx110.ger.corp.intel.com ([163.33.3.25]) with mapi id 14.03.0248.002;
 Fri, 2 Dec 2016 01:00:56 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
CC: Thomas Monjalon <thomas.monjalon@6wind.com>, "dev@dpdk.org"
 <dev@dpdk.org>, Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>, Stephen Hurd
 <stephen.hurd@broadcom.com>, Jan Medala <jan@semihalf.com>, Jakub Palider
 <jpa@semihalf.com>, John Daley <johndale@cisco.com>, Alejandro Lucero
 <alejandro.lucero@netronome.com>, Harish Patil <harish.patil@qlogic.com>,
 Rasesh Mody <rasesh.mody@qlogic.com>, Jerin Jacob
 <jerin.jacob@caviumnetworks.com>, Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Yong Wang <yongwang@vmware.com>, "Kulasek, TomaszX"
 <tomaszx.kulasek@intel.com>, "olivier.matz@6wind.com"
 <olivier.matz@6wind.com>
Thread-Topic: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
Thread-Index: AQHSRbBDKacvBhx2eUCvrHI7LRX1L6DuQp0AgALr7oCAACwIoIABX2MAgAEdxrA=
Date: Fri, 2 Dec 2016 01:00:55 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772583F0E2A9A@irsmsx105.ger.corp.intel.com>
References: <1477486575-25148-1-git-send-email-tomaszx.kulasek@intel.com>
 <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com>
 <8317180.L80Qf11uiu@xps13> <20161130074003.GD10340@6wind.com>
 <2601191342CEEE43887BDE71AB9772583F0E20C8@irsmsx105.ger.corp.intel.com>
 <20161201071518.GG10340@6wind.com>
In-Reply-To: <20161201071518.GG10340@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Dec 2016 01:01:02 -0000


Hi Adrien,

>=20
> Hi Konstantin,
>=20
> On Wed, Nov 30, 2016 at 10:54:50AM +0000, Ananyev, Konstantin wrote:
> [...]
> > > Something is definitely needed here, and only PMDs can provide it. I =
think
> > > applications should not have to clear checksum fields or initialize t=
hem to
> > > some magic value, same goes for any other offload or hardware limitat=
ion
> > > that needs to be worked around.
> > >
> > > tx_prep() is one possible answer to this issue, however as mentioned =
in the
> > > original patch it can be very expensive if exposed by the PMD.
> > >
> > > Another issue I'm more concerned about is the way limitations are man=
aged
> > > (struct rte_eth_desc_lim). While not officially tied to tx_prep(), th=
is
> > > structure contains new fields that are only relevant to a few devices=
, and I
> > > fear it will keep growing with each new hardware quirk to manage, bre=
aking
> > > ABIs in the process.
> >
> > Well, if some new HW capability/limitation would arise and we'd like to=
 support
> > it in DPDK, then yes we probably would need to think how to incorporate=
 it here.
> > Do you have anything particular in mind here?
>=20
> Nothing in particular, so for the sake of the argument, let's suppose tha=
t I
> would like to add a field to expose some limitation that only applies to =
my
> PMD during TX but looks generic enough to make sense, e.g. maximum packet
> size when VLAN tagging is requested.

Hmm, I didn't hear about such limitations so far, but if it is real case -
sure, feel free to submit the patch.  =20

> PMDs are free to set that field to some
> special value (say, 0) if they do not care.
>=20
> Since that field exists however, conscious applications should check its
> value for each packet that needs to be transmitted. This extra code cause=
s a
> slowdown just by sitting in the data path. Since it is not the only field=
 in
> that structure, the performance impact can be significant.
>=20
> Even though this code is inside applications, it remains unfair to PMDs f=
or
> which these tests are irrelevant. This problem is identified and addresse=
d
> by tx_prepare().

I suppose the question is why do we need:
uint16_t nb_seg_max;
uint16_t nb_mtu_seg_max;
as we now have tx_prepare(), right?

For two reasons:
1. Some people might feel that tx_prepare() is not good (smart/fast) enough
for them and would prefer to do necessary preparations for TX offloads them=
selves.
2. Even if people do use tx_prepare() they still should take this informati=
on into accout.
As an example ixgbe can't TX packets with then 40 segments.
Obviously ixbge_tx_prep() performs that check and returns an error.
But it wouldn't try to merge/reallocate mbufs for you.
User still has to do it himself, or just prevent creating such long chains =
somehow.

>=20
> Thanks to tx_prepare(), these checks are moved back into PMDs where they
> belong. PMDs that do not need them do not have to provide support for
> tx_prepare() and do not suffer any performance impact as result;
> applications only have to make sure tx_prepare() is always called at some
> point before tx_burst().
>=20
> Once you reach this stage, you've effectively made tx_prepare() mandatory
> before tx_burst(). If some bug occurs, then perhaps you forgot to call
> tx_prepare(), you just need to add it. The total cost for doing TX is
> therefore tx_prepare() + tx_burst().
>=20
> I'm perhaps a bit pessimistic mind you, but I do not think tx_prepare() w=
ill
> remain optional for long. Sure, PMDs that do not implement it do not care=
,
> I'm focusing on applications, for which the performance impact of calling
> tx_prepare() followed by tx_burst() is higher than a single tx_burst()
> performing all the necessary preparation at once.
>=20
> [...]
> > > Following the same logic, why can't such a thing be made part of the =
TX
> > > burst function as well (through a direct call to rte_phdr_cksum_fix()
> > > whenever necessary). From an application standpoint, what are the adv=
antages
> > > of having to:
> > >
> > >  if (tx_prep()) // iterate and update mbufs as needed
> > >      tx_burst(); // iterate and send
> > >
> > > Compared to:
> > >
> > >  tx_burst(); // iterate, update as needed and send
> >
> > I think that was discussed extensively quite a lot previously here:
> > As Thomas already replied - main motivation is to allow user
> > to execute them on different stages of packet TX pipeline,
> > and probably on different cores.
> > I think that provides better flexibility to the user to when/where
> > do these preparations and hopefully would lead to better performance.
>=20
> And I agree, I think this use case is valid but does not warrant such a h=
igh
> penalty when your application does not need that much flexibility. Simple
> (yet conscious) applications need the highest performance. Complex ones a=
s
> you described already suffer quite a bit from IPCs and won't mind a coupl=
e
> of extra CPU cycles right?

It would mean an extra cache-miss for every packet, so I think performance =
hit
would be quite significant.=20
About the 'simple' case when tx_prep() and tx_burst() are called on the sam=
e core,
Why do you believe that:
tx_prep(); tx_burst(); would be much slower than tx_burst() {tx_prep(), ...=
}?
tx_prep() itself is quite expensive, let say for Intel HW it includes:
- read mbuf fileds (2 cache-lines),
- read packet header (1/2 cache-lines)
- calculate pseudo-header csum
 - update packet header=20
Comparing to that price of extra function call seems neglectable
(if we TX packets in bursts of course).=20

>=20
> Yes they will, therefore we need a method that satisfies both cases.
>=20
> As a possible solution, a special mbuf flag could be added to each mbuf
> having gone through tx_prepare(). That way, tx_burst() could skip some
> checks and things it would otherwise have done.

That's an interesting idea, but it has one drawback:
As I understand, it means that from now on if user doing preparations on hi=
s own,
he had to setup this flag, otherwise tx_burst() would do extra unnecessary =
work.
So any existing applications that using TX offloads and do preparation by t=
hemselves
would have to be modified to avoid performance loss.

>=20
> Another possibility, telling the PMD first that you always intend to use
> tx_prepare() and getting a simpler/faster tx_burst() callback as a result=
.

That what we have right now (at least for Intel HW): =20
it is a user responsibility to do the necessary preparations/checks before =
calling tx_burst(). =20
With tx_prepare() we just remove from user the headache to implement tx_pre=
pare() on his own.
Now he can use a 'proper' PMD provided function.
My vote still would be for that model.

Konstantin