From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by dpdk.org (Postfix) with ESMTP id D6EDB2BE1
 for <dev@dpdk.org>; Thu, 28 Jul 2016 16:21:04 +0200 (CEST)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga104.fm.intel.com with ESMTP; 28 Jul 2016 07:21:04 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,434,1464678000"; d="scan'208";a="1015540506"
Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153])
 by fmsmga001.fm.intel.com with ESMTP; 28 Jul 2016 07:21:02 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.102]) by
 IRSMSX101.ger.corp.intel.com ([169.254.1.183]) with mapi id 14.03.0248.002;
 Thu, 28 Jul 2016 15:21:02 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>, Jerin Jacob
 <jerin.jacob@caviumnetworks.com>
CC: Thomas Monjalon <thomas.monjalon@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev
 structure
Thread-Index: AQHR6C4vhCk5+MDYj0GuIonGmNNwxqAsvetQgABMTACAAI6goIAAD0eAgAAetsCAAAhmAIAAE1hQ
Date: Thu, 28 Jul 2016 14:21:01 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836B8ABEB@irsmsx105.ger.corp.intel.com>
References: <1469024691-58750-1-git-send-email-tomaszx.kulasek@intel.com>
 <1469114659-66063-1-git-send-email-tomaszx.kulasek@intel.com>
 <2601191342CEEE43887BDE71AB97725836B80AD8@irsmsx105.ger.corp.intel.com>
 <2146153.nVzdynOqdk@xps13> <20160727171043.GA22116@localhost.localdomain>
 <2601191342CEEE43887BDE71AB97725836B8884E@irsmsx105.ger.corp.intel.com>
 <20160727174133.GA22895@localhost.localdomain>
 <2601191342CEEE43887BDE71AB97725836B88894@irsmsx105.ger.corp.intel.com>
 <20160728021345.GA3617@localhost.localdomain>
 <2601191342CEEE43887BDE71AB97725836B8AA48@irsmsx105.ger.corp.intel.com>
 <20160728113853.GA14755@localhost.localdomain>
 <2601191342CEEE43887BDE71AB97725836B8AB57@irsmsx105.ger.corp.intel.com>
 <579A0F9E.6090200@6wind.com>
In-Reply-To: <579A0F9E.6090200@6wind.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for rte_eth_dev
 structure
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Jul 2016 14:21:05 -0000

Hi Olivier,

>=20
> Hi,
>=20
> Jumping into this thread, it looks it's the last pending patch remaining =
for the release.
>=20
> For reference, the idea of tx_prep() was mentionned some time ago in http=
://dpdk.org/ml/archives/dev/2014-May/002504.html
>=20
> Few comments below.
>=20
> On 07/28/2016 03:01 PM, Ananyev, Konstantin wrote:
> > Right now to make HW TX offloads to work user is required to do particu=
lar actions:
> > 1. set mbuf.ol_flags properly.
> > 2. setup mbuf.tx_offload fields properly.
> > 3. update L3/L4 header fields in a particular way.
> >
> > We move #3 into tx_prep(), to hide that complexity from the user simpli=
fy things for him.
> > Though if he still prefers to do #3  by himself - that's ok too.
>=20
> I think moving #3 out of the application is a good idea. Today, for TSO, =
the offload dpdk API requires to set a specific pseudo header
> checksum (which does not include the ip len, as expected by Intel drivers=
), and set the IP checksum to 0.
>=20
> In our app, the network stack sets the TCP checksum to the standard pseud=
o header checksum, and before sending the mbuf:
> - packets are split in sw if the driver does not support tso
> - the tcp csum is patched to match dpdk api if the driver supports tso
>=20
> In the patchs I've recently sent adding tso support for virtio-pmd, it co=
nforms to the dpdk API (phdr csum without ip len), so the tx function
> need to patch the mbuf data inside the driver, which is something what we=
 want to avoid, for some good reasons explained by Konstantin.

Yep, that would be another good use-case for tx_prep() I suppose.

>=20
> So, I think having a tx_prep would also be the occasion to standardize a =
bit the dpdk offload api, and let the driver-specific stuff in tx_prep().
>=20
> Konstantin, any opinion about this?

Yes, that sounds like a good thing to me.

>=20
> >>> Another main purpose of tx_prep(): for multi-segment packets is to
> >>> check that number of segments doesn't exceed  HW limit.
> >>> Again right now users have to do that on their own.
>=20
> If calling tx_prep() is optional, does it mean that this check may be don=
e twice? (once in tx_prep() and once in tx_burst())

I meant 'optional' in a way, that if user doesn't want to use tx_prep() and
do step #3 from the above on his own (what happens now) that is still ok.
But I think step #3 (modify packet's data) still needs to be done before tx=
_burst()  is called for the packets.

>=20
> What would be the advantage of doing this check in tx_prep() instead of k=
eeping it in tx_burst(), as it does not touch the mbuf data?
>=20
> >>> 3.  Having it a s separate function would allow user control when/whe=
re
> >>>        to call it, let say only for some packets, or probably call tx=
_prep()
> >>>        on one core, and do actual tx_burst() for these packets on the=
 other.
>=20
> Yes, from what I remember, the pipeline model was the main reason why we =
do not just modify the packet in tx_burst(). Right?

Yes.

>=20
> >>>>> If you refer as lost cycles here something like:
> >>>>> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_prep, -ENOTSUP); then
> >>>>> yes.
> >>>>> Though comparing to actual work need to be done for most HW TX
> >>>>> offloads, I think it is neglectable.
> >>>>
> >>>> Not sure.
>=20
> I think doing this test on a per-bulk basis should not impact performance=
.
>=20
> > To be honest, I don't understand what is your concern here.
> > That proposed change doesn't break any existing functionality, doesn't
> > introduce any new requirements to the existing API, and wouldn't
> > introduce any performance regression for existing apps.
> > It is a an extension, and user is free not to use it, if it doesn't fit=
 his needs.
> >  From other side there are users who are interested in that
> > functionality, and they do have use-cases for  it.
>=20
> In my opinion, using tx_prep() will implicitly become mandatory as soon a=
s the application want to do offload. An application that won't use
> it will have to prepare the mbuf, and this preparation will depend on the=
 device, which is not acceptable inside an application.

Yes, I also hope that most apps that do use TX offloads will start to use i=
t,
as I think it will be much more convenient way, then what we have right now=
.
I just liked to emphasis that user wouldn't be forced to.
Konstantin

>=20
>=20
> So, to conclude, the api change notification looks good to me, even if th=
ere is still some room to discuss the implementation details.
>=20
>=20
> Acked-by: Olivier Matz <olivier.matz@6wind.com>