From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id 0DDC31BBE
 for <dev@dpdk.org>; Wed,  8 Mar 2017 12:13:45 +0100 (CET)
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga105.jf.intel.com with ESMTP; 08 Mar 2017 03:13:44 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.36,262,1486454400"; d="scan'208";a="65424939"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by orsmga004.jf.intel.com with ESMTP; 08 Mar 2017 03:13:41 -0800
Received: from irsmsx109.ger.corp.intel.com ([169.254.13.44]) by
 IRSMSX103.ger.corp.intel.com ([169.254.3.107]) with mapi id 14.03.0248.002;
 Wed, 8 Mar 2017 11:11:25 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>
CC: Jan Blunck <jblunck@infradead.org>, "Richardson, Bruce"
 <bruce.richardson@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
Thread-Index: AQHSdlXOvL8VQlbyOU+/T85mDKip26FcYNGAgA9pHoCAAD0PgIABJAiAgAAukACAAArJAIAEZh4AgAGZsQCAAGmrAIAAB0CAgAAHcgCAAANakIAAIcgAgAARDkCABE1sgIAF9CCAgAABN2CAAAaegIAADv0QgAAJP4CAAAuoYIAAD6MAgACKxICAAuH6gIAJARjw
Date: Wed, 8 Mar 2017 11:11:23 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772583FACBF10@IRSMSX109.ger.corp.intel.com>
References: <1485271173-13408-1-git-send-email-olivier.matz@6wind.com>
 <CALe+Z03meh2od13-pfnFh0SpmCqxgKLD5MG2MF5Bj9Q8EtS=Hw@mail.gmail.com>
 <20170221163808.GA213576@bricha3-MOBL3.ger.corp.intel.com>
 <CALe+Z01pVFdEckOUabXTnh1q-xEOmJajTagEB1hvqYZazrG7iA@mail.gmail.com>
 <2601191342CEEE43887BDE71AB9772583F11B4CC@irsmsx105.ger.corp.intel.com>
 <CALe+Z01ozmTdWwxcc7mG+NhSV16K4+-Pe5uDWASzBBs5oMyh1g@mail.gmail.com>
 <2601191342CEEE43887BDE71AB9772583F11B633@irsmsx105.ger.corp.intel.com>
 <20170224150053.279e718d@platinum>
 <CALe+Z01+K8Odpz3oqk672qsKnjVAXif0TJCwsuPhbcwX+Z11Sg@mail.gmail.com>
 <2601191342CEEE43887BDE71AB9772583F11E992@irsmsx105.ger.corp.intel.com>
 <20170228102359.5d601797@platinum>
 <2601191342CEEE43887BDE71AB9772583F11EA11@irsmsx105.ger.corp.intel.com>
 <20170228115043.3f78ce52@platinum>
 <2601191342CEEE43887BDE71AB9772583F11EA96@irsmsx105.ger.corp.intel.com>
 <20170228132825.37586902@platinum>
 <2601191342CEEE43887BDE71AB9772583F11EE7A@irsmsx105.ger.corp.intel.com>
 <20170302174623.268592a7@platinum>
In-Reply-To: <20170302174623.268592a7@platinum>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Mar 2017 11:13:46 -0000


Hi Olivier,

>=20
> Hi Konstantin,
>=20
> On Tue, 28 Feb 2017 22:53:55 +0000, "Ananyev, Konstantin" <konstantin.ana=
nyev@intel.com> wrote:
> > > > Another thing that doesn't look very convenient to me here -
> > > > We can have 2 different values of timestamp (both normalized and no=
t)
> > > > and there is no clear way for the application to know which one is =
in
> > > > use right now. So each app writer would have to come-up with his ow=
n
> > > > solution.
> > >
> > > It depends:
> > > - the solution you describe is to have the application storing the
> > >   normalized value in its private metadata.
> > > - another solution would be to store the normalized value in
> > >   m->timestamp. In this case, we would need a flag to tell if the
> > >   timestamp value is normalized.
> >
> > My first thought also was about second flag to specify was timestamp
> > already normalized or not.
> > Though I still in doubt - is it all really worth it: extra ol_flag, new=
 function in eth_dev API.
> > My feeling that we trying to overcomplicate things.
>=20
> I don't see what is so complicated. The idea is just to let the
> application do the normalization if it is required.

I meant 2 ol_flags and special function just to treat properly one of the m=
buf field
seems too much.
Though after second thought might be 2 ol_flags is not a bad idea -
it gives PMD writer a freedom to choose provide a normalized or raw value
on return from rx_burst().=20

>=20
> If the time is normalized in nanosecond in the PMD, we would still
> need to normalized the time reference (the 0). And for that we'd need
> a call to a synchronization code as well.
>=20
>=20
>=20
> > > The problem pointed out by Jan is that doing the timestamp
> > > normalization may take some CPU cycles, even if a small part of packe=
ts
> > > requires it.
> >
> > I understand that point, but from what I've seen with real example:
> > http://dpdk.org/ml/archives/dev/2016-October/048810.html
> > the amount of calculations at RX is pretty small.
> > I don't think it would affect performance in a noticeable way
> > (though I don't have any numbers here to prove it).
>=20
> I think we can consider by default that adding code in the data path
> impacts performance.
>=20
>=20
> > From other side, if user doesn't want a timestamp he can always disable
> > that feature anad save cycles, right?
> >
> > BTW, you and Jan both mention that not every packet would need a timest=
amp.
> > Instead we need sort of a timestamp for the group of packets?
>=20
> I think that for many applications the timestamp should be as precise
> as possible for each packet.
>=20
>=20
> > Is that really the only foreseen usage model?
>=20
> No, but it could be one.
>=20
>=20
> > If so, then why not to have a special function that would extract 'late=
st' timestamp
> > from the dev?
> > Or even have tx_burst_extra() that would return a latest timestamp (ext=
ra parameter or so).
> > Then there is no need to put timestamp into mbuf at all.
>=20
> Doing that will give a poor precision for the timestamp.
>=20
>=20
> > > > > Applications that
> > > > > are doing this are responsible of what they change.
> > > > >
> > > > >
> > > > > > 3. In theory with eth_dev_detach() - mbuf->port value might be
> > > > > > not valid at the point when application would decide to do
> > > > > > normalization.
> > > > > >
> > > > > > So to me all that approach with delayed normalization seems
> > > > > > unnecessary overcomplicated. Original one suggested by Olivier,
> > > > > > when normalization is done in PMD at RX look much cleaner and
> > > > > > more manageable.
> > > > >
> > > > > Detaching a device requires a synchronization between control and
> > > > > data plane, and not only for this use case.
> > > >
> > > > Of course it does.
> > > > But right now it is possible to do:
> > > >
> > > > eth_rx_burst(port=3D0, ..., &mbuf, 1);
> > > > eth_dev_detach(port=3D0, ...);
> > > > ...
> > > > /*process previously received mbuf */
> > > >
> > > > With what you are proposing it would be not always possible any mor=
e.
> > >
> > > With your example, it does not work even without the timestamp featur=
e,
> > > since the mbuf input port would reference an invalid port.
> > > This port  is usually used in the application to do a lookup for an p=
ort structure,
> > > so it is expected that the entry is valid. It would be even worse if =
you
> > > do a detach + attach.
> >
> > I am not talking about the mbuf->port value usage.
> > Right now user can access/interpret  all metadata fields set by PMD RX =
routines
> > (vlan, rss hash, ol_flags, ptype, etc.) without need to accessing the d=
evice data or
> > calling device functions.
> > With that change it wouldn't be the case anymore.
>=20
> That's the same for some other functions. If in my application I want
> to call eth_rx_queue_count(m->port), I will have the same problem.

Yes, but here you are trying to get extra information about device/queue ba=
sed
on port value stored inside mbuf.
I am talking about information that already stored inside particular mbuf i=
tself.
About m->port itself - as I said before my preference would be to remove it=
 at all
(partly because of that implication - we can't guarantee that m->port infor=
mation
would be valid though all mbuf lifetime).
But that's probably subject of another discussion.=20

>=20
> I think we also have something quite similar in examples/ptpclient:
>=20
>   rte_eth_rx_burst(portid, 0, &m, 1);
>   ...
>   parse_ptp_frames(portid, m);
>   ...
>   ptp_data.portid =3D portid;
>   ...
>   rte_eth_timesync_read_tx_timestamp(ptp_data->portid, ...)
>=20
>=20
> So, really, I think it's an application issue: when the app deletes
> a port, it should ask itself if there are remaining references to
> it (m->port).

Hmm, and where in the example below do you see the reference to the m->port=
?
As I can see, what that the code above does:
  - it deduces portid value from global variable  - not from m->port
  - saves portid info (not from m->port) inside global variable ptp_data.po=
rtid=20
 - later inside same function it used that value to call rte_ethdev functio=
ns
   (via parse_fup or parse_drsp).

So I am not sure how it relates to the topic we are discussing.

Anyway, to summarize how the proposal looks right now:=20

1. m->timestamp value after rx_burst() could be either in raw or normalized=
 format.
2. validity of m->timesamp and the it's format should be determined by 2 ol=
_flags
(something like: RX_TIMESTAMP, RX_TIMESTAMP_NORM).
3. PMD is free to choose what timestamp value to return (raw/normalized) =20
4. PMD can provide an optional routine inside devops:
uint64_t dev_ops->timestamp_normalise(uint64_t timestamps); =20
5. If the user wants to use that function it would be his responsibility to=
 map mbuf
to the port it was received from.=20

Is that correct?

Thanks
Konstantin


>=20
> > > So, I think it is already the responsibility of the application to do
> > > the sync (flush retrieved packets before detaching a port).
> >
> > The packets are not in RX or TX queue of detaching device any more.
> > I received a packet, after that I expect to have all its data and metad=
ata inside mbuf.
> > So I can store mbufs somewhere and process them much later.
> > Or might be I would like to pass it to the secondary process for loggin=
g/analyzing, etc.
>=20
> Yes, but that's still an app problem for me.
>=20
>=20
> > > > >In the first solution, the normalization is
> > > > > partial: unit is nanosecond, but the time reference is different.
> > > >
> > > > Not sure I get you here...
> > >
> > > In the first solution I described, each PMD had to convert its unit
> > > into nanosecond. This is easy because we assume the PMD knows the
> > > value of its clock. But to get a fully normalized value, it also has =
to
> > > use the same time reference, so we would also need to manage an offse=
t
> > > (we need a new API to give this value to the PMD).
> >
> > Yes, I suppose we do need an start timestamp and sort of factor() to co=
nvert
> > HW value, something like:
> >
> > mbuf->timestamp =3D rxq->start_timestamp  + factor(hw_timestamp);
> >
> > Right?
> > Why passing start_timestamp at the configure() phase will be a problem?
> >
> > >
> > > I have another fear related to hardware clocks: if clocks are not
> > > synchronized between PMDs, the simple operation "t * ratio - offset"
> > > won't work. That's why I think we could delegate this job in a specif=
ic
> > > library that would manage this.
> >
> > But then that library would need to account all PMDs inside the system,
> > and be aware about each HW clock skew, etc.
> > Again, doesn't' sound like an simple task to me.
>=20
> Exactly, that's also why I want to let the specialists take care of
> it. Having non-normalized timestamps now allow to do the job later
> when required, while allowing basic usages as required by metrics
> libraries and mlx pmd.
>=20
>=20
>=20
> Olivier