From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adrien.mazarguil@6wind.com>
Received: from mail-wm0-f53.google.com (mail-wm0-f53.google.com [74.125.82.53])
 by dpdk.org (Postfix) with ESMTP id 159CC5686
 for <dev@dpdk.org>; Tue, 25 Oct 2016 13:04:52 +0200 (CEST)
Received: by mail-wm0-f53.google.com with SMTP id d128so20448567wmf.1
 for <dev@dpdk.org>; Tue, 25 Oct 2016 04:04:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to;
 bh=+yCjMkQf/UbSLxetriH9QeYLHbRpQIlwxjiUdEPTL+U=;
 b=HnBkht/2w6u2shOzWhTO0mTpQ4MH0NcGinB1XSYrfewO+MhN2PB8D1FQlVrvzTo/J/
 MfCSkeIybWx1AATroGumsW2n7P6soSH7y5ssXrCm1m4ViRyXIHqiXZ2psiKZr7TB29Y6
 rV9C0eLk7ht9639ubf5y4q/wb1JGwLntvmSCxA1tjZpEb4/BPOixtUquKW3TW/WvoDW4
 ysxMvwWUG8jqdzojkDklOuxAaDyCmklSVqd6lqP280Pxs88KjM+Fm+ZaWAX+Z66szmwa
 r/CRCKDEiIaJvvbZL1uxpGI0Ux/GtthAOEl/V8zKsK5mtoCw1GUu11XA7T8A2o/SVBfV
 jmeA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to;
 bh=+yCjMkQf/UbSLxetriH9QeYLHbRpQIlwxjiUdEPTL+U=;
 b=b+7IQwLza2XEOdDwKR90eacjWqzrcpvF8QPR8XZJeoCqgrh7hMclYGmRFT8XnsvUmy
 D4BO+XmlLBYyeH1+QoL+8BzxY1+W9dgSZxPJ4mmrJUJbPfsWepzE/B4d04NzqPbf1M8I
 SZVC9SajVCGvJNao+Vc3SR4/3bVkhxuhqYY5FRS3LHmVhrTejTP1cLnzG49MiERq2wCm
 cc7qTTqrLpJ59fvkcX5lnwAO8ii/AShZk+/irWOmUYHu3G+vR6TaHgwBJshke3kz9Hbb
 63bJOcpAiCs4tKnaZSSId/WoUq8FN9xAdLyWXmPhgXNYAw1eIsEDDzHhFYqtXj+v9WJF
 7ImQ==
X-Gm-Message-State: ABUngveGSFVkK+BETfGtFtRe4Df1IGQyHo58MuZPEAXeSWGjwhDPF9M3zzo+SNCC34pi0/SC
X-Received: by 10.28.208.71 with SMTP id h68mr2491080wmg.14.1477393491802;
 Tue, 25 Oct 2016 04:04:51 -0700 (PDT)
Received: from 6wind.com (guy78-3-82-239-227-177.fbx.proxad.net.
 [82.239.227.177])
 by smtp.gmail.com with ESMTPSA id wh3sm24415078wjb.49.2016.10.25.04.04.50
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 25 Oct 2016 04:04:51 -0700 (PDT)
Date: Tue, 25 Oct 2016 13:04:44 +0200
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Morten =?utf-8?Q?Br=C3=B8rup?= <mb@smartsharesystems.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>,
 "Wiles, Keith" <keith.wiles@intel.com>, dev@dpdk.org,
 Olivier Matz <olivier.matz@6wind.com>, Oleg Kuporosov <olegk@mellanox.com>
Message-ID: <20161025110444.GK5733@6wind.com>
References: <98CBD80474FA8B44BF855DF32C47DC359EA8B1@smartserver.smartshare.dk>
 <7910CF2F-7087-4307-A9AC-DE0287104185@intel.com>
 <20161024162538.GA34988@bricha3-MOBL3.ger.corp.intel.com>
 <20161025093915.GJ5733@6wind.com>
 <98CBD80474FA8B44BF855DF32C47DC359EA8B7@smartserver.smartshare.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC359EA8B7@smartserver.smartshare.dk>
Subject: Re: [dpdk-dev] mbuf changes
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Oct 2016 11:04:52 -0000

On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> Comments inline.
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Tuesday, October 25, 2016 11:39 AM
> > To: Bruce Richardson
> > Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz; Oleg
> > Kuporosov
> > Subject: Re: [dpdk-dev] mbuf changes
> > 
> > On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson wrote:
> > > On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith wrote:
> > [...]
> > > > > On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > <mb@smartsharesystems.com> wrote:
> > [...]
> > > > > 5.
> > > > >
> > > > > And here’s something new to think about:
> > > > >
> > > > > m->next already reveals if there are more segments to a packet.
> > Which purpose does m->nb_segs serve that is not already covered by m-
> > >next?
> > >
> > > It is duplicate info, but nb_segs can be used to check the validity
> > of
> > > the next pointer without having to read the second mbuf cacheline.
> > >
> > > Whether it's worth having is something I'm happy enough to discuss,
> > > though.
> > 
> > Although slower in some cases than a full blown "next packet" pointer,
> > nb_segs can also be conveniently abused to link several packets and
> > their segments in the same list without wasting space.
> 
> I don’t understand that; can you please elaborate? Are you abusing m->nb_segs as an index into an array in your application? If that is the case, and it is endorsed by the community, we should get rid of m->nb_segs and add a member for application specific use instead. 

Well, that's just an idea, I'm not aware of any application using this,
however the ability to link several packets with segments seems
useful to me (e.g. buffering packets). Here's a diagram:

 .-----------.   .-----------.   .-----------.   .-----------.   .------
 | pkt 0     |   | seg 1     |   | seg 2     |   | pkt 1     |   | pkt 2
 |      next --->|      next --->|      next --->|      next --->| ...
 | nb_segs 3 |   | nb_segs 1 |   | nb_segs 1 |   | nb_segs 1 |   |
 `-----------'   `-----------'   `-----------'   `-----------'   `------

> > > One other point I'll mention is that we need to have a discussion on
> > > how/where to add in a timestamp value into the mbuf. Personally, I
> > > think it can be in a union with the sequence number value, but I also
> > > suspect that 32-bits of a timestamp is not going to be enough for
> > many.
> > >
> > > Thoughts?
> > 
> > If we consider that timestamp representation should use nanosecond
> > granularity, a 32-bit value may likely wrap around too quickly to be
> > useful. We can also assume that applications requesting timestamps may
> > care more about latency than throughput, Oleg found that using the
> > second cache line for this purpose had a noticeable impact [1].
> > 
> >  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> 
> I agree with Oleg about the latency vs. throughput importance for such applications.
> 
> If you need high resolution timestamps, consider them to be generated by the NIC RX driver, possibly by the hardware itself (http://w3new.napatech.com/features/time-precision/hardware-time-stamp), so the timestamp belongs in the first cache line. And I am proposing that it should have the highest possible accuracy, which makes the value hardware dependent.
> 
> Furthermore, I am arguing that we leave it up to the application to keep track of the slowly moving bits (i.e. counting whole seconds, hours and calendar date) out of band, so we don't use precious space in the mbuf. The application doesn't need the NIC RX driver's fast path to capture which date (or even which second) a packet was received. Yes, it adds complexity to the application, but we can't set aside 64 bit for a generic timestamp. Or as a weird tradeoff: Put the fast moving 32 bit in the first cache line and the slow moving 32 bit in the second cache line, as a placeholder for the application to fill out if needed. Yes, it means that the application needs to check the time and update its variable holding the slow moving time once every second or so; but that should be doable without significant effort.

That's a good point, however without a 64 bit value, elapsed time between
two arbitrary mbufs cannot be measured reliably due to not enough context,
one way or another the low resolution value is also needed.

Obviously latency-sensitive applications are unlikely to perform lengthy
buffering and require this but I'm not sure about all the possible
use-cases. Considering many NICs expose 64 bit timestaps, I suggest we do
not truncate them.

I'm not a fan of the weird tradeoff either, PMDs will be tempted to fill the
extra 32 bits whenever they can and negate the performance improvement of
the first cache line.

-- 
Adrien Mazarguil
6WIND