DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>,
	"Wiles, Keith" <keith.wiles@intel.com>,
	dev@dpdk.org, Olivier Matz <olivier.matz@6wind.com>,
	Oleg Kuporosov <olegk@mellanox.com>
Subject: Re: [dpdk-dev] mbuf changes
Date: Tue, 25 Oct 2016 13:45:08 +0100	[thread overview]
Message-ID: <20161025124508.GA59784@bricha3-MOBL3.ger.corp.intel.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC359EA8BB@smartserver.smartshare.dk>

On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> Comments at the end.
> 
> Med venlig hilsen / kind regards
> - Morten Brørup
> 
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, October 25, 2016 2:20 PM
> > To: Morten Brørup
> > Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> > Kuporosov
> > Subject: Re: [dpdk-dev] mbuf changes
> > 
> > On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> > > Comments inline.
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > > > Richardson
> > > > Sent: Tuesday, October 25, 2016 1:14 PM
> > > > To: Adrien Mazarguil
> > > > Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> > > > Kuporosov
> > > > Subject: Re: [dpdk-dev] mbuf changes
> > > >
> > > > On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> > > > > On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> > > > > > Comments inline.
> > > > > >
> > > > > > Med venlig hilsen / kind regards
> > > > > > - Morten Brørup
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > > > > > > Sent: Tuesday, October 25, 2016 11:39 AM
> > > > > > > To: Bruce Richardson
> > > > > > > Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> > > > > > > Oleg Kuporosov
> > > > > > > Subject: Re: [dpdk-dev] mbuf changes
> > > > > > >
> > > > > > > On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> > wrote:
> > > > > > > > On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> > wrote:
> > > > > > > [...]
> > > > > > > > > > On Oct 24, 2016, at 10:49 AM, Morten Brørup
> > > > > > > <mb@smartsharesystems.com> wrote:
> > > > > > > [...]
> > >
> > > > > > > > One other point I'll mention is that we need to have a
> > > > > > > > discussion on how/where to add in a timestamp value into
> > the
> > > > > > > > mbuf. Personally, I think it can be in a union with the
> > > > sequence
> > > > > > > > number value, but I also suspect that 32-bits of a
> > timestamp
> > > > > > > > is not going to be enough for
> > > > > > > many.
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > >
> > > > > > > If we consider that timestamp representation should use
> > > > nanosecond
> > > > > > > granularity, a 32-bit value may likely wrap around too
> > quickly
> > > > > > > to be useful. We can also assume that applications requesting
> > > > > > > timestamps may care more about latency than throughput, Oleg
> > > > found
> > > > > > > that using the second cache line for this purpose had a
> > > > noticeable impact [1].
> > > > > > >
> > > > > > >  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> > > > > >
> > > > > > I agree with Oleg about the latency vs. throughput importance
> > > > > > for
> > > > such applications.
> > > > > >
> > > > > > If you need high resolution timestamps, consider them to be
> > > > generated by the NIC RX driver, possibly by the hardware itself
> > > > (http://w3new.napatech.com/features/time-precision/hardware-time-
> > > > stamp), so the timestamp belongs in the first cache line. And I am
> > > > proposing that it should have the highest possible accuracy, which
> > > > makes the value hardware dependent.
> > > > > >
> > > > > > Furthermore, I am arguing that we leave it up to the
> > application
> > > > > > to
> > > > keep track of the slowly moving bits (i.e. counting whole seconds,
> > > > hours and calendar date) out of band, so we don't use precious
> > space
> > > > in the mbuf. The application doesn't need the NIC RX driver's fast
> > > > path to capture which date (or even which second) a packet was
> > > > received. Yes, it adds complexity to the application, but we can't
> > > > set aside 64 bit for a generic timestamp. Or as a weird tradeoff:
> > > > Put the fast moving 32 bit in the first cache line and the slow
> > > > moving 32 bit in the second cache line, as a placeholder for the
> > application to fill out if needed.
> > > > Yes, it means that the application needs to check the time and
> > > > update its variable holding the slow moving time once every second
> > > > or so; but that should be doable without significant effort.
> > > > >
> > > > > That's a good point, however without a 64 bit value, elapsed time
> > > > > between two arbitrary mbufs cannot be measured reliably due to
> > not
> > > > > enough context, one way or another the low resolution value is
> > > > > also
> > > > needed.
> > > > >
> > > > > Obviously latency-sensitive applications are unlikely to perform
> > > > > lengthy buffering and require this but I'm not sure about all the
> > > > > possible use-cases. Considering many NICs expose 64 bit
> > timestaps,
> > > > > I suggest we do not truncate them.
> > > > >
> > > > > I'm not a fan of the weird tradeoff either, PMDs will be tempted
> > > > > to fill the extra 32 bits whenever they can and negate the
> > > > > performance improvement of the first cache line.
> > > >
> > > > I would tend to agree, and I don't really see any convenient way to
> > > > avoid putting in a 64-bit field for the timestamp in cache-line 0.
> > > > If we are ok with having this overlap/partially overlap with
> > > > sequence number, it will use up an extra 4B of storage in that
> > cacheline.
> > >
> > > I agree about the lack of convenience! And Adrien certainly has a
> > point about PMD temptations.
> > >
> > > However, I still don't think that a NICs ability to date-stamp a
> > packet is sufficient reason to put a date-stamp in cache line 0 of the
> > mbuf. Storing only the fast moving 32 bit in cache line 0 seems like a
> > good compromise to me.
> > >
> > > Maybe you can find just one more byte, so it can hold 17 minutes with
> > > nanosecond resolution. (I'm joking!)
> > >
> > > Please don't sacrifice the sequence number for the seconds/hours/days
> > part a timestamp. Maybe it could be configurable to use a 32 bit or 64
> > bit timestamp.
> > >
> > Do you see both timestamp and sequence numbers being used together? I
> > would have thought that apps would either use one or the other?
> > However, your suggestion is workable in any case, to allow the sequence
> > number to overlap just the high 32 bits of the timestamp, rather than
> > the low.
> 
> In our case, I can foresee sequence numbers used for packet processing and timestamps for timing analysis (and possibly for packet capturing, when being used). For timing analysis, we don’t need long durations, e.g. 4 seconds with 32 bit nanosecond resolution suffices. And for packet capturing we are perfectly capable of adding the slowly moving 32 bit of the timestamp to our output data stream without fetching it from the mbuf.
> 

For the 32-bit timestamp case, it might be useful to have a right-shift
value passed in to the ethdev driver. If we assume a NIC with nanosecond
resolution, (or TSC value with resolution of that order of magnitude),
then the app can choose to have 1 ns resolution with 4 second
wraparound, or alternatively 4ns resolution with 16 second wraparound,
or even microsecond resolution with wrap around of over an hour.
The cost is obviously just a shift op in the driver code per packet -
hopefully with multiple packets done at a time using vector operations.

/Bruce

  reply	other threads:[~2016-10-25 12:45 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 15:49 Morten Brørup
2016-10-24 16:11 ` Wiles, Keith
2016-10-24 16:25   ` Bruce Richardson
2016-10-24 21:47     ` Morten Brørup
2016-10-25  8:53       ` Bruce Richardson
2016-10-25 10:02         ` Ananyev, Konstantin
2016-10-25 10:22           ` Morten Brørup
2016-10-25 13:00             ` Olivier Matz
2016-10-25 13:04               ` Ramia, Kannan Babu
2016-10-25 13:24                 ` Thomas Monjalon
2016-10-25 14:32                   ` Ramia, Kannan Babu
2016-10-25 14:54                     ` Thomas Monjalon
2016-10-25 13:15               ` Bruce Richardson
2016-10-25 13:20               ` Thomas Monjalon
2016-10-25  9:39     ` Adrien Mazarguil
2016-10-25 10:11       ` Morten Brørup
2016-10-25 11:04         ` Adrien Mazarguil
2016-10-25 11:13           ` Bruce Richardson
2016-10-25 12:16             ` Morten Brørup
2016-10-25 12:20               ` Bruce Richardson
2016-10-25 12:33                 ` Morten Brørup
2016-10-25 12:45                   ` Bruce Richardson [this message]
2016-10-25 12:48                     ` Olivier Matz
2016-10-25 13:13                       ` Morten Brørup
2016-10-25 13:38                       ` Ananyev, Konstantin
2016-10-25 14:25                         ` Morten Brørup
2016-10-25 14:45                           ` Olivier Matz
2016-10-28 13:34                       ` Pattan, Reshma
2016-10-28 14:11                         ` Morten Brørup
2016-10-28 15:25                           ` Pattan, Reshma
2016-10-28 16:50                           ` Adrien Mazarguil
2016-10-28 17:00                             ` Richardson, Bruce
2016-10-28 20:27                               ` Morten Brørup
2016-10-31 10:55                               ` Morten Brørup
2016-10-31 16:05                       ` Morten Brørup
2016-10-25 13:48               ` Adrien Mazarguil
2016-10-25 13:58                 ` Ananyev, Konstantin
2016-10-25 11:54     ` Shreyansh Jain
2016-10-25 12:05       ` Bruce Richardson
2016-10-26  8:28         ` Alejandro Lucero
2016-10-26  9:01           ` Morten Brørup
2016-11-09 11:42           ` Alejandro Lucero
2016-11-10  9:19             ` [dpdk-dev] Fwd: " Alejandro Lucero
2016-10-25 12:49       ` [dpdk-dev] " Morten Brørup
2016-10-25 13:14 ` Olivier Matz
2016-10-25 13:18   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025124508.GA59784@bricha3-MOBL3.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=olegk@mellanox.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).