DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Olivier Matz" <olivier.matz@6wind.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: "Adrien Mazarguil" <adrien.mazarguil@6wind.com>,
	"Wiles, Keith" <keith.wiles@intel.com>, <dev@dpdk.org>,
	"Oleg Kuporosov" <olegk@mellanox.com>
Subject: Re: [dpdk-dev] mbuf changes
Date: Tue, 25 Oct 2016 15:13:27 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC359EA8BD@smartserver.smartshare.dk> (raw)
In-Reply-To: <4427b23b-e9c0-4e43-9c28-fb7e56da745c@6wind.com>

Comments inline.

Med venlig hilsen / kind regards
- Morten Brørup


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 25, 2016 2:49 PM
> To: Bruce Richardson; Morten Brørup
> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Oleg Kuporosov
> Subject: Re: [dpdk-dev] mbuf changes
> 
> 
> 
> On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> >> Comments at the end.
> >>
> >> Med venlig hilsen / kind regards
> >> - Morten Brørup
> >>
> >>> -----Original Message-----
> >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> >>> Sent: Tuesday, October 25, 2016 2:20 PM
> >>> To: Morten Brørup
> >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz;
> Oleg
> >>> Kuporosov
> >>> Subject: Re: [dpdk-dev] mbuf changes
> >>>
> >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> >>>> Comments inline.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> >>>>> Richardson
> >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> >>>>> To: Adrien Mazarguil
> >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> >>>>> Kuporosov
> >>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>
> >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> >>>>>>> Comments inline.
> >>>>>>>
> >>>>>>> Med venlig hilsen / kind regards
> >>>>>>> - Morten Brørup
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> >>>>>>>> To: Bruce Richardson
> >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> >>>>>>>> Oleg Kuporosov
> >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>>>>
> >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> >>> wrote:
> >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> >>> wrote:
> >>>>>>>> [...]
> >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> >>>>>>>> <mb@smartsharesystems.com> wrote:
> >>>>>>>> [...]
> >>>>
> >>>>>>>>> One other point I'll mention is that we need to have a
> >>>>>>>>> discussion on how/where to add in a timestamp value into
> >>> the
> >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> >>>>> sequence
> >>>>>>>>> number value, but I also suspect that 32-bits of a
> >>> timestamp
> >>>>>>>>> is not going to be enough for
> >>>>>>>> many.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> If we consider that timestamp representation should use
> >>>>> nanosecond
> >>>>>>>> granularity, a 32-bit value may likely wrap around too
> >>> quickly
> >>>>>>>> to be useful. We can also assume that applications requesting
> >>>>>>>> timestamps may care more about latency than throughput, Oleg
> >>>>> found
> >>>>>>>> that using the second cache line for this purpose had a
> >>>>> noticeable impact [1].
> >>>>>>>>
> >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> >>>>>>>
> >>>>>>> I agree with Oleg about the latency vs. throughput importance
> >>>>>>> for
> >>>>> such applications.
> >>>>>>>
> >>>>>>> If you need high resolution timestamps, consider them to be
> >>>>> generated by the NIC RX driver, possibly by the hardware itself
> >>>>> (http://w3new.napatech.com/features/time-precision/hardware-time-
> >>>>> stamp), so the timestamp belongs in the first cache line. And I
> am
> >>>>> proposing that it should have the highest possible accuracy,
> which
> >>>>> makes the value hardware dependent.
> >>>>>>>
> >>>>>>> Furthermore, I am arguing that we leave it up to the
> >>> application
> >>>>>>> to
> >>>>> keep track of the slowly moving bits (i.e. counting whole
> seconds,
> >>>>> hours and calendar date) out of band, so we don't use precious
> >>> space
> >>>>> in the mbuf. The application doesn't need the NIC RX driver's
> fast
> >>>>> path to capture which date (or even which second) a packet was
> >>>>> received. Yes, it adds complexity to the application, but we
> can't
> >>>>> set aside 64 bit for a generic timestamp. Or as a weird tradeoff:
> >>>>> Put the fast moving 32 bit in the first cache line and the slow
> >>>>> moving 32 bit in the second cache line, as a placeholder for the
> >>> application to fill out if needed.
> >>>>> Yes, it means that the application needs to check the time and
> >>>>> update its variable holding the slow moving time once every
> second
> >>>>> or so; but that should be doable without significant effort.
> >>>>>>
> >>>>>> That's a good point, however without a 64 bit value, elapsed
> time
> >>>>>> between two arbitrary mbufs cannot be measured reliably due to
> >>> not
> >>>>>> enough context, one way or another the low resolution value is
> >>>>>> also
> >>>>> needed.
> >>>>>>
> >>>>>> Obviously latency-sensitive applications are unlikely to perform
> >>>>>> lengthy buffering and require this but I'm not sure about all
> the
> >>>>>> possible use-cases. Considering many NICs expose 64 bit
> >>> timestaps,
> >>>>>> I suggest we do not truncate them.
> >>>>>>
> >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be tempted
> >>>>>> to fill the extra 32 bits whenever they can and negate the
> >>>>>> performance improvement of the first cache line.
> >>>>>
> >>>>> I would tend to agree, and I don't really see any convenient way
> >>>>> to avoid putting in a 64-bit field for the timestamp in cache-
> line 0.
> >>>>> If we are ok with having this overlap/partially overlap with
> >>>>> sequence number, it will use up an extra 4B of storage in that
> >>> cacheline.
> >>>>
> >>>> I agree about the lack of convenience! And Adrien certainly has a
> >>> point about PMD temptations.
> >>>>
> >>>> However, I still don't think that a NICs ability to date-stamp a
> >>> packet is sufficient reason to put a date-stamp in cache line 0 of
> >>> the mbuf. Storing only the fast moving 32 bit in cache line 0 seems
> >>> like a good compromise to me.
> >>>>
> >>>> Maybe you can find just one more byte, so it can hold 17 minutes
> >>>> with nanosecond resolution. (I'm joking!)
> >>>>
> >>>> Please don't sacrifice the sequence number for the
> >>>> seconds/hours/days
> >>> part a timestamp. Maybe it could be configurable to use a 32 bit or
> >>> 64 bit timestamp.
> >>>>
> >>> Do you see both timestamp and sequence numbers being used together?
> >>> I would have thought that apps would either use one or the other?
> >>> However, your suggestion is workable in any case, to allow the
> >>> sequence number to overlap just the high 32 bits of the timestamp,
> >>> rather than the low.
> >>
> >> In our case, I can foresee sequence numbers used for packet
> processing and timestamps for timing analysis (and possibly for packet
> capturing, when being used). For timing analysis, we don’t need long
> durations, e.g. 4 seconds with 32 bit nanosecond resolution suffices.
> And for packet capturing we are perfectly capable of adding the slowly
> moving 32 bit of the timestamp to our output data stream without
> fetching it from the mbuf.
> >>
> 
> We should keep in mind that today we have the seqn field but it is not
> used by any PMD. In case it is implemented, would it be a per-queue
> sequence number? Is it useful from an application view?
> 
> This field is only used by the librte_reorder library, and in my
> opinion, we should consider moving it in the second cache line since it
> is not filled by the PMD.

Our DPDK project is not yet sufficiently progressed to be able to provide quality feedback about this.

> > For the 32-bit timestamp case, it might be useful to have a
> > right-shift value passed in to the ethdev driver. If we assume a NIC
> > with nanosecond resolution, (or TSC value with resolution of that
> > order of magnitude), then the app can choose to have 1 ns resolution
> > with 4 second wraparound, or alternatively 4ns resolution with 16
> > second wraparound, or even microsecond resolution with wrap around of
> over an hour.
> > The cost is obviously just a shift op in the driver code per packet -
> > hopefully with multiple packets done at a time using vector
> operations.
> 
> 
> About the timestamp, we can manage to find 64 bits in the first cache
> line, without sacrifying any field we have today. The question is more
> for the fields we may want to add later.
> 
> To answer to the question of the size of the timestamp, the first
> question is to know what is the precision required for the applications
> using it?

In our application, microsecond resolution probably suffices, although we do use nanoseconds today. And a wrapping counter is fine for us, so we don’t need the slow moving bits.

For dedicated packet capture applications, please refer to e.g. Napatech's reasoning: http://w3new.napatech.com/features/time-precision/hardware-time-stamp

> I don't quite like the idea of splitting the timestamp in the 2 cache
> lines, I think it would not be easy to use.
> 
> 
> Olivier


  reply	other threads:[~2016-10-25 13:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 15:49 Morten Brørup
2016-10-24 16:11 ` Wiles, Keith
2016-10-24 16:25   ` Bruce Richardson
2016-10-24 21:47     ` Morten Brørup
2016-10-25  8:53       ` Bruce Richardson
2016-10-25 10:02         ` Ananyev, Konstantin
2016-10-25 10:22           ` Morten Brørup
2016-10-25 13:00             ` Olivier Matz
2016-10-25 13:04               ` Ramia, Kannan Babu
2016-10-25 13:24                 ` Thomas Monjalon
2016-10-25 14:32                   ` Ramia, Kannan Babu
2016-10-25 14:54                     ` Thomas Monjalon
2016-10-25 13:15               ` Bruce Richardson
2016-10-25 13:20               ` Thomas Monjalon
2016-10-25  9:39     ` Adrien Mazarguil
2016-10-25 10:11       ` Morten Brørup
2016-10-25 11:04         ` Adrien Mazarguil
2016-10-25 11:13           ` Bruce Richardson
2016-10-25 12:16             ` Morten Brørup
2016-10-25 12:20               ` Bruce Richardson
2016-10-25 12:33                 ` Morten Brørup
2016-10-25 12:45                   ` Bruce Richardson
2016-10-25 12:48                     ` Olivier Matz
2016-10-25 13:13                       ` Morten Brørup [this message]
2016-10-25 13:38                       ` Ananyev, Konstantin
2016-10-25 14:25                         ` Morten Brørup
2016-10-25 14:45                           ` Olivier Matz
2016-10-28 13:34                       ` Pattan, Reshma
2016-10-28 14:11                         ` Morten Brørup
2016-10-28 15:25                           ` Pattan, Reshma
2016-10-28 16:50                           ` Adrien Mazarguil
2016-10-28 17:00                             ` Richardson, Bruce
2016-10-28 20:27                               ` Morten Brørup
2016-10-31 10:55                               ` Morten Brørup
2016-10-31 16:05                       ` Morten Brørup
2016-10-25 13:48               ` Adrien Mazarguil
2016-10-25 13:58                 ` Ananyev, Konstantin
2016-10-25 11:54     ` Shreyansh Jain
2016-10-25 12:05       ` Bruce Richardson
2016-10-26  8:28         ` Alejandro Lucero
2016-10-26  9:01           ` Morten Brørup
2016-11-09 11:42           ` Alejandro Lucero
2016-11-10  9:19             ` [dpdk-dev] Fwd: " Alejandro Lucero
2016-10-25 12:49       ` [dpdk-dev] " Morten Brørup
2016-10-25 13:14 ` Olivier Matz
2016-10-25 13:18   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC359EA8BD@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=olegk@mellanox.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).