DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Pattan, Reshma" <reshma.pattan@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Morten Brørup" <mb@smartsharesystems.com>
Subject: Re: [dpdk-dev] mbuf changes
Date: Fri, 28 Oct 2016 13:34:54 +0000	[thread overview]
Message-ID: <3AEA2BF9852C6F48A459DA490692831F010BF015@IRSMSX109.ger.corp.intel.com> (raw)
In-Reply-To: <4427b23b-e9c0-4e43-9c28-fb7e56da745c@6wind.com>

Hi Olivier,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz
> Sent: Tuesday, October 25, 2016 1:49 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>; Morten Brørup
> <mb@smartsharesystems.com>
> Cc: Adrien Mazarguil <adrien.mazarguil@6wind.com>; Wiles, Keith
> <keith.wiles@intel.com>; dev@dpdk.org; Oleg Kuporosov
> <olegk@mellanox.com>
> Subject: Re: [dpdk-dev] mbuf changes
> 
> 
> 
> On 10/25/2016 02:45 PM, Bruce Richardson wrote:
> > On Tue, Oct 25, 2016 at 02:33:55PM +0200, Morten Brørup wrote:
> >> Comments at the end.
> >>
> >> Med venlig hilsen / kind regards
> >> - Morten Brørup
> >>
> >>> -----Original Message-----
> >>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> >>> Sent: Tuesday, October 25, 2016 2:20 PM
> >>> To: Morten Brørup
> >>> Cc: Adrien Mazarguil; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> >>> Kuporosov
> >>> Subject: Re: [dpdk-dev] mbuf changes
> >>>
> >>> On Tue, Oct 25, 2016 at 02:16:29PM +0200, Morten Brørup wrote:
> >>>> Comments inline.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> >>>>> Richardson
> >>>>> Sent: Tuesday, October 25, 2016 1:14 PM
> >>>>> To: Adrien Mazarguil
> >>>>> Cc: Morten Brørup; Wiles, Keith; dev@dpdk.org; Olivier Matz; Oleg
> >>>>> Kuporosov
> >>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>
> >>>>> On Tue, Oct 25, 2016 at 01:04:44PM +0200, Adrien Mazarguil wrote:
> >>>>>> On Tue, Oct 25, 2016 at 12:11:04PM +0200, Morten Brørup wrote:
> >>>>>>> Comments inline.
> >>>>>>>
> >>>>>>> Med venlig hilsen / kind regards
> >>>>>>> - Morten Brørup
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> >>>>>>>> Sent: Tuesday, October 25, 2016 11:39 AM
> >>>>>>>> To: Bruce Richardson
> >>>>>>>> Cc: Wiles, Keith; Morten Brørup; dev@dpdk.org; Olivier Matz;
> >>>>>>>> Oleg Kuporosov
> >>>>>>>> Subject: Re: [dpdk-dev] mbuf changes
> >>>>>>>>
> >>>>>>>> On Mon, Oct 24, 2016 at 05:25:38PM +0100, Bruce Richardson
> >>> wrote:
> >>>>>>>>> On Mon, Oct 24, 2016 at 04:11:33PM +0000, Wiles, Keith
> >>> wrote:
> >>>>>>>> [...]
> >>>>>>>>>>> On Oct 24, 2016, at 10:49 AM, Morten Brørup
> >>>>>>>> <mb@smartsharesystems.com> wrote:
> >>>>>>>> [...]
> >>>>
> >>>>>>>>> One other point I'll mention is that we need to have a
> >>>>>>>>> discussion on how/where to add in a timestamp value into
> >>> the
> >>>>>>>>> mbuf. Personally, I think it can be in a union with the
> >>>>> sequence
> >>>>>>>>> number value, but I also suspect that 32-bits of a
> >>> timestamp
> >>>>>>>>> is not going to be enough for
> >>>>>>>> many.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> If we consider that timestamp representation should use
> >>>>> nanosecond
> >>>>>>>> granularity, a 32-bit value may likely wrap around too
> >>> quickly
> >>>>>>>> to be useful. We can also assume that applications requesting
> >>>>>>>> timestamps may care more about latency than throughput, Oleg
> >>>>> found
> >>>>>>>> that using the second cache line for this purpose had a
> >>>>> noticeable impact [1].
> >>>>>>>>
> >>>>>>>>  [1] http://dpdk.org/ml/archives/dev/2016-October/049237.html
> >>>>>>>
> >>>>>>> I agree with Oleg about the latency vs. throughput importance
> >>>>>>> for
> >>>>> such applications.
> >>>>>>>
> >>>>>>> If you need high resolution timestamps, consider them to be
> >>>>> generated by the NIC RX driver, possibly by the hardware itself
> >>>>> (http://w3new.napatech.com/features/time-precision/hardware-time-
> >>>>> stamp), so the timestamp belongs in the first cache line. And I am
> >>>>> proposing that it should have the highest possible accuracy, which
> >>>>> makes the value hardware dependent.
> >>>>>>>
> >>>>>>> Furthermore, I am arguing that we leave it up to the
> >>> application
> >>>>>>> to
> >>>>> keep track of the slowly moving bits (i.e. counting whole seconds,
> >>>>> hours and calendar date) out of band, so we don't use precious
> >>> space
> >>>>> in the mbuf. The application doesn't need the NIC RX driver's fast
> >>>>> path to capture which date (or even which second) a packet was
> >>>>> received. Yes, it adds complexity to the application, but we can't
> >>>>> set aside 64 bit for a generic timestamp. Or as a weird tradeoff:
> >>>>> Put the fast moving 32 bit in the first cache line and the slow
> >>>>> moving 32 bit in the second cache line, as a placeholder for the
> >>> application to fill out if needed.
> >>>>> Yes, it means that the application needs to check the time and
> >>>>> update its variable holding the slow moving time once every second
> >>>>> or so; but that should be doable without significant effort.
> >>>>>>
> >>>>>> That's a good point, however without a 64 bit value, elapsed time
> >>>>>> between two arbitrary mbufs cannot be measured reliably due to
> >>> not
> >>>>>> enough context, one way or another the low resolution value is
> >>>>>> also
> >>>>> needed.
> >>>>>>
> >>>>>> Obviously latency-sensitive applications are unlikely to perform
> >>>>>> lengthy buffering and require this but I'm not sure about all the
> >>>>>> possible use-cases. Considering many NICs expose 64 bit
> >>> timestaps,
> >>>>>> I suggest we do not truncate them.
> >>>>>>
> >>>>>> I'm not a fan of the weird tradeoff either, PMDs will be tempted
> >>>>>> to fill the extra 32 bits whenever they can and negate the
> >>>>>> performance improvement of the first cache line.
> >>>>>
> >>>>> I would tend to agree, and I don't really see any convenient way
> >>>>> to avoid putting in a 64-bit field for the timestamp in cache-line 0.
> >>>>> If we are ok with having this overlap/partially overlap with
> >>>>> sequence number, it will use up an extra 4B of storage in that
> >>> cacheline.
> >>>>
> >>>> I agree about the lack of convenience! And Adrien certainly has a
> >>> point about PMD temptations.
> >>>>
> >>>> However, I still don't think that a NICs ability to date-stamp a
> >>> packet is sufficient reason to put a date-stamp in cache line 0 of
> >>> the mbuf. Storing only the fast moving 32 bit in cache line 0 seems
> >>> like a good compromise to me.
> >>>>
> >>>> Maybe you can find just one more byte, so it can hold 17 minutes
> >>>> with nanosecond resolution. (I'm joking!)
> >>>>
> >>>> Please don't sacrifice the sequence number for the
> >>>> seconds/hours/days
> >>> part a timestamp. Maybe it could be configurable to use a 32 bit or
> >>> 64 bit timestamp.
> >>>>
> >>> Do you see both timestamp and sequence numbers being used together?
> >>> I would have thought that apps would either use one or the other?
> >>> However, your suggestion is workable in any case, to allow the
> >>> sequence number to overlap just the high 32 bits of the timestamp,
> >>> rather than the low.
> >>
> >> In our case, I can foresee sequence numbers used for packet processing and
> timestamps for timing analysis (and possibly for packet capturing, when being
> used). For timing analysis, we don’t need long durations, e.g. 4 seconds with 32
> bit nanosecond resolution suffices. And for packet capturing we are perfectly
> capable of adding the slowly moving 32 bit of the timestamp to our output data
> stream without fetching it from the mbuf.
> >>
> 
> We should keep in mind that today we have the seqn field but it is not used by
> any PMD. In case it is implemented, would it be a per-queue sequence number?
> Is it useful from an application view?
> 
> This field is only used by the librte_reorder library, and in my opinion, we should
> consider moving it in the second cache line since it is not filled by the PMD.
> 
> 
> > For the 32-bit timestamp case, it might be useful to have a
> > right-shift value passed in to the ethdev driver. If we assume a NIC
> > with nanosecond resolution, (or TSC value with resolution of that
> > order of magnitude), then the app can choose to have 1 ns resolution
> > with 4 second wraparound, or alternatively 4ns resolution with 16
> > second wraparound, or even microsecond resolution with wrap around of over
> an hour.
> > The cost is obviously just a shift op in the driver code per packet -
> > hopefully with multiple packets done at a time using vector operations.
> 
> 
> About the timestamp, we can manage to find 64 bits in the first cache line,
> without sacrifying any field we have today. The question is more for the fields
> we may want to add later.
> 
> To answer to the question of the size of the timestamp, the first question is to
> know what is the precision required for the applications using it?

As part of the 17.02 latency calculation feature, the requirement is to report latency in nanoseconds. So, +1 on keeping timestamp as 64bits. 
Since the feature is planned for 17.02, can we finalize on the timestamp position in the mbuf struct? 
Based on the decision, I am planning to make the change and send ABI notice if needed.

Reshma

> 
> I don't quite like the idea of splitting the timestamp in the 2 cache lines, I think it
> would not be easy to use.
> 
> 
> Olivier

  parent reply	other threads:[~2016-10-28 13:34 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 15:49 Morten Brørup
2016-10-24 16:11 ` Wiles, Keith
2016-10-24 16:25   ` Bruce Richardson
2016-10-24 21:47     ` Morten Brørup
2016-10-25  8:53       ` Bruce Richardson
2016-10-25 10:02         ` Ananyev, Konstantin
2016-10-25 10:22           ` Morten Brørup
2016-10-25 13:00             ` Olivier Matz
2016-10-25 13:04               ` Ramia, Kannan Babu
2016-10-25 13:24                 ` Thomas Monjalon
2016-10-25 14:32                   ` Ramia, Kannan Babu
2016-10-25 14:54                     ` Thomas Monjalon
2016-10-25 13:15               ` Bruce Richardson
2016-10-25 13:20               ` Thomas Monjalon
2016-10-25  9:39     ` Adrien Mazarguil
2016-10-25 10:11       ` Morten Brørup
2016-10-25 11:04         ` Adrien Mazarguil
2016-10-25 11:13           ` Bruce Richardson
2016-10-25 12:16             ` Morten Brørup
2016-10-25 12:20               ` Bruce Richardson
2016-10-25 12:33                 ` Morten Brørup
2016-10-25 12:45                   ` Bruce Richardson
2016-10-25 12:48                     ` Olivier Matz
2016-10-25 13:13                       ` Morten Brørup
2016-10-25 13:38                       ` Ananyev, Konstantin
2016-10-25 14:25                         ` Morten Brørup
2016-10-25 14:45                           ` Olivier Matz
2016-10-28 13:34                       ` Pattan, Reshma [this message]
2016-10-28 14:11                         ` Morten Brørup
2016-10-28 15:25                           ` Pattan, Reshma
2016-10-28 16:50                           ` Adrien Mazarguil
2016-10-28 17:00                             ` Richardson, Bruce
2016-10-28 20:27                               ` Morten Brørup
2016-10-31 10:55                               ` Morten Brørup
2016-10-31 16:05                       ` Morten Brørup
2016-10-25 13:48               ` Adrien Mazarguil
2016-10-25 13:58                 ` Ananyev, Konstantin
2016-10-25 11:54     ` Shreyansh Jain
2016-10-25 12:05       ` Bruce Richardson
2016-10-26  8:28         ` Alejandro Lucero
2016-10-26  9:01           ` Morten Brørup
2016-11-09 11:42           ` Alejandro Lucero
2016-11-10  9:19             ` [dpdk-dev] Fwd: " Alejandro Lucero
2016-10-25 12:49       ` [dpdk-dev] " Morten Brørup
2016-10-25 13:14 ` Olivier Matz
2016-10-25 13:18   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3AEA2BF9852C6F48A459DA490692831F010BF015@IRSMSX109.ger.corp.intel.com \
    --to=reshma.pattan@intel.com \
    --cc=dev@dpdk.org \
    --cc=mb@smartsharesystems.com \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).