DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: "Wiles, Keith" <keith.wiles@intel.com>, dev <dev@dpdk.org>,
	Thomas Monjalon <thomas@monjalon.net>,
	"Wang, Haiyue" <haiyue.wang@intel.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Andrew Rybchenko <arybchenko@solarflare.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	bruce.richardson@intel.com
Subject: Re: [dpdk-dev] [PATCH] mbuf: support dynamic fields and flags
Date: Mon, 23 Sep 2019 11:41:26 +0200	[thread overview]
Message-ID: <20190923094126.4gfjclvu2vwagr3p@platinum> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35B42AB4@smartserver.smartshare.dk>

Hi Morten,

On Mon, Sep 23, 2019 at 10:56:01AM +0200, Morten Brørup wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> > Sent: Saturday, September 21, 2019 10:29 AM
> > 
> > > On Sep 18, 2019, at 6:54 PM, Olivier Matz <olivier.matz@6wind.com>
> > wrote:
> > >
> > > Many features require to store data inside the mbuf. As the room in
> > mbuf
> > > structure is limited, it is not possible to have a field for each
> > > feature. Also, changing fields in the mbuf structure can break the
> > API
> > > or ABI.
> > >
> > > This commit addresses these issues, by enabling the dynamic
> > registration
> > > of fields or flags:
> > >
> > > - a dynamic field is a named area in the rte_mbuf structure, with a
> > >  given size (>= 1 byte) and alignment constraint.
> > > - a dynamic flag is a named bit in the rte_mbuf structure.
> > >
> > > The typical use case is a PMD that registers space for an offload
> > > feature, when the application requests to enable this feature.  As
> > > the space in mbuf is limited, the space should only be reserved if it
> > > is going to be used (i.e when the application explicitly asks for
> > it).
> > >
> > > The registration can be done at any moment, but it is not possible
> > > to unregister fields or flags for now.
> > >
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > > Acked-by: Thomas Monjalon <thomas@monjalon.net>
> > > —
> > >
> > 
> > The idea of registration for space in the mbuf I am not a big fan. I
> > did like Konstantin’s suggestion of having the compiler help with
> > optimizing the code, but with a slight difference. Maybe I
> > misunderstand, but now with this design you have to pass the offsets to
> > different parts of the application or place in global memory or have
> > each section request the offsets. It seems great if the application is
> > one big application or an appliance model application having control of
> > the whole design not so good for service chains like designs where
> > different parts of the whole application is design by different teams.
> > 
> > Konstantin’s suggest if I understand it was to use structures to allow
> > the compiler to optimize the access to the mbuf and I like that idea,
> > but with one change we add a field in the mbuf to define the mbuf
> > structure type.
> > 
> > Say 0 is the standard rte_mbuf type then type 1 could be the IPSec
> > offset type mbuf, type 2 could be something else, … The type 0 looks
> > just like the mbuf we have today with maybe the optional fields set to
> > reserved or some type of filler variables to reserve the holes in the
> > structure. Then type 1 is the IPSec mbuf and in the reserved sections
> > of the mbuf contain the IPSec related data with the standard mbuf
> > fields still matching the type 0 version.
> > 
> > This allows the mbuf to be used by the developer and the compiler now
> > knows exactly where the fields are located in the structure and does
> > not have to deal with any of the macros and offsets and registration
> > suggested here. Just cast the mbuf pointer into the new type mbuf
> > structure. We just have to make sure the code that needs to use a given
> > mbuf type has access to the structure definitions.
> > 
> > If the mbufs it going to be translated from one type mbuf to another
> > mbuf type, we just have to define that type and then cast the mbuf
> > pointer to that structure. When an mbuf is received from IPSec PMD then
> > the application needs to forward that mbuf to the next stage it can
> > reset the type to 0 or to another type filling in the reserved fields
> > to be used by the next stage in the pipeline.
> > 
> > The mbuf now contains the type and every point in the application can
> > look at the type to determine how that mbuf is defined. I am sure there
> > are some holes here, but I think it is a better solution then using all
> > of these macros, offset values and registration APIs.
> > 
> > 
> > Regards,
> > Keith
> 
>
> First of all, I applaud the idea of cleaning up the mbuf structure and
> removing the fields only rarely used and/or for special use cases only, as
> mentioned in the presentation, e.g. timestamp, timesync and seqn. It is great
> seeing serious effort put into improving the very core of DPDK!
>
> However, after some hallway discussions at DPDK Userspace and further thinking
> about the details, I can see two additional risks by introducing dynamic
> mbufs, which I would like to share for your consideration:
>
> 1. It may prevent us from adding future solutions not yet considered.
>
> If we were to introduce new functions for more granular handling of mbufs,
> similar to some of the Linux kernel's skbuff handling functions, how should
> such functions handle the dynamic fields? And how is the rte_pktmbuf_clone()
> function supposed to handle the dynamic fields? Some fields may need to be
> copied as-is, some may need to be initialized to zero or some other value, and
> so on. It is apparently not a problem now; but dynamic mbufs may prevent us
> from adding some of such functions in the future.
>
> I admit that I can only imagine the issue on an abstract level, so I can't
> give you a concrete example. Perhaps some of the more experienced Linux
> developers can provide one or debunk my concern. (Stephen: In relation to
> packet capturing we were discussing the reference counter not being respected
> by some applications, and the possible need for more granular mbuf handling.)

For now, the clone copies the fields. If we introduce a copy function, I
think it should do the same. Right now, I cannot find a use case where
the field should be set to another value, but yes, this could be a
limitation.

> 2. In the long run, we might end up adding more fields than we remove.
>
> Dynamic mbufs makes it easier to add specialized fields, which is great. But
> when the barrier to adding specialized fields is reduced, more PMDs and
> libraries may add their own unique fields, rather than going through a
> discussion and consensus for adding them to the fixed mbuf structure or
> finding some other solution. And if a multitude of PMDs each need a
> specialized field, PMDs might end up adding variants of a field, rather than
> reaching consensus for a common standard. It will be much easier to just add
> your own specialized mbuf field rather than having to go through the
> standardization process in the DPDK community.
>
> I will use the timestamp field as a theoretic example: The packet timestamp
> measurement unit differs between vendors, so with dynamic mbufs one vendor's
> PMD might create a timestamp_ns field, counting in nanoseconds, and another
> vendor's PMD might create a timestamp_clocks field, counting in clock
> counts. With the fixed mbuf, this triggered a public discussion, and a
> compromise for the field was reached.

In the mbuf structure, there is not a lot of room to describe some of
the fields, especially the ones that are inside a structure of union of
structure of union of union ;)

The definition of a dynamic field is in a dedicated structure, so there
is more room to describe it. For public dynamic fields, we have to be as
strict as for static fields, because it will be a public API. It has to
be correctly defined. About your timestamp example, we should not allow
2 different timestamp formats in the public API.

For private fields (Lib only, App only, PMD only), it's less critical
because it is not public, even if it's always good to have a clear
description.


> Although I am worried about dynamic mbufs, I don't have a better
> suggestion. And perhaps I just worry too much.

Feedback is always good to have, thanks.

Olivier

  reply	other threads:[~2019-09-23  9:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-10  9:29 [dpdk-dev] [RFC] " Olivier Matz
2019-07-10 17:14 ` Wang, Haiyue
2019-07-11  7:26   ` Olivier Matz
2019-07-11  8:04     ` Wang, Haiyue
2019-07-11  8:20       ` Olivier Matz
2019-07-11  8:34         ` Wang, Haiyue
2019-07-11 15:31     ` Stephen Hemminger
2019-07-12  9:18       ` Olivier Matz
2019-07-10 17:49 ` Stephen Hemminger
2019-07-10 18:12   ` Wiles, Keith
2019-07-11  7:53     ` Olivier Matz
2019-07-11 14:37       ` Wiles, Keith
2019-07-12  9:06         ` Olivier Matz
2019-07-11  7:36   ` Olivier Matz
2019-07-12 12:23     ` Jerin Jacob Kollanukkaran
2019-07-16  9:39       ` Olivier Matz
2019-07-16 14:43         ` Stephen Hemminger
2019-07-11  9:24 ` Thomas Monjalon
2019-07-12 14:54 ` Andrew Rybchenko
2019-07-16  9:49   ` Olivier Matz
2019-07-16 11:31     ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2019-09-18 16:54 ` [dpdk-dev] [PATCH] " Olivier Matz
2019-09-21  4:54   ` Wang, Haiyue
2019-09-23  8:31     ` Olivier Matz
2019-09-23 11:01       ` Wang, Haiyue
2019-09-21  8:28   ` Wiles, Keith
2019-09-23  8:56     ` Morten Brørup
2019-09-23  9:41       ` Olivier Matz [this message]
2019-09-23  9:13     ` Olivier Matz
2019-09-23 15:14       ` Wiles, Keith
2019-09-23 16:16         ` Olivier Matz
2019-09-23 17:14           ` Wiles, Keith
2019-09-23 16:09       ` Wiles, Keith
2019-10-01 10:49   ` Ananyev, Konstantin
2019-10-17  7:54     ` Olivier Matz
2019-10-17 11:58       ` Ananyev, Konstantin
2019-10-17 12:58         ` Olivier Matz
2019-10-17 14:42 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2019-10-18  2:47   ` Wang, Haiyue
2019-10-18  7:53     ` Olivier Matz
2019-10-18  8:28       ` Wang, Haiyue
2019-10-18  9:47         ` Olivier Matz
2019-10-18 11:24           ` Wang, Haiyue
2019-10-22 22:51   ` Ananyev, Konstantin
2019-10-23  3:16     ` Wang, Haiyue
2019-10-23 10:21       ` Olivier Matz
2019-10-23 15:00         ` Stephen Hemminger
2019-10-23 15:12           ` Wang, Haiyue
2019-10-23 10:19     ` Olivier Matz
2019-10-23 11:45       ` Olivier Matz
2019-10-23 11:49         ` Ananyev, Konstantin
2019-10-23 12:00   ` Shahaf Shuler
2019-10-23 13:33     ` Olivier Matz
2019-10-24  4:54       ` Shahaf Shuler
2019-10-24  7:07         ` Olivier Matz
2019-10-24  7:38   ` Slava Ovsiienko
2019-10-24  7:56     ` Olivier Matz
2019-10-24  8:13 ` [dpdk-dev] [PATCH v3] " Olivier Matz
2019-10-24 15:30   ` Stephen Hemminger
2019-10-24 15:44     ` Thomas Monjalon
2019-10-24 17:07       ` Stephen Hemminger
2019-10-24 16:40   ` Thomas Monjalon
2019-10-26 12:39 ` [dpdk-dev] [PATCH v4] " Olivier Matz
2019-10-26 17:04   ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190923094126.4gfjclvu2vwagr3p@platinum \
    --to=olivier.matz@6wind.com \
    --cc=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=haiyue.wang@intel.com \
    --cc=jerinj@marvell.com \
    --cc=keith.wiles@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).