DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: Ivan Boule <ivan.boule@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
Date: Thu, 29 May 2014 22:37:52 +0000	[thread overview]
Message-ID: <3EB4FA525960D640B5BDFFD6A3D891261B1BF90D@IRSMSX102.ger.corp.intel.com> (raw)
In-Reply-To: <5385D091.1010205@6wind.com>

There is a tricky type below (leave of -> live off), correcting ...

-----Original Message-----
From: Dumitrescu, Cristian 
Sent: Thursday, May 29, 2014 11:28 PM
To: 'Ivan Boule'; dev@dpdk.org
Subject: RE: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf

Hi Ivan,

I hate to disagree with you :), but no, we do not break the scatter-gather feature. We actually implemented the Packet Framework IPv4 fragmentation and reassembly ports to validate exactly this.

Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only works (and it is only used) for direct mbufs, so probably the correct name for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it is not used) for indirect mbufs.

I am describing the rationale behind meta-data design below, sorry for the long description that looks like a design document ...

Quick recap:
- mbuf->buf_addr points to the first address where a byte of data for the current segment _can_ be placed
- direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer with the packet itself, so mbuf->buf_addr = mbuf + 1;
- indirect mbuf: the packet descriptor is located in a different mempool buffer than the packet itself, so mbuf->buf_addr != mbuf + 1;
- Regardless of the mbuf type, it is not necessarily true that the first byte of data is located at mbuf->buf_addr, as we save a headroom (configurable, by default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why not, meta-data!).  The location of the first real data byte is mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does not change.

Packet meta-data rationale:
- I think we both agree that we need to be able to store packet meta-data in the packet buffer. The packet buffer regions where meta-data could be stored can only be the in the headroom or in the tailroom, which are both managed by the application and both go up and down during the packet lifetime.

- To me, packet-metadata is part of the packet descriptor: mbuf is the mandatory & standard part of the packet descriptor, while meta-data is the optional & non-standard (per application) part of the packet descriptor. Therefore, it makes sense to put the meta-data immediately after mbuf, but this is not mandatory.

- The zero-size field called meta-data is really just an offset: it points to the first buffer location where meta-data _can_ be placed. The reason for having this field is to provide a standard way to place meta-data in the packet buffer rather that hide it in the Packet Framework libraries and potentially conflict with other mbuf changes in the future. It flags that meta-data should be taken into consideration when any mbuf change is done in  the future.

- For direct mbufs, the same buffer space (the headroom) is shared between packet data (header prepending) and meta-data similar to how the stack and the heap manage the same memory. Of course, since it is managed by the application, it is the responsibility of the application to make sure the packet bytes and the meta-data do not step on one another, but this problem is not at all new, nor introduced by the meta-data field: even currently, the application has to make sure the headroom is dimensioned correctly, so that even in the worst case scenario (application-specific), the packet bytes do not run into the mbuf fields, right?

- For indirect mbufs, the packet meta-data is associated with the indirect mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as the direct mbuf contains a different packet that has different meta-data), so it makes sense that meta-data of the indirect mbuf is stored in the same buffer as the indirect mbuf (rather than the buffer of the attached direct mbuf). So this means that the buffer size used to store the indirect mbuf is sized appropriately (mbuf size, currently 64 bytes, plus the max size for additional meta-data). This is illustrated in the code of the IPv4 fragmentation port, where for every child packet (output fragment) we copy the parent meta-data in the child buffer (which stores an indirect buffer attached to the direct buffer of the input jumbo, plus now additional meta-data).

- We are also considering having a user_data field in the mbuf itself to point to meta-data in the same buffer or any other buffer. The Packet Framework functionally works with both approaches, but there is a performance problem associated with the mbuf->user_data approach that we are not addressing for this 1.7 release timeframe. The issue is the data dependency that is created, as in order to find out the location of the meta-data, we need to first read the mbuf  and then read meta-data from mbuf->user_data. The cost of the additional memory access is high, due to data dependency preventing efficient prefetch, therefore (at least for the time being) we need to use a compile-time known meta-data offset. The application can fill the meta-data on packet RX and then all additional CPU cores processing the packet can live off the meta-data for a long time with no need to access the mbuf or the packet (efficiency). For the time being, if somebody needs more yet meta-data in yet another buffer, they can add (for their application) a user_data pointer as part of their application meta-data (instead of standard mbuf).

- As said, the mbuf->metadata points to the first location where meta-data _can_ be placed, if somebody needs a different offset, they can add it on top of the mbuf->metadata (e.g. by having a reserved field in their struct app_pkt_metadata). We have demonstrated the use of meta-data in the examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h").

Please let me know if this makes sense?

Regards,
Cristian
 
PS: This is where a white board would help a lot ...


-----Original Message-----
From: Ivan Boule [mailto:ivan.boule@6wind.com] 
Sent: Wednesday, May 28, 2014 1:03 PM
To: Dumitrescu, Cristian; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf

Hi Cristian,

Currently, the DPDK framework does not make any assumption on the actual
layout of a mbuf.
More precisely, the DPDK does not impose any constraint on the actual
location of additional metadata, if any, or on the actual location and
size of its associated payload data buffer.
This is coherent with the fact that mbuf pools are not created by the
DPDK itself, but by network applications that are free to choose
whatever packet mbuf layout that fits their particular needs.

There is one exception to this basic DPDK rule: the mbuf cloning feature 
available through the RTE_MBUF_SCATTER_GATHER configuration option 
assumes that the payload data buffer of the mbuf immediately follows the 
rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR, 
RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file 
lib/librte_mbuf/rte_mbuf.h).

The cloning feature prevents to build packet mbufs with their metadata 
located immediately after the rte_mbuf data structure, which is exactly 
what your patch introduces.

At least, a comment that clearly warns the user of this incompatibility
might be worth adding into both the code and your patch log.

Regards,
Ivan

On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote:
> Added zero-size field (offset in data structure) to specify the beginning of packet meta-data in the packet buffer just after the mbuf.
>
> The size of the packet meta-data is application specific and the packet meta-data is managed by the application.
>
> The packet meta-data should always be accessed through the provided macros.
>
> This is used by the Packet Framework libraries (port, table, pipeline).
>
> There is absolutely no performance impact due to this mbuf field, as it does not take any space in the mbuf structure (zero-size field).
>
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> ---
>   lib/librte_mbuf/rte_mbuf.h |   17 +++++++++++++++++
>   1 files changed, 17 insertions(+), 0 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 4a9ab41..bf09618 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -201,8 +201,25 @@ struct rte_mbuf {
>   		struct rte_ctrlmbuf ctrl;
>   		struct rte_pktmbuf pkt;
>   	};
> +	
> +	union {
> +		uint8_t metadata[0];
> +		uint16_t metadata16[0];
> +		uint32_t metadata32[0];
> +		uint64_t metadata64[0];
> +	};
>   } __rte_cache_aligned;
>
> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset)       (mbuf->metadata[offset])
> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset)      (mbuf->metadata16[offset/sizeof(uint16_t)])
> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset)      (mbuf->metadata32[offset/sizeof(uint32_t)])
> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset)      (mbuf->metadata64[offset/sizeof(uint64_t)])
> +
> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   (&mbuf->metadata[offset])
> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  (&mbuf->metadata16[offset/sizeof(uint16_t)])
> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  (&mbuf->metadata32[offset/sizeof(uint32_t)])
> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  (&mbuf->metadata64[offset/sizeof(uint64_t)])
> +
>   /**
>    * Given the buf_addr returns the pointer to corresponding mbuf.
>    */
>


-- 
Ivan Boule
6WIND Development Engineer
--------------------------------------------------------------
Intel Shannon Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263
Business address: Dromore House, East Park, Shannon, Co. Clare

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

  parent reply	other threads:[~2014-05-29 22:37 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 17:09 [dpdk-dev] [PATCH 00/29] Packet Framework Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 01/29] librte_lpm: rule_is_present Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 02/29] hexdump: fixed minor build issue Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 03/29] log: added log IDs for Packet Framework libraries Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Cristian Dumitrescu
2014-05-28 12:03   ` Ivan Boule
2014-05-29 22:28     ` Dumitrescu, Cristian
2014-06-02 12:24       ` Ivan Boule
2014-06-05 13:22         ` Dumitrescu, Cristian
2014-05-29 22:37     ` Dumitrescu, Cristian [this message]
2014-05-27 17:09 ` [dpdk-dev] [PATCH 05/29] Packet Framework librte_port: Port API Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 06/29] Packet Framework librte_port: ethdev ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 07/29] Packet Framework librte_port: ring ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 08/29] Packet Framework librte_port: IPv4 frag port Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 09/29] Packet Framework librte_port: IPv4 reassembly Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 10/29] Packet Framework librte_port: hierarchical scheduler port Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 11/29] Packet Framework librte_port: Source/Sink ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 12/29] Packet Framework librte_port: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 13/29] Packet Framework librte_table: Table API Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 14/29] Packet Framework librte_table: LPM IPv4 table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 15/29] Packet Framework librte_table: LPM IPv6 table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 16/29] Packet Framework librte_table: ACL table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 17/29] Packet Framework librte_table: Hash tables Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 18/29] Packet Framework librte_table: array table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 19/29] Packet Framework librte_table: Stub table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 20/29] Packet Framework librte_table: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 21/29] Packet Framework librte_pipeline: Pipeline Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 22/29] Packet Framework librte_pipeline: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 23/29] librte_cfgfile: interpret config files Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 24/29] librte_cfgfile: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 25/29] Packet Framework: build infrastructure Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 26/29] Packet Framework performance application Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 27/29] Packet Framework IPv4 pipeline sample app Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 28/29] app/Makefile: enable app/test-pipeline Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 29/29] Packet Framework unit tests Cristian Dumitrescu
2014-05-27 19:47 ` [dpdk-dev] [PATCH 00/29] Packet Framework Neil Horman
2014-05-29 20:06   ` Dumitrescu, Cristian
2014-06-04 14:16 ` Cao, Waterman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3EB4FA525960D640B5BDFFD6A3D891261B1BF90D@IRSMSX102.ger.corp.intel.com \
    --to=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=ivan.boule@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).