DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ivan Boule <ivan.boule@6wind.com>
To: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
	 "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
Date: Mon, 02 Jun 2014 14:24:06 +0200	[thread overview]
Message-ID: <538C6CE6.1000900@6wind.com> (raw)
In-Reply-To: <3EB4FA525960D640B5BDFFD6A3D891261B1BF8EC@IRSMSX102.ger.corp.intel.com>

Hi Cristian,

I agree with you, the natural way to store application metadata into 
mbufs consists in putting it right after the rte_mbuf data structure.
This can be simply implemented this way:

struct metadata {
     ...
};

struct app_mbuf {
     struct rte_mbuf mbuf;
     struct metadata mtdt;
};

With such a representation, the application initializes the "buf_addr" 
field of each mbuf pointed to by a (struct app_mbuf *)amb pointer as:

     amb->mbuf.buf_addr = ((char *amb) + sizeof(struct app_mbuf));

But such a natural programming approach breaks the assumptions of the 
macros RTE_MBUF_FROM_BADDR, RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and 
RTE_MBUF_DIRECT.

For instance, in the function  __rte_pktmbuf_prefree_seg(m), after the line
     struct rte_mbuf *md = RTE_MBUF_FROM_BADDR(m->buf_addr);
"md" is always different from "m", and thus systematically (and most of 
the time wrongly) considered as an indirect mbuf.

In the same way, the operations done by the function rte_pktmbuf_detach(m)
     void *buf = RTE_MBUF_TO_BADDR(m);
     m->buf_addr = buf;
does not set buf_addr to its [dafault] correct value.

To summarize, adding metadata after the rte_mbuf data structure is
incompatible with the packet cloning feature behind the [wrongly named]
RTE_MBUF_SCATTER_GATHER configuration option.

By the way, you suggest to use the headroom to also store packet metadata.
But, then, how does an application can store both types of information 
in a given mbuf at the same time, when the actual length of network 
headers in a mbuf is variable, as it depends on the protocol path 
followed by the packet in a networking stack (tunnels, etc)?

Regards,
Ivan


On 05/30/2014 12:28 AM, Dumitrescu, Cristian wrote:
> Hi Ivan,
>
> I hate to disagree with you :), but no, we do not break the scatter-gather feature. We actually implemented the Packet Framework IPv4 fragmentation and reassembly ports to validate exactly this.
>
> Maybe the confusion comes from the RTE_MBUF_TO_BADDR macro. This macro only works (and it is only used) for direct mbufs, so probably the correct name for this macro should be RTE_MBUF_DIRECT_TO_BADDR, as it cannot work (and it is not used) for indirect mbufs.
>
> I am describing the rationale behind meta-data design below, sorry for the long description that looks like a design document ...
>
> Quick recap:
> - mbuf->buf_addr points to the first address where a byte of data for the current segment _can_ be placed
> - direct mbuf: the packet descriptor (mbuf) sits in the same mempool buffer with the packet itself, so mbuf->buf_addr = mbuf + 1;
> - indirect mbuf: the packet descriptor is located in a different mempool buffer than the packet itself, so mbuf->buf_addr != mbuf + 1;
> - Regardless of the mbuf type, it is not necessarily true that the first byte of data is located at mbuf->buf_addr, as we save a headroom (configurable, by default initially of CONFIG_RTE_PKTMBUF_HEADROOM = 128 bytes) at the start of the data buffer (mbuf->buf_addr) for prepending packet headers ( ... or, why not, meta-data!).  The location of the first real data byte is mbuf->pkt.data, which is variable, as opposed to mbuf->buf_addr, which does not change.
>
> Packet meta-data rationale:
> - I think we both agree that we need to be able to store packet meta-data in the packet buffer. The packet buffer regions where meta-data could be stored can only be the in the headroom or in the tailroom, which are both managed by the application and both go up and down during the packet lifetime.
>
> - To me, packet-metadata is part of the packet descriptor: mbuf is the mandatory & standard part of the packet descriptor, while meta-data is the optional & non-standard (per application) part of the packet descriptor. Therefore, it makes sense to put the meta-data immediately after mbuf, but this is not mandatory.
>
> - The zero-size field called meta-data is really just an offset: it points to the first buffer location where meta-data _can_ be placed. The reason for having this field is to provide a standard way to place meta-data in the packet buffer rather that hide it in the Packet Framework libraries and potentially conflict with other mbuf changes in the future. It flags that meta-data should be taken into consideration when any mbuf change is done in  the future.
>
> - For direct mbufs, the same buffer space (the headroom) is shared between packet data (header prepending) and meta-data similar to how the stack and the heap manage the same memory. Of course, since it is managed by the application, it is the responsibility of the application to make sure the packet bytes and the meta-data do not step on one another, but this problem is not at all new, nor introduced by the meta-data field: even currently, the application has to make sure the headroom is dimensioned correctly, so that even in the worst case scenario (application-specific), the packet bytes do not run into the mbuf fields, right?
>
> - For indirect mbufs, the packet meta-data is associated with the indirect mbuf rather than the direct mbuf (the one the indirect mbuf attaches to, as the direct mbuf contains a different packet that has different meta-data), so it makes sense that meta-data of the indirect mbuf is stored in the same buffer as the indirect mbuf (rather than the buffer of the attached direct mbuf). So this means that the buffer size used to store the indirect mbuf is sized appropriately (mbuf size, currently 64 bytes, plus the max size for additional meta-data). This is illustrated in the code of the IPv4 fragmentation port, where for every child packet (output fragment) we copy the parent meta-data in the child buffer (which stores an indirect buffer attached to the direct buffer of the input jumbo, plus now additional meta-data).
>
> - We are also considering having a user_data field in the mbuf itself to point to meta-data in the same buffer or any other buffer. The Packet Framework functionally works with both approaches, but there is a performance problem associated with the mbuf->user_data approach that we are not addressing for this 1.7 release timeframe. The issue is the data dependency that is created, as in order to find out the location of the meta-data, we need to first read the mbuf  and then read meta-data from mbuf->user_data. The cost of the additional memory access is high, due to data dependency preventing efficient prefetch, therefore (at least for the time being) we need to use a compile-time known meta-data offset. The application can fill the meta-data on packet RX and then all additional CPU cores processing the packet can leave of the meta-data for a long time with no need to access the mbuf or the packet (efficiency). For the time being, if somebody needs more yet meta-data in yet another 
buffer, they can add (for their application) a user_data pointer as part of their application meta-data (instead of standard mbuf).
>
> - As said, the mbuf->metadata points to the first location where meta-data _can_ be placed, if somebody needs a different offset, they can add it on top of the mbuf->metadata (e.g. by having a reserved field in their struct app_pkt_metadata). We have demonstrated the use of meta-data in the examples/ip_pipeline sample app (see struct app_pkt_metadata in "main.h").
>
> Please let me know if this makes sense?
>
> Regards,
> Cristian
>
> PS: This is where a white board would help a lot ...
>
>
> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule@6wind.com]
> Sent: Wednesday, May 28, 2014 1:03 PM
> To: Dumitrescu, Cristian; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf
>
> Hi Cristian,
>
> Currently, the DPDK framework does not make any assumption on the actual
> layout of a mbuf.
> More precisely, the DPDK does not impose any constraint on the actual
> location of additional metadata, if any, or on the actual location and
> size of its associated payload data buffer.
> This is coherent with the fact that mbuf pools are not created by the
> DPDK itself, but by network applications that are free to choose
> whatever packet mbuf layout that fits their particular needs.
>
> There is one exception to this basic DPDK rule: the mbuf cloning feature
> available through the RTE_MBUF_SCATTER_GATHER configuration option
> assumes that the payload data buffer of the mbuf immediately follows the
> rte_mbuf data structure (see the macros RTE_MBUF_FROM_BADDR,
> RTE_MBUF_TO_BADDR, RTE_MBUF_INDIRECT, and RTE_MBUF_DIRECT in the file
> lib/librte_mbuf/rte_mbuf.h).
>
> The cloning feature prevents to build packet mbufs with their metadata
> located immediately after the rte_mbuf data structure, which is exactly
> what your patch introduces.
>
> At least, a comment that clearly warns the user of this incompatibility
> might be worth adding into both the code and your patch log.
>
> Regards,
> Ivan
>
> On 05/27/2014 07:09 PM, Cristian Dumitrescu wrote:
>> Added zero-size field (offset in data structure) to specify the beginning of packet meta-data in the packet buffer just after the mbuf.
>>
>> The size of the packet meta-data is application specific and the packet meta-data is managed by the application.
>>
>> The packet meta-data should always be accessed through the provided macros.
>>
>> This is used by the Packet Framework libraries (port, table, pipeline).
>>
>> There is absolutely no performance impact due to this mbuf field, as it does not take any space in the mbuf structure (zero-size field).
>>
>> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
>> ---
>>    lib/librte_mbuf/rte_mbuf.h |   17 +++++++++++++++++
>>    1 files changed, 17 insertions(+), 0 deletions(-)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 4a9ab41..bf09618 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -201,8 +201,25 @@ struct rte_mbuf {
>>    		struct rte_ctrlmbuf ctrl;
>>    		struct rte_pktmbuf pkt;
>>    	};
>> +	
>> +	union {
>> +		uint8_t metadata[0];
>> +		uint16_t metadata16[0];
>> +		uint32_t metadata32[0];
>> +		uint64_t metadata64[0];
>> +	};
>>    } __rte_cache_aligned;
>>
>> +#define RTE_MBUF_METADATA_UINT8(mbuf, offset)       (mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16(mbuf, offset)      (mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32(mbuf, offset)      (mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64(mbuf, offset)      (mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>> +#define RTE_MBUF_METADATA_UINT8_PTR(mbuf, offset)   (&mbuf->metadata[offset])
>> +#define RTE_MBUF_METADATA_UINT16_PTR(mbuf, offset)  (&mbuf->metadata16[offset/sizeof(uint16_t)])
>> +#define RTE_MBUF_METADATA_UINT32_PTR(mbuf, offset)  (&mbuf->metadata32[offset/sizeof(uint32_t)])
>> +#define RTE_MBUF_METADATA_UINT64_PTR(mbuf, offset)  (&mbuf->metadata64[offset/sizeof(uint64_t)])
>> +
>>    /**
>>     * Given the buf_addr returns the pointer to corresponding mbuf.
>>     */
>>
>
>


-- 
Ivan Boule
6WIND Development Engineer

  reply	other threads:[~2014-06-02 12:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-27 17:09 [dpdk-dev] [PATCH 00/29] Packet Framework Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 01/29] librte_lpm: rule_is_present Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 02/29] hexdump: fixed minor build issue Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 03/29] log: added log IDs for Packet Framework libraries Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 04/29] mbuf: added offset of packet meta-data in the packet buffer just after mbuf Cristian Dumitrescu
2014-05-28 12:03   ` Ivan Boule
2014-05-29 22:28     ` Dumitrescu, Cristian
2014-06-02 12:24       ` Ivan Boule [this message]
2014-06-05 13:22         ` Dumitrescu, Cristian
2014-05-29 22:37     ` Dumitrescu, Cristian
2014-05-27 17:09 ` [dpdk-dev] [PATCH 05/29] Packet Framework librte_port: Port API Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 06/29] Packet Framework librte_port: ethdev ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 07/29] Packet Framework librte_port: ring ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 08/29] Packet Framework librte_port: IPv4 frag port Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 09/29] Packet Framework librte_port: IPv4 reassembly Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 10/29] Packet Framework librte_port: hierarchical scheduler port Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 11/29] Packet Framework librte_port: Source/Sink ports Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 12/29] Packet Framework librte_port: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 13/29] Packet Framework librte_table: Table API Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 14/29] Packet Framework librte_table: LPM IPv4 table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 15/29] Packet Framework librte_table: LPM IPv6 table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 16/29] Packet Framework librte_table: ACL table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 17/29] Packet Framework librte_table: Hash tables Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 18/29] Packet Framework librte_table: array table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 19/29] Packet Framework librte_table: Stub table Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 20/29] Packet Framework librte_table: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 21/29] Packet Framework librte_pipeline: Pipeline Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 22/29] Packet Framework librte_pipeline: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 23/29] librte_cfgfile: interpret config files Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 24/29] librte_cfgfile: Makefile Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 25/29] Packet Framework: build infrastructure Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 26/29] Packet Framework performance application Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 27/29] Packet Framework IPv4 pipeline sample app Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 28/29] app/Makefile: enable app/test-pipeline Cristian Dumitrescu
2014-05-27 17:09 ` [dpdk-dev] [PATCH 29/29] Packet Framework unit tests Cristian Dumitrescu
2014-05-27 19:47 ` [dpdk-dev] [PATCH 00/29] Packet Framework Neil Horman
2014-05-29 20:06   ` Dumitrescu, Cristian
2014-06-04 14:16 ` Cao, Waterman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538C6CE6.1000900@6wind.com \
    --to=ivan.boule@6wind.com \
    --cc=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).