From: "Zhang, Helin" <helin.zhang@intel.com>
To: Ivan Boule <ivan.boule@6wind.com>,
Olivier MATZ <olivier.matz@6wind.com>,
"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Making space in mbuf data-structure
Date: Tue, 8 Jul 2014 07:37:48 +0000 [thread overview]
Message-ID: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B448@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <53BB9AE6.4030503@6wind.com>
> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule@6wind.com]
> Sent: Tuesday, July 8, 2014 3:17 PM
> To: Zhang, Helin; Olivier MATZ; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Making space in mbuf data-structure
>
> On 07/08/2014 09:04 AM, Zhang, Helin wrote:
> >
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, July 7, 2014 6:19 PM
> >> To: Richardson, Bruce; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] Making space in mbuf data-structure
> >>
> >> Hello Bruce,
> >>
> >> Thank you to revive this discussion now that the 1.7 is released.
> >>
> >> First, I would like to reference my previous patch series that first
> >> reworks the mbuf to gain 9 bytes [1]. The v1 of the patch was discussed at
> [2].
> >>
> >> Now, let's list what I would find useful to have in this mbuf rework:
> >>
> >> - larger size for ol_flags: this is at least needed for TSO, but as it
> >> is completely full today, I expect to have this need for other
> >> features.
> >> - add other offload fields: l4_len and mss, required for TSO
> >> - remove ctrl_mbuf: they could be replaced by a packet mbuf. It will
> >> simplify the mbuf structure. Moreover, it would allow to save room
> >> in the mbuf.
> >> - a new vlan tag, I suppose this could be useful in some use-cases
> >> where vlans are stacked.
> >> - splitting out fields that are superimposed: if 2 features can be used
> >> at the same time
> >>
> >> On the other hand, I'm not convinced by this:
> >>
> >> - new filters in the i40e driver: I don't think the mbuf is the
> >> right place for driver-specific flags. If a feature is brought
> >> by a new driver requiring a flag in mbuf, we should take care that
> >> the flag is not bound to this particular driver and would match
> >> the same feature in another driver.
> >> - sequence number: I'm not sure I understand the use-case, maybe this
> >> could stay in a mbuf meta data in the reordering module.
> >>
> >>> Firstly, we believe that there is no possible way that we can ever
> >>> fit all the fields we need to fit into a 64-byte mbuf, and so we
> >>> need to start looking at a 128-byte mbuf instead.
> >>
> >> The TSO patches show that it is possible to keep a 64 bytes mbuf (of
> >> course, it depends on what we want to add in the mbuf). I'm not
> >> fundamentally against having 128 bytes mbuf. But:
> >>
> >> - it should not be a reason for just adding things and not reworking
> >> things that could be enhanced
> >> - it should not be a reason for not optimizing the current mbuf
> >> structure
> >> - if we can do the same with a 64 bytes mbuf, we need to carefuly
> >> compare the solutions as fetching a second cache line is not
> >> costless in all situations. The 64 bytes solution I'm proposing
> >> in [1] may cost a bit more in CPU cycles but avoids an additional
> >> cache prefetch (or miss). In some situations (I'm thinking about
> >> use-cases where we are memory-bound, e.g. an application
> processing
> >> a lot of data), it is better to loose a few CPU cycles.
> >>
> >>> First off the blocks is to look at moving the mempool pointer into
> >>> the second cache line [...] Beyond this change, I'm also
> >>> investigating potentially moving the "next"
> >>> pointer to the second cache line, but it's looking harder to move
> >>> without serious impact
> >>
> >> I think we can easily find DPDK applications that would use the "next"
> >> field of the mbuf on rx side, as it is the standard way of chaining
> >> packets. For
> >> instance: IP reassembly, TCP/UDP socket queues, or any other protocol
> >> that needs a reassembly queue. This is at least what we do in
> >> 6WINDGate fast path stack, and I suppose other network stack
> >> implementations would do something similar, so we should probably avoid
> moving this field to the 2nd cache line.
> >>
> >> One more issue I do foresee, with slower CPUs like Atom, having 2
> >> cache lines will add more cost than on Xeon. I'm wondering if it make
> >> sense to have a compilation time option to select either limited
> >> features with one cache line or full features 2 line caches. I don't
> >> know if it's a good idea because it would make the code more complex,
> >> but we could consider it. I think we don't target binary compatibility today?
> >>
> >> From a functional point of view, we could check that my TSO patch
> >> can be adapted to your proposal so we can challenge and merge both
> approaches.
> >>
> >> As this change would impact the core of DPDK, I think it would be
> >> interesting to list some representative use-cases in order to
> >> evaluate the cost of each solution. This will also help for future
> >> modifications, and could be included in a sort of non-regression test?
> >>
> >> Regards,
> >> Olivier
> >>
> >> [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html
> >> [2] http://dpdk.org/ml/archives/dev/2014-May/002322.html
> >
> > Hi Olivier
> >
> > I am trying to convince you on the new field of "filter status".
> > It is for matched Flow Director Filter ID, and might be reused for HASH
> signature if it matches hash filter, or others.
> > It is quite useful for Flow Director, and not a flag. I guess there should have
> the similar feature even in non-Intel NICs.
> >
> By construction, since a packet cannot match more than 1 filter with an
> associated identifier, this is typically the kind of field that should be put in an
> union with the standard 32-bit RSS id.
>
> Regards,
> Ivan
>
> > Regards,
> > Helin
> >
>
>
> --
> Ivan Boule
> 6WIND Development Engineer
Hi Ivan
Yes, you are right. The 32 bits 'filter status' can be union-ed with RSS ID.
But it still needs another 32 bits. Why?
i40e can report 64 bits 'flexible bytes', or report both 32 bits 'flexible bytes' and 32 bits 'matched FD filter ID' at the same time.
Regards,
Helin
next prev parent reply other threads:[~2014-07-08 7:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-03 23:38 Richardson, Bruce
2014-07-07 10:19 ` Olivier MATZ
2014-07-08 7:04 ` Zhang, Helin
2014-07-08 7:16 ` Ivan Boule
2014-07-08 7:37 ` Zhang, Helin [this message]
2014-07-15 9:07 ` Ananyev, Konstantin
2014-07-16 12:32 ` Olivier MATZ
2014-07-15 9:31 ` Ananyev, Konstantin
2014-07-15 18:07 ` Richardson, Bruce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B448@SHSMSX104.ccr.corp.intel.com \
--to=helin.zhang@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=ivan.boule@6wind.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).