From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <helin.zhang@intel.com>
Received: from mga03.intel.com (mga03.intel.com [143.182.124.21])
 by dpdk.org (Postfix) with ESMTP id 888865960
 for <dev@dpdk.org>; Tue,  8 Jul 2014 09:38:06 +0200 (CEST)
Received: from azsmga001.ch.intel.com ([10.2.17.19])
 by azsmga101.ch.intel.com with ESMTP; 08 Jul 2014 00:38:26 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.01,624,1400050800"; d="scan'208";a="454341796"
Received: from fmsmsx103.amr.corp.intel.com ([10.19.9.34])
 by azsmga001.ch.intel.com with ESMTP; 08 Jul 2014 00:37:52 -0700
Received: from fmsmsx152.amr.corp.intel.com (10.19.17.221) by
 FMSMSX103.amr.corp.intel.com (10.19.9.34) with Microsoft SMTP Server (TLS) id
 14.3.123.3; Tue, 8 Jul 2014 00:37:51 -0700
Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by
 fmsmsx152.amr.corp.intel.com (10.19.17.221) with Microsoft SMTP Server (TLS)
 id 14.3.123.3; Tue, 8 Jul 2014 00:37:51 -0700
Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.122]) by
 SHSMSX101.ccr.corp.intel.com ([169.254.1.81]) with mapi id 14.03.0123.003;
 Tue, 8 Jul 2014 15:37:49 +0800
From: "Zhang, Helin" <helin.zhang@intel.com>
To: Ivan Boule <ivan.boule@6wind.com>, Olivier MATZ <olivier.matz@6wind.com>, 
 "Richardson, Bruce" <bruce.richardson@intel.com>
Thread-Topic: [dpdk-dev] Making space in mbuf data-structure
Thread-Index: AQHPmczGhalggDbv/kymVhyh5xoE/5uVwDNg//9/RwCAAIpI0A==
Date: Tue, 8 Jul 2014 07:37:48 +0000
Message-ID: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B448@SHSMSX104.ccr.corp.intel.com>
References: <59AF69C657FD0841A61C55336867B5B02CF13E32@IRSMSX103.ger.corp.intel.com>
 <53BA7422.9080706@6wind.com>
 <F35DEAC7BCE34641BA9FAC6BCA4A12E70A74B3F6@SHSMSX104.ccr.corp.intel.com>
 <53BB9AE6.4030503@6wind.com>
In-Reply-To: <53BB9AE6.4030503@6wind.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Making space in mbuf data-structure
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Jul 2014 07:38:07 -0000


> -----Original Message-----
> From: Ivan Boule [mailto:ivan.boule@6wind.com]
> Sent: Tuesday, July 8, 2014 3:17 PM
> To: Zhang, Helin; Olivier MATZ; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Making space in mbuf data-structure
>=20
> On 07/08/2014 09:04 AM, Zhang, Helin wrote:
> >
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Monday, July 7, 2014 6:19 PM
> >> To: Richardson, Bruce; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] Making space in mbuf data-structure
> >>
> >> Hello Bruce,
> >>
> >> Thank you to revive this discussion now that the 1.7 is released.
> >>
> >> First, I would like to reference my previous patch series that first
> >> reworks the mbuf to gain 9 bytes [1]. The v1 of the patch was discusse=
d at
> [2].
> >>
> >> Now, let's list what I would find useful to have in this mbuf rework:
> >>
> >> - larger size for ol_flags: this is at least needed for TSO, but as it
> >>     is completely full today, I expect to have this need for other
> >>     features.
> >> - add other offload fields: l4_len and mss, required for TSO
> >> - remove ctrl_mbuf: they could be replaced by a packet mbuf. It will
> >>     simplify the mbuf structure. Moreover, it would allow to save room
> >>     in the mbuf.
> >> - a new vlan tag, I suppose this could be useful in some use-cases
> >>     where vlans are stacked.
> >> - splitting out fields that are superimposed: if 2 features can be use=
d
> >>     at the same time
> >>
> >> On the other hand, I'm not convinced by this:
> >>
> >> - new filters in the i40e driver: I don't think the mbuf is the
> >>     right place for driver-specific flags. If a feature is brought
> >>     by a new driver requiring a flag in mbuf, we should take care that
> >>     the flag is not bound to this particular driver and would match
> >>     the same feature in another driver.
> >> - sequence number: I'm not sure I understand the use-case, maybe this
> >>     could stay in a mbuf meta data in the reordering module.
> >>
> >>> Firstly, we believe that there is no possible way that we can ever
> >>> fit all the fields we need to fit into a 64-byte mbuf, and so we
> >>> need to start looking at a 128-byte mbuf instead.
> >>
> >> The TSO patches show that it is possible to keep a 64 bytes mbuf (of
> >> course, it depends on what we want to add in the mbuf). I'm not
> >> fundamentally against having 128 bytes mbuf. But:
> >>
> >> - it should not be a reason for just adding things and not reworking
> >>     things that could be enhanced
> >> - it should not be a reason for not optimizing the current mbuf
> >>     structure
> >> - if we can do the same with a 64 bytes mbuf, we need to carefuly
> >>     compare the solutions as fetching a second cache line is not
> >>     costless in all situations. The 64 bytes solution I'm proposing
> >>     in [1] may cost a bit more in CPU cycles but avoids an additional
> >>     cache prefetch (or miss). In some situations (I'm thinking about
> >>     use-cases where we are memory-bound, e.g. an application
> processing
> >>     a lot of data), it is better to loose a few CPU cycles.
> >>
> >>> First off the blocks is to look at moving the mempool pointer into
> >>> the second cache line [...] Beyond this change, I'm also
> >>> investigating potentially moving the "next"
> >>> pointer to the second cache line, but it's looking harder to move
> >>> without serious impact
> >>
> >> I think we can easily find DPDK applications that would use the "next"
> >> field of the mbuf on rx side, as it is the standard way of chaining
> >> packets. For
> >> instance: IP reassembly, TCP/UDP socket queues, or any other protocol
> >> that needs a reassembly queue. This is at least what we do in
> >> 6WINDGate fast path stack, and I suppose other network stack
> >> implementations would do something similar, so we should probably avoi=
d
> moving this field to the 2nd cache line.
> >>
> >> One more issue I do foresee, with slower CPUs like Atom, having 2
> >> cache lines will add more cost than on Xeon. I'm wondering if it make
> >> sense to have a compilation time option to select either limited
> >> features with one cache line or full features 2 line caches. I don't
> >> know if it's a good idea because it would make the code more complex,
> >> but we could consider it. I think we don't target binary compatibility=
 today?
> >>
> >>   From a functional point of view, we could check that my TSO patch
> >> can be adapted to your proposal so we can challenge and merge both
> approaches.
> >>
> >> As this change would impact the core of DPDK, I think it would be
> >> interesting to list some representative use-cases in order to
> >> evaluate the cost of each solution. This will also help for future
> >> modifications, and could be included in a sort of non-regression test?
> >>
> >> Regards,
> >> Olivier
> >>
> >> [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html
> >> [2] http://dpdk.org/ml/archives/dev/2014-May/002322.html
> >
> > Hi Olivier
> >
> > I am trying to convince you on the new field of "filter status".
> > It is for matched Flow Director Filter ID, and might be reused for HASH
> signature if it matches hash filter, or others.
> > It is quite useful for Flow Director, and not a flag. I guess there sho=
uld have
> the similar feature even in non-Intel NICs.
> >
> By construction, since a packet cannot match more than 1 filter with an
> associated identifier, this is typically the kind of field that should be=
 put in an
> union with the standard 32-bit RSS id.
>=20
> Regards,
> Ivan
>=20
> > Regards,
> > Helin
> >
>=20
>=20
> --
> Ivan Boule
> 6WIND Development Engineer

Hi Ivan

Yes, you are right. The 32 bits 'filter status' can be union-ed with RSS ID=
.

But it still needs another 32 bits. Why?
i40e can report 64 bits 'flexible bytes', or report both 32 bits 'flexible =
bytes' and 32 bits 'matched FD filter ID' at the same time.

Regards,
Helin