From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [143.182.124.21]) by dpdk.org (Postfix) with ESMTP id 888865960 for ; Tue, 8 Jul 2014 09:38:06 +0200 (CEST) Received: from azsmga001.ch.intel.com ([10.2.17.19]) by azsmga101.ch.intel.com with ESMTP; 08 Jul 2014 00:38:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,624,1400050800"; d="scan'208";a="454341796" Received: from fmsmsx103.amr.corp.intel.com ([10.19.9.34]) by azsmga001.ch.intel.com with ESMTP; 08 Jul 2014 00:37:52 -0700 Received: from fmsmsx152.amr.corp.intel.com (10.19.17.221) by FMSMSX103.amr.corp.intel.com (10.19.9.34) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 8 Jul 2014 00:37:51 -0700 Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by fmsmsx152.amr.corp.intel.com (10.19.17.221) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 8 Jul 2014 00:37:51 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.122]) by SHSMSX101.ccr.corp.intel.com ([169.254.1.81]) with mapi id 14.03.0123.003; Tue, 8 Jul 2014 15:37:49 +0800 From: "Zhang, Helin" To: Ivan Boule , Olivier MATZ , "Richardson, Bruce" Thread-Topic: [dpdk-dev] Making space in mbuf data-structure Thread-Index: AQHPmczGhalggDbv/kymVhyh5xoE/5uVwDNg//9/RwCAAIpI0A== Date: Tue, 8 Jul 2014 07:37:48 +0000 Message-ID: References: <59AF69C657FD0841A61C55336867B5B02CF13E32@IRSMSX103.ger.corp.intel.com> <53BA7422.9080706@6wind.com> <53BB9AE6.4030503@6wind.com> In-Reply-To: <53BB9AE6.4030503@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Making space in mbuf data-structure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2014 07:38:07 -0000 > -----Original Message----- > From: Ivan Boule [mailto:ivan.boule@6wind.com] > Sent: Tuesday, July 8, 2014 3:17 PM > To: Zhang, Helin; Olivier MATZ; Richardson, Bruce > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] Making space in mbuf data-structure >=20 > On 07/08/2014 09:04 AM, Zhang, Helin wrote: > > > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ > >> Sent: Monday, July 7, 2014 6:19 PM > >> To: Richardson, Bruce; dev@dpdk.org > >> Subject: Re: [dpdk-dev] Making space in mbuf data-structure > >> > >> Hello Bruce, > >> > >> Thank you to revive this discussion now that the 1.7 is released. > >> > >> First, I would like to reference my previous patch series that first > >> reworks the mbuf to gain 9 bytes [1]. The v1 of the patch was discusse= d at > [2]. > >> > >> Now, let's list what I would find useful to have in this mbuf rework: > >> > >> - larger size for ol_flags: this is at least needed for TSO, but as it > >> is completely full today, I expect to have this need for other > >> features. > >> - add other offload fields: l4_len and mss, required for TSO > >> - remove ctrl_mbuf: they could be replaced by a packet mbuf. It will > >> simplify the mbuf structure. Moreover, it would allow to save room > >> in the mbuf. > >> - a new vlan tag, I suppose this could be useful in some use-cases > >> where vlans are stacked. > >> - splitting out fields that are superimposed: if 2 features can be use= d > >> at the same time > >> > >> On the other hand, I'm not convinced by this: > >> > >> - new filters in the i40e driver: I don't think the mbuf is the > >> right place for driver-specific flags. If a feature is brought > >> by a new driver requiring a flag in mbuf, we should take care that > >> the flag is not bound to this particular driver and would match > >> the same feature in another driver. > >> - sequence number: I'm not sure I understand the use-case, maybe this > >> could stay in a mbuf meta data in the reordering module. > >> > >>> Firstly, we believe that there is no possible way that we can ever > >>> fit all the fields we need to fit into a 64-byte mbuf, and so we > >>> need to start looking at a 128-byte mbuf instead. > >> > >> The TSO patches show that it is possible to keep a 64 bytes mbuf (of > >> course, it depends on what we want to add in the mbuf). I'm not > >> fundamentally against having 128 bytes mbuf. But: > >> > >> - it should not be a reason for just adding things and not reworking > >> things that could be enhanced > >> - it should not be a reason for not optimizing the current mbuf > >> structure > >> - if we can do the same with a 64 bytes mbuf, we need to carefuly > >> compare the solutions as fetching a second cache line is not > >> costless in all situations. The 64 bytes solution I'm proposing > >> in [1] may cost a bit more in CPU cycles but avoids an additional > >> cache prefetch (or miss). In some situations (I'm thinking about > >> use-cases where we are memory-bound, e.g. an application > processing > >> a lot of data), it is better to loose a few CPU cycles. > >> > >>> First off the blocks is to look at moving the mempool pointer into > >>> the second cache line [...] Beyond this change, I'm also > >>> investigating potentially moving the "next" > >>> pointer to the second cache line, but it's looking harder to move > >>> without serious impact > >> > >> I think we can easily find DPDK applications that would use the "next" > >> field of the mbuf on rx side, as it is the standard way of chaining > >> packets. For > >> instance: IP reassembly, TCP/UDP socket queues, or any other protocol > >> that needs a reassembly queue. This is at least what we do in > >> 6WINDGate fast path stack, and I suppose other network stack > >> implementations would do something similar, so we should probably avoi= d > moving this field to the 2nd cache line. > >> > >> One more issue I do foresee, with slower CPUs like Atom, having 2 > >> cache lines will add more cost than on Xeon. I'm wondering if it make > >> sense to have a compilation time option to select either limited > >> features with one cache line or full features 2 line caches. I don't > >> know if it's a good idea because it would make the code more complex, > >> but we could consider it. I think we don't target binary compatibility= today? > >> > >> From a functional point of view, we could check that my TSO patch > >> can be adapted to your proposal so we can challenge and merge both > approaches. > >> > >> As this change would impact the core of DPDK, I think it would be > >> interesting to list some representative use-cases in order to > >> evaluate the cost of each solution. This will also help for future > >> modifications, and could be included in a sort of non-regression test? > >> > >> Regards, > >> Olivier > >> > >> [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html > >> [2] http://dpdk.org/ml/archives/dev/2014-May/002322.html > > > > Hi Olivier > > > > I am trying to convince you on the new field of "filter status". > > It is for matched Flow Director Filter ID, and might be reused for HASH > signature if it matches hash filter, or others. > > It is quite useful for Flow Director, and not a flag. I guess there sho= uld have > the similar feature even in non-Intel NICs. > > > By construction, since a packet cannot match more than 1 filter with an > associated identifier, this is typically the kind of field that should be= put in an > union with the standard 32-bit RSS id. >=20 > Regards, > Ivan >=20 > > Regards, > > Helin > > >=20 >=20 > -- > Ivan Boule > 6WIND Development Engineer Hi Ivan Yes, you are right. The 32 bits 'filter status' can be union-ed with RSS ID= . But it still needs another 32 bits. Why? i40e can report 64 bits 'flexible bytes', or report both 32 bits 'flexible = bytes' and 32 bits 'matched FD filter ID' at the same time. Regards, Helin