From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 3F0626829 for ; Tue, 8 Jul 2014 09:04:48 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 07 Jul 2014 23:59:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,624,1400050800"; d="scan'208";a="540102338" Received: from fmsmsx106.amr.corp.intel.com ([10.19.9.37]) by orsmga001.jf.intel.com with ESMTP; 08 Jul 2014 00:04:38 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.110.14) by FMSMSX106.amr.corp.intel.com (10.19.9.37) with Microsoft SMTP Server (TLS) id 14.3.123.3; Tue, 8 Jul 2014 00:04:38 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.122]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.210]) with mapi id 14.03.0123.003; Tue, 8 Jul 2014 15:04:37 +0800 From: "Zhang, Helin" To: Olivier MATZ , "Richardson, Bruce" Thread-Topic: [dpdk-dev] Making space in mbuf data-structure Thread-Index: AQHPmczGhalggDbv/kymVhyh5xoE/5uVwDNg Date: Tue, 8 Jul 2014 07:04:36 +0000 Message-ID: References: <59AF69C657FD0841A61C55336867B5B02CF13E32@IRSMSX103.ger.corp.intel.com> <53BA7422.9080706@6wind.com> In-Reply-To: <53BA7422.9080706@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Making space in mbuf data-structure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Jul 2014 07:04:48 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ > Sent: Monday, July 7, 2014 6:19 PM > To: Richardson, Bruce; dev@dpdk.org > Subject: Re: [dpdk-dev] Making space in mbuf data-structure >=20 > Hello Bruce, >=20 > Thank you to revive this discussion now that the 1.7 is released. >=20 > First, I would like to reference my previous patch series that first rewo= rks the > mbuf to gain 9 bytes [1]. The v1 of the patch was discussed at [2]. >=20 > Now, let's list what I would find useful to have in this mbuf rework: >=20 > - larger size for ol_flags: this is at least needed for TSO, but as it > is completely full today, I expect to have this need for other > features. > - add other offload fields: l4_len and mss, required for TSO > - remove ctrl_mbuf: they could be replaced by a packet mbuf. It will > simplify the mbuf structure. Moreover, it would allow to save room > in the mbuf. > - a new vlan tag, I suppose this could be useful in some use-cases > where vlans are stacked. > - splitting out fields that are superimposed: if 2 features can be used > at the same time >=20 > On the other hand, I'm not convinced by this: >=20 > - new filters in the i40e driver: I don't think the mbuf is the > right place for driver-specific flags. If a feature is brought > by a new driver requiring a flag in mbuf, we should take care that > the flag is not bound to this particular driver and would match > the same feature in another driver. > - sequence number: I'm not sure I understand the use-case, maybe this > could stay in a mbuf meta data in the reordering module. >=20 > > Firstly, we believe that there is no possible way that we can ever fit > > all the fields we need to fit into a 64-byte mbuf, and so we need to > > start looking at a 128-byte mbuf instead. >=20 > The TSO patches show that it is possible to keep a 64 bytes mbuf (of cour= se, it > depends on what we want to add in the mbuf). I'm not fundamentally agains= t > having 128 bytes mbuf. But: >=20 > - it should not be a reason for just adding things and not reworking > things that could be enhanced > - it should not be a reason for not optimizing the current mbuf > structure > - if we can do the same with a 64 bytes mbuf, we need to carefuly > compare the solutions as fetching a second cache line is not > costless in all situations. The 64 bytes solution I'm proposing > in [1] may cost a bit more in CPU cycles but avoids an additional > cache prefetch (or miss). In some situations (I'm thinking about > use-cases where we are memory-bound, e.g. an application processing > a lot of data), it is better to loose a few CPU cycles. >=20 > > First off the blocks is to look at moving the mempool pointer into the > > second cache line [...] Beyond this change, I'm also investigating > > potentially moving the "next" > > pointer to the second cache line, but it's looking harder to move > > without serious impact >=20 > I think we can easily find DPDK applications that would use the "next" > field of the mbuf on rx side, as it is the standard way of chaining packe= ts. For > instance: IP reassembly, TCP/UDP socket queues, or any other protocol tha= t > needs a reassembly queue. This is at least what we do in 6WINDGate fast p= ath > stack, and I suppose other network stack implementations would do somethi= ng > similar, so we should probably avoid moving this field to the 2nd cache l= ine. >=20 > One more issue I do foresee, with slower CPUs like Atom, having 2 cache l= ines > will add more cost than on Xeon. I'm wondering if it make sense to have a > compilation time option to select either limited features with one cache = line or > full features 2 line caches. I don't know if it's a good idea because it = would make > the code more complex, but we could consider it. I think we don't target = binary > compatibility today? >=20 > From a functional point of view, we could check that my TSO patch can be > adapted to your proposal so we can challenge and merge both approaches. >=20 > As this change would impact the core of DPDK, I think it would be interes= ting to > list some representative use-cases in order to evaluate the cost of each > solution. This will also help for future modifications, and could be incl= uded in a > sort of non-regression test? >=20 > Regards, > Olivier >=20 > [1] http://dpdk.org/ml/archives/dev/2014-May/002537.html > [2] http://dpdk.org/ml/archives/dev/2014-May/002322.html Hi Olivier I am trying to convince you on the new field of "filter status". It is for matched Flow Director Filter ID, and might be reused for HASH sig= nature if it matches hash filter, or others. It is quite useful for Flow Director, and not a flag. I guess there should = have the similar feature even in non-Intel NICs. Regards, Helin