From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 28EA45A95 for ; Wed, 17 Jun 2015 16:06:51 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP; 17 Jun 2015 07:06:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,633,1427785200"; d="scan'208";a="745159166" Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.21]) by fmsmga002.fm.intel.com with SMTP; 17 Jun 2015 07:06:49 -0700 Received: by (sSMTP sendmail emulation); Wed, 17 Jun 2015 15:06:48 +0025 Date: Wed, 17 Jun 2015 15:06:48 +0100 From: Bruce Richardson To: "Damjan Marion (damarion)" Message-ID: <20150617140648.GC8208@bricha3-MOBL3> References: <87110795-201A-4A1E-A4CC-A778AA7C8218@cisco.com> <557ED116.7040508@6wind.com> <20150615134409.GA7500@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A838@irsmsx105.ger.corp.intel.com> <557EDB91.9010503@6wind.com> <20150615141258.GA580@bricha3-MOBL3> <68EBE73B-D251-4297-BFE2-E2D2A3AEFD33@cisco.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <68EBE73B-D251-4297-BFE2-E2D2A3AEFD33@cisco.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jun 2015 14:06:52 -0000 On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote: > > > On 15 Jun 2015, at 16:12, Bruce Richardson wrote: > > > > The next pointers always start out as NULL when the mbuf pool is created. The > > only time it is set to non-NULL is when we have chained mbufs. If we never have > > any chained mbufs, we never need to touch the next field, or even read it - since > > we have the num-segments count in the first cache line. If we do have a multi-segment > > mbuf, it's likely to be a big packet, so we have more processing time available > > and we can then take the hit of setting the next pointer. > > There are applications which are not using rx offload, but they deal with chained mbufs. > Why they are less important than ones using rx offload? This is something people > should be able to configure on build time. It's not that they are less important, it's that the packet processing cycle count budget is going to be greater. A packet which is 64 bytes, or 128 bytes in size can make use of a number of RX offloads to reduce it's processing time. However, a 64/128 packet is not going to be split across multiple buffers [unless we are dealing with a very unusual setup!]. To handle 64 byte packets at 40G line rate, one has 50 cycles per core per packet when running at 3GHz. [3000000000 cycles / 59.5 mpps]. If we assume that we are dealing with fairly small buffers here, and that anything greater than 1k packets are chained, we still have 626 cycles per 3GHz core per packet to work with for that 1k packet. Given that "normal" DPDK buffers are 2k in size, we have over a thousand cycles per packet for any packet that is split. In summary, packets spread across multiple buffers are large packets, and so have larger packet cycle count budgets and so can much better absorb the cost of touching a second cache line in the mbuf than a 64-byte packet can. Therefore, we optimize for the 64B packet case. Hope this clarifies things a bit. Regards, /Bruce