From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 06193C30A for ; Mon, 15 Jun 2015 17:29:04 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 15 Jun 2015 08:28:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,618,1427785200"; d="scan'208";a="747016060" Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155]) by orsmga002.jf.intel.com with ESMTP; 15 Jun 2015 08:28:57 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX102.ger.corp.intel.com (163.33.3.155) with Microsoft SMTP Server (TLS) id 14.3.224.2; Mon, 15 Jun 2015 16:28:56 +0100 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.73]) by irsmsx155.ger.corp.intel.com ([169.254.14.49]) with mapi id 14.03.0224.002; Mon, 15 Jun 2015 16:28:55 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" , Olivier MATZ Thread-Topic: [dpdk-dev] rte_mbuf.next in 2nd cacheline Thread-Index: AQHQo8caRiJzxo5OE0esWvWQH/eOLp2tg74AgAAGpYCAABHI0P//9BKAgAACMwCAAAUGAIAAFgVg///3iYCAAAE0AIAAEVcw Date: Mon, 15 Jun 2015 15:28:55 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836A0A8FB@irsmsx105.ger.corp.intel.com> References: <87110795-201A-4A1E-A4CC-A778AA7C8218@cisco.com> <557ED116.7040508@6wind.com> <20150615134409.GA7500@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A838@irsmsx105.ger.corp.intel.com> <557EDB91.9010503@6wind.com> <20150615141258.GA580@bricha3-MOBL3> <557EE1A0.609@6wind.com> <2601191342CEEE43887BDE71AB97725836A0A8A8@irsmsx105.ger.corp.intel.com> <557EECFF.3090402@6wind.com> <20150615152346.GC580@bricha3-MOBL3> In-Reply-To: <20150615152346.GC580@bricha3-MOBL3> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" , "Damjan Marion \(damarion\)" Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 15:29:05 -0000 > -----Original Message----- > From: Richardson, Bruce > Sent: Monday, June 15, 2015 4:24 PM > To: Olivier MATZ > Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline >=20 > On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote: > > > > > > On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote: > > > Hi Olivier, > > > > > >> -----Original Message----- > > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > > >> Sent: Monday, June 15, 2015 3:31 PM > > >> To: Richardson, Bruce > > >> Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > > >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > >> > > >> > > >> > > >> On 06/15/2015 04:12 PM, Bruce Richardson wrote: > > >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote: > > >>>> Hi, > > >>>> > > >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote: > > >>>>> > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richa= rdson > > >>>>>> Sent: Monday, June 15, 2015 2:44 PM > > >>>>>> To: Olivier MATZ > > >>>>>> Cc: dev@dpdk.org; Damjan Marion (damarion) > > >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > >>>>>> > > >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote: > > >>>>>>> Hi Damjan, > > >>>>>>> > > >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote: > > >>>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>> We noticed 7% performance improvement by simply moving rte_mbu= f.next field to the 1st cache line. > > >>>>>>>> > > >>>>>>>> Currently, it falls under /* second cache line - fields only u= sed in slow path or on TX */ > > >>>>>>>> but it is actually used at several places in rx fast path. (e.= g.: i40e_rx_alloc_bufs() is setting that field to NULL). > > >>>>>>>> > > >>>>>>>> Is there anything we can do here (stop using next field, or mo= ve it to 1st cache line)? > > >>>>>>> > > >>>>>>> Agree, this is also something I noticed, see: > > >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html > > >>>>>>> > > >>>>>>> I did not have the time to do performance testing, but it's som= ething > > >>>>>>> I'd like to do as soon as I can. I don't see any obvious reason= not to > > >>>>>>> do it. > > >>>>>>> > > >>>>>>> It seems we currently just have enough room to do it (8 bytes a= re > > >>>>>>> remaining in the first cache line when compiled in 64 bits). > > >>>>>> > > >>>>>> This, to me, is the obvious reason not to do it! It prevents us = from taking in > > >>>>>> any other offload fields in the RX fast-path into the mbuf. > > >>>>>> > > >>>>>> That being said, I can see why we might want to look to move it = - but from the > > >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as much= performance with > > >>>>>> it on the second cache line for most cases, through judicious us= e of prefetching, > > >>>>>> or otherwise. > > >>>>>> > > >>>>>> It took a lot of work and investigation to get free space in the= mbuf - especially > > >>>>>> in cache line 0, and I'd like to avoid just filling the cache li= ne up again as > > >>>>>> long as we possibly can! > > >>>>> > > >>>>> Yep, agree with Bruce here. > > >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer, > > >>>>> we just don't have 8 free bytes at the first cache line any more. > > >>>> > > >>>> I don't understand why m->next would not be a better candidate tha= n > > >>>> rx offload fields to be in the first cache line. For instance, m->= next > > >>>> is mandatory and must be initialized when allocating a mbuf (to be > > >>>> compared with m->seqn for instance, which is also in the first cac= he > > >>>> line). So if we want to do some room in the first cache line, I > > >>>> think we can. > > >>>> > > >>>> To me, the only reason for not doing it now is because we don't > > >>>> have a full performance test report (several use-cases, drivers, .= ..) > > >>>> that shows it's better. > > >>>> > > >>> Because the "next" field is not mandatory to be set on initializati= on. It can > > >>> instead be set only when needed, and cleared on free if it is used. > > >>> > > >>> The next pointers always start out as NULL when the mbuf pool is cr= eated. The > > >>> only time it is set to non-NULL is when we have chained mbufs. If w= e never have > > >>> any chained mbufs, we never need to touch the next field, or even r= ead it - since > > >>> we have the num-segments count in the first cache line. If we do ha= ve a multi-segment > > >>> mbuf, it's likely to be a big packet, so we have more processing ti= me available > > >>> and we can then take the hit of setting the next pointer. Whenever = we go to > > >>> free that mbuf for that packet, the code to do the freeing obviousl= y needs to > > >>> read the next pointer so as to free all the buffers in the chain, a= nd so it can > > >>> also reset the next pointer to NULL when doing so. > > >>> > > >>> In this way, we can ensure that the next pointer on cache line 1 is= not a problem > > >>> in our fast path. > > >> > > >> This is a good idea, but looking at the drivers, it seems that today > > >> they all set m->next to NULL in the rx function. What you are sugges= ting > > >> is to remove all of them, and document somewhere that all mbufs in a > > >> pool are supposed to have their m->next set to NULL, correct? > > >> > > >> I think what you are describing could also apply to reference counte= r > > >> (set to 1 by default), right? > > > > > > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg(), > > > at the same time we do reset refcnt to 0. > > > Is that what you suggesting? > > > > Yes, I can give it a try. > > > > > Why would we need to change that function? The main free_seg function (wh= ich > is called from rte_pktmbuf_free() function) already sets the next pointer= to > NULL? Is there some edge case in the code now where we are missing settin= g > the next pointer to NULL on free? ixgbe_tx_free_bufs() at drivers/net/ixgbe/ixgbe_rxtx_vec.c >=20 > Also, any code that is not using the regular free functions i.e. mbuf_fre= e or > free_seg - anything not starting with "-" :-) - is responsible itself for > ensuring that it frees things correctly. If it doesn't set next to NULL w= hen it > needs to - it needs to be fixed there, rather than changing the prefree_s= eg > function. Why do think it would be a problem? It seems like a good idea, tohide it inside __rte_pktmbuf_prefree_seg(), So users don't have to set it to NULL manually each time. Konstantin >=20 > /Bruce >=20 > > > > > > > > Konstantin > > > > > >> > > >> > > >> Olivier > > >> > > >> > > >>> > > >>> /Bruce > > >>>