From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 6606BC30A for ; Mon, 15 Jun 2015 17:39:48 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 15 Jun 2015 08:39:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,618,1427785200"; d="scan'208";a="711303217" Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.22]) by orsmga001.jf.intel.com with SMTP; 15 Jun 2015 08:39:44 -0700 Received: by (sSMTP sendmail emulation); Mon, 15 Jun 2015 16:39:43 +0025 Date: Mon, 15 Jun 2015 16:39:43 +0100 From: Bruce Richardson To: "Ananyev, Konstantin" , # Message-ID: <20150615153943.GD580@bricha3-MOBL3> References: <557ED116.7040508@6wind.com> <20150615134409.GA7500@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A838@irsmsx105.ger.corp.intel.com> <557EDB91.9010503@6wind.com> <20150615141258.GA580@bricha3-MOBL3> <557EE1A0.609@6wind.com> <2601191342CEEE43887BDE71AB97725836A0A8A8@irsmsx105.ger.corp.intel.com> <557EECFF.3090402@6wind.com> <20150615152346.GC580@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A8FB@irsmsx105.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2601191342CEEE43887BDE71AB97725836A0A8FB@irsmsx105.ger.corp.intel.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "dev@dpdk.org" , "Damjan Marion \(damarion\)" Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 15:39:49 -0000 On Mon, Jun 15, 2015 at 04:28:55PM +0100, Ananyev, Konstantin wrote: > > > > -----Original Message----- > > From: Richardson, Bruce > > Sent: Monday, June 15, 2015 4:24 PM > > To: Olivier MATZ > > Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > > On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote: > > > > > > > > > On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote: > > > > Hi Olivier, > > > > > > > >> -----Original Message----- > > > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > > > >> Sent: Monday, June 15, 2015 3:31 PM > > > >> To: Richardson, Bruce > > > >> Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > > > >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > >> > > > >> > > > >> > > > >> On 06/15/2015 04:12 PM, Bruce Richardson wrote: > > > >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote: > > > >>>> Hi, > > > >>>> > > > >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote: > > > >>>>> > > > >>>>> > > > >>>>>> -----Original Message----- > > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > > >>>>>> Sent: Monday, June 15, 2015 2:44 PM > > > >>>>>> To: Olivier MATZ > > > >>>>>> Cc: dev@dpdk.org; Damjan Marion (damarion) > > > >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > >>>>>> > > > >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote: > > > >>>>>>> Hi Damjan, > > > >>>>>>> > > > >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote: > > > >>>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> We noticed 7% performance improvement by simply moving rte_mbuf.next field to the 1st cache line. > > > >>>>>>>> > > > >>>>>>>> Currently, it falls under /* second cache line - fields only used in slow path or on TX */ > > > >>>>>>>> but it is actually used at several places in rx fast path. (e.g.: i40e_rx_alloc_bufs() is setting that field to NULL). > > > >>>>>>>> > > > >>>>>>>> Is there anything we can do here (stop using next field, or move it to 1st cache line)? > > > >>>>>>> > > > >>>>>>> Agree, this is also something I noticed, see: > > > >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html > > > >>>>>>> > > > >>>>>>> I did not have the time to do performance testing, but it's something > > > >>>>>>> I'd like to do as soon as I can. I don't see any obvious reason not to > > > >>>>>>> do it. > > > >>>>>>> > > > >>>>>>> It seems we currently just have enough room to do it (8 bytes are > > > >>>>>>> remaining in the first cache line when compiled in 64 bits). > > > >>>>>> > > > >>>>>> This, to me, is the obvious reason not to do it! It prevents us from taking in > > > >>>>>> any other offload fields in the RX fast-path into the mbuf. > > > >>>>>> > > > >>>>>> That being said, I can see why we might want to look to move it - but from the > > > >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as much performance with > > > >>>>>> it on the second cache line for most cases, through judicious use of prefetching, > > > >>>>>> or otherwise. > > > >>>>>> > > > >>>>>> It took a lot of work and investigation to get free space in the mbuf - especially > > > >>>>>> in cache line 0, and I'd like to avoid just filling the cache line up again as > > > >>>>>> long as we possibly can! > > > >>>>> > > > >>>>> Yep, agree with Bruce here. > > > >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer, > > > >>>>> we just don't have 8 free bytes at the first cache line any more. > > > >>>> > > > >>>> I don't understand why m->next would not be a better candidate than > > > >>>> rx offload fields to be in the first cache line. For instance, m->next > > > >>>> is mandatory and must be initialized when allocating a mbuf (to be > > > >>>> compared with m->seqn for instance, which is also in the first cache > > > >>>> line). So if we want to do some room in the first cache line, I > > > >>>> think we can. > > > >>>> > > > >>>> To me, the only reason for not doing it now is because we don't > > > >>>> have a full performance test report (several use-cases, drivers, ...) > > > >>>> that shows it's better. > > > >>>> > > > >>> Because the "next" field is not mandatory to be set on initialization. It can > > > >>> instead be set only when needed, and cleared on free if it is used. > > > >>> > > > >>> The next pointers always start out as NULL when the mbuf pool is created. The > > > >>> only time it is set to non-NULL is when we have chained mbufs. If we never have > > > >>> any chained mbufs, we never need to touch the next field, or even read it - since > > > >>> we have the num-segments count in the first cache line. If we do have a multi-segment > > > >>> mbuf, it's likely to be a big packet, so we have more processing time available > > > >>> and we can then take the hit of setting the next pointer. Whenever we go to > > > >>> free that mbuf for that packet, the code to do the freeing obviously needs to > > > >>> read the next pointer so as to free all the buffers in the chain, and so it can > > > >>> also reset the next pointer to NULL when doing so. > > > >>> > > > >>> In this way, we can ensure that the next pointer on cache line 1 is not a problem > > > >>> in our fast path. > > > >> > > > >> This is a good idea, but looking at the drivers, it seems that today > > > >> they all set m->next to NULL in the rx function. What you are suggesting > > > >> is to remove all of them, and document somewhere that all mbufs in a > > > >> pool are supposed to have their m->next set to NULL, correct? > > > >> > > > >> I think what you are describing could also apply to reference counter > > > >> (set to 1 by default), right? > > > > > > > > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg(), > > > > at the same time we do reset refcnt to 0. > > > > Is that what you suggesting? > > > > > > Yes, I can give it a try. > > > > > > > > Why would we need to change that function? The main free_seg function (which > > is called from rte_pktmbuf_free() function) already sets the next pointer to > > NULL? Is there some edge case in the code now where we are missing setting > > the next pointer to NULL on free? > > ixgbe_tx_free_bufs() at drivers/net/ixgbe/ixgbe_rxtx_vec.c > Not a gap - the vector TX functions cannot deal with scattered packets generally, not just freeing them :-) > > > > Also, any code that is not using the regular free functions i.e. mbuf_free or > > free_seg - anything not starting with "-" :-) - is responsible itself for > > ensuring that it frees things correctly. If it doesn't set next to NULL when it > > needs to - it needs to be fixed there, rather than changing the prefree_seg > > function. > > Why do think it would be a problem? > It seems like a good idea, tohide it inside __rte_pktmbuf_prefree_seg(), > So users don't have to set it to NULL manually each time. > I'm just nervous of slowing things down. Adding an extra instruction here applies to every single packet - it's not just per burst. That's why I'd like to be sure there is a definite problem before adding in more instructions here. /Bruce > Konstantin > > > > > /Bruce > > > > > > > > > > > > > Konstantin > > > > > > > >> > > > >> > > > >> Olivier > > > >> > > > >> > > > >>> > > > >>> /Bruce > > > >>>