From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 88D9C5A5E for ; Mon, 15 Jun 2015 17:59:58 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP; 15 Jun 2015 08:59:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,618,1427785200"; d="scan'208";a="508606673" Received: from irsmsx151.ger.corp.intel.com ([163.33.192.59]) by FMSMGA003.fm.intel.com with ESMTP; 15 Jun 2015 08:59:55 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.73]) by IRSMSX151.ger.corp.intel.com ([169.254.4.102]) with mapi id 14.03.0224.002; Mon, 15 Jun 2015 16:59:55 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" Thread-Topic: [dpdk-dev] rte_mbuf.next in 2nd cacheline Thread-Index: AQHQo8caRiJzxo5OE0esWvWQH/eOLp2tg74AgAAGpYCAABHI0P//9BKAgAACMwCAAAUGAIAAFgVg///3iYCAAAE0AIAAEVcw///zHoAAAn6f8A== Date: Mon, 15 Jun 2015 15:59:55 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836A0A91C@irsmsx105.ger.corp.intel.com> References: <557ED116.7040508@6wind.com> <20150615134409.GA7500@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A838@irsmsx105.ger.corp.intel.com> <557EDB91.9010503@6wind.com> <20150615141258.GA580@bricha3-MOBL3> <557EE1A0.609@6wind.com> <2601191342CEEE43887BDE71AB97725836A0A8A8@irsmsx105.ger.corp.intel.com> <557EECFF.3090402@6wind.com> <20150615152346.GC580@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725836A0A8FB@irsmsx105.ger.corp.intel.com> <20150615153943.GD580@bricha3-MOBL3> In-Reply-To: <20150615153943.GD580@bricha3-MOBL3> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" , "Damjan Marion \(damarion\)" Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jun 2015 15:59:59 -0000 > -----Original Message----- > From: Richardson, Bruce > Sent: Monday, June 15, 2015 4:40 PM > To: Ananyev, Konstantin; # > Cc: Olivier MATZ; dev@dpdk.org; Damjan Marion (damarion) > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline >=20 > On Mon, Jun 15, 2015 at 04:28:55PM +0100, Ananyev, Konstantin wrote: > > > > > > > -----Original Message----- > > > From: Richardson, Bruce > > > Sent: Monday, June 15, 2015 4:24 PM > > > To: Olivier MATZ > > > Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > > > Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > > > > On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote: > > > > > > > > > > > > On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote: > > > > > Hi Olivier, > > > > > > > > > >> -----Original Message----- > > > > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > > > > >> Sent: Monday, June 15, 2015 3:31 PM > > > > >> To: Richardson, Bruce > > > > >> Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion) > > > > >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > > >> > > > > >> > > > > >> > > > > >> On 06/15/2015 04:12 PM, Bruce Richardson wrote: > > > > >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote: > > > > >>>> Hi, > > > > >>>> > > > > >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote: > > > > >>>>> > > > > >>>>> > > > > >>>>>> -----Original Message----- > > > > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce R= ichardson > > > > >>>>>> Sent: Monday, June 15, 2015 2:44 PM > > > > >>>>>> To: Olivier MATZ > > > > >>>>>> Cc: dev@dpdk.org; Damjan Marion (damarion) > > > > >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline > > > > >>>>>> > > > > >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote= : > > > > >>>>>>> Hi Damjan, > > > > >>>>>>> > > > > >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote: > > > > >>>>>>>> > > > > >>>>>>>> Hi, > > > > >>>>>>>> > > > > >>>>>>>> We noticed 7% performance improvement by simply moving rte= _mbuf.next field to the 1st cache line. > > > > >>>>>>>> > > > > >>>>>>>> Currently, it falls under /* second cache line - fields on= ly used in slow path or on TX */ > > > > >>>>>>>> but it is actually used at several places in rx fast path.= (e.g.: i40e_rx_alloc_bufs() is setting that field to NULL). > > > > >>>>>>>> > > > > >>>>>>>> Is there anything we can do here (stop using next field, o= r move it to 1st cache line)? > > > > >>>>>>> > > > > >>>>>>> Agree, this is also something I noticed, see: > > > > >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html > > > > >>>>>>> > > > > >>>>>>> I did not have the time to do performance testing, but it's= something > > > > >>>>>>> I'd like to do as soon as I can. I don't see any obvious re= ason not to > > > > >>>>>>> do it. > > > > >>>>>>> > > > > >>>>>>> It seems we currently just have enough room to do it (8 byt= es are > > > > >>>>>>> remaining in the first cache line when compiled in 64 bits)= . > > > > >>>>>> > > > > >>>>>> This, to me, is the obvious reason not to do it! It prevents= us from taking in > > > > >>>>>> any other offload fields in the RX fast-path into the mbuf. > > > > >>>>>> > > > > >>>>>> That being said, I can see why we might want to look to move= it - but from the > > > > >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as = much performance with > > > > >>>>>> it on the second cache line for most cases, through judiciou= s use of prefetching, > > > > >>>>>> or otherwise. > > > > >>>>>> > > > > >>>>>> It took a lot of work and investigation to get free space in= the mbuf - especially > > > > >>>>>> in cache line 0, and I'd like to avoid just filling the cach= e line up again as > > > > >>>>>> long as we possibly can! > > > > >>>>> > > > > >>>>> Yep, agree with Bruce here. > > > > >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer, > > > > >>>>> we just don't have 8 free bytes at the first cache line any m= ore. > > > > >>>> > > > > >>>> I don't understand why m->next would not be a better candidate= than > > > > >>>> rx offload fields to be in the first cache line. For instance,= m->next > > > > >>>> is mandatory and must be initialized when allocating a mbuf (t= o be > > > > >>>> compared with m->seqn for instance, which is also in the first= cache > > > > >>>> line). So if we want to do some room in the first cache line, = I > > > > >>>> think we can. > > > > >>>> > > > > >>>> To me, the only reason for not doing it now is because we don'= t > > > > >>>> have a full performance test report (several use-cases, driver= s, ...) > > > > >>>> that shows it's better. > > > > >>>> > > > > >>> Because the "next" field is not mandatory to be set on initiali= zation. It can > > > > >>> instead be set only when needed, and cleared on free if it is u= sed. > > > > >>> > > > > >>> The next pointers always start out as NULL when the mbuf pool i= s created. The > > > > >>> only time it is set to non-NULL is when we have chained mbufs. = If we never have > > > > >>> any chained mbufs, we never need to touch the next field, or ev= en read it - since > > > > >>> we have the num-segments count in the first cache line. If we d= o have a multi-segment > > > > >>> mbuf, it's likely to be a big packet, so we have more processin= g time available > > > > >>> and we can then take the hit of setting the next pointer. Whene= ver we go to > > > > >>> free that mbuf for that packet, the code to do the freeing obvi= ously needs to > > > > >>> read the next pointer so as to free all the buffers in the chai= n, and so it can > > > > >>> also reset the next pointer to NULL when doing so. > > > > >>> > > > > >>> In this way, we can ensure that the next pointer on cache line = 1 is not a problem > > > > >>> in our fast path. > > > > >> > > > > >> This is a good idea, but looking at the drivers, it seems that t= oday > > > > >> they all set m->next to NULL in the rx function. What you are su= ggesting > > > > >> is to remove all of them, and document somewhere that all mbufs = in a > > > > >> pool are supposed to have their m->next set to NULL, correct? > > > > >> > > > > >> I think what you are describing could also apply to reference co= unter > > > > >> (set to 1 by default), right? > > > > > > > > > > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg()= , > > > > > at the same time we do reset refcnt to 0. > > > > > Is that what you suggesting? > > > > > > > > Yes, I can give it a try. > > > > > > > > > > > Why would we need to change that function? The main free_seg function= (which > > > is called from rte_pktmbuf_free() function) already sets the next poi= nter to > > > NULL? Is there some edge case in the code now where we are missing se= tting > > > the next pointer to NULL on free? > > > > ixgbe_tx_free_bufs() at drivers/net/ixgbe/ixgbe_rxtx_vec.c > > >=20 > Not a gap - the vector TX functions cannot deal with scattered packets ge= nerally, > not just freeing them :-) True :) >=20 > > > > > > Also, any code that is not using the regular free functions i.e. mbuf= _free or > > > free_seg - anything not starting with "-" :-) - is responsible itself= for > > > ensuring that it frees things correctly. If it doesn't set next to NU= LL when it > > > needs to - it needs to be fixed there, rather than changing the prefr= ee_seg > > > function. > > > > Why do think it would be a problem? > > It seems like a good idea, tohide it inside __rte_pktmbuf_prefree_seg()= , > > So users don't have to set it to NULL manually each time. > > >=20 > I'm just nervous of slowing things down. Adding an extra instruction here= applies > to every single packet - it's not just per burst. That's why I'd like to = be > sure there is a definite problem before adding in more instructions here. As I can see, vector TX is the only one that calls __rte_pktmbuf_prefree_se= g() directly. All others use rte_pktmbuf_free_seg(), that does ' m->next =3D NULL' anyway= . For vector TX - yes, need to verify that it would not introduce a slowdown. Konstantin >=20 > /Bruce >=20 > > Konstantin > > > > > > > > /Bruce > > > > > > > > > > > > > > > > > Konstantin > > > > > > > > > >> > > > > >> > > > > >> Olivier > > > > >> > > > > >> > > > > >>> > > > > >>> /Bruce > > > > >>>