DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	"Damjan Marion \(damarion\)" <damarion@cisco.com>
Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline
Date: Mon, 15 Jun 2015 16:23:46 +0100	[thread overview]
Message-ID: <20150615152346.GC580@bricha3-MOBL3> (raw)
In-Reply-To: <557EECFF.3090402@6wind.com>

On Mon, Jun 15, 2015 at 05:19:27PM +0200, Olivier MATZ wrote:
> 
> 
> On 06/15/2015 04:52 PM, Ananyev, Konstantin wrote:
> > Hi Olivier,
> > 
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> >> Sent: Monday, June 15, 2015 3:31 PM
> >> To: Richardson, Bruce
> >> Cc: Ananyev, Konstantin; dev@dpdk.org; Damjan Marion (damarion)
> >> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> >>
> >>
> >>
> >> On 06/15/2015 04:12 PM, Bruce Richardson wrote:
> >>> On Mon, Jun 15, 2015 at 04:05:05PM +0200, Olivier MATZ wrote:
> >>>> Hi,
> >>>>
> >>>> On 06/15/2015 03:54 PM, Ananyev, Konstantin wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> >>>>>> Sent: Monday, June 15, 2015 2:44 PM
> >>>>>> To: Olivier MATZ
> >>>>>> Cc: dev@dpdk.org; Damjan Marion (damarion)
> >>>>>> Subject: Re: [dpdk-dev] rte_mbuf.next in 2nd cacheline
> >>>>>>
> >>>>>> On Mon, Jun 15, 2015 at 03:20:22PM +0200, Olivier MATZ wrote:
> >>>>>>> Hi Damjan,
> >>>>>>>
> >>>>>>> On 06/10/2015 11:47 PM, Damjan Marion (damarion) wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> We noticed 7% performance improvement by simply moving rte_mbuf.next field to the 1st cache line.
> >>>>>>>>
> >>>>>>>> Currently, it falls under /* second cache line - fields only used in slow path or on TX */
> >>>>>>>> but it is actually used at several places in rx fast path. (e.g.: i40e_rx_alloc_bufs() is setting that field to NULL).
> >>>>>>>>
> >>>>>>>> Is there anything we can do here (stop using next field, or move it to 1st cache line)?
> >>>>>>>
> >>>>>>> Agree, this is also something I noticed, see:
> >>>>>>> http://dpdk.org/ml/archives/dev/2015-February/014400.html
> >>>>>>>
> >>>>>>> I did not have the time to do performance testing, but it's something
> >>>>>>> I'd like to do as soon as I can. I don't see any obvious reason not to
> >>>>>>> do it.
> >>>>>>>
> >>>>>>> It seems we currently just have enough room to do it (8 bytes are
> >>>>>>> remaining in the first cache line when compiled in 64 bits).
> >>>>>>
> >>>>>> This, to me, is the obvious reason not to do it! It prevents us from taking in
> >>>>>> any other offload fields in the RX fast-path into the mbuf.
> >>>>>>
> >>>>>> That being said, I can see why we might want to look to move it - but from the
> >>>>>> work done in the ixgbe driver, I'd be hopeful we can get as much performance with
> >>>>>> it on the second cache line for most cases, through judicious use of prefetching,
> >>>>>> or otherwise.
> >>>>>>
> >>>>>> It took a lot of work and investigation to get free space in the mbuf - especially
> >>>>>> in cache line 0, and I'd like to avoid just filling the cache line up again as
> >>>>>> long as we possibly can!
> >>>>>
> >>>>> Yep, agree with Bruce here.
> >>>>> Plus, with packet_type going to be 4B and vlan_tci_outer,
> >>>>> we just don't have 8 free bytes at the first cache line any more.
> >>>>
> >>>> I don't understand why m->next would not be a better candidate than
> >>>> rx offload fields to be in the first cache line. For instance, m->next
> >>>> is mandatory and must be initialized when allocating a mbuf (to be
> >>>> compared with m->seqn for instance, which is also in the first cache
> >>>> line). So if we want to do some room in the first cache line, I
> >>>> think we can.
> >>>>
> >>>> To me, the only reason for not doing it now is because we don't
> >>>> have a full performance test report (several use-cases, drivers, ...)
> >>>> that shows it's better.
> >>>>
> >>> Because the "next" field is not mandatory to be set on initialization. It can
> >>> instead be set only when needed, and cleared on free if it is used.
> >>>
> >>> The next pointers always start out as NULL when the mbuf pool is created. The
> >>> only time it is set to non-NULL is when we have chained mbufs. If we never have
> >>> any chained mbufs, we never need to touch the next field, or even read it - since
> >>> we have the num-segments count in the first cache line. If we do have a multi-segment
> >>> mbuf, it's likely to be a big packet, so we have more processing time available
> >>> and we can then take the hit of setting the next pointer. Whenever we go to
> >>> free that mbuf for that packet, the code to do the freeing obviously needs to
> >>> read the next pointer so as to free all the buffers in the chain, and so it can
> >>> also reset the next pointer to NULL when doing so.
> >>>
> >>> In this way, we can ensure that the next pointer on cache line 1 is not a problem
> >>> in our fast path.
> >>
> >> This is a good idea, but looking at the drivers, it seems that today
> >> they all set m->next to NULL in the rx function. What you are suggesting
> >> is to remove all of them, and document somewhere that all mbufs in a
> >> pool are supposed to have their m->next set to NULL, correct?
> >>
> >> I think what you are describing could also apply to reference counter
> >> (set to 1 by default), right?
> > 
> > We probably can reset next to NULL at __rte_pktmbuf_prefree_seg(),
> > at the same time we do reset refcnt to 0.
> > Is that what you suggesting?  
> 
> Yes, I can give it a try.
> 
>
Why would we need to change that function? The main free_seg function (which
is called from rte_pktmbuf_free() function) already sets the next pointer to
NULL? Is there some edge case in the code now where we are missing setting
the next pointer to NULL on free? 

Also, any code that is not using the regular free functions i.e. mbuf_free or
free_seg - anything not starting with "-" :-) - is responsible itself for
ensuring that it frees things correctly. If it doesn't set next to NULL when it
needs to - it needs to be fixed there, rather than changing the prefree_seg
function.

/Bruce

> 
> > 
> > Konstantin
> > 
> >>
> >>
> >> Olivier
> >>
> >>
> >>>
> >>> /Bruce
> >>>

  reply	other threads:[~2015-06-15 15:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 21:47 Damjan Marion (damarion)
2015-06-15 13:20 ` Olivier MATZ
2015-06-15 13:44   ` Bruce Richardson
2015-06-15 13:54     ` Ananyev, Konstantin
2015-06-15 14:05       ` Olivier MATZ
2015-06-15 14:12         ` Bruce Richardson
2015-06-15 14:30           ` Olivier MATZ
2015-06-15 14:46             ` Bruce Richardson
2015-06-15 14:52             ` Ananyev, Konstantin
2015-06-15 15:19               ` Olivier MATZ
2015-06-15 15:23                 ` Bruce Richardson [this message]
2015-06-15 15:28                   ` Ananyev, Konstantin
2015-06-15 15:39                     ` Bruce Richardson
2015-06-15 15:59                       ` Ananyev, Konstantin
2015-06-15 16:02                         ` Bruce Richardson
2015-06-15 16:10                           ` Ananyev, Konstantin
2015-06-15 16:23                             ` Bruce Richardson
2015-06-15 18:34                               ` Ananyev, Konstantin
2015-06-15 20:47                                 ` Stephen Hemminger
2015-06-16  8:20                                   ` Bruce Richardson
2015-06-17 13:55           ` Damjan Marion (damarion)
2015-06-17 14:04             ` Thomas Monjalon
2015-06-17 14:06             ` Bruce Richardson
2015-06-17 14:23               ` Damjan Marion (damarion)
2015-06-17 16:32                 ` Thomas Monjalon
     [not found]               ` <0DE313B5-C9F0-4879-9D92-838ED088202C@cisco.com>
     [not found]                 ` <27EA8870B328F74E88180827A0F816396BD43720@xmb-aln-x10.cisco.com>
     [not found]                   ` <59AF69C657FD0841A61C55336867B5B0345592CD@IRSMSX103.ger.corp.intel.com>
     [not found]                     ` <1FD9B82B8BF2CF418D9A1000154491D97450B186@ORSMSX102.amr.corp.intel.com>
     [not found]                       ` <27EA8870B328F74E88180827A0F816396BD43891@xmb-aln-x10.cisco.com>
     [not found]                         ` <2601191342CEEE43887BDE71AB97725836A1237C@irsmsx105.ger.corp.intel.com>
2015-06-17 18:50                           ` Ananyev, Konstantin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150615152346.GC580@bricha3-MOBL3 \
    --to=bruce.richardson@intel.com \
    --cc=damarion@cisco.com \
    --cc=dev@dpdk.org \
    --cc=olivier.matz@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).