From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id BED565598 for ; Tue, 21 Feb 2017 15:28:49 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP; 21 Feb 2017 06:28:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.35,189,1484035200"; d="scan'208";a="227935960" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.61]) by fmsmga004.fm.intel.com with SMTP; 21 Feb 2017 06:28:46 -0800 Received: by (sSMTP sendmail emulation); Tue, 21 Feb 2017 14:28:46 +0000 Date: Tue, 21 Feb 2017 14:28:46 +0000 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Olivier Matz , "Ananyev, Konstantin" , dev@dpdk.org Message-ID: <20170221142845.GA210668@bricha3-MOBL3.ger.corp.intel.com> References: <1485271173-13408-1-git-send-email-olivier.matz@6wind.com> <2601191342CEEE43887BDE71AB9772583F111A29@irsmsx105.ger.corp.intel.com> <20170216144807.7add2c71@platinum> <20170216154619.GA115208@bricha3-MOBL3.ger.corp.intel.com> <20170216171410.57bff4ed@platinum> <98CBD80474FA8B44BF855DF32C47DC359EAD0D@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC359EAD0D@smartserver.smartshare.dk> Organization: Intel Research and =?iso-8859-1?Q?De=ACvel?= =?iso-8859-1?Q?opment?= Ireland Ltd. User-Agent: Mutt/1.7.2 (2016-11-26) Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Feb 2017 14:28:50 -0000 On Tue, Feb 21, 2017 at 03:20:23PM +0100, Morten Brørup wrote: > Comments at the end. > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Matz > > Sent: Thursday, February 16, 2017 5:14 PM > > To: Bruce Richardson > > Cc: Ananyev, Konstantin; dev@dpdk.org > > Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization > > > > On Thu, 16 Feb 2017 15:46:19 +0000, Bruce Richardson > > wrote: > > > On Thu, Feb 16, 2017 at 02:48:07PM +0100, Olivier Matz wrote: > > > > Hi Konstantin, > > > > > > > > Thanks for the feedback. > > > > Comments inline. > > > > > > > > > > > > On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin" > > > > wrote: > > > > > Hi Olivier, > > > > > Looks good in general, some comments from me below. > > > > > Thanks > > > > > Konstantin > > > > > > > > > > > > > > > > > The main changes are: > > > > > > - reorder structure to increase vector performance on some > > > > > > non-ia platforms. > > > > > > - add a 64bits timestamp field in the 1st cache line > > > > > > > > > > Wonder why it deserves to be in first cache line? > > > > > How it differs from seqn below (pure SW stuff right now). > > > > > > > > In case the timestamp is set from a NIC value, it is set in the Rx > > > > path. So that's why I think it deserve to be located in the 1st > > > > cache line. > > > > > > > > As you said, the seqn is a pure sw stuff right: it is set in a lib, > > > > not in a PMD rx path. > > > > > > > > > > - m->next, m->nb_segs, and m->refcnt are always initialized for > > > > > > mbufs in the pool, avoiding the need of setting m->next > > (located > > > > > > in the 2nd cache line) in the Rx path for mono-segment packets. > > > > > > - change port and nb_segs to 16 bits > > > > > > > > > > Not that I am completely against it, but changing nb_segs to 16 > > > > > bits seems like an overkill to me. > > > > > I think we can keep and extra 8bits for something more useful in > > > > > future. > > > > > > > > In my case, I use the m->next field to chain more than 256 segments > > > > for L4 socket buffers. It also updates nb_seg that can overflow. > > > > It's not a big issue since at the end, nb_seg is decremented for > > > > each segment. On the other hand, if I enable some sanity checks on > > > > mbufs, it complains because the number of segments is not equal to > > > > nb_seg. > > > > > > > > There is also another use case with fragmentation as discussed > > > > recently: http://dpdk.org/dev/patchwork/patch/19819/ > > > > > > > > Of course, dealing with a long mbuf list is not that efficient, but > > > > the application can maintain another structure to accelerate the > > > > access to the middle/end of the list. > > > > > > > > Finally, we have other ideas to get additional 8 bits if required > > in > > > > the future, so I don't think it's really a problem. > > > > > > > > > > > > > > > > > > > - move seqn in the 2nd cache line > > > > > > > > > > > > Things discussed but not done in the patchset: > > > > > > - move refcnt and nb_segs to the 2nd cache line: many drivers > > > > > > sets them in the Rx path, so it could introduce a performance > > > > > > regression, or > > > > > > > > > > I wonder can refcnt only be moved into the 2-nd cacheline? > > > > > As I understand thanks to other change (from above) m->refcnt > > will > > > > > already be initialized, so RX code don't need to touch it. > > > > > Though yes, it still would require changes in all PMDs. > > > > > > > > Yes, I agree, some fields could be moved in the 2nd cache line once > > > > all PMDs stop to write them in RX path. I propose to issue some > > > > guidelines to PMD maintainers at the same time the patchset is > > > > pushed. Then we can consider changing it in a future version, in > > > > case we need more room in the 1st mbuf cache line. > > > > > > > > > > If we are changing things, we should really do all that now, rather > > > than storing up future breaks to mbuf. Worst case, we should plan for > > > it immediately after the release where we make these changes. Have > > two > > > releases that break mbuf immediately after each other - and flagged > > as > > > such, but keep it stable thereafter. I don't like having technical > > > debt on mbuf just after we supposedly "fix" it. > > > > I think there is no need to do this change now. And I don't feel good > > with the idea of having a patchset that updates all the PMDs to remove > > the access to a field because it moved to the 2nd cache line > > (especially thinking about vector PMDs). > > > > That's why I think the plan could be: > > - push an updated version of this patchset quickly > > - advertise to PMD maintainers "you don't need to set the m->next, > > m->refcnt, and m->nb_segs in the RX path, please update your drivers" > > - later, if we need more room in the 1st cache line of the mbuf, we > > can move refcnt and nb_seg, probably without impacting the > > performance. > > > > > > Olivier > > I suppose you mean that PMDs don't need to /initialize/ m->next, m->refcnt and m->nb_segs. > > Forgive my ignorance, and this is wild speculation, but: Would a PMD not need to set m->next and m->nb_segs if it receives a jumbogram larger than an mbuf packet buffer? And if this is a realistic use case, these fields actually do belong in the 1st cache line. PMD developers please chime in. Yes, it would. However, this is not really fast-path processing. If we assume a 2GHz CPU, for 64-byte packets, a core has 34 cycles to process each packet to achieve 40G line rate. For a packet of size 2k - the normal size it would need to hit to overflow a buffer, unless you are using small buffers - the core has 827 cycles per packet. Therefore, in the latter case, with big packets, the core can afford the hit of accessing the second cacheline. /Bruce