From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 4F0549A98 for ; Thu, 19 May 2016 14:19:01 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP; 19 May 2016 05:19:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,334,1459839600"; d="scan'208";a="970360106" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by fmsmga001.fm.intel.com with ESMTP; 19 May 2016 05:18:59 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.27]) by IRSMSX108.ger.corp.intel.com ([169.254.11.33]) with mapi id 14.03.0248.002; Thu, 19 May 2016 13:18:57 +0100 From: "Ananyev, Konstantin" To: "Richardson, Bruce" , Jerin Jacob CC: "dev@dpdk.org" , "thomas.monjalon@6wind.com" , "viktorin@rehivetech.com" , "jianbo.liu@linaro.org" Thread-Topic: [dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned Thread-Index: AQHRsQ1tzpC6531PP0SlcrLOZHyycJ++1c4AgAAjjwCAAOrYAIAAQhzw Date: Thu, 19 May 2016 12:18:57 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B5AB67@irsmsx105.ger.corp.intel.com> References: <1463579863-32053-1-git-send-email-jerin.jacob@caviumnetworks.com> <20160518164300.GA12324@bricha3-MOBL3> <20160518185011.GA4432@localhost.localdomain> <20160519085047.GA17500@bricha3-MOBL3> In-Reply-To: <20160519085047.GA17500@bricha3-MOBL3> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 May 2016 12:19:02 -0000 Hi everyone, =20 > On Thu, May 19, 2016 at 12:20:16AM +0530, Jerin Jacob wrote: > > On Wed, May 18, 2016 at 05:43:00PM +0100, Bruce Richardson wrote: > > > On Wed, May 18, 2016 at 07:27:43PM +0530, Jerin Jacob wrote: > > > > To avoid multiple stores on fast path, Ethernet drivers > > > > aggregate the writes to data_off, refcnt, nb_segs and port > > > > to an uint64_t data and write the data in one shot > > > > with uint64_t* at &mbuf->rearm_data address. > > > > > > > > Some of the non-IA platforms have store operation overhead > > > > if the store address is not naturally aligned.This patch > > > > fixes the performance issue on those targets. > > > > > > > > Signed-off-by: Jerin Jacob > > > > --- > > > > > > > > Tested this patch on IA and non-IA(ThunderX) platforms. > > > > This patch shows 400Kpps/core improvement on ThunderX + ixgbe + vec= tor environment. > > > > and this patch does not have any overhead on IA platform. > > > > > > > > Have tried an another similar approach by replacing "buf_len" with = "pad" > > > > (in this patch context), > > > > Since it has additional overhead on read and then mask to keep "buf= _len" intact, > > > > not much improvement is not shown. > > > > ref: http://dpdk.org/ml/archives/dev/2016-May/038914.html > > > > > > > > --- > > > While this will work and from your tests doesn't seem to have a perfo= rmance > > > impact, I'm not sure I particularly like it. It's extending out the e= nd of > > > cacheline0 of the mbuf by 16 bytes, though I suppose it's not technic= ally using > > > up any more space of it. > > > > Extending by 2 bytes. Right ?. Yes, I guess, Now we using only 56 out o= f 64 bytes > > in the first 64-byte cache line. > > > > > > > > What I'm wondering about though, is do we have any usecases where we = need a > > > variable buf_len for packets for RX. These mbufs come directly from a= mempool, > > > which is generally understood to be a set of fixed-sized buffers. I r= ealise that > > > this change was made in the past after some discussion, but one of th= e key points > > > there [at least to my reading] was that - even though nobody actually= made a > > > concrete case where they had variable-sized buffers - having support = for them > > > made no performance difference. I was going to point to vhost zcp support, but as Thomas pointed out that functionality was removed from dpdk.org recently. So I am not aware does such case exist right now in the 'real world' or not= . Though I still think RX function should leave buf_len field intact.=20 > > > > > > The latter part of that has now changed, and supporting variable-size= d mbufs > > > from an mbuf pool has a perf impact. Do we definitely need that funct= ionality, > > > because the easiest fix here is just to move the rxrearm marker back = above > > > mbuf_len as it was originally in releases like 1.8? > > > > And initialize the buf_len with mp->elt_size - sizeof(struct rte_mbuf). > > Right? > > > > I don't have a strong opinion on this, I can do this if there is no > > objection on this. Let me know. > > > > However, I do see in future, "buf_len" may belong at the end of the fir= st 64 byte > > cache line as currently "port" is defined as uint8_t, IMO, that is less= . > > We may need to increase that uint16_t. The reason why I think that > > because, Currently in ThunderX HW, we do have 128VFs per socket for > > built-in NIC, So, the two node configuration and one external PCIe NW c= ard > > configuration can easily go beyond 256 ports. I wonder does anyone really use mbuf port field? My though was - could we to drop it completely? Actually, after discussing it with Bruce offline, an interesting idea came = out: if we'll drop port and make mbuf_prefree() to reset nb_segs=3D1, then we can reduce RX rearm_data to 4B. So with that layout: struct rte_mbuf { MARKER cacheline0; void *buf_addr; =20 phys_addr_t buf_physaddr;=20 uint16_t buf_len; uint8_t nb_segs; uint8_t reserved_1byte; /* former port */ =20 MARKER32 rearm_data; uint16_t data_off; uint16_t refcnt; =20 uint64_t ol_flags; ... We can keep buf_len at its place and avoid 2B gap, while making rearm_data 4B long and 4B aligned. Another similar alternative, is to make mbuf_prefree() to set refcnt=3D1 (as it update it anyway). Then we can remove refcnt from the RX rearm_data, and again make rearm_data 4B long and 4B aligned: struct rte_mbuf { MARKER cacheline0; void *buf_addr; =20 phys_addr_t buf_physaddr;=20 uint16_t buf_len; uint16_t refcnt; MARKER32 rearm_data; uint16_t data_off; uint8_t nb_segs; uint8_t port; =20 uint64_t ol_flags; .. As additional plus, __rte_mbuf_raw_alloc() wouldn't need to modify mbuf con= tents at all - which probably is a good thing. As a drawback - we'll have a free mbufs in pool with refcnt=3D=3D1, which p= robably reduce debug ability of the mbuf code. =20 Konstantin > > > Ok, good point. If you think it's needed, and if we are changing the mbuf > structure, it might be a good time to extend that field while you are at = it, save > a second ABI break later on. >=20 > /Bruce >=20 > > > > > > Regards, > > > /Bruce > > > > > > Ref: http://dpdk.org/ml/archives/dev/2014-December/009432.html > > >