From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 36A5C58D4
 for <dev@dpdk.org>; Wed,  9 Dec 2015 14:44:48 +0100 (CET)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga103.fm.intel.com with ESMTP; 09 Dec 2015 05:44:47 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.20,403,1444719600"; d="scan'208";a="870003880"
Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159])
 by fmsmga002.fm.intel.com with ESMTP; 09 Dec 2015 05:44:46 -0800
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.203]) by
 IRSMSX104.ger.corp.intel.com ([169.254.5.138]) with mapi id 14.03.0248.002;
 Wed, 9 Dec 2015 13:44:44 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Thread-Topic: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource
 issue with 128-byte cache line targets
Thread-Index: AQHRMD83I2PPlXB2GkuW7U0sf/5hWZ6/lqgQgAF1foCAACUpkIAAL8KAgAE9UnA=
Date: Wed, 9 Dec 2015 13:44:44 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836AD1EE0@irsmsx105.ger.corp.intel.com>
References: <1449417564-29600-1-git-send-email-jerin.jacob@caviumnetworks.com>
 <1449417564-29600-2-git-send-email-jerin.jacob@caviumnetworks.com>
 <2601191342CEEE43887BDE71AB97725836AD15BE@irsmsx105.ger.corp.intel.com>
 <20151208124527.GA18192@localhost.localdomain>
 <2601191342CEEE43887BDE71AB97725836AD1BDB@irsmsx105.ger.corp.intel.com>
 <20151208174922.GA1868@localhost.localdomain>
In-Reply-To: <20151208174922.GA1868@localhost.localdomain>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource
 issue with 128-byte cache line targets
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Dec 2015 13:44:49 -0000


Hi Jerin,

> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, December 08, 2015 5:49 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org; thomas.monjalon@6wind.com; Richardson, Bruce; olivier.m=
atz@6wind.com; Dumitrescu, Cristian
> Subject: Re: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resource =
issue with 128-byte cache line targets
>=20
> On Tue, Dec 08, 2015 at 04:07:46PM +0000, Ananyev, Konstantin wrote:
> > >
> > > Hi Konstantin,
> > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > > > > Sent: Sunday, December 06, 2015 3:59 PM
> > > > > To: dev@dpdk.org
> > > > > Cc: thomas.monjalon@6wind.com; Richardson, Bruce; olivier.matz@6w=
ind.com; Dumitrescu, Cristian; Ananyev, Konstantin;
> Jerin
> > > > > Jacob
> > > > > Subject: [dpdk-dev] [PATCH 1/2] mbuf: fix performance/cache resou=
rce issue with 128-byte cache line targets
> > > > >
> > > > > No need to split mbuf structure to two cache lines for 128-byte c=
ache line
> > > > > size targets as it can fit on a single 128-byte cache line.
> > > > >
> > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > ---
> > > > >  app/test/test_mbuf.c                                          | =
4 ++++
> > > > >  lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | =
4 ++++
> > > > >  lib/librte_mbuf/rte_mbuf.h                                    | =
2 ++
> > > > >  3 files changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
> > > > > index b32bef6..5e21075 100644
> > > > > --- a/app/test/test_mbuf.c
> > > > > +++ b/app/test/test_mbuf.c
> > > > > @@ -930,7 +930,11 @@ test_failing_mbuf_sanity_check(void)
> > > > >  static int
> > > > >  test_mbuf(void)
> > > > >  {
> > > > > +#if RTE_CACHE_LINE_SIZE =3D=3D 64
> > > > >  	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) !=3D RTE_CACHE_LINE_SI=
ZE * 2);
> > > > > +#elif RTE_CACHE_LINE_SIZE =3D=3D 128
> > > > > +	RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) !=3D RTE_CACHE_LINE_SI=
ZE);
> > > > > +#endif
> > > > >
> > > > >  	/* create pktmbuf pool if it does not exist */
> > > > >  	if (pktmbuf_pool =3D=3D NULL) {
> > > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni=
_common.h b/lib/librte_eal/linuxapp/eal/include/exec-
> > > > > env/rte_kni_common.h
> > > > > index bd1cc09..e724af7 100644
> > > > > --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common=
.h
> > > > > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common=
.h
> > > > > @@ -121,8 +121,12 @@ struct rte_kni_mbuf {
> > > > >  	uint32_t pkt_len;       /**< Total pkt len: sum of all segment =
data_len. */
> > > > >  	uint16_t data_len;      /**< Amount of data in segment buffer. =
*/
> > > > >
> > > > > +#if RTE_CACHE_LINE_SIZE =3D=3D 64
> > > > >  	/* fields on second cache line */
> > > > >  	char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
> > > > > +#elif RTE_CACHE_LINE_SIZE =3D=3D 128
> > > > > +	char pad3[24];
> > > > > +#endif
> > > > >  	void *pool;
> > > > >  	void *next;
> > > > >  };
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbu=
f.h
> > > > > index f234ac9..0bf55e0 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -813,8 +813,10 @@ struct rte_mbuf {
> > > > >
> > > > >  	uint16_t vlan_tci_outer;  /**< Outer VLAN Tag Control Identifie=
r (CPU order) */
> > > > >
> > > > > +#if RTE_CACHE_LINE_SIZE =3D=3D 64
> > > > >  	/* second cache line - fields only used in slow path or on TX *=
/
> > > > >  	MARKER cacheline1 __rte_cache_aligned;
> > > > > +#endif
> > > >
> > > > I suppose you'll need to keep same space reserved for first 64B eve=
n on systems with 128B cache-line.
> > > > Otherwise we can endup with different mbuf format for systems with =
128B cache-line.
> > >
> > > Just to understand, Is there any issue in mbuf format being different
> > > across the systems. I think, we are not sending the mbuf over the wir=
e
> > > or sharing with different system etc. right?
> >
> > No, we don't have to support that.
> > At least I am not aware about such cases.
> >
> > >
> > > Yes, I do understand the KNI dependency with mbuf.
> >
> > Are you asking about what will be broken (except KNI) if mbuf layout IA=
 and ARM would be different?
> > Probably nothing right now, except vector RX/TX.
> > But they are not supported on ARM anyway, and if someone will implement=
 them in future, it
> > might be completely different from IA one.
> > It just seems wrong to me to have different mbuf layout for each archit=
ecture.
>=20
> It's not architecture specific, it's machine and PMD specific.
> Typical ARM machines are 64-bytes CL but ThunderX and Power8  have
> 128-byte CL.

Ok, didn't know that.=20
Thanks for clarification.

>=20
> It's PMD specific also, There are some NIC's which can write application
> interested fields before the packet in DDR, typically one CL size is dedi=
cated
> for that.
> So there is an overhead to emulate the standard mbuf layout(Which the
> application shouldn't care about) i.e
> - reserve the space for generic mbuf layout
> - reserve the space for HW mbuf write
> - on packet receive, copy the content from HW mbuf space to generic
>   buf layout(space and additional cache misses for each packet, bad :-( )

Each different NIC model has different format of HW descriptors
That's what each PMD have to do: read information from HW specific layout,
interpret it and fill mbuf.=20
I suppose that's the price all of us have to pay.
Otherwise it would mean that DPDK app would be able to work only with one
NIC model and if you'll have to rebuild your app each time the underlying H=
W=20
Is going to change.

>=20
> So, It's critical to abstract the mbuf to support such HW capable NICs.
> The application should be interested in the fields of mbuf, not the
> actual layout.Maybe we can take up this with external mem pool manager.
>=20
> >
> > >
> > > > Another thing - now we have __rte_cache_aligned all over the places=
, and I don't know is to double
> > > > sizes of all these structures is a good idea.
> > >
> > > I thought so, the only concern I have, what if, the struct split to 6=
4
> > > and one cache line is shared between two core/two different structs w=
hich have
> > > the different type of operation(most likely). One extensive write and=
 other one
> > > read, The write makes the line dirty start evicting and read core is
> > > going to suffer. Any thoughts?
> > >
> > > If its tradeoff between amount memory and performance, I think, it ma=
kes sense
> > > to stick the performance in data plane, Hence split option may be not=
 useful?
> > > right?
> >
> > I understand that for most cases you would like to have your data struc=
tures CL aligned -
> > to avoid performance penalties.
> > I just suggest to have RTE_CACHE_MIN_LINE_SIZE(=3D=3D64) for few cases =
when it might be plausible.
> > As an example:
> > struct rte_mbuf {
> > 	...
> > 	MARKER cacheline1 __rte_cache_min_aligned;
> > 	...
> > } _rte_cache_aligned;
>=20
> I agree(in last email also). I will send next revision based on this,

Ok.

> But kni muf definition, bitmap change we need to have some #define,
> so I have proposed some scheme in the last email(See below)[1]. Any thoug=
hts?

Probably I am missing something, but with RTE_CACHE_MIN_LINE_SIZE,
and keeping mbuf layout intact, why do we need:
#if RTE_CACHE_LINE_SIZE =3D=3D 64\
for_64;
#elif RTE_CACHE_LINE_SIZE =3D=3D 128\
for_128;\
#endif =20
for rte_mbuf.h and friends at all?

Inside kni_common.h, we can change:
- char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE)));
+ char pad3[8] __attribute__((__aligned__(RTE_CACHE_MIN_LINE_SIZE)));
To keep it in sync with rte_mbuf.h

Inside test_mbuf.c:=20
- RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) !=3D RTE_CACHE_LINE_SIZE * 2);
+ RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) !=3D RTE_CACHE_MIN_LINE_SIZE * 2=
);

For rte_bitmap.h, and similar stuff - if we'll have
CONFIG_RTE_CACHE_LINE_SIZE_LOG2 defined in the config file,
and will make  RTE_CACHE_LINE_SIZE derived from it,
then it would fix such problems?

Konstantin


>=20
> >
> > So we would have mbuf with the same size and layout, but different alig=
nment for IA and ARM.
> >
> > Another example, where RTE_CACHE_MIN_LINE_SIZE  could be used:
> > struct rte_eth_(rxq|txq)_info.
> > There is no real need to have them 128B aligned for ARM.
> > The main purpose why they were defined as '__rte_cache_aligned' -
> > just to reserve some space for future expansion.
>=20
> makes sense
>=20
> >
> > Konstantin
> >
> > >
> > >
> > > > Again,  #if RTE_CACHE_LINE_SIZE =3D=3D 64 ... all over the places l=
ooks a bit clumsy.
> > > > Wonder can we have __rte_cache_aligned/ RTE_CACHE_LINE_SIZE archite=
cture specific,
> > >
> > > I think, its architecture specific now
> > >
> > > > but introduce RTE_CACHE_MIN_LINE_SIZE(=3D=3D64)/ __rte_cache_min_al=
igned and used it for mbuf
> > > > (and might be other places).
> > >
> > > Yes, it will help in this specific mbuf case and it make sense
> > > if mbuf going to stay within 2 x 64 CL.
> > >
> > > AND/OR
> > >
> > > can we introduce something like below to reduce the clutter in
> > > other places, macro name is just not correct, trying to share the vie=
w
> > >
> > > #define rte_cacheline_diff(for_64, for_128)\
> > > do {\
> > > #if RTE_CACHE_LINE_SIZE =3D=3D 64\
> > > for_64;
> > > #elif RTE_CACHE_LINE_SIZE =3D=3D 128\
> > > for_128;\
> > > #endif
> > >
> > > OR
> > >
> > > Typedef struct rte_mbuf to new abstract type and define for 64 bytes =
and
> > > 128 byte
>=20
> [1] some proposals list.
>=20
> > >
> > > Jerin
> > >
> > > > Konstantin
> > > >
> > > >