From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id C0610A04F3;
	Thu,  9 Jan 2020 01:49:05 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 137EB1DB11;
	Thu,  9 Jan 2020 01:49:05 +0100 (CET)
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id C52941DB10
 for <dev@dpdk.org>; Thu,  9 Jan 2020 01:49:02 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 08 Jan 2020 16:49:01 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.69,412,1571727600"; d="scan'208";a="371159436"
Received: from orsmsx108.amr.corp.intel.com ([10.22.240.6])
 by orsmga004.jf.intel.com with ESMTP; 08 Jan 2020 16:49:01 -0800
Received: from orsmsx159.amr.corp.intel.com (10.22.240.24) by
 ORSMSX108.amr.corp.intel.com (10.22.240.6) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Wed, 8 Jan 2020 16:49:01 -0800
Received: from ORSEDG001.ED.cps.intel.com (10.7.248.4) by
 ORSMSX159.amr.corp.intel.com (10.22.240.24) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Wed, 8 Jan 2020 16:49:00 -0800
Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.106)
 by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (TLS)
 id 14.3.439.0; Wed, 8 Jan 2020 16:49:00 -0800
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=PsrYbkdPOkgVf/auLroI6Rj1PAN8/9VLFICKi6FH7w0L1fxpsDWlSIWb1sGdoIBjGtIRx4uONrdaLeRCJRrutt/xMqZ1Onz0sm5ehwnpB/SiLdcIyOcMgAPwWhpv+GRgxvT5+EhuicXbomxmhKiPJtWCCObpLHct84FGjJ5ABkQ4kUJ63MXWTDDHnESoizAEYGKkxLJscSw7g7j0Vl6SxGQ3n+U+Ea6Ed70j6YjXtn57bv2Agj0pXVm3UV4McJxwWJD6iTNVHr3q6to6iyGoW9dV+wSzX+wGY746zrdHxVKkeywwDloLLQYSrKX9+RXbAwi50VLTGeFuUXMwh1lIcg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=byYN6yDKTtYg8NC45KfNNdkPa8l9CagNMS+ZYwAO7Z4=;
 b=Y4nXis4iA7L+Wl/WyN4nPA1rcMgrOBZSVMrxaQ71tpph+GELBl4KNId37ffm8bq62JQSok9oJoxM3aj+N4NNnKyfg+2S/SEajN+JMQnJnqx7RKLIvXLrWn7i6KkncjeQNZ4Hu+T3esSyQ5PobjJrd7Au11y1ADXICpTm2rTiNaFu2FBDtFJRbSCcwNkOnx2tyKPozABMPMCN05wH44FqoE+Cq2nDYujldXjlV+Pq/zFpY3oK61yKNPRRgzxBfCYUt6XUJrbgkhb2Pqn1MhNIuEy9w8T76LlT1EQetNvSm/6Wv+l2UH6OMwsqEMSB2bLhuil1zI5MaeFO6MOPokGhAw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com;
 dkim=pass header.d=intel.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; 
 s=selector2-intel-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=byYN6yDKTtYg8NC45KfNNdkPa8l9CagNMS+ZYwAO7Z4=;
 b=bCTONIcz8dA0Dr9NAqfHfsRD5oQZPF4DiFOeVXrsc9VdyqHlLpRF4wvzGo2ts3UYVb7UFs6MHs6jhbelxqRmBDqLYFhThx28GYiElg3WgZskr1eGtC3YbC0v3kXcqnaVVIwanQMChgCSIByXhkZbZxGIKHvimk/ituKg3eEM7lM=
Received: from SN6PR11MB2558.namprd11.prod.outlook.com (52.135.94.19) by
 SN6PR11MB2735.namprd11.prod.outlook.com (52.135.95.138) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.2623.8; Thu, 9 Jan 2020 00:48:58 +0000
Received: from SN6PR11MB2558.namprd11.prod.outlook.com
 ([fe80::4d86:362a:13c3:8386]) by SN6PR11MB2558.namprd11.prod.outlook.com
 ([fe80::4d86:362a:13c3:8386%7]) with mapi id 15.20.2623.008; Thu, 9 Jan 2020
 00:48:57 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 "olivier.matz@6wind.com" <olivier.matz@6wind.com>, "sthemmin@microsoft.com"
 <sthemmin@microsoft.com>, "jerinj@marvell.com" <jerinj@marvell.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>, "david.marchand@redhat.com"
 <david.marchand@redhat.com>, "pbhagavatula@marvell.com"
 <pbhagavatula@marvell.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>, 
 Ruifeng Wang <Ruifeng.Wang@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, nd
 <nd@arm.com>, nd <nd@arm.com>
Thread-Topic: [PATCH v7 02/17] lib/ring: apis to support configurable element
 size
Thread-Index: AQHVtvB22uIJex5+gEyRiFOc8yhYEqfXpo/wgAckGwCAAAcRgIAAR9qggABU2wCAAAOEAIAA9tKAgAA+MMCAAOURgIAAEbOg
Date: Thu, 9 Jan 2020 00:48:57 +0000
Message-ID: <SN6PR11MB25586CB1A5A7BCDD50D1BCE79A390@SN6PR11MB2558.namprd11.prod.outlook.com>
References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com>
 <20191220044524.32910-1-honnappa.nagarahalli@arm.com>
 <20191220044524.32910-3-honnappa.nagarahalli@arm.com>
 <SN6PR11MB2558EFF8BE8444196FCD2B9B9A200@SN6PR11MB2558.namprd11.prod.outlook.com>
 <VE1PR08MB51496F6ED2F1837B38EBFBAB983F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <VE1PR08MB51499889334E94B0DA0872DA983F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <SN6PR11MB255865F7B666618C89AB1D699A3F0@SN6PR11MB2558.namprd11.prod.outlook.com>
 <VE1PR08MB514939C2147EC94CA7A5625B983F0@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <SN6PR11MB2558255EA6128C655F7428769A3F0@SN6PR11MB2558.namprd11.prod.outlook.com>
 <AM0PR08MB51380D18DCB22BEF0FA348A8983E0@AM0PR08MB5138.eurprd08.prod.outlook.com>
 <SN6PR11MB2558806861F0DE439863544F9A3E0@SN6PR11MB2558.namprd11.prod.outlook.com>
 <VE1PR08MB514964E8E611BAE63EF90C99983E0@VE1PR08MB5149.eurprd08.prod.outlook.com>
In-Reply-To: <VE1PR08MB514964E8E611BAE63EF90C99983E0@VE1PR08MB5149.eurprd08.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMTU3ZDBkZTMtMDNjOC00NTA5LWExZTQtN2FhNTZjMWY4ODk5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoia0RnenN3U3lLVmQ0R0xXMHVjSFNCRk8rRk1zQVlvWldReW9rcUtMQTYzOFlhbFMrR2NGSlVZYjFPR284UjZIaCJ9
dlp-product: dlpe-windows
dlp-reaction: no-action
dlp-version: 11.2.0.6
x-ctpclassification: CTP_NT
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=konstantin.ananyev@intel.com; 
x-originating-ip: [192.198.151.162]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: a2ffd90a-ba1e-4807-90e1-08d7949db5f6
x-ms-traffictypediagnostic: SN6PR11MB2735:
x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr
x-ms-exchange-transport-forked: True
x-microsoft-antispam-prvs: <SN6PR11MB273583BBA2B308C5BF6E49A19A390@SN6PR11MB2735.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 02778BF158
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10019020)(396003)(136003)(39860400002)(346002)(376002)(366004)(189003)(199004)(5660300002)(76116006)(81156014)(81166006)(9686003)(86362001)(2906002)(8676002)(4326008)(7416002)(8936002)(52536014)(33656002)(55016002)(54906003)(64756008)(6506007)(478600001)(66476007)(66556008)(110136005)(71200400001)(316002)(7696005)(66446008)(186003)(66946007)(26005);
 DIR:OUT; SFP:1102; SCL:1; SRVR:SN6PR11MB2735;
 H:SN6PR11MB2558.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; MX:1; A:1; 
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 7sXzYo3p1Aj4Vme1gwxxsdz1jQ3ehT0qnDFvXbUamxpeNMf66/guTvrnrzwvq3gH4AZPsoq4CNYN4DD1gGErv1VHz6TvvV9qiTDK18f5yJ1c9/0SAmXzYWlYxR4r9QWSJo+aCxQOsSbkCTck2j+HcwS3yWd2u0DyYYyWTybWKQ7i7CVNboUJcLp8X7SV4fzrLkIsAI9EaCuwQfbAX9k4xK57MPLtTVkf4UfvK5EoltW8H9Z0MFFIdIT/3i33+Zwpxq9ewrr9Snnfz8EAB+grre+OGM52E6mPnlpV1OeZF4ew7tSDZ+PG7HonkIevb5iuag3QqD9UNh3HzAdroRKpONze6zEtgDr4EzcH07LZLfklNydBJSGVftdktIEmVxHC79/rebFQ1euh9fQO1uICMUSqGXfearzGwmqDiuKVgbwLkOn9LfFf7N6WZPo+/CDV
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: a2ffd90a-ba1e-4807-90e1-08d7949db5f6
X-MS-Exchange-CrossTenant-originalarrivaltime: 09 Jan 2020 00:48:57.8216 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: NgjSwLESkWbv426cekMrYeAPWkLVJt3NY3zS3FRdh9fyLzPjJgf5kl8g8gzl7a15w3LBu5Cn1O9YJe0aFUKZ9qMQNomkqNmLMJRrbJlDURc=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB2735
X-OriginatorOrg: intel.com
Subject: Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support
 configurable element size
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>



> <snip>
> > > > > > > > > > +
> > > > > > > > > > +static __rte_always_inline void
> > > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t
> > > > > > > > > > +prod_head, const void *obj_table, uint32_t n) {
> > > > > > > > > > +unsigned int i; const uint32_t size =3D
> > > > > > > > > > +r->size; uint32_t idx =3D prod_head & r->mask;
> > > > > > > > > > +r->__uint128_t
> > > > > > > > > > +*ring =3D (__uint128_t *)&r[1]; const __uint128_t *obj=
 =3D
> > > > > > > > > > +(const __uint128_t *)obj_table; if (likely(idx + n <
> > > > > > > > > > +size)) { for (i =3D 0; i < (n & ~0x1); i +=3D 2, idx +=
=3D 2)
> > > > > > > > > > +{ ring[idx] =3D obj[i]; ring[idx + 1] =3D obj[i + 1];
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > AFAIK, that implies 16B aligned obj_table...
> > > > > > > > > Would it always be the case?
> > > > > > > > I am not sure from the compiler perspective.
> > > > > > > > At least on Arm architecture, unaligned access (address tha=
t
> > > > > > > > is accessed is not aligned to the size of the data element
> > > > > > > > being
> > > > > > > > accessed) will result in faults or require additional cycle=
s.
> > > > > > > > So, aligning on
> > > > > > 16B should be fine.
> > > > > > > Further, I would be changing this to use 'rte_int128_t' as
> > > > > > > '__uint128_t' is
> > > > > > not defined on 32b systems.
> > > > > >
> > > > > > What I am trying to say: with this code we imply new requiremen=
t
> > > > > > for elems
> > > > > The only existing use case in DPDK for 16B is the event ring. The
> > > > > event ring
> > > > already does similar kind of copy (using 'struct rte_event').
> > > > > So, there is no change in expectations for event ring.
> > > > > For future code, I think this expectation should be fine since it
> > > > > allows for
> > > > optimal code.
> > > > >
> > > > > > in the ring: when sizeof(elem)=3D=3D16 it's alignment also has =
to be
> > > > > > at least
> > > > 16.
> > > > > > Which from my perspective is not ideal.
> > > > > Any reasoning?
> > > >
> > > > New implicit requirement and inconsistency.
> > > > Code like that:
> > > >
> > > > struct ring_elem {uint64_t a, b;};
> > > > ....
> > > > struct ring_elem elem;
> > > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem));
> > > >
> > > > might cause a crash.
> > > The alignment here is 8B. Assuming that instructions generated will
> > > require 16B alignment, it will result in a crash, if configured to ge=
nerate
> > exception.
> > > But, these instructions are not atomic instructions. At least on
> > > aarch64, unaligned access will not result in an exception for non-ato=
mic
> > loads/stores. I believe it is the same behavior for x86 as well.
> >
> > On IA, there are 2 types of 16B load/store instructions: aligned and un=
aligned.
> > Aligned are a bit faster, but will cause an exception if used on non 16=
B aligned
> > address.
> > As you using uint128_t * compiler will assume that both src and dst are=
 16B
> > aligned and might generate code with aligned instructions.
> Ok, looking at few articles, I read that if the address is aligned, the u=
naligned instructions do not incur the penalty. Is this understanding
> correct?

Yes, from my experience the difference is negligible.

>=20
> I see 2 solutions here:
> 1) We can switch this copy to use uint32_t pointer. It would still allow =
the compiler to generate (unaligned) instructions for up to 256b
> load/store. The 2 multiplications (to normalize the index and the size of=
 copy) can use shifts. This should make it safer. If one wants
> performance, they can align the obj table to 16B (the ring itself is alre=
ady aligned on the cache line boundary).

Sounds good to me.

>=20
> 2) Considering that performance is paramount, we could document that the =
obj table needs to be aligned on 16B boundary. This would
> affect event dev (if we go ahead with replacing the event ring implementa=
tion) significantly.

I don't think perf difference would be that significant to justify such con=
straint.
I am in favor of #1.
=20
> Note that we have to do the same thing for 64b elements as well.

I don't mind to have one unified copy procedure, which would always use 32b=
it elems,
but AFAIK, on IA there is no such limitation for 64bit load/stores.


>=20
> >
> > >
> > > > While exactly the same code with:
> > > >
> > > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem {uint64_t
> > > > a, b, c, d;};
> > > >
> > > > will work ok.
> > > The alignment for these structures is still 8B. Are you saying this
> > > will work because these will be copied using pointer to uint32_t (who=
se
> > alignment is 4B)?
> >
> > Yes, as we doing uint32_t copies, compiler can't assume the data will b=
e 16B
> > aligned and will use unaligned instructions.
> >
> > >
> > > >
> > > > >
> > > > > > Note that for elem sizes > 16 (24, 32), there is no such constr=
aint.
> > > > > The rest of them need to be aligned on 4B boundary. However, this
> > > > > should
> > > > not affect the existing code.
> > > > > The code for 8B and 16B is kept as is to ensure the performance i=
s
> > > > > not
> > > > affected for the existing code.
> > > <snip>