From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id A015DA3168
	for <public@inbox.dpdk.org>; Thu, 17 Oct 2019 06:46:47 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 764101DFEC;
	Thu, 17 Oct 2019 06:46:47 +0200 (CEST)
Received: from EUR02-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr20059.outbound.protection.outlook.com [40.107.2.59])
 by dpdk.org (Postfix) with ESMTP id 741741DFE7
 for <dev@dpdk.org>; Thu, 17 Oct 2019 06:46:45 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; 
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=ZCd/xh6tZWkMrDNE5fSA9ZAZsks22emj/swjenMc3u0=;
 b=dEjUyeYETZkO79D9VOKnWE6pQJsfR++nEjUweIDLeMflKZuKESk2U0Aswn3CewXuuT+XhReQvkuS8davqFqdmg47N5jrLWSHDp3cFUiMwNSCgrnAJrIFp6edO/Z/xKVEo7rVbcyuPmVD5SdefBOgNGncqV8T8nxdSao7T82DZWM=
Received: from HE1PR0802CA0020.eurprd08.prod.outlook.com (2603:10a6:3:bd::30)
 by HE1PR0802MB2619.eurprd08.prod.outlook.com (2603:10a6:3:d9::14)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2347.21; Thu, 17 Oct
 2019 04:46:42 +0000
Received: from DB5EUR03FT040.eop-EUR03.prod.protection.outlook.com
 (2a01:111:f400:7e0a::203) by HE1PR0802CA0020.outlook.office365.com
 (2603:10a6:3:bd::30) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.2347.16 via Frontend
 Transport; Thu, 17 Oct 2019 04:46:41 +0000
Authentication-Results: spf=temperror (sender IP is 63.35.35.123)
 smtp.mailfrom=arm.com; dpdk.org; dkim=pass (signature was verified)
 header.d=armh.onmicrosoft.com;dpdk.org; dmarc=none action=none
 header.from=arm.com;
Received-SPF: TempError (protection.outlook.com: error in processing during
 lookup of arm.com: DNS Timeout)
Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by
 DB5EUR03FT040.mail.protection.outlook.com (10.152.20.243) with
 Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id
 15.20.2305.15 via Frontend Transport; Thu, 17 Oct 2019 04:46:40 +0000
Received: ("Tessian outbound 6481c7fa5a3c:v33");
 Thu, 17 Oct 2019 04:46:34 +0000
X-CR-MTA-TID: 64aa7808
Received: from 07760c03f8c0.2 (ip-172-16-0-2.eu-west-1.compute.internal
 [104.47.6.57]) by 64aa7808-outbound-1.mta.getcheckrecipient.com id
 CC6D29B1-B5FA-42AC-8541-E45841D51BF5.1; 
 Thu, 17 Oct 2019 04:46:29 +0000
Received: from EUR02-VE1-obe.outbound.protection.outlook.com
 (mail-ve1eur02lp2057.outbound.protection.outlook.com [104.47.6.57])
 by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 07760c03f8c0.2
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384);
 Thu, 17 Oct 2019 04:46:29 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=NkARcNumHxo+PH0bgsQjcQkaA5ybIWT2X0wVW4F1j2E7MCIcSKgHliQrcy+6zHR6m/PqXzAOJyYdGnOMT+vNyqZAObyHGCa0iSPvJLo43uN6eAjwF62Uf4CaYwkobP9r3xJaeEYPQGwTmTrOMRHU+St9hZCal1nZKDB0fF+qvx+KjYjNnXflmp9qZIMFBw5T2ApK5gkpbL9gb/WJWHz2WNvPwroN/Y+beyUOV/Ulg2nK78+HbwoIXjr5r48w5Yxm4JngNAO//pzl/Himnj0iuUY0CltyqLqtp6puh4ZCBcXtdCmWWxvxlI6adVhSkgGyf00IQBKapjYEPBDtJDTZyw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=ZCd/xh6tZWkMrDNE5fSA9ZAZsks22emj/swjenMc3u0=;
 b=gufUiR2mjuPDAhqiCBkTT8vLAD9SGipoFJCNCZhiVsldEFQ5eaxTwblFjD5ypzWrcd0+ckOMANevIuKqu+vl1H2Nbnce/fS/SnLwxmoloypp1f+nhlHRigOi4My2KpXTpnZdFDUi4Dec5rmkeJBuShSTGOnIt3eK6LTZaCag94UGS5tLmYzvO03jw0e2YWdYa7Ssm+GUc21WGkPN4oFcfBO79Wh4SHgtNJK+pCbI0Kd/z5pYK6gngKeoKrJ3kp5q9byk0BnvHAlhMbCZzcPC2MU99sirNf+/H8v7KEEA9j2p/E6bTXGPqwxCXbr926Hg4ZrlZQwMu60mDH5j4VfUyw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass
 header.d=arm.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; 
 s=selector2-armh-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=ZCd/xh6tZWkMrDNE5fSA9ZAZsks22emj/swjenMc3u0=;
 b=dEjUyeYETZkO79D9VOKnWE6pQJsfR++nEjUweIDLeMflKZuKESk2U0Aswn3CewXuuT+XhReQvkuS8davqFqdmg47N5jrLWSHDp3cFUiMwNSCgrnAJrIFp6edO/Z/xKVEo7rVbcyuPmVD5SdefBOgNGncqV8T8nxdSao7T82DZWM=
Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.27) by
 VE1PR08MB4703.eurprd08.prod.outlook.com (10.255.27.11) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.2347.21; Thu, 17 Oct 2019 04:46:27 +0000
Received: from VE1PR08MB5149.eurprd08.prod.outlook.com
 ([fe80::8c82:8d9c:c78d:22a6]) by VE1PR08MB5149.eurprd08.prod.outlook.com
 ([fe80::8c82:8d9c:c78d:22a6%7]) with mapi id 15.20.2347.023; Thu, 17 Oct 2019
 04:46:27 +0000
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "olivier.matz@6wind.com" <olivier.matz@6wind.com>, "sthemmin@microsoft.com"
 <sthemmin@microsoft.com>, "jerinj@marvell.com" <jerinj@marvell.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>, "david.marchand@redhat.com"
 <david.marchand@redhat.com>, "pbhagavatula@marvell.com"
 <pbhagavatula@marvell.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>, 
 "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>,
 "Gavin Hu (Arm
 Technology China)" <Gavin.Hu@arm.com>, "stephen@networkplumber.org"
 <stephen@networkplumber.org>, Honnappa Nagarahalli
 <Honnappa.Nagarahalli@arm.com>, nd <nd@arm.com>, nd <nd@arm.com>
Thread-Topic: [PATCH v4 1/2] lib/ring: apis to support configurable element
 size
Thread-Index: AQHVgGkjycLlQ66roUqbf1hqE7rJk6dajggAgAACWdCAAOYoAIACzYSQ
Date: Thu, 17 Oct 2019 04:46:27 +0000
Message-ID: <VE1PR08MB5149D51FA4EDB55D6DEFA129986D0@VE1PR08MB5149.eurprd08.prod.outlook.com>
References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com>
 <20191009024709.38144-1-honnappa.nagarahalli@arm.com>
 <20191009024709.38144-2-honnappa.nagarahalli@arm.com>
 <VE1PR08MB5149D57CAA77B51392E5423898970@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725801A8C68545@IRSMSX104.ger.corp.intel.com>
 <VE1PR08MB5149CD175CEB6B455C99F88D98900@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com>
In-Reply-To: <2601191342CEEE43887BDE71AB97725801A8C68A99@IRSMSX104.ger.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ts-tracking-id: f0dd9922-2920-4a4d-ba4e-18c79e53c88e.0
x-checkrecipientchecked: true
Authentication-Results-Original: spf=none (sender IP is )
 smtp.mailfrom=Honnappa.Nagarahalli@arm.com; 
x-originating-ip: [217.140.111.135]
x-ms-publictraffictype: Email
X-MS-Office365-Filtering-Correlation-Id: bcb55fc0-5db0-463f-0f82-08d752bd005e
X-MS-Office365-Filtering-HT: Tenant
X-MS-TrafficTypeDiagnostic: VE1PR08MB4703:|VE1PR08MB4703:|HE1PR0802MB2619:
x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr
x-ms-exchange-transport-forked: True
X-Microsoft-Antispam-PRVS: <HE1PR0802MB2619E5EFB45DD33556ED5FFA986D0@HE1PR0802MB2619.eurprd08.prod.outlook.com>
x-checkrecipientrouted: true
x-ms-oob-tlc-oobclassifiers: OLM:4502;OLM:4502;
x-forefront-prvs: 01930B2BA8
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;
 SFS:(10009020)(4636009)(396003)(366004)(376002)(346002)(39860400002)(136003)(199004)(189003)(51234002)(64756008)(25786009)(6506007)(76176011)(9686003)(33656002)(99286004)(102836004)(2201001)(81156014)(81166006)(8676002)(305945005)(71190400001)(8936002)(7696005)(71200400001)(7736002)(86362001)(2501003)(14444005)(256004)(229853002)(66446008)(66476007)(76116006)(66946007)(66556008)(11346002)(478600001)(446003)(186003)(26005)(55016002)(6246003)(54906003)(476003)(110136005)(486006)(74316002)(6116002)(4326008)(3846002)(52536014)(2906002)(66066001)(5660300002)(316002)(1511001)(6436002)(14454004)(30864003)(579004);
 DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB4703;
 H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; A:1; MX:1; 
received-spf: None (protection.outlook.com: arm.com does not designate
 permitted sender hosts)
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam-Untrusted: BCL:0;
X-Microsoft-Antispam-Message-Info-Original: P2OG1FlsL3X3HoAhLg7meOTco6n9aDDBAe3MGJreHpydO4VC7fyNM/YaxGi7kC2bMdQ70ymKPOWW/QeZoKAjE+S7BYTMIXdGjEsoGzdMwNKShlubbYnE4n5Jqn8Cw5pT57Br77ZgNn3zfWq/uQBt+kwt0HO9VfzXgCi0VHNjJYkFUQsoixzJMuTOq8NtBvAycCEfKM+Di+abooXzujfVMMhIYDdQOeJcW2ChdJiyn5f0tyw7JnOe/S7j3YoDsJkxFiNsuWUGbs85nBBuijsQOA6sx5qy1gxFH+l09Uh4wxofPOMp55GXUQPLJ2LRJJ0Lq92frV6hygp3E5uxae8bOVROz8Yjs+tbkGt4WyeWPv03vas5r+pi3sewIXV1tGLKFb94iku3cQ5MeZD6PCxClbqUDm91CB45sFchzeVZiqc=
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB4703
Original-Authentication-Results: spf=none (sender IP is )
 smtp.mailfrom=Honnappa.Nagarahalli@arm.com; 
X-EOPAttributedMessage: 0
X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5EUR03FT040.eop-EUR03.prod.protection.outlook.com
X-Forefront-Antispam-Report: CIP:63.35.35.123; IPV:CAL; SCL:-1; CTRY:IE;
 EFV:NLI; SFV:NSPM;
 SFS:(10009020)(4636009)(376002)(136003)(346002)(39860400002)(396003)(51234002)(189003)(199004)(2906002)(6116002)(305945005)(2201001)(26826003)(25786009)(74316002)(478600001)(7736002)(86362001)(14444005)(316002)(33656002)(23726003)(2501003)(3846002)(70586007)(14454004)(70206006)(76130400001)(229853002)(22756006)(4326008)(81166006)(126002)(76176011)(81156014)(8746002)(9686003)(97756001)(8936002)(102836004)(46406003)(52536014)(11346002)(99286004)(476003)(6506007)(356004)(63350400001)(50466002)(110136005)(8676002)(446003)(7696005)(186003)(47776003)(66066001)(30864003)(5660300002)(486006)(26005)(1511001)(6246003)(336012)(55016002)(54906003);
 DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0802MB2619;
 H:64aa7808-outbound-1.mta.getcheckrecipient.com; FPR:; SPF:TempError; LANG:en;
 PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; MX:1; A:1; 
X-MS-Office365-Filtering-Correlation-Id-Prvs: f7ce9b34-52d5-40e0-999d-08d752bcf887
NoDisclaimer: True
X-Forefront-PRVS: 01930B2BA8
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: Z3BcGj7CXBGYXjv0HI5/sftvU6xz3M9XB8IvrhC9LpIK2BvrM0HcqCZiILguWGI4z0NzN8Cah/FKJuQMIMgvDqLj6oD7Ri28DONasFtna16JWmCI1Y4xFRviaDNS03i3iyVlvHR0h5Kuo9tlorYGNKK9iuvQa+3WZRvW8PpgNVfJtj/1JDq5th4KmUw2eEKfvKQWmYM2j+pqK6eMRBmL6/fb3vlcjFzccGMLbpjRf+R6CKsxkrs0KQZpsluUoiO1yzl32S5KK5U7SSr16uKXXUBxXxNsmG4GjJyQ+0RqV7lNd33P+wPH1KNkFLwxFEkEe5JZ36ReP/4n0Wi56D+T5tD8ytanWjFnT9TJ8xE/09epdVlq0jn1MUayWOpcS+DthMuH7p44kfDexzS2vZCNfq0EoX3BPdiFQsMOt2dOBoU=
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Oct 2019 04:46:40.4004 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: bcb55fc0-5db0-463f-0f82-08d752bd005e
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123];
 Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0802MB2619
Subject: Re: [dpdk-dev] [PATCH v4 1/2] lib/ring: apis to support
 configurable element size
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

<snip>

> Hi Honnappa,
>=20
> > > > >
> > > > > Current APIs assume ring elements to be pointers. However, in
> > > > > many use cases, the size can be different. Add new APIs to
> > > > > support configurable ring element sizes.
> > > > >
> > > > > Signed-off-by: Honnappa Nagarahalli
> > > > > <honnappa.nagarahalli@arm.com>
> > > > > Reviewed-by: Dharmik Thakkar <dharmik.thakkar@arm.com>
> > > > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > > ---
> > > > >  lib/librte_ring/Makefile             |   3 +-
> > > > >  lib/librte_ring/meson.build          |   3 +
> > > > >  lib/librte_ring/rte_ring.c           |  45 +-
> > > > >  lib/librte_ring/rte_ring.h           |   1 +
> > > > >  lib/librte_ring/rte_ring_elem.h      | 946
> +++++++++++++++++++++++++++
> > > > >  lib/librte_ring/rte_ring_version.map |   2 +
> > > > >  6 files changed, 991 insertions(+), 9 deletions(-)  create mode
> > > > > 100644 lib/librte_ring/rte_ring_elem.h
> > > > >
> > > > > diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
> > > > > index 21a36770d..515a967bb 100644
> > > > > --- a/lib/librte_ring/Makefile
> > > > > +++ b/lib/librte_ring/Makefile

<snip>

> > > > > +
> > > > > +# rte_ring_create_elem and rte_ring_get_memsize_elem are
> > > > > +experimental allow_experimental_apis =3D true
> > > > > diff --git a/lib/librte_ring/rte_ring.c
> > > > > b/lib/librte_ring/rte_ring.c index d9b308036..6fed3648b 100644
> > > > > --- a/lib/librte_ring/rte_ring.c
> > > > > +++ b/lib/librte_ring/rte_ring.c
> > > > > @@ -33,6 +33,7 @@
> > > > >  #include <rte_tailq.h>
> > > > >
> > > > >  #include "rte_ring.h"
> > > > > +#include "rte_ring_elem.h"
> > > > >

<snip>

> > > > > diff --git a/lib/librte_ring/rte_ring_elem.h
> > > > > b/lib/librte_ring/rte_ring_elem.h new file mode 100644 index
> > > > > 000000000..860f059ad
> > > > > --- /dev/null
> > > > > +++ b/lib/librte_ring/rte_ring_elem.h
> > > > > @@ -0,0 +1,946 @@
> > > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > > + *
> > > > > + * Copyright (c) 2019 Arm Limited
> > > > > + * Copyright (c) 2010-2017 Intel Corporation
> > > > > + * Copyright (c) 2007-2009 Kip Macy kmacy@freebsd.org
> > > > > + * All rights reserved.
> > > > > + * Derived from FreeBSD's bufring.h
> > > > > + * Used as BSD-3 Licensed with permission from Kip Macy.
> > > > > + */
> > > > > +
> > > > > +#ifndef _RTE_RING_ELEM_H_
> > > > > +#define _RTE_RING_ELEM_H_
> > > > > +

<snip>

> > > > > +
> > > > > +/* the actual enqueue of pointers on the ring.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi producer enqueue functions.
> > > > > + */
> > > > > +#define ENQUEUE_PTRS_ELEM(r, ring_start, prod_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize =3D=3D 4) \
> > > > > +		ENQUEUE_PTRS_32(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize =3D=3D 8) \
> > > > > +		ENQUEUE_PTRS_64(r, ring_start, prod_head,
> obj_table, n); \
> > > > > +	else if (esize =3D=3D 16) \
> > > > > +		ENQUEUE_PTRS_128(r, ring_start, prod_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_32(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	uint32_t idx =3D prod_head & (r)->mask; \
> > > > > +	uint32_t *ring =3D (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj =3D (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n & ((~(unsigned)0x7))); i +=3D 8, idx +=3D=
 8)
> { \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +			ring[idx + 1] =3D obj[i + 1]; \
> > > > > +			ring[idx + 2] =3D obj[i + 2]; \
> > > > > +			ring[idx + 3] =3D obj[i + 3]; \
> > > > > +			ring[idx + 4] =3D obj[i + 4]; \
> > > > > +			ring[idx + 5] =3D obj[i + 5]; \
> > > > > +			ring[idx + 6] =3D obj[i + 6]; \
> > > > > +			ring[idx + 7] =3D obj[i + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_64(r, ring_start, prod_head, obj_table, n)
> do { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	uint32_t idx =3D prod_head & (r)->mask; \
> > > > > +	uint64_t *ring =3D (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj =3D (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n & ((~(unsigned)0x3))); i +=3D 4, idx +=3D=
 4)
> { \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +			ring[idx + 1] =3D obj[i + 1]; \
> > > > > +			ring[idx + 2] =3D obj[i + 2]; \
> > > > > +			ring[idx + 3] =3D obj[i + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			ring[idx++] =3D obj[i++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			ring[idx++] =3D obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define ENQUEUE_PTRS_128(r, ring_start, prod_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	uint32_t idx =3D prod_head & (r)->mask; \
> > > > > +	__uint128_t *ring =3D (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj =3D (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n >> 1); i +=3D 2, idx +=3D 2) { \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +			ring[idx + 1] =3D obj[i + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			ring[idx++] =3D obj[i++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++)\
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			ring[idx] =3D obj[i]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* the actual copy of pointers on the ring to obj_table.
> > > > > + * Placed here since identical code needed in both
> > > > > + * single and multi consumer dequeue functions.
> > > > > + */
> > > > > +#define DEQUEUE_PTRS_ELEM(r, ring_start, cons_head, obj_table,
> > > > > +esize, n)
> > > > > do { \
> > > > > +	if (esize =3D=3D 4) \
> > > > > +		DEQUEUE_PTRS_32(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize =3D=3D 8) \
> > > > > +		DEQUEUE_PTRS_64(r, ring_start, cons_head,
> obj_table, n); \
> > > > > +	else if (esize =3D=3D 16) \
> > > > > +		DEQUEUE_PTRS_128(r, ring_start, cons_head,
> obj_table, n);
> > > \ }
> > > > > while
> > > > > +(0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_32(r, ring_start, cons_head, obj_table, n) =
do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx =3D cons_head & (r)->mask; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	uint32_t *ring =3D (uint32_t *)ring_start; \
> > > > > +	uint32_t *obj =3D (uint32_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n & (~(unsigned)0x7)); i +=3D 8, idx +=3D 8=
)
> {\
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +			obj[i + 1] =3D ring[idx + 1]; \
> > > > > +			obj[i + 2] =3D ring[idx + 2]; \
> > > > > +			obj[i + 3] =3D ring[idx + 3]; \
> > > > > +			obj[i + 4] =3D ring[idx + 4]; \
> > > > > +			obj[i + 5] =3D ring[idx + 5]; \
> > > > > +			obj[i + 6] =3D ring[idx + 6]; \
> > > > > +			obj[i + 7] =3D ring[idx + 7]; \
> > > > > +		} \
> > > > > +		switch (n & 0x7) { \
> > > > > +		case 7: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 6: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 5: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 4: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 3: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_64(r, ring_start, cons_head, obj_table, n) =
do
> { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx =3D cons_head & (r)->mask; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	uint64_t *ring =3D (uint64_t *)ring_start; \
> > > > > +	uint64_t *obj =3D (uint64_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n & (~(unsigned)0x3)); i +=3D 4, idx +=3D 4=
)
> {\
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +			obj[i + 1] =3D ring[idx + 1]; \
> > > > > +			obj[i + 2] =3D ring[idx + 2]; \
> > > > > +			obj[i + 3] =3D ring[idx + 3]; \
> > > > > +		} \
> > > > > +		switch (n & 0x3) { \
> > > > > +		case 3: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 2: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		case 1: \
> > > > > +			obj[i++] =3D ring[idx++]; \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +#define DEQUEUE_PTRS_128(r, ring_start, cons_head, obj_table,
> > > > > +n) do
> > > { \
> > > > > +	unsigned int i; \
> > > > > +	uint32_t idx =3D cons_head & (r)->mask; \
> > > > > +	const uint32_t size =3D (r)->size; \
> > > > > +	__uint128_t *ring =3D (__uint128_t *)ring_start; \
> > > > > +	__uint128_t *obj =3D (__uint128_t *)obj_table; \
> > > > > +	if (likely(idx + n < size)) { \
> > > > > +		for (i =3D 0; i < (n >> 1); i +=3D 2, idx +=3D 2) { \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +			obj[i + 1] =3D ring[idx + 1]; \
> > > > > +		} \
> > > > > +		switch (n & 0x1) { \
> > > > > +		case 1: \
> > > > > +			obj[i++] =3D ring[idx++]; /* fallthrough */ \
> > > > > +		} \
> > > > > +	} else { \
> > > > > +		for (i =3D 0; idx < size; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +		for (idx =3D 0; i < n; i++, idx++) \
> > > > > +			obj[i] =3D ring[idx]; \
> > > > > +	} \
> > > > > +} while (0)
> > > > > +
> > > > > +/* Between load and load. there might be cpu reorder in weak
> > > > > +model
> > > > > + * (powerpc/arm).
> > > > > + * There are 2 choices for the users
> > > > > + * 1.use rmb() memory barrier
> > > > > + * 2.use one-direction load_acquire/store_release
> > > > > +barrier,defined by
> > > > > + * CONFIG_RTE_USE_C11_MEM_MODEL=3Dy
> > > > > + * It depends on performance test results.
> > > > > + * By default, move common functions to rte_ring_generic.h  */
> > > > > +#ifdef RTE_USE_C11_MEM_MODEL #include "rte_ring_c11_mem.h"
> > > > > +#else
> > > > > +#include "rte_ring_generic.h"
> > > > > +#endif
> > > > > +
> > > > > +/**
> > > > > + * @internal Enqueue several objects on the ring
> > > > > + *
> > > > > + * @param r
> > > > > + *   A pointer to the ring structure.
> > > > > + * @param obj_table
> > > > > + *   A pointer to a table of void * pointers (objects).
> > > > > + * @param esize
> > > > > + *   The size of ring element, in bytes. It must be a multiple o=
f 4.
> > > > > + *   Currently, sizes 4, 8 and 16 are supported. This should be =
the
> same
> > > > > + *   as passed while creating the ring, otherwise the results ar=
e
> undefined.
> > > > > + * @param n
> > > > > + *   The number of objects to add in the ring from the obj_table=
.
> > > > > + * @param behavior
> > > > > + *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items
> from a
> > > ring
> > > > > + *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > > from
> > > > > ring
> > > > > + * @param is_sp
> > > > > + *   Indicates whether to use single producer or multi-producer =
head
> > > update
> > > > > + * @param free_space
> > > > > + *   returns the amount of space after the enqueue operation has
> > > finished
> > > > > + * @return
> > > > > + *   Actual number of objects enqueued.
> > > > > + *   If behavior =3D=3D RTE_RING_QUEUE_FIXED, this will be 0 or =
n only.
> > > > > + */
> > > > > +static __rte_always_inline unsigned int
> > > > > +__rte_ring_do_enqueue_elem(struct rte_ring *r, void * const
> obj_table,
> > > > > +		unsigned int esize, unsigned int n,
> > > > > +		enum rte_ring_queue_behavior behavior, unsigned
> int is_sp,
> > > > > +		unsigned int *free_space)
> > >
> > >
> > > I like the idea to add esize as an argument to the public API, so
> > > the compiler can do it's jib optimizing calls with constant esize.
> > > Though I am not very happy with the rest of implementation:
> > > 1. It doesn't really provide configurable elem size - only 4/8/16B
> > > elems are supported.
> > Agree. I was thinking other sizes can be added on need basis.
> > However, I am wondering if we should just provide for 4B and then the
> users can use bulk operations to construct whatever they need?
>=20
> I suppose it could be plan B... if there would be no agreement on generic=
 case.
> And for 4B elems, I guess you do have a particular use-case?
Yes

>=20
> > It
> > would mean extra work for the users.
> >
> > > 2. A lot of code duplication with these 3 copies of ENQUEUE/DEQUEUE
> > > macros.
> > >
> > > Looking at ENQUEUE/DEQUEUE macros, I can see that main loop always
> > > does 32B copy per iteration.
> > Yes, I tried to keep it the same as the existing one (originally, I
> > guess the intention was to allow for 256b vector instructions to be
> > generated)
> >
> > > So wonder can we make a generic function that would do 32B copy per
> > > iteration in a main loop, and copy tail  by 4B chunks?
> > > That would avoid copy duplication and will allow user to have any
> > > elem size (multiple of 4B) he wants.
> > > Something like that (note didn't test it, just a rough idea):
> > >
> > >  static inline void
> > > copy_elems(uint32_t du32[], const uint32_t su32[], uint32_t num,
> > > uint32_t
> > > esize) {
> > >         uint32_t i, sz;
> > >
> > >         sz =3D (num * esize) / sizeof(uint32_t);
> > If 'num' is a compile time constant, 'sz' will be a compile time consta=
nt.
> Otherwise, this will result in a multiplication operation.
>=20
> Not always.
> If esize is compile time constant, then for esize as power of 2 (4,8,16,.=
..), it
> would be just one shift.
> For other constant values it could be a 'mul' or in many cases just 2 shi=
fts plus
> 'add' (if compiler is smart enough).
> I.E. let say for 24B elem is would be either num * 6 or (num << 2) + (num=
 <<
> 1).
With num * 15 it has to be (num << 3) + (num << 2) + (num << 1) + num
Not sure if the compiler will do this.

> I suppose for non-power of 2 elems it might be ok to get such small perf =
hit.
Agree, should be ok not to focus on right now.

>=20
> >I have tried
> > to avoid the multiplication operation and try to use shift and mask
> operations (just like how the rest of the ring code does).
> >
> > >
> > >         for (i =3D 0; i < (sz & ~7); i +=3D 8)
> > >                 memcpy(du32 + i, su32 + i, 8 * sizeof(uint32_t));
> > I had used memcpy to start with (for the entire copy operation),
> > performance is not the same for 64b elements when compared with the
> existing ring APIs (some cases more and some cases less).
>=20
> I remember that from one of your previous mails, that's why here I sugges=
t to
> use in a loop memcpy() with fixed size.
> That way for each iteration complier will replace memcpy() with instructi=
ons
> to copy 32B in a way he thinks is optimal (same as for original macro, I =
think).
I tried this. On x86 (Xeon(R) Gold 6132 CPU @ 2.60GHz), the results are as =
follows. The numbers in brackets are with the code on master.
gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0

RTE>>ring_perf_elem_autotest
### Testing single element and burst enq/deq ###
SP/SC single enq/dequeue: 5
MP/MC single enq/dequeue: 40 (35)
SP/SC burst enq/dequeue (size: 8): 2
MP/MC burst enq/dequeue (size: 8): 6
SP/SC burst enq/dequeue (size: 32): 1 (2)
MP/MC burst enq/dequeue (size: 32): 2

### Testing empty dequeue ###
SC empty dequeue: 2.11
MC empty dequeue: 1.41 (2.11)

### Testing using a single lcore ###
SP/SC bulk enq/dequeue (size: 8): 2.15 (2.86)
MP/MC bulk enq/dequeue (size: 8): 6.35 (6.91)
SP/SC bulk enq/dequeue (size: 32): 1.35 (2.06)
MP/MC bulk enq/dequeue (size: 32): 2.38 (2.95)

### Testing using two physical cores ###
SP/SC bulk enq/dequeue (size: 8): 73.81 (15.33)
MP/MC bulk enq/dequeue (size: 8): 75.10 (71.27)
SP/SC bulk enq/dequeue (size: 32): 21.14 (9.58)
MP/MC bulk enq/dequeue (size: 32): 25.74 (20.91)

### Testing using two NUMA nodes ###
SP/SC bulk enq/dequeue (size: 8): 164.32 (50.66)
MP/MC bulk enq/dequeue (size: 8): 176.02 (173.43)
SP/SC bulk enq/dequeue (size: 32): 50.78 (23)
MP/MC bulk enq/dequeue (size: 32): 63.17 (46.74)

On one of the Arm platform
MP/MC bulk enq/dequeue (size: 32): 0.37 (0.33) (~12% hit, the rest are ok)

On another Arm platform, all numbers are same or slightly better.

I can post the patch with this change if you want to run some benchmarks on=
 your platform.
I have not used the same code you have suggested, instead I have used the s=
ame logic in a single macro with memcpy.

>=20
> >
> > IMO, we have to keep the performance of the 64b and 128b the same as
> > what we get with the existing ring and event-ring APIs. That would allo=
w us
> to replace them with these new APIs. I suggest that we keep the macros in
> this patch for 64b and 128b.
>=20
> I still think we probably can achieve that without duplicating macros, wh=
ile
> still supporting arbitrary elem size.
> See above.
>=20
> > For the rest of the sizes, we could put a for loop around 32b macro (th=
is
> would allow for all sizes as well).
> >
> > >
> > >         switch (sz & 7) {
> > >         case 7: du32[sz - 7] =3D su32[sz - 7]; /* fallthrough */
> > >         case 6: du32[sz - 6] =3D su32[sz - 6]; /* fallthrough */
> > >         case 5: du32[sz - 5] =3D su32[sz - 5]; /* fallthrough */
> > >         case 4: du32[sz - 4] =3D su32[sz - 4]; /* fallthrough */
> > >         case 3: du32[sz - 3] =3D su32[sz - 3]; /* fallthrough */
> > >         case 2: du32[sz - 2] =3D su32[sz - 2]; /* fallthrough */
> > >         case 1: du32[sz - 1] =3D su32[sz - 1]; /* fallthrough */
> > >         }
> > > }
> > >
> > > static inline void
> > > enqueue_elems(struct rte_ring *r, void *ring_start, uint32_t prod_hea=
d,
> > >                 void *obj_table, uint32_t num, uint32_t esize) {
> > >         uint32_t idx, n;
> > >         uint32_t *du32;
> > >
> > >         const uint32_t size =3D r->size;
> > >
> > >         idx =3D prod_head & (r)->mask;
> > >
> > >         du32 =3D ring_start + idx * sizeof(uint32_t);
> > >
> > >         if (idx + num < size)
> > >                 copy_elems(du32, obj_table, num, esize);
> > >         else {
> > >                 n =3D size - idx;
> > >                 copy_elems(du32, obj_table, n, esize);
> > >                 copy_elems(ring_start, obj_table + n * sizeof(uint32_=
t),
> > >                         num - n, esize);
> > >         }
> > > }
> > >
> > > And then, in that function, instead of ENQUEUE_PTRS_ELEM(), just:
> > >
> > > enqueue_elems(r, &r[1], prod_head, obj_table, n, esize);
> > >
> > >
> > > > > +{
> > > > > +	uint32_t prod_head, prod_next;
> > > > > +	uint32_t free_entries;
> > > > > +
> > > > > +	n =3D __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > > > +			&prod_head, &prod_next, &free_entries);
> > > > > +	if (n =3D=3D 0)
> > > > > +		goto end;
> > > > > +
> > > > > +	ENQUEUE_PTRS_ELEM(r, &r[1], prod_head, obj_table, esize,
> n);
> > > > > +
> > > > > +	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > > +end:
> > > > > +	if (free_space !=3D NULL)
> > > > > +		*free_space =3D free_entries - n;
> > > > > +	return n;
> > > > > +}
> > > > > +