From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B5D8AA051C; Sat, 18 Jan 2020 13:32:33 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 540C11B203; Sat, 18 Jan 2020 13:32:32 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id D6D1E316B for ; Sat, 18 Jan 2020 13:32:29 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Jan 2020 04:32:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,334,1574150400"; d="scan'208";a="226630630" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga003.jf.intel.com with ESMTP; 18 Jan 2020 04:32:27 -0800 Received: from fmsmsx126.amr.corp.intel.com (10.18.125.43) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.439.0; Sat, 18 Jan 2020 04:32:27 -0800 Received: from FMSEDG002.ED.cps.intel.com (10.1.192.134) by FMSMSX126.amr.corp.intel.com (10.18.125.43) with Microsoft SMTP Server (TLS) id 14.3.439.0; Sat, 18 Jan 2020 04:32:27 -0800 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.108) by edgegateway.intel.com (192.55.55.69) with Microsoft SMTP Server (TLS) id 14.3.439.0; Sat, 18 Jan 2020 04:32:27 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Brg+xYE8SlC8e6Ls+CymNum+WSwr1ERgqD1s2n42e7Gsk2K76BtVM1RaseD96t5jGSu6yCu8z0uANOO9aGrd66kfwtWCuboZ3NnAhHgEXDWx4R3U9yV2l1KdikhrVgOksMZwZAkSRdeUcflF/uHdK/40xBDOO5X/joQu58okW2kkvbZ5OGCnoE7UrXZHT1mF6weOYFqgVwtCcdxabnv9JnyTuJioz+gta0j2iuiRVAw8FP+/iuQDHwfHIQ7xpCBvIDpcykLYfLn3ePHB6n5Sunll3yKfnfzuDqIsfzbsnhhvYoLCwWzIRpQRODgtuDz6VA4IsHe5LAirTe79aHmUQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4L831mmUojlA8UMee84dsbYMAbI9gg1DzZRZL8Nu5gc=; b=YBzXKVLqLc1Pu88aZGS+yUGwdZ2bRwFz9K+LLf07T1sRNt5b8i5y79itGC0fVUgbps/Tnm2VxkJJh4mXSNHvPIWnSjsR4WDJ/AUiw27T62o9uIgHXDXk1QDcHEXXX1k4CRqKCxCY4Pwjw6kawsXJ29omr0t2qO/VMNBvDuZS4JFqiVjPCcAmEPahjD8+s/kGM/Z9APCdprSMkcaR45LJWCykzknDzgw7kQzto5OMYFHn/usUYOPc4DMVT6eK1jHUTV7ckCw1A69Wh8mrTSyypDmvVbTjsx/o6nKJ/3OOwAHZ7UbcOCNIJhCLpJPdt5WnIq4T9ZeqbCS7iVYmVt7nXQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4L831mmUojlA8UMee84dsbYMAbI9gg1DzZRZL8Nu5gc=; b=MGuZkwoHMCa6WbtyCpuVSeYjzkTWzASw8ykax7N46BKsUsf/L4zH2QclEsRz8SvmHnnA4k4Q6EZOZiQEjpgw8dycfHjFj+xVzq5NGR9/2nceM8mwjOumjT7HD1wpsPp2oPKLwq//mol/nO430Iw+LOS5cy8RT/7R57mfplxKkCs= Received: from SN6PR11MB2558.namprd11.prod.outlook.com (52.135.94.19) by SN6PR11MB2542.namprd11.prod.outlook.com (52.135.95.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2644.20; Sat, 18 Jan 2020 12:32:23 +0000 Received: from SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386]) by SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386%7]) with mapi id 15.20.2644.023; Sat, 18 Jan 2020 12:32:23 +0000 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , Olivier Matz CC: "sthemmin@microsoft.com" , "jerinj@marvell.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "pbhagavatula@marvell.com" , "Wang, Yipeng1" , "dev@dpdk.org" , Dharmik Thakkar , Ruifeng Wang , Gavin Hu , nd , David Christensen , nd Thread-Topic: [PATCH v9 2/6] lib/ring: apis to support configurable element size Thread-Index: AQHVzC1zTik1gZ4m3UqM4TttSFJGW6fvD5GAgAADEoCAAUqzYA== Date: Sat, 18 Jan 2020 12:32:23 +0000 Message-ID: References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20200116052511.8557-1-honnappa.nagarahalli@arm.com> <20200116052511.8557-3-honnappa.nagarahalli@arm.com> <20200117163417.GY22738@platinum> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNmQ2YzdjN2UtZGIyMy00Mjg4LTg2OGUtODM2YjFiODJkNWQ3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoieVZIQkROQUViN0c0NXQ5dUpBMXlDUXlEWEZDNlgxZXJWcXBtYlIyS0NmYUk1YUJZOFZQU1pGSzY0VVc3TFIwbSJ9 dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 x-ctpclassification: CTP_NT authentication-results: spf=none (sender IP is ) smtp.mailfrom=konstantin.ananyev@intel.com; x-originating-ip: [192.198.151.160] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 026547ef-f679-42c6-3dd0-08d79c12783f x-ms-traffictypediagnostic: SN6PR11MB2542: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8273; x-forefront-prvs: 0286D7B531 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(366004)(376002)(136003)(346002)(39860400002)(189003)(199004)(86362001)(6506007)(54906003)(26005)(110136005)(478600001)(2906002)(7416002)(55016002)(9686003)(316002)(8676002)(7696005)(8936002)(81156014)(81166006)(33656002)(52536014)(186003)(66946007)(66556008)(64756008)(66446008)(71200400001)(4326008)(76116006)(5660300002)(66476007); DIR:OUT; SFP:1102; SCL:1; SRVR:SN6PR11MB2542; H:SN6PR11MB2558.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: XvtHzLyKVq9xDqtd1Etp/ztzcVV7dt1FXzXjyVOg51U57auqJoxeBuTIY4Fpeat90wOn998DTAZ/iN9riJNbnb5DionEjuj+DgXpWqn+TX9bDDXhnXr9DQtjTc5nmPWMynuTIpdAcXqKExMyGlXGQdxT+6JACjOZkLa93Ot2IUna3cI8MQoSPJHjR2pjYaGujAYBqCGMSspzawOCAvgy9LVeaiszfLwUYkEXYenr9cpd+PUIQ02Rd8auzyH8KsMOZN5m8ykNowxxm6tjJCa0xHeN1agvmOmBz+BDdL8xPuyzM3wXVhZy7fwF4Xg/Zg8xPYyYR+zSOEHv9Lpz2+oo5Yg3mnL1JOSg6DjNF2ymz7z5AYxKtk5nAtkoigHvfOfccamUuu5AWnqVZaWEhDFjxMH/PX7xDPs0yNesqt2x6M78aEENa7D8v37UpAVN6QV+ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 026547ef-f679-42c6-3dd0-08d79c12783f X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2020 12:32:23.4680 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: TDefeiwgai596fQWjHqilTWe0vHyhXwfPK9X1UJUa/bIoPcNZtTmW+fnqTCpnfTMrQYt6NE3vEpPd161VgEPSW+Qo3axeUfZUYQnCNs/5Pc= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB2542 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote: > > > Current APIs assume ring elements to be pointers. However, in many us= e > > > cases, the size can be different. Add new APIs to support configurabl= e > > > ring element sizes. > > > > > > Signed-off-by: Honnappa Nagarahalli > > > Reviewed-by: Dharmik Thakkar > > > Reviewed-by: Gavin Hu > > > Reviewed-by: Ruifeng Wang > > > --- > > > lib/librte_ring/Makefile | 3 +- > > > lib/librte_ring/meson.build | 4 + > > > lib/librte_ring/rte_ring.c | 41 +- > > > lib/librte_ring/rte_ring.h | 1 + > > > lib/librte_ring/rte_ring_elem.h | 1003 ++++++++++++++++++++++++= ++ > > > lib/librte_ring/rte_ring_version.map | 2 + > > > 6 files changed, 1045 insertions(+), 9 deletions(-) create mode > > > 100644 lib/librte_ring/rte_ring_elem.h > > > > > > > [...] > > > > > +static __rte_always_inline void > > > +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t i= dx, > > > + const void *obj_table, uint32_t n) > > > +{ > > > + unsigned int i; > > > + uint32_t *ring =3D (uint32_t *)&r[1]; > > > + const uint32_t *obj =3D (const uint32_t *)obj_table; > > > + if (likely(idx + n < size)) { > > > + for (i =3D 0; i < (n & ~0x7); i +=3D 8, idx +=3D 8) { > > > + ring[idx] =3D obj[i]; > > > + ring[idx + 1] =3D obj[i + 1]; > > > + ring[idx + 2] =3D obj[i + 2]; > > > + ring[idx + 3] =3D obj[i + 3]; > > > + ring[idx + 4] =3D obj[i + 4]; > > > + ring[idx + 5] =3D obj[i + 5]; > > > + ring[idx + 6] =3D obj[i + 6]; > > > + ring[idx + 7] =3D obj[i + 7]; > > > + } > > > + switch (n & 0x7) { > > > + case 7: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 6: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 5: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 4: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 3: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 2: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 1: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + } > > > + } else { > > > + for (i =3D 0; idx < size; i++, idx++) > > > + ring[idx] =3D obj[i]; > > > + /* Start at the beginning */ > > > + for (idx =3D 0; i < n; i++, idx++) > > > + ring[idx] =3D obj[i]; > > > + } > > > +} > > > + > > > +static __rte_always_inline void > > > +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, > > > + const void *obj_table, uint32_t n) > > > +{ > > > + unsigned int i; > > > + const uint32_t size =3D r->size; > > > + uint32_t idx =3D prod_head & r->mask; > > > + uint64_t *ring =3D (uint64_t *)&r[1]; > > > + const uint64_t *obj =3D (const uint64_t *)obj_table; > > > + if (likely(idx + n < size)) { > > > + for (i =3D 0; i < (n & ~0x3); i +=3D 4, idx +=3D 4) { > > > + ring[idx] =3D obj[i]; > > > + ring[idx + 1] =3D obj[i + 1]; > > > + ring[idx + 2] =3D obj[i + 2]; > > > + ring[idx + 3] =3D obj[i + 3]; > > > + } > > > + switch (n & 0x3) { > > > + case 3: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 2: > > > + ring[idx++] =3D obj[i++]; /* fallthrough */ > > > + case 1: > > > + ring[idx++] =3D obj[i++]; > > > + } > > > + } else { > > > + for (i =3D 0; idx < size; i++, idx++) > > > + ring[idx] =3D obj[i]; > > > + /* Start at the beginning */ > > > + for (idx =3D 0; i < n; i++, idx++) > > > + ring[idx] =3D obj[i]; > > > + } > > > +} > > > + > > > +static __rte_always_inline void > > > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, > > > + const void *obj_table, uint32_t n) > > > +{ > > > + unsigned int i; > > > + const uint32_t size =3D r->size; > > > + uint32_t idx =3D prod_head & r->mask; > > > + rte_int128_t *ring =3D (rte_int128_t *)&r[1]; > > > + const rte_int128_t *obj =3D (const rte_int128_t *)obj_table; > > > + if (likely(idx + n < size)) { > > > + for (i =3D 0; i < (n & ~0x1); i +=3D 2, idx +=3D 2) > > > + memcpy((void *)(ring + idx), > > > + (const void *)(obj + i), 32); > > > + switch (n & 0x1) { > > > + case 1: > > > + memcpy((void *)(ring + idx), > > > + (const void *)(obj + i), 16); > > > + } > > > + } else { > > > + for (i =3D 0; idx < size; i++, idx++) > > > + memcpy((void *)(ring + idx), > > > + (const void *)(obj + i), 16); > > > + /* Start at the beginning */ > > > + for (idx =3D 0; i < n; i++, idx++) > > > + memcpy((void *)(ring + idx), > > > + (const void *)(obj + i), 16); > > > + } > > > +} > > > + > > > +/* the actual enqueue of elements on the ring. > > > + * Placed here since identical code needed in both > > > + * single and multi producer enqueue functions. > > > + */ > > > +static __rte_always_inline void > > > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void > > *obj_table, > > > + uint32_t esize, uint32_t num) > > > +{ > > > + /* 8B and 16B copies implemented individually to retain > > > + * the current performance. > > > + */ > > > + if (esize =3D=3D 8) > > > + enqueue_elems_64(r, prod_head, obj_table, num); > > > + else if (esize =3D=3D 16) > > > + enqueue_elems_128(r, prod_head, obj_table, num); > > > + else { > > > + uint32_t idx, scale, nr_idx, nr_num, nr_size; > > > + > > > + /* Normalize to uint32_t */ > > > + scale =3D esize / sizeof(uint32_t); > > > + nr_num =3D num * scale; > > > + idx =3D prod_head & r->mask; > > > + nr_idx =3D idx * scale; > > > + nr_size =3D r->size * scale; > > > + enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num); > > > + } > > > +} > > > > Following Konstatin's comment on v7, enqueue_elems_128() was modified t= o > > ensure it won't crash if the object is unaligned. Are we sure that this= same > > problem cannot also occurs with 64b copies on all supported architectur= es? (I > > mean 64b access that is only aligned on 32b) > Konstantin mentioned that the 64b load/store instructions on x86 can hand= le unaligned access. Yep, I think we are ok here for IA and IA-32. > On aarch64, the load/store (non-atomic, > which will be used in this case) can handle unaligned access. >=20 > + David Christensen to comment for PPC If we are in doubt here, probably worth to add a new test-case(s) for UT? >=20 > > > > Out of curiosity, would it make a big perf difference to only use > > enqueue_elems_32()? > Yes, this was having a significant impact on 128b elements. I did not try= on 64b elements. > I will run the perf test with 32b copy for 64b element size and get back.