From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 70F1BA04F0; Mon, 13 Jan 2020 12:54:14 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 91F181D15D; Mon, 13 Jan 2020 12:54:13 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 9237F1C436 for ; Mon, 13 Jan 2020 12:54:11 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Jan 2020 03:54:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.69,428,1571727600"; d="scan'208";a="422788462" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga005.fm.intel.com with ESMTP; 13 Jan 2020 03:54:10 -0800 Received: from fmsmsx605.amr.corp.intel.com (10.18.126.85) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.439.0; Mon, 13 Jan 2020 03:54:10 -0800 Received: from fmsmsx605.amr.corp.intel.com (10.18.126.85) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Mon, 13 Jan 2020 03:54:09 -0800 Received: from FMSEDG002.ED.cps.intel.com (10.1.192.134) by fmsmsx605.amr.corp.intel.com (10.18.126.85) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Mon, 13 Jan 2020 03:54:09 -0800 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.105) by edgegateway.intel.com (192.55.55.69) with Microsoft SMTP Server (TLS) id 14.3.439.0; Mon, 13 Jan 2020 03:54:04 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nE/ZIXyRDbbz47x3sVu6JjXxTUBil47MU/1am24tBIQZbK1Mgop+5Xj3ljd26IlerBZRTZ8SXEb0rn+ulCr66kjnq/DiOB9Ioyhmj94XhLS8+K2kQ6TFI4EXs+z7QYRotfz0Rft+kUAtaJLG6KrRW3qNNBc95fheEAictbvjlOapRmb74hh/H6Nb1OnfqyCiJAdIsEvKmOZDXxLWNE2UhSem3INdR0zOxn/if+0PvGoJ14DG9y3lsNBr00f++T2z0ioEjkAR3vFYyZ5bHcJAztJ6ClJEcTX0JhRm1FvkKRdebmRh2KePUjH7Z8FZviLNRx0BTuk/SlTo93ULAXukow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yHhqD5qfdO546FPAM9KqycTVxJ+eYCK/H5P3c8wEcL0=; b=KV2QoFJcmlIRi2er9CEmuSuZWeo4XXsCnJIsV6HgsSK2htWUuNm1H896Pgm+k4Nc7FFSeuqk+y2VF2nd7TbijLm2qVOohp/oG3TvpDuky/1zt/0UgryuaAnQbXLIBZTMiDlRN5pleA77MUoDHnwSn944S37XLer+pExqkN+XglxpMWvmXUJy79daMuJLWbM4IQIdWtmmFvUMy976q5yL4kpcS4ReE9jqhJZ1d7G1kMV5U1wCRqcKyC4z8ClQu8NFDCxgKjpZ4jObLOu4irBIajn+Fsexk8E7aMxD2ofG2qlbjHv/2P34OdHd4cmaP2r3BGbvuZTOIpRJYIaM0gRjQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yHhqD5qfdO546FPAM9KqycTVxJ+eYCK/H5P3c8wEcL0=; b=YxpLcnJJbYg80PM/vIkasW6jLkGgeZCMq+hW8p76F5j5bJMAfIko9Jyz7hkU7tJdbkaqB/sIfx1CTF4fnA7gxSYgIeSkol6pwK4OKf/muu62P7AGLg8CyN6iBvBoMGxjbZKksPYWsukHILmwaIAgYk3ySj1c9mVwyXPnSE1QZlM= Received: from SN6PR11MB2558.namprd11.prod.outlook.com (52.135.94.19) by SN6PR11MB3375.namprd11.prod.outlook.com (52.135.110.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2623.9; Mon, 13 Jan 2020 11:53:57 +0000 Received: from SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386]) by SN6PR11MB2558.namprd11.prod.outlook.com ([fe80::4d86:362a:13c3:8386%7]) with mapi id 15.20.2623.015; Mon, 13 Jan 2020 11:53:57 +0000 From: "Ananyev, Konstantin" To: Honnappa Nagarahalli , "olivier.matz@6wind.com" , "sthemmin@microsoft.com" , "jerinj@marvell.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "pbhagavatula@marvell.com" CC: "dev@dpdk.org" , Dharmik Thakkar , Ruifeng Wang , Gavin Hu , nd , nd Thread-Topic: [PATCH v7 02/17] lib/ring: apis to support configurable element size Thread-Index: AQHVtvB22uIJex5+gEyRiFOc8yhYEqfXpo/wgAckGwCAAAcRgIAAR9qggABU2wCAAAOEAIAA9tKAgAA+MMCAAOURgIAAEbOggAEB6ICABgJsQA== Date: Mon, 13 Jan 2020 11:53:56 +0000 Message-ID: References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20191220044524.32910-1-honnappa.nagarahalli@arm.com> <20191220044524.32910-3-honnappa.nagarahalli@arm.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiY2VlYjcwZDUtZDMwNy00YTNjLWE3NTctZDMyZDNmYWRjZDFlIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiSVFRTG96cEpwZXRlcllXUWNUSU5IM0dEQjBnR21oV1NLMUhIa2tFY3dMNmxiVFwvRk42ZTljYmVnREVjbVBkZUQifQ== dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 x-ctpclassification: CTP_NT authentication-results: spf=none (sender IP is ) smtp.mailfrom=konstantin.ananyev@intel.com; x-originating-ip: [192.198.151.179] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9a4d204a-1250-4db9-a203-08d7981f45d6 x-ms-traffictypediagnostic: SN6PR11MB3375: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 028166BF91 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(396003)(39860400002)(346002)(136003)(366004)(376002)(199004)(189003)(7416002)(71200400001)(478600001)(52536014)(4326008)(33656002)(110136005)(5660300002)(54906003)(186003)(26005)(2906002)(8936002)(86362001)(66476007)(316002)(66556008)(81156014)(81166006)(8676002)(7696005)(9686003)(55016002)(6506007)(66446008)(76116006)(64756008)(66946007); DIR:OUT; SFP:1102; SCL:1; SRVR:SN6PR11MB3375; H:SN6PR11MB2558.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: q5aAGcJ3z5iwbjqrQ8L4MY7iB9TQGS/hOOsfMpkxWyd6Fwxy5If1t1Sc6+fpS/ngU6jnOfShW9kt9oBKVdZd48x3m//abMQgGyUlmYIPAFdmGKKK50kvDsb1Ox7Lq1O+mhmG3343z48HtvRpK/sLxDvMIGCNJ/IzybsH9uwN9b2lMfnDv7Fi3Rz2Hstkyfkui8fk+sNKI1jbeGSZvOBFTOdHHguotqcvZy7Js44FmVpTMCJu9YptcqKTz4R9AjiLVnaqA/lGjNY2CBLpSXWEBbW81zgARmlsA4ppvQ/UgF142lvByI7IJJ50glaWGAtiRl/Q2djcpZWOQN2rXtf22Y3ZIGmsNBXGVOo10X61KN+dX3qzBhAaj+s/6oqZTbZtNw9MphH+aPPnLXdclLi5I60ubbAC+SkuVZ+drHbdOXeZm6edJ3iKf/XEmN7KUdJr Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 9a4d204a-1250-4db9-a203-08d7981f45d6 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jan 2020 11:53:57.3640 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: +hm5mWAXIGxGaSZWfYzedD/Jj4GkPV9MZo3dg7bc7w9HT2W34OyQX28v5yl53y2xfDJrCRxbHMYKAA3It+tij2KEgglsyxWQHl8zCBKHZRo= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB3375 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v7 02/17] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > > > > > > > > > > > > + > > > > > > > > > > > > +static __rte_always_inline void > > > > > > > > > > > > +enqueue_elems_128(struct rte_ring *r, uint32_t > > > > > > > > > > > > +prod_head, const void *obj_table, uint32_t n) { > > > > > > > > > > > > +unsigned int i; const uint32_t size =3D > > > > > > > > > > > > +r->size; uint32_t idx =3D prod_head & r->mask; > > > > > > > > > > > > +r->__uint128_t > > > > > > > > > > > > +*ring =3D (__uint128_t *)&r[1]; const __uint128_t > > > > > > > > > > > > +*obj =3D (const __uint128_t *)obj_table; if > > > > > > > > > > > > +(likely(idx + n < > > > > > > > > > > > > +size)) { for (i =3D 0; i < (n & ~0x1); i +=3D 2, i= dx +=3D > > > > > > > > > > > > +2) { ring[idx] =3D obj[i]; ring[idx + 1] =3D obj[i= + > > > > > > > > > > > > +1]; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > AFAIK, that implies 16B aligned obj_table... > > > > > > > > > > > Would it always be the case? > > > > > > > > > > I am not sure from the compiler perspective. > > > > > > > > > > At least on Arm architecture, unaligned access (address > > > > > > > > > > that is accessed is not aligned to the size of the data > > > > > > > > > > element being > > > > > > > > > > accessed) will result in faults or require additional c= ycles. > > > > > > > > > > So, aligning on > > > > > > > > 16B should be fine. > > > > > > > > > Further, I would be changing this to use 'rte_int128_t' a= s > > > > > > > > > '__uint128_t' is > > > > > > > > not defined on 32b systems. > > > > > > > > > > > > > > > > What I am trying to say: with this code we imply new > > > > > > > > requirement for elems > > > > > > > The only existing use case in DPDK for 16B is the event ring. > > > > > > > The event ring > > > > > > already does similar kind of copy (using 'struct rte_event'). > > > > > > > So, there is no change in expectations for event ring. > > > > > > > For future code, I think this expectation should be fine sinc= e > > > > > > > it allows for > > > > > > optimal code. > > > > > > > > > > > > > > > in the ring: when sizeof(elem)=3D=3D16 it's alignment also = has > > > > > > > > to be at least > > > > > > 16. > > > > > > > > Which from my perspective is not ideal. > > > > > > > Any reasoning? > > > > > > > > > > > > New implicit requirement and inconsistency. > > > > > > Code like that: > > > > > > > > > > > > struct ring_elem {uint64_t a, b;}; .... > > > > > > struct ring_elem elem; > > > > > > rte_ring_dequeue_elem(ring, &elem, sizeof(elem)); > > > > > > > > > > > > might cause a crash. > > > > > The alignment here is 8B. Assuming that instructions generated > > > > > will require 16B alignment, it will result in a crash, if > > > > > configured to generate > > > > exception. > > > > > But, these instructions are not atomic instructions. At least on > > > > > aarch64, unaligned access will not result in an exception for > > > > > non-atomic > > > > loads/stores. I believe it is the same behavior for x86 as well. > > > > > > > > On IA, there are 2 types of 16B load/store instructions: aligned an= d > > unaligned. > > > > Aligned are a bit faster, but will cause an exception if used on no= n > > > > 16B aligned address. > > > > As you using uint128_t * compiler will assume that both src and dst > > > > are 16B aligned and might generate code with aligned instructions. > > > Ok, looking at few articles, I read that if the address is aligned, > > > the unaligned instructions do not incur the penalty. Is this understa= nding > > correct? > > > > Yes, from my experience the difference is negligible. > > > > > > > > I see 2 solutions here: > > > 1) We can switch this copy to use uint32_t pointer. It would still > > > allow the compiler to generate (unaligned) instructions for up to 256= b > > > load/store. The 2 multiplications (to normalize the index and the siz= e of copy) > > can use shifts. This should make it safer. If one wants performance, th= ey can > > align the obj table to 16B (the ring itself is already aligned on the c= ache line > > boundary). > > > > Sounds good to me. > > > > > > > > 2) Considering that performance is paramount, we could document that > > > the obj table needs to be aligned on 16B boundary. This would affect = event > > dev (if we go ahead with replacing the event ring implementation) signi= ficantly. > > > > I don't think perf difference would be that significant to justify such= constraint. > > I am in favor of #1. > Ok, will go with this. > Is it ok if I squash the intermediate commits for test cases? I can keep = one commit for functional tests and another for performance tests. Yes, sounds like a good thing for me. Konstantin=20 >=20 > > > > > Note that we have to do the same thing for 64b elements as well. > > > > I don't mind to have one unified copy procedure, which would always use= 32bit > > elems, but AFAIK, on IA there is no such limitation for 64bit load/stor= es. > > > > > > > > > > > > > > > > > > > > > > While exactly the same code with: > > > > > > > > > > > > struct ring_elem {uint64_t a, b, c;}; OR struct ring_elem > > > > > > {uint64_t a, b, c, d;}; > > > > > > > > > > > > will work ok. > > > > > The alignment for these structures is still 8B. Are you saying > > > > > this will work because these will be copied using pointer to > > > > > uint32_t (whose > > > > alignment is 4B)? > > > > > > > > Yes, as we doing uint32_t copies, compiler can't assume the data > > > > will be 16B aligned and will use unaligned instructions. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Note that for elem sizes > 16 (24, 32), there is no such co= nstraint. > > > > > > > The rest of them need to be aligned on 4B boundary. However, > > > > > > > this should > > > > > > not affect the existing code. > > > > > > > The code for 8B and 16B is kept as is to ensure the > > > > > > > performance is not > > > > > > affected for the existing code. > > > > >