From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 65ACFA051A; Fri, 17 Jan 2020 17:34:22 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 222FA2BB9; Fri, 17 Jan 2020 17:34:22 +0100 (CET) Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by dpdk.org (Postfix) with ESMTP id 2061E2BAE for ; Fri, 17 Jan 2020 17:34:21 +0100 (CET) Received: by mail-wm1-f66.google.com with SMTP id t14so8241396wmi.5 for ; Fri, 17 Jan 2020 08:34:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=v8Ox2Zppxd4hsGNFw1dwcroKuxHsD+kBrVIXcw2klLQ=; b=Julc1+dDzZ5+L10eIYrfnnz18cQ016dFlq1wNhKGU36VKX6zbDXkocpBTBSICFr7KX wGonIIFcfQ/kkNh8U8U2fTbXNSWPQaL9PrIJvgjTyTn3USc1k2Pn50aRr4YDfYu8VCyu 7lFpBbiWJcWREHrjzQ99SfWd1eCzNdbnowWxUcUbYzEKgfG17Mi1nWwNBKgUtP9twhc9 2eA5l2jsjTSz5BQuXqIkj0qhhwx+54JRsuI44BnhDmZTPhjikHopYFV1etsQUZs0rPgr EdJzSqTmnJ6iZDEm4AcaXGT1twPSCmTJ/1qg40kvIZeXVUBilOSe/JxL3PMswYrV7Reg W5zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=v8Ox2Zppxd4hsGNFw1dwcroKuxHsD+kBrVIXcw2klLQ=; b=a6wJ0veDDyE/uMH5I2J+A5Jf39ojrZ24UCEl6Ui8Icu4tLmRnsEmYVozwkMPswKSHq 88IZ1navMzTBGXrvs4tR+2f4y5Le/OCaggwkSm8voz+d02bGmnPylsv3zBmFfzYApZD0 b1TwC1xNHgApP7DKuZVCh9Rrcatz6G60QhyFLr14AAzxmJTNPfPFl6MWyee8wNdOaVyq yQUt2XHba2snxjyRSmjh8aMA2nc2K/cN1wF3czDpku7XAQ21svyavdV5s7+gcBmqiJFu oP3Ne5/SjshWJUOnDWomG4Z0+Fas7DVCHvKLjxJzJfPReiU4x7VGbvS+SQslYqYiP4Fj Sj1A== X-Gm-Message-State: APjAAAVGoIT+ixidIhv1sZIrwkeudIcdrvW9/zNhCGsu/kmet4oyAS73 e+swyXVsOXvJRiQvTcVyDyWyTA== X-Google-Smtp-Source: APXvYqxge9Kli6Tkm37jRkffwrfApMiIAIyADeP3zfbiFntgtkNecMs1oczjX1laRC7Fj5g0J9uN3g== X-Received: by 2002:a1c:de09:: with SMTP id v9mr5593560wmg.170.1579278860757; Fri, 17 Jan 2020 08:34:20 -0800 (PST) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id u7sm10026459wmj.3.2020.01.17.08.34.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jan 2020 08:34:18 -0800 (PST) Date: Fri, 17 Jan 2020 17:34:17 +0100 From: Olivier Matz To: Honnappa Nagarahalli Cc: sthemmin@microsoft.com, jerinj@marvell.com, bruce.richardson@intel.com, david.marchand@redhat.com, pbhagavatula@marvell.com, konstantin.ananyev@intel.com, yipeng1.wang@intel.com, dev@dpdk.org, dharmik.thakkar@arm.com, ruifeng.wang@arm.com, gavin.hu@arm.com, nd@arm.com Message-ID: <20200117163417.GY22738@platinum> References: <20190906190510.11146-1-honnappa.nagarahalli@arm.com> <20200116052511.8557-1-honnappa.nagarahalli@arm.com> <20200116052511.8557-3-honnappa.nagarahalli@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200116052511.8557-3-honnappa.nagarahalli@arm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH v9 2/6] lib/ring: apis to support configurable element size X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Honnappa, On Wed, Jan 15, 2020 at 11:25:07PM -0600, Honnappa Nagarahalli wrote: > Current APIs assume ring elements to be pointers. However, in many > use cases, the size can be different. Add new APIs to support > configurable ring element sizes. > > Signed-off-by: Honnappa Nagarahalli > Reviewed-by: Dharmik Thakkar > Reviewed-by: Gavin Hu > Reviewed-by: Ruifeng Wang > --- > lib/librte_ring/Makefile | 3 +- > lib/librte_ring/meson.build | 4 + > lib/librte_ring/rte_ring.c | 41 +- > lib/librte_ring/rte_ring.h | 1 + > lib/librte_ring/rte_ring_elem.h | 1003 ++++++++++++++++++++++++++ > lib/librte_ring/rte_ring_version.map | 2 + > 6 files changed, 1045 insertions(+), 9 deletions(-) > create mode 100644 lib/librte_ring/rte_ring_elem.h > [...] > +static __rte_always_inline void > +enqueue_elems_32(struct rte_ring *r, const uint32_t size, uint32_t idx, > + const void *obj_table, uint32_t n) > +{ > + unsigned int i; > + uint32_t *ring = (uint32_t *)&r[1]; > + const uint32_t *obj = (const uint32_t *)obj_table; > + if (likely(idx + n < size)) { > + for (i = 0; i < (n & ~0x7); i += 8, idx += 8) { > + ring[idx] = obj[i]; > + ring[idx + 1] = obj[i + 1]; > + ring[idx + 2] = obj[i + 2]; > + ring[idx + 3] = obj[i + 3]; > + ring[idx + 4] = obj[i + 4]; > + ring[idx + 5] = obj[i + 5]; > + ring[idx + 6] = obj[i + 6]; > + ring[idx + 7] = obj[i + 7]; > + } > + switch (n & 0x7) { > + case 7: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 6: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 5: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 4: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 3: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 2: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 1: > + ring[idx++] = obj[i++]; /* fallthrough */ > + } > + } else { > + for (i = 0; idx < size; i++, idx++) > + ring[idx] = obj[i]; > + /* Start at the beginning */ > + for (idx = 0; i < n; i++, idx++) > + ring[idx] = obj[i]; > + } > +} > + > +static __rte_always_inline void > +enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, > + const void *obj_table, uint32_t n) > +{ > + unsigned int i; > + const uint32_t size = r->size; > + uint32_t idx = prod_head & r->mask; > + uint64_t *ring = (uint64_t *)&r[1]; > + const uint64_t *obj = (const uint64_t *)obj_table; > + if (likely(idx + n < size)) { > + for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { > + ring[idx] = obj[i]; > + ring[idx + 1] = obj[i + 1]; > + ring[idx + 2] = obj[i + 2]; > + ring[idx + 3] = obj[i + 3]; > + } > + switch (n & 0x3) { > + case 3: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 2: > + ring[idx++] = obj[i++]; /* fallthrough */ > + case 1: > + ring[idx++] = obj[i++]; > + } > + } else { > + for (i = 0; idx < size; i++, idx++) > + ring[idx] = obj[i]; > + /* Start at the beginning */ > + for (idx = 0; i < n; i++, idx++) > + ring[idx] = obj[i]; > + } > +} > + > +static __rte_always_inline void > +enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, > + const void *obj_table, uint32_t n) > +{ > + unsigned int i; > + const uint32_t size = r->size; > + uint32_t idx = prod_head & r->mask; > + rte_int128_t *ring = (rte_int128_t *)&r[1]; > + const rte_int128_t *obj = (const rte_int128_t *)obj_table; > + if (likely(idx + n < size)) { > + for (i = 0; i < (n & ~0x1); i += 2, idx += 2) > + memcpy((void *)(ring + idx), > + (const void *)(obj + i), 32); > + switch (n & 0x1) { > + case 1: > + memcpy((void *)(ring + idx), > + (const void *)(obj + i), 16); > + } > + } else { > + for (i = 0; idx < size; i++, idx++) > + memcpy((void *)(ring + idx), > + (const void *)(obj + i), 16); > + /* Start at the beginning */ > + for (idx = 0; i < n; i++, idx++) > + memcpy((void *)(ring + idx), > + (const void *)(obj + i), 16); > + } > +} > + > +/* the actual enqueue of elements on the ring. > + * Placed here since identical code needed in both > + * single and multi producer enqueue functions. > + */ > +static __rte_always_inline void > +enqueue_elems(struct rte_ring *r, uint32_t prod_head, const void *obj_table, > + uint32_t esize, uint32_t num) > +{ > + /* 8B and 16B copies implemented individually to retain > + * the current performance. > + */ > + if (esize == 8) > + enqueue_elems_64(r, prod_head, obj_table, num); > + else if (esize == 16) > + enqueue_elems_128(r, prod_head, obj_table, num); > + else { > + uint32_t idx, scale, nr_idx, nr_num, nr_size; > + > + /* Normalize to uint32_t */ > + scale = esize / sizeof(uint32_t); > + nr_num = num * scale; > + idx = prod_head & r->mask; > + nr_idx = idx * scale; > + nr_size = r->size * scale; > + enqueue_elems_32(r, nr_size, nr_idx, obj_table, nr_num); > + } > +} Following Konstatin's comment on v7, enqueue_elems_128() was modified to ensure it won't crash if the object is unaligned. Are we sure that this same problem cannot also occurs with 64b copies on all supported architectures? (I mean 64b access that is only aligned on 32b) Out of curiosity, would it make a big perf difference to only use enqueue_elems_32()?