From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 40D9C45CA6; Thu, 7 Nov 2024 18:34:41 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id ABCCC43297; Thu, 7 Nov 2024 18:34:36 +0100 (CET) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 535CF4327C for ; Thu, 7 Nov 2024 18:34:35 +0100 (CET) Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Xkq0Q1WZVz6K9Cb; Fri, 8 Nov 2024 01:32:54 +0800 (CST) Received: from frapeml500007.china.huawei.com (unknown [7.182.85.172]) by mail.maildlp.com (Postfix) with ESMTPS id EE736140453; Fri, 8 Nov 2024 01:34:34 +0800 (CST) Received: from localhost.localdomain (10.220.239.45) by frapeml500007.china.huawei.com (7.182.85.172) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 7 Nov 2024 18:34:34 +0100 From: Konstantin Ananyev To: CC: , , , , , , , , Subject: [PATCH v8 3/7] ring: make copying functions generic Date: Thu, 7 Nov 2024 13:24:25 -0500 Message-ID: <20241107182429.60406-4-konstantin.ananyev@huawei.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20241107182429.60406-1-konstantin.ananyev@huawei.com> References: <20241030212304.104180-1-konstantin.ananyev@huawei.com> <20241107182429.60406-1-konstantin.ananyev@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.220.239.45] X-ClientProxiedBy: frapeml500006.china.huawei.com (7.182.85.219) To frapeml500007.china.huawei.com (7.182.85.172) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Note upfront: that change doesn't introduce any functional or performance changes. It is just a code-reordering for: - improve code modularity and re-usability - ability in future to re-use the same code to introduce new functionality There is no real need for enqueue_elems()/dequeue_elems() to get pointer to actual rte_ring structure, instead it is enough to pass a pointer to actual elements buffer inside the ring. In return, we'll get a copying functions that could be used for other queueing abstractions that do have circular ring buffer inside. Signed-off-by: Konstantin Ananyev Acked-by: Morten Brørup Acked-by: Stephen Hemminger --- lib/ring/rte_ring_elem_pvt.h | 115 ++++++++++++++++++++--------------- 1 file changed, 67 insertions(+), 48 deletions(-) diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h index 3a83668a08..6eafae121f 100644 --- a/lib/ring/rte_ring_elem_pvt.h +++ b/lib/ring/rte_ring_elem_pvt.h @@ -17,12 +17,14 @@ #endif static __rte_always_inline void -__rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size, - uint32_t idx, const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_32(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - uint32_t *ring = (uint32_t *)&r[1]; + + uint32_t *ring = (uint32_t *)ring_table; const uint32_t *obj = (const uint32_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x7); i += 8, idx += 8) { ring[idx] = obj[i]; @@ -60,14 +62,14 @@ __rte_ring_enqueue_elems_32(struct rte_ring *r, const uint32_t size, } static __rte_always_inline void -__rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_64(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = prod_head & r->mask; - uint64_t *ring = (uint64_t *)&r[1]; + + uint64_t *ring = (uint64_t *)ring_table; const unaligned_uint64_t *obj = (const unaligned_uint64_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { ring[idx] = obj[i]; @@ -93,14 +95,14 @@ __rte_ring_enqueue_elems_64(struct rte_ring *r, uint32_t prod_head, } static __rte_always_inline void -__rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t n) +__rte_ring_enqueue_elems_128(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = prod_head & r->mask; - rte_int128_t *ring = (rte_int128_t *)&r[1]; + + rte_int128_t *ring = (rte_int128_t *)ring_table; const rte_int128_t *obj = (const rte_int128_t *)obj_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x1); i += 2, idx += 2) memcpy((void *)(ring + idx), @@ -126,37 +128,47 @@ __rte_ring_enqueue_elems_128(struct rte_ring *r, uint32_t prod_head, * single and multi producer enqueue functions. */ static __rte_always_inline void -__rte_ring_enqueue_elems(struct rte_ring *r, uint32_t prod_head, - const void *obj_table, uint32_t esize, uint32_t num) +__rte_ring_do_enqueue_elems(void *ring_table, const void *obj_table, + uint32_t size, uint32_t idx, uint32_t esize, uint32_t num) { /* 8B and 16B copies implemented individually to retain * the current performance. */ if (esize == 8) - __rte_ring_enqueue_elems_64(r, prod_head, obj_table, num); + __rte_ring_enqueue_elems_64(ring_table, obj_table, size, + idx, num); else if (esize == 16) - __rte_ring_enqueue_elems_128(r, prod_head, obj_table, num); + __rte_ring_enqueue_elems_128(ring_table, obj_table, size, + idx, num); else { - uint32_t idx, scale, nr_idx, nr_num, nr_size; + uint32_t scale, nr_idx, nr_num, nr_size; /* Normalize to uint32_t */ scale = esize / sizeof(uint32_t); nr_num = num * scale; - idx = prod_head & r->mask; nr_idx = idx * scale; - nr_size = r->size * scale; - __rte_ring_enqueue_elems_32(r, nr_size, nr_idx, - obj_table, nr_num); + nr_size = size * scale; + __rte_ring_enqueue_elems_32(ring_table, obj_table, nr_size, + nr_idx, nr_num); } } static __rte_always_inline void -__rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size, - uint32_t idx, void *obj_table, uint32_t n) +__rte_ring_enqueue_elems(struct rte_ring *r, uint32_t prod_head, + const void *obj_table, uint32_t esize, uint32_t num) +{ + __rte_ring_do_enqueue_elems(&r[1], obj_table, r->size, + prod_head & r->mask, esize, num); +} + +static __rte_always_inline void +__rte_ring_dequeue_elems_32(void *obj_table, const void *ring_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - uint32_t *ring = (uint32_t *)&r[1]; uint32_t *obj = (uint32_t *)obj_table; + const uint32_t *ring = (const uint32_t *)ring_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x7); i += 8, idx += 8) { obj[i] = ring[idx]; @@ -194,14 +206,13 @@ __rte_ring_dequeue_elems_32(struct rte_ring *r, const uint32_t size, } static __rte_always_inline void -__rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t cons_head, - void *obj_table, uint32_t n) +__rte_ring_dequeue_elems_64(void *obj_table, const void *ring_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = cons_head & r->mask; - uint64_t *ring = (uint64_t *)&r[1]; unaligned_uint64_t *obj = (unaligned_uint64_t *)obj_table; + const uint64_t *ring = (const uint64_t *)ring_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { obj[i] = ring[idx]; @@ -227,27 +238,26 @@ __rte_ring_dequeue_elems_64(struct rte_ring *r, uint32_t cons_head, } static __rte_always_inline void -__rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t cons_head, - void *obj_table, uint32_t n) +__rte_ring_dequeue_elems_128(void *obj_table, const void *ring_table, + uint32_t size, uint32_t idx, uint32_t n) { unsigned int i; - const uint32_t size = r->size; - uint32_t idx = cons_head & r->mask; - rte_int128_t *ring = (rte_int128_t *)&r[1]; rte_int128_t *obj = (rte_int128_t *)obj_table; + const rte_int128_t *ring = (const rte_int128_t *)ring_table; + if (likely(idx + n <= size)) { for (i = 0; i < (n & ~0x1); i += 2, idx += 2) - memcpy((void *)(obj + i), (void *)(ring + idx), 32); + memcpy((obj + i), (const void *)(ring + idx), 32); switch (n & 0x1) { case 1: - memcpy((void *)(obj + i), (void *)(ring + idx), 16); + memcpy((obj + i), (const void *)(ring + idx), 16); } } else { for (i = 0; idx < size; i++, idx++) - memcpy((void *)(obj + i), (void *)(ring + idx), 16); + memcpy((obj + i), (const void *)(ring + idx), 16); /* Start at the beginning */ for (idx = 0; i < n; i++, idx++) - memcpy((void *)(obj + i), (void *)(ring + idx), 16); + memcpy((obj + i), (const void *)(ring + idx), 16); } } @@ -256,30 +266,39 @@ __rte_ring_dequeue_elems_128(struct rte_ring *r, uint32_t cons_head, * single and multi producer enqueue functions. */ static __rte_always_inline void -__rte_ring_dequeue_elems(struct rte_ring *r, uint32_t cons_head, - void *obj_table, uint32_t esize, uint32_t num) +__rte_ring_do_dequeue_elems(void *obj_table, const void *ring_table, + uint32_t size, uint32_t idx, uint32_t esize, uint32_t num) { /* 8B and 16B copies implemented individually to retain * the current performance. */ if (esize == 8) - __rte_ring_dequeue_elems_64(r, cons_head, obj_table, num); + __rte_ring_dequeue_elems_64(obj_table, ring_table, size, + idx, num); else if (esize == 16) - __rte_ring_dequeue_elems_128(r, cons_head, obj_table, num); + __rte_ring_dequeue_elems_128(obj_table, ring_table, size, + idx, num); else { - uint32_t idx, scale, nr_idx, nr_num, nr_size; + uint32_t scale, nr_idx, nr_num, nr_size; /* Normalize to uint32_t */ scale = esize / sizeof(uint32_t); nr_num = num * scale; - idx = cons_head & r->mask; nr_idx = idx * scale; - nr_size = r->size * scale; - __rte_ring_dequeue_elems_32(r, nr_size, nr_idx, - obj_table, nr_num); + nr_size = size * scale; + __rte_ring_dequeue_elems_32(obj_table, ring_table, nr_size, + nr_idx, nr_num); } } +static __rte_always_inline void +__rte_ring_dequeue_elems(struct rte_ring *r, uint32_t cons_head, + void *obj_table, uint32_t esize, uint32_t num) +{ + __rte_ring_do_dequeue_elems(obj_table, &r[1], r->size, + cons_head & r->mask, esize, num); +} + /* Between load and load. there might be cpu reorder in weak model * (powerpc/arm). * There are 2 choices for the users -- 2.35.3