From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C423D45B42; Tue, 15 Oct 2024 18:00:00 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7A3AA400D7; Tue, 15 Oct 2024 18:00:00 +0200 (CEST) Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by mails.dpdk.org (Postfix) with ESMTP id 930B5400D6 for ; Tue, 15 Oct 2024 17:59:58 +0200 (CEST) Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-20c767a9c50so45463755ad.1 for ; Tue, 15 Oct 2024 08:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1729007998; x=1729612798; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=vX/ZEQQ7BbjTKPHbMOloGWfoKAYr0RlfXlCm9NynYdc=; b=BJ6oOwhkRi1LDn2g9IwROJmAA1ePbnoSNhzpACLoBYrPhx1qxhby9CyFNaIctmEkwr Cphe4UVD0my76MihDUT2fo2+VWrojbSLg5F0boPoOxUS2YQjGPApbMnKzjEa+pXrvWch 1PSmo2Ef6Z3QTsDyTzNlNv1PCbr6qYChdcm7pfZm3xddfKf73KS9+/VSkPU/TuH9qLpG PcaZAXWVI8MsHp5UW+/6OPVUzpInmWEmoU4NM0iTBEQT+su5EsJkU3BWTNxSftdLXPC7 n63zxkPIXiQCCP4ZiK/90hwe7cb8VJLbDRg974Oc2Y6lAbnjzkA8dnl44vzsQ4H9S26z WTHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729007998; x=1729612798; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vX/ZEQQ7BbjTKPHbMOloGWfoKAYr0RlfXlCm9NynYdc=; b=UUQks+EaO0XcXFxnzpvlXY6oNBfKb7lwJ4UiOTHr0blQNNi6Z9UFgeJXL3pvb4Vo3E ijEihdT+uM7Li9NGKA105dKIJmQtkQA1gTRGgUXhk9dSLuExZWquYPtKUMmeRKbQ7j3D IdXEfxpsVqmMj0JhoCQtuK+lRRtOMLnRmCozPHlE+tBY/VlI/XGyGNSdxsQae9kh45p1 NlbZmIz0yttDIPVuimcIUb6TQy/d739OW+OEMgKqsvltyWHfZ0ElP6lIh1nQVCDHMji3 Oyz6R/PYg5gMj4ctiq9zQSo+1oyHoVae2wPNQ46v0PM0JOp0fjS308ILwP7TTWKybEw4 Kz7w== X-Gm-Message-State: AOJu0YztbfE9Exn7igGhVJzn4BlHwuFHtvBX68jMDaxJ5hR289rXxoTc fM/63xKFuhzgHTU2JPc2JodoNpi/eIoJgtb8Xpeld7cA/bPspd1oaHOpYV2IrcU= X-Google-Smtp-Source: AGHT+IFIgBXXutoWH8oHaEhp8JB6x9C/SsrdJm1PpsXB3vc9Wq+cyHtH3UR9HjL/e3rAKuhpgsErCQ== X-Received: by 2002:a17:903:41c1:b0:20c:6392:1a7b with SMTP id d9443c01a7336-20cbb18355dmr222402295ad.2.1729007997557; Tue, 15 Oct 2024 08:59:57 -0700 (PDT) Received: from hermes.local (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-20d17f9d2ffsm13756155ad.100.2024.10.15.08.59.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 15 Oct 2024 08:59:57 -0700 (PDT) Date: Tue, 15 Oct 2024 08:59:55 -0700 From: Stephen Hemminger To: Konstantin Ananyev Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, jerinj@marvell.com, hemant.agrawal@nxp.com, bruce.richardson@intel.com, drc@linux.vnet.ibm.com, ruifeng.wang@arm.com, mb@smartsharesystems.com, eimear.morrissey@huawei.com, Konstantin Ananyev Subject: Re: [PATCH v5 0/6] Stage-Ordered API and other extensions for ring library Message-ID: <20241015085955.16540ecb@hermes.local> In-Reply-To: <20241015130111.826-1-konstantin.v.ananyev@yandex.ru> References: <20240917120946.1212-1-konstantin.v.ananyev@yandex.ru> <20241015130111.826-1-konstantin.v.ananyev@yandex.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, 15 Oct 2024 14:01:05 +0100 Konstantin Ananyev wrote: > From: Konstantin Ananyev >=20 > NOTE UPFRONT: this version is still not ready for merging. > Missing items: > - ARM/PPC tests passing > - PG update >=20 > v4 -> v5 > - fix public API/doc comments from Jerin > - update devtools/build-dict.sh (Stephen) > - fix MSVC warnings > - introduce new test-suite for meson (stress) with > ring_stress_autotest and soring_stress_autotest in it > - enhance error report in tests > - reorder some sync code in soring and add extra checks > (for better debuggability) >=20 > v3 -> v4: > - fix compilation/doxygen complains (attempt #2) > - updated release notes >=20 > v2 -> v3: > - fix compilation/doxygen complains > - dropped patch: > "examples/l3fwd: make ACL work in pipeline and eventdev modes": [2] > As was mentioned in the patch desctiption it was way too big, > controversial and incomplete. If the community is ok to introduce > pipeline model into the l3fwd, then it is propbably worth to be > a separate patch series. >=20 > v1 -> v2: > - rename 'elmst/objst' to 'meta' (Morten) > - introduce new data-path APIs set: one with both meta{} and objs[], > second with just objs[] (Morten) > - split data-path APIs into burst/bulk flavours (same as rte_ring) > - added dump function for te_soring and improved dump() for rte_ring. > - dropped patch: > " ring: minimize reads of the counterpart cache-line" > - no performance gain observed > - actually it does change behavior of conventional rte_ring > enqueue/dequeue APIs - > it could return available/free less then actually exist in the ring. > As in some other libs we reliy on that information - it will > introduce problems. >=20 > The main aim of these series is to extend ring library with > new API that allows user to create/use Staged-Ordered-Ring (SORING) > abstraction. In addition to that there are few other patches that serve > different purposes: > - first two patches are just code reordering to de-duplicate > and generalize existing rte_ring code. > - patch #3 extends rte_ring_dump() to correctly print head/tail metadata > for different sync modes. > - next two patches introduce SORING API into the ring library and > provide UT for it. >=20 > SORING overview > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Staged-Ordered-Ring (SORING) provides a SW abstraction for 'ordered' queu= es > with multiple processing 'stages'. It is based on conventional DPDK > rte_ring, re-uses many of its concepts, and even substantial part of > its code. > It can be viewed as an 'extension' of rte_ring functionality. > In particular, main SORING properties: > - circular ring buffer with fixed size objects > - producer, consumer plus multiple processing stages in between. > - allows to split objects processing into multiple stages. > - objects remain in the same ring while moving from one stage to the othe= r, > initial order is preserved, no extra copying needed. > - preserves the ingress order of objects within the queue across multiple > stages > - each stage (and producer/consumer) can be served by single and/or > multiple threads. >=20 > - number of stages, size and number of objects in the ring are > configurable at ring initialization time. >=20 > Data-path API provides four main operations: > - enqueue/dequeue works in the same manner as for conventional rte_ring, > all rte_ring synchronization types are supported. > - acquire/release - for each stage there is an acquire (start) and > release (finish) operation. After some objects are 'acquired' - > given thread can safely assume that it has exclusive ownership of > these objects till it will invoke 'release' for them. > After 'release', objects can be 'acquired' by next stage and/or dequeued > by the consumer (in case of last stage). >=20 > Expected use-case: applications that uses pipeline model > (probably with multiple stages) for packet processing, when preserving > incoming packet order is important. >=20 > The concept of =E2=80=98ring with stages=E2=80=99 is similar to DPDK OPDL= eventdev PMD [1], > but the internals are different. > In particular, SORING maintains internal array of 'states' for each eleme= nt > in the ring that is shared by all threads/processes that access the ring. > That allows 'release' to avoid excessive waits on the tail value and helps > to improve performancei and scalability. > In terms of performance, with our measurements rte_soring and > conventional rte_ring provide nearly identical numbers. > As an example, on our SUT: Intel ICX CPU @ 2.00GHz, > l3fwd (--lookup=3Dacl) in pipeline mode [2] both > rte_ring and rte_soring reach ~20Mpps for single I/O lcore and same > number of worker lcores. >=20 > [1] https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/DPDK-China20= 17-Ma-OPDL.pdf > [2] https://patchwork.dpdk.org/project/dpdk/patch/20240906131348.804-7-ko= nstantin.v.ananyev@yandex.ru/ >=20 > Eimear Morrissey (1): > ring: make dump function more verbose >=20 > Konstantin Ananyev (5): > ring: common functions for 'move head' ops > ring: make copying functions generic > ring/soring: introduce Staged Ordered Ring > app/test: add unit tests for soring API > test: add stress test suite >=20 > .mailmap | 1 + > app/test/meson.build | 3 + > app/test/suites/meson.build | 10 + > app/test/test.h | 1 + > app/test/test_ring_stress.c | 2 +- > app/test/test_ring_stress_impl.h | 1 + > app/test/test_soring.c | 442 +++++++++++++ > app/test/test_soring_mt_stress.c | 40 ++ > app/test/test_soring_stress.c | 48 ++ > app/test/test_soring_stress.h | 35 ++ > app/test/test_soring_stress_impl.h | 834 +++++++++++++++++++++++++ > devtools/build-dict.sh | 1 + > doc/api/doxy-api-index.md | 1 + > doc/guides/rel_notes/release_24_11.rst | 8 + > lib/ring/meson.build | 4 +- > lib/ring/rte_ring.c | 87 ++- > lib/ring/rte_ring.h | 15 + > lib/ring/rte_ring_c11_pvt.h | 134 +--- > lib/ring/rte_ring_elem_pvt.h | 181 ++++-- > lib/ring/rte_ring_generic_pvt.h | 121 +--- > lib/ring/rte_ring_hts_elem_pvt.h | 85 +-- > lib/ring/rte_ring_rts_elem_pvt.h | 85 +-- > lib/ring/rte_soring.c | 182 ++++++ > lib/ring/rte_soring.h | 555 ++++++++++++++++ > lib/ring/soring.c | 561 +++++++++++++++++ > lib/ring/soring.h | 124 ++++ > lib/ring/version.map | 26 + > 27 files changed, 3190 insertions(+), 397 deletions(-) > create mode 100644 app/test/test_soring.c > create mode 100644 app/test/test_soring_mt_stress.c > create mode 100644 app/test/test_soring_stress.c > create mode 100644 app/test/test_soring_stress.h > create mode 100644 app/test/test_soring_stress_impl.h > create mode 100644 lib/ring/rte_soring.c > create mode 100644 lib/ring/rte_soring.h > create mode 100644 lib/ring/soring.c > create mode 100644 lib/ring/soring.h >=20 And some build failures ###########################################################################= ######### #### [Begin job log] "ubuntu-22.04-gcc-mini" at step Build and test ###########################################################################= ######### ../lib/eal/include/rte_bitops.h:1481:9: note: in expansion of macro =E2=80= =98__RTE_BIT_OVERLOAD_SZ_4R=E2=80=99 1481 | __RTE_BIT_OVERLOAD_SZ_4R(family, fun, qualifier, 64, ret_ty= pe, arg1_type, arg1_name, \ | ^~~~~~~~~~~~~~~~~~~~~~~~ ../lib/eal/include/rte_bitops.h:1497:1: note: in expansion of macro =E2=80= =98__RTE_BIT_OVERLOAD_4R=E2=80=99 1497 | __RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int= , nr, bool, value, | ^~~~~~~~~~~~~~~~~~~~~ ../lib/eal/include/rte_bitops.h:1463:1: note: previous declaration =E2=80= =98bool rte_bit_atomic_test_and_assign(uint32_t*, unsigned int, bool, int)= =E2=80=99 1463 | rte_bit_ ## family ## fun(qualifier uint ## size ## _t *addr, arg1_= type arg1_name, \ | ^~~~~~~~ ../lib/eal/include/rte_bitops.h:1472:9: note: in expansion of macro =E2=80= =98__RTE_BIT_OVERLOAD_V_4R=E2=80=99 1472 | __RTE_BIT_OVERLOAD_V_4R(family,, fun, qualifier, size, ret_= type, arg1_type, arg1_name, \ | ^~~~~~~~~~~~~~~~~~~~~~~ ../lib/eal/include/rte_bitops.h:1479:9: note: in expansion of macro =E2=80= =98__RTE_BIT_OVERLOAD_SZ_4R=E2=80=99 1479 | __RTE_BIT_OVERLOAD_SZ_4R(family, fun, qualifier, 32, ret_ty= pe, arg1_type, arg1_name, \ | ^~~~~~~~~~~~~~~~~~~~~~~~ ../lib/eal/include/rte_bitops.h:1497:1: note: in expansion of macro =E2=80= =98__RTE_BIT_OVERLOAD_4R=E2=80=99 1497 | __RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int= , nr, bool, value, | ^~~~~~~~~~~~~~~~~~~~~ [847/912] Compiling C++ object 'buildtools/chkincs/fe389a9@@chkincs-cpp at = exe/meson-generated_rte_mbuf_dyn.cpp.o'. [848/912] Compiling C++ object 'buildtools/chkincs/fe389a9@@chkincs-cpp at = exe/meson-generated_rte_mempool.cpp.o'. [849/912] Compiling C++ object 'buildtools/chkincs/fe389a9@@chkincs-cpp at = exe/meson-generated_rte_mempool_trace_fp.cpp.o'. [850/912] Compiling C++ object 'buildtools/chkincs/fe389a9@@chkincs-cpp at = exe/meson-generated_rte_mbuf.cpp.o'. [851/912] Compiling C object 'app/a172ced@@dpdk-test at exe/test_test_memcp= y_perf.c.o'. ninja: build stopped: subcommand failed. ##[error]Process completed with exit code 1. ###########################################################################= ######### #### [End job log] "ubuntu-22.04-gcc-mini" at step Build and test ###########################################################################= #########