From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C238A46804; Tue, 27 May 2025 15:02:07 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 396E2402D6; Tue, 27 May 2025 15:02:02 +0200 (CEST) Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178]) by mails.dpdk.org (Postfix) with ESMTP id D2F47402AE for ; Tue, 27 May 2025 15:02:00 +0200 (CEST) Received: by inbox.dpdk.org (Postfix, from userid 33) id CB46046806; Tue, 27 May 2025 15:02:00 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Subject: [DPDK/core Bug 1713] dispatcher_autotest hangs on build with disabled RTE_USE_C11_MEM_MODEL Date: Tue, 27 May 2025 13:02:00 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: core X-Bugzilla-Version: 25.03 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: konstantin.v.ananyev@yandex.ru X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: multipart/alternative; boundary=17483509200.Fb6Fcdf.2313538 Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --17483509200.Fb6Fcdf.2313538 Date: Tue, 27 May 2025 15:02:00 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All https://bugs.dpdk.org/show_bug.cgi?id=3D1713 Bug ID: 1713 Summary: dispatcher_autotest hangs on build with disabled RTE_USE_C11_MEM_MODEL Product: DPDK Version: 25.03 Hardware: ARM OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: konstantin.v.ananyev@yandex.ru Target Milestone: --- CPU: Ampere(R) Altra(R) Max steps to reproduce: 1)build dpdk with RTE_USE_C11_MEM_MODEL undefined, i.e.: --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -33,7 +33,7 @@ implementer_generic =3D { 'description': 'Generic armv8', 'flags': [ ['RTE_MACHINE', '"armv8a"'], - ['RTE_USE_C11_MEM_MODEL', true], + ['RTE_USE_C11_MEM_MODEL', false], ['RTE_MAX_LCORE', 256], ['RTE_MAX_NUMA_NODES', 4] ], $CFLAGS=3D'-g -O1 -DRTE_SORING_DEBUG -DRTE_ENABLE_ASSERT' CC=3Dgcc-13 meson --prefix=3D${PWD}/aarch64-default-linuxapp-gcc13-rg-dbg-install --werror -Dbuildtype=3Drelease -Dmachine=3Ddefault -Ddisable_drivers=3Dnet/cnxk,net/= xsc -Dexamples=3Dall aarch64-default-linuxapp-gcc13-rg-dbg $grep C11_MEM aarch64-default-linuxapp-gcc13-rg-dbg/rte_build_config.h #undef RTE_USE_C11_MEM_MODEL $ninja -v -j 24 -C aarch64-default-linuxapp-gcc13-rg-dbg 2) run dispatcher_autotest: $DPDK_TEST=3Ddispatcher_autotest ./aarch64-default-linuxapp-gcc13-rg-dbg/app/dpdk-test --no-huge -m 2048 --no-pci --lcores=3D0-3 the test will get hanged forever at the very first test. Looking with gdb, we get stuck at: ... #11 test_basic () at ../app/test/test_dispatcher.c:848 #12 0x000000000050d638 in unit_test_suite_runner ( suite=3Dsuite@entry=3D0x1a23370 ) at ../app/test/test.c:364 #13 0x0000000000571fa8 in test_dispatcher () at ../app/test/test_dispatcher.c:1053 ... i.e. we are waiting for all events to complete, but it never happens, though there is huge numbers of errors: (gdb) print test_app->completed_events $1 =3D 71402 (gdb) print test_app->errors $2 =3D 767639 Debugging it a bit more, errors counters are incremented at: ./app/test/test_dispatcher.c:377 test_app_process_queue(uint8_t p_event_dev_id, uint8_t p_event_port_id, struct rte_event *in_events, uint16_t num, void *cb_data) { ... for (i =3D 0; i < num; i++) { const struct rte_event *in_event =3D &in_events[i]; struct rte_event *out_event =3D &out_events[i]; uint64_t sn =3D in_event->u64; uint64_t expected_sn; if (in_event->queue_id !=3D app_queue->queue_id) { test_app_queue_note_error(app); return; } expected_sn =3D app_queue->sn[in_event->flow_id]++; if (expected_sn !=3D sn) { test_app_queue_note_error(app); return; As I can read the code - that means that dequeued events don't pass sanity check. Note that with RTE_USE_C11_MEM_MODEL undefined we use different (legacy) implementation of __rte_ring_headtail_move_head(). Also note that we do use MP/SC rte_ring for events.=20 In that implementation we do: load(&d->head); rmb(); load(&s->tail); ... if (st) d->head =3D ...; else rte_atomic32_cmpset(&d->head, ...); While for MT case cmpset will generate a CASAL instruction - CAS() with acquire-release mem-order, for ST case we don't have any memory-ordering primitives. So, as I understand, what happens here: reading elements from the ring sometimes get re-ordered with reading prod.t= ail value, i.e. we reading elements that are not yet completely written into the ring. What is really strange that we never hit such issue before... Anyway, the proposed fix, is to put rmb() after load(&s->tail); for ST case: --- a/lib/ring/rte_ring_generic_pvt.h +++ b/lib/ring/rte_ring_generic_pvt.h @@ -106,6 +106,7 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail = *d, *new_head =3D *old_head + n; if (is_st) { + rte_smp_rmb(); d->head =3D *new_head; success =3D 1; } else With it in place, dispatcher_autotest completes successfully on that machin= e. --=20 You are receiving this mail because: You are the assignee for the bug.= --17483509200.Fb6Fcdf.2313538 Date: Tue, 27 May 2025 15:02:00 +0200 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All
Bug ID 1713
Summary dispatcher_autotest hangs on build with disabled RTE_USE_C11_= MEM_MODEL
Product DPDK
Version 25.03
Hardware ARM
OS All
Status UNCONFIRMED
Severity normal
Priority Normal
Component core
Assignee dev@dpdk.org
Reporter konstantin.v.ananyev@yandex.ru
Target Milestone ---

CPU: Ampere(R) Altra(R) Max

steps to reproduce:
1)build dpdk with RTE_USE_C11_MEM_MODEL undefined, i.e.:
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -33,7 +33,7 @@ implementer_generic =3D {
     'description': 'Generic armv8',
     'flags': [
         ['RTE_MACHINE', '"armv8a"'],
-        ['RTE_USE_C11_MEM_MODEL', true],
+        ['RTE_USE_C11_MEM_MODEL', false],
         ['RTE_MAX_LCORE', 256],
         ['RTE_MAX_NUMA_NODES', 4]
     ],

$CFLAGS=3D'-g -O1 -DRTE_SORING_DEBUG -DRTE_ENABLE_ASSERT' CC=3Dgcc-13 meson
--prefix=3D${PWD}/aarch64-default-linuxapp-gcc13-rg-dbg-install --werror
-Dbuildtype=3Drelease -Dmachine=3Ddefault -Ddisable_drivers=3Dnet/cnxk,net/=
xsc
-Dexamples=3Dall aarch64-default-linuxapp-gcc13-rg-dbg

$grep C11_MEM aarch64-default-linuxapp-gcc13-rg-dbg/rte_build_config.h
#undef RTE_USE_C11_MEM_MODEL

$ninja -v -j 24 -C aarch64-default-linuxapp-gcc13-rg-dbg

2) run dispatcher_autotest:
$DPDK_TEST=3Ddispatcher_autotest
./aarch64-default-linuxapp-gcc13-rg-dbg/app/dpdk-test --no-huge -m 2048
--no-pci --lcores=3D0-3

the test will get hanged forever at the very first test.

Looking with gdb, we get stuck at:
...
#11 test_basic () at ../app/test/test_dispatcher.c:848
#12 0x000000000050d638 in unit_test_suite_runner (
    suite=3Dsuite@entry=3D0x1a23370 <test_suite>) at ../app/test/=
test.c:364
#13 0x0000000000571fa8 in test_dispatcher ()
    at ../app/test/test_dispatcher.c:1053
...
i.e. we are waiting for all events to complete, but it never happens,
though there is huge numbers of errors:
(gdb) print test_app->completed_events
$1 =3D 71402
(gdb) print test_app->errors
$2 =3D 767639

Debugging it a bit more, errors counters are incremented at:
./app/test/test_dispatcher.c:377
test_app_process_queue(uint8_t p_event_dev_id, uint8_t p_event_port_id,
        struct rte_event *in_events, uint16_t num,
        void *cb_data)
{
    ...

    for (i =3D 0; i < num; i++) {
                const struct rte_event *in_event =3D &in_events[i];
                struct rte_event *out_event =3D &out_events[i];
                uint64_t sn =3D in_event->u64;
                uint64_t expected_sn;

                if (in_event->queue_id !=3D app_queue->queue_id) {
                        test_app_queue_note_error(app);
                        return;
                }

                expected_sn =3D app_queue->sn[in_event->flow_id]++;

                if (expected_sn !=3D sn) {
                        test_app_queue_note_error(app);
                        return;


As I can read the code - that means that dequeued events don't pass sanity
check.
Note that with RTE_USE_C11_MEM_MODEL undefined we use different (legacy)
implementation of __rte_ring_headtail_move_head().
Also note that we do use MP/SC rte_ring for events.=20

In that implementation we do:

load(&d->head);
rmb();
load(&s->tail);
...
if (st)
     d->head =3D ...;
else
   rte_atomic32_cmpset(&d->head, ...);

While for MT case cmpset will generate a CASAL instruction - CAS() with
acquire-release mem-order, for ST case we don't have any memory-ordering
primitives.
So, as I understand, what happens here:
reading elements from the ring sometimes get re-ordered with reading prod.t=
ail
value, i.e. we reading elements that are not yet completely written into the
ring.
What is really strange that we never hit such issue before...

Anyway, the proposed fix, is to put rmb() after load(&s->tail); for =
ST case:
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -106,6 +106,7 @@ __rte_ring_headtail_move_head(struct rt=
e_ring_headtail *d,

                *new_head =3D *old_head + n;
                if (is_st) {
+                       rte_smp_rmb();
                        d->head =3D *new_head;
                        success =3D 1;
                } else

With it in place, dispatcher_autotest completes successfully on that machin=
e.
          


You are receiving this mail because:
  • You are the assignee for the bug.
=20=20=20=20=20=20=20=20=20=20
= --17483509200.Fb6Fcdf.2313538--