DPDK patches and discussions
 help / color / mirror / Atom feed
* [DPDK/core Bug 1713] dispatcher_autotest hangs on build with disabled RTE_USE_C11_MEM_MODEL
@ 2025-05-27 13:02 bugzilla
  0 siblings, 0 replies; only message in thread
From: bugzilla @ 2025-05-27 13:02 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 4539 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1713

            Bug ID: 1713
           Summary: dispatcher_autotest hangs on build with disabled
                    RTE_USE_C11_MEM_MODEL
           Product: DPDK
           Version: 25.03
          Hardware: ARM
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: core
          Assignee: dev@dpdk.org
          Reporter: konstantin.v.ananyev@yandex.ru
  Target Milestone: ---

CPU: Ampere(R) Altra(R) Max

steps to reproduce:
1)build dpdk with RTE_USE_C11_MEM_MODEL undefined, i.e.:
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -33,7 +33,7 @@ implementer_generic = {
     'description': 'Generic armv8',
     'flags': [
         ['RTE_MACHINE', '"armv8a"'],
-        ['RTE_USE_C11_MEM_MODEL', true],
+        ['RTE_USE_C11_MEM_MODEL', false],
         ['RTE_MAX_LCORE', 256],
         ['RTE_MAX_NUMA_NODES', 4]
     ],

$CFLAGS='-g -O1 -DRTE_SORING_DEBUG -DRTE_ENABLE_ASSERT' CC=gcc-13 meson
--prefix=${PWD}/aarch64-default-linuxapp-gcc13-rg-dbg-install --werror
-Dbuildtype=release -Dmachine=default -Ddisable_drivers=net/cnxk,net/xsc
-Dexamples=all aarch64-default-linuxapp-gcc13-rg-dbg

$grep C11_MEM aarch64-default-linuxapp-gcc13-rg-dbg/rte_build_config.h
#undef RTE_USE_C11_MEM_MODEL

$ninja -v -j 24 -C aarch64-default-linuxapp-gcc13-rg-dbg

2) run dispatcher_autotest:
$DPDK_TEST=dispatcher_autotest
./aarch64-default-linuxapp-gcc13-rg-dbg/app/dpdk-test --no-huge -m 2048
--no-pci --lcores=0-3

the test will get hanged forever at the very first test.

Looking with gdb, we get stuck at:
...
#11 test_basic () at ../app/test/test_dispatcher.c:848
#12 0x000000000050d638 in unit_test_suite_runner (
    suite=suite@entry=0x1a23370 <test_suite>) at ../app/test/test.c:364
#13 0x0000000000571fa8 in test_dispatcher ()
    at ../app/test/test_dispatcher.c:1053
...
i.e. we are waiting for all events to complete, but it never happens,
though there is huge numbers of errors:
(gdb) print test_app->completed_events
$1 = 71402
(gdb) print test_app->errors
$2 = 767639

Debugging it a bit more, errors counters are incremented at:
./app/test/test_dispatcher.c:377
test_app_process_queue(uint8_t p_event_dev_id, uint8_t p_event_port_id,
        struct rte_event *in_events, uint16_t num,
        void *cb_data)
{
    ...

    for (i = 0; i < num; i++) {
                const struct rte_event *in_event = &in_events[i];
                struct rte_event *out_event = &out_events[i];
                uint64_t sn = in_event->u64;
                uint64_t expected_sn;

                if (in_event->queue_id != app_queue->queue_id) {
                        test_app_queue_note_error(app);
                        return;
                }

                expected_sn = app_queue->sn[in_event->flow_id]++;

                if (expected_sn != sn) {
                        test_app_queue_note_error(app);
                        return;


As I can read the code - that means that dequeued events don't pass sanity
check.
Note that with RTE_USE_C11_MEM_MODEL undefined we use different (legacy)
implementation of __rte_ring_headtail_move_head().
Also note that we do use MP/SC rte_ring for events. 

In that implementation we do:

load(&d->head);
rmb();
load(&s->tail);
...
if (st)
     d->head = ...;
else
   rte_atomic32_cmpset(&d->head, ...);

While for MT case cmpset will generate a CASAL instruction - CAS() with
acquire-release mem-order, for ST case we don't have any memory-ordering
primitives.
So, as I understand, what happens here:
reading elements from the ring sometimes get re-ordered with reading prod.tail
value, i.e. we reading elements that are not yet completely written into the
ring.
What is really strange that we never hit such issue before...

Anyway, the proposed fix, is to put rmb() after load(&s->tail); for ST case:
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -106,6 +106,7 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,

                *new_head = *old_head + n;
                if (is_st) {
+                       rte_smp_rmb();
                        d->head = *new_head;
                        success = 1;
                } else

With it in place, dispatcher_autotest completes successfully on that machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 6534 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-05-27 13:02 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-27 13:02 [DPDK/core Bug 1713] dispatcher_autotest hangs on build with disabled RTE_USE_C11_MEM_MODEL bugzilla

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).