DPDK patches and discussions
 help / color / mirror / Atom feed
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [DPDK/core Bug 1713] dispatcher_autotest hangs on build with disabled RTE_USE_C11_MEM_MODEL
Date: Tue, 27 May 2025 13:02:00 +0000	[thread overview]
Message-ID: <bug-1713-3@http.bugs.dpdk.org/> (raw)

[-- Attachment #1: Type: text/plain, Size: 4539 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1713

            Bug ID: 1713
           Summary: dispatcher_autotest hangs on build with disabled
                    RTE_USE_C11_MEM_MODEL
           Product: DPDK
           Version: 25.03
          Hardware: ARM
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: core
          Assignee: dev@dpdk.org
          Reporter: konstantin.v.ananyev@yandex.ru
  Target Milestone: ---

CPU: Ampere(R) Altra(R) Max

steps to reproduce:
1)build dpdk with RTE_USE_C11_MEM_MODEL undefined, i.e.:
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -33,7 +33,7 @@ implementer_generic = {
     'description': 'Generic armv8',
     'flags': [
         ['RTE_MACHINE', '"armv8a"'],
-        ['RTE_USE_C11_MEM_MODEL', true],
+        ['RTE_USE_C11_MEM_MODEL', false],
         ['RTE_MAX_LCORE', 256],
         ['RTE_MAX_NUMA_NODES', 4]
     ],

$CFLAGS='-g -O1 -DRTE_SORING_DEBUG -DRTE_ENABLE_ASSERT' CC=gcc-13 meson
--prefix=${PWD}/aarch64-default-linuxapp-gcc13-rg-dbg-install --werror
-Dbuildtype=release -Dmachine=default -Ddisable_drivers=net/cnxk,net/xsc
-Dexamples=all aarch64-default-linuxapp-gcc13-rg-dbg

$grep C11_MEM aarch64-default-linuxapp-gcc13-rg-dbg/rte_build_config.h
#undef RTE_USE_C11_MEM_MODEL

$ninja -v -j 24 -C aarch64-default-linuxapp-gcc13-rg-dbg

2) run dispatcher_autotest:
$DPDK_TEST=dispatcher_autotest
./aarch64-default-linuxapp-gcc13-rg-dbg/app/dpdk-test --no-huge -m 2048
--no-pci --lcores=0-3

the test will get hanged forever at the very first test.

Looking with gdb, we get stuck at:
...
#11 test_basic () at ../app/test/test_dispatcher.c:848
#12 0x000000000050d638 in unit_test_suite_runner (
    suite=suite@entry=0x1a23370 <test_suite>) at ../app/test/test.c:364
#13 0x0000000000571fa8 in test_dispatcher ()
    at ../app/test/test_dispatcher.c:1053
...
i.e. we are waiting for all events to complete, but it never happens,
though there is huge numbers of errors:
(gdb) print test_app->completed_events
$1 = 71402
(gdb) print test_app->errors
$2 = 767639

Debugging it a bit more, errors counters are incremented at:
./app/test/test_dispatcher.c:377
test_app_process_queue(uint8_t p_event_dev_id, uint8_t p_event_port_id,
        struct rte_event *in_events, uint16_t num,
        void *cb_data)
{
    ...

    for (i = 0; i < num; i++) {
                const struct rte_event *in_event = &in_events[i];
                struct rte_event *out_event = &out_events[i];
                uint64_t sn = in_event->u64;
                uint64_t expected_sn;

                if (in_event->queue_id != app_queue->queue_id) {
                        test_app_queue_note_error(app);
                        return;
                }

                expected_sn = app_queue->sn[in_event->flow_id]++;

                if (expected_sn != sn) {
                        test_app_queue_note_error(app);
                        return;


As I can read the code - that means that dequeued events don't pass sanity
check.
Note that with RTE_USE_C11_MEM_MODEL undefined we use different (legacy)
implementation of __rte_ring_headtail_move_head().
Also note that we do use MP/SC rte_ring for events. 

In that implementation we do:

load(&d->head);
rmb();
load(&s->tail);
...
if (st)
     d->head = ...;
else
   rte_atomic32_cmpset(&d->head, ...);

While for MT case cmpset will generate a CASAL instruction - CAS() with
acquire-release mem-order, for ST case we don't have any memory-ordering
primitives.
So, as I understand, what happens here:
reading elements from the ring sometimes get re-ordered with reading prod.tail
value, i.e. we reading elements that are not yet completely written into the
ring.
What is really strange that we never hit such issue before...

Anyway, the proposed fix, is to put rmb() after load(&s->tail); for ST case:
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -106,6 +106,7 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,

                *new_head = *old_head + n;
                if (is_st) {
+                       rte_smp_rmb();
                        d->head = *new_head;
                        success = 1;
                } else

With it in place, dispatcher_autotest completes successfully on that machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 6534 bytes --]

                 reply	other threads:[~2025-05-27 13:02 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-1713-3@http.bugs.dpdk.org/ \
    --to=bugzilla@dpdk.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).