From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8CB3945B82; Wed, 30 Oct 2024 21:31:27 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 65B9C432CA; Wed, 30 Oct 2024 21:31:27 +0100 (CET) Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by mails.dpdk.org (Postfix) with ESMTP id 42AA442F2B for ; Wed, 30 Oct 2024 21:31:26 +0100 (CET) Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-4315baec69eso1814075e9.2 for ; Wed, 30 Oct 2024 13:31:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730320285; x=1730925085; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=om457Hri1mvE/M6oqhYiBOPN9uhOyXZSwW0EDRWWIeA=; b=m65dJdq7tU8c4E3m2VRAwF4V//0jNkrxJgia3QgJ/WR28FTjskajsIR8eoc7/UeVs8 CuaNx1gtlx6kmU+a0hC/NAtG0/H+pdQxsn83gn23v+V5AN2L4tSPFhIYygiEmDa0pUU7 Yfqu4YZuO7lhPsMJh75HY+Q0jMQfZcOoIoqgHvfjHe6SXYoEBdHqElvZJMydhqgCeE3I ukWs95QHx8NbDogYERaHs43CzP+9MKfsAhP9iNI0MuwM93DlVuVqxafJepU3OtJhxONU GqscqtWh30BmH+lDAbdqopbapxdMBpO7G5LcIPRrsnZNzRhBroJyXeHDFq510BqAmK40 xf3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730320285; x=1730925085; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=om457Hri1mvE/M6oqhYiBOPN9uhOyXZSwW0EDRWWIeA=; b=U18fIXWkSdKNOoaRXDQggFq8YOFrFFRqZdz43y3k6KgUjL0EjEgzZBNZmb/vGx8LWl OZhIu1u4KVtFApbLMNALziktdPHPlKq1MKJYYHpjA9PdYHsJ4wvyGzM6Gb7pvoHNPQJk vYYwCs4bGF4e5oV9BObLB+WW9ELfHgMSPWbttrAO+Q0HVNDUoh7THqImR8S+EVa4KkKk zoT1QULxKgsH/3HUSSIyzFFBXkHDsMvL9ORAILyHM7+Kl7n0hsD0ySIETGD79yqpqLzs i/3OclmV1xDV8+8FrEfQUbiUgJEkqxkSt9Y/Yp8aVLjwKbp6qtXXh2ArZTC/7YKK+IYW CwFQ== X-Gm-Message-State: AOJu0Yy4GYVklnGiY03PJeNPRFaqThZW/RiXobnNtaYbV+MLrWBnzwO/ hdObsr1YYIM0BJRSM+QCWp60OzoMChpf13T49358st/HXG7sqJmrQC225SHv X-Google-Smtp-Source: AGHT+IF8A6v66T58uNDnnZoSkWDxS3daTL7IjQUsD7nqztgiUN+v2vOUU+Y0W89PZp8H/QF1tgn0PQ== X-Received: by 2002:a05:600c:4711:b0:431:586e:7e7 with SMTP id 5b1f17b1804b1-4319ac6e818mr157781245e9.1.1730320285378; Wed, 30 Oct 2024 13:31:25 -0700 (PDT) Received: from localhost ([2a01:4b00:d036:ae00:3528:8196:8470:920]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-381c10b7c38sm43845f8f.24.2024.10.30.13.31.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Oct 2024 13:31:24 -0700 (PDT) From: luca.boccassi@gmail.com To: dev@dpdk.org Cc: david.marchand@redhat.com, roretzla@linux.microsoft.com Subject: [PATCH v2] Revert "eal/unix: fix thread creation" Date: Wed, 30 Oct 2024 20:30:59 +0000 Message-ID: <20241030203122.416198-1-luca.boccassi@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241030190945.409721-1-luca.boccassi@gmail.com> References: <20241030190945.409721-1-luca.boccassi@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Luca Boccassi This commit introduced a regression on arm64, causing a deadlock. lcores_autotest gets stuck and never terminates: [ 1077s] EAL: Detected CPU lcores: 4 [ 1077s] EAL: Detected NUMA nodes: 1 [ 1077s] EAL: Detected shared linkage of DPDK [ 1077s] EAL: Multi-process socket /tmp/dpdk/rte/mp_socket [ 1077s] EAL: Selected IOVA mode 'VA' [ 1077s] APP: HPET is not enabled, using TSC as default timer [ 1077s] RTE>>lcores_autotest [ 1127s] DPDK:fast-tests / lcores_autotest time out (After 50.0 seconds) This is 100% reproducible when running the fast tests suite after a package build on OBS. Reverting it reliably fixes the issue. This reverts commit b28c6196b132d1f25cb8c1bf781520fc41556b3a. Signed-off-by: Luca Boccassi --- v2: add forgotten signed-off-by I have bisected this long standing issue and identified the commit that introduced it. If anybody can provide a different fix that would be better, but if it's not possible to find another solution, it would be good to revert it until it can be found, to resolve the regression. lib/eal/unix/rte_thread.c | 73 +++++++++++++++------------------------ 1 file changed, 28 insertions(+), 45 deletions(-) diff --git a/lib/eal/unix/rte_thread.c b/lib/eal/unix/rte_thread.c index 1b4c73f58e..9a39ba3bb3 100644 --- a/lib/eal/unix/rte_thread.c +++ b/lib/eal/unix/rte_thread.c @@ -5,7 +5,6 @@ #include #include -#include #include #include @@ -19,14 +18,9 @@ struct eal_tls_key { pthread_key_t thread_index; }; -struct thread_start_context { +struct thread_routine_ctx { rte_thread_func thread_func; - void *thread_args; - const rte_thread_attr_t *thread_attr; - pthread_mutex_t wrapper_mutex; - pthread_cond_t wrapper_cond; - int wrapper_ret; - bool wrapper_done; + void *routine_args; }; static int @@ -89,29 +83,13 @@ thread_map_os_priority_to_eal_priority(int policy, int os_pri, } static void * -thread_start_wrapper(void *arg) +thread_func_wrapper(void *arg) { - struct thread_start_context *ctx = (struct thread_start_context *)arg; - rte_thread_func thread_func = ctx->thread_func; - void *thread_args = ctx->thread_args; - int ret = 0; + struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg; - if (ctx->thread_attr != NULL && CPU_COUNT(&ctx->thread_attr->cpuset) > 0) { - ret = rte_thread_set_affinity_by_id(rte_thread_self(), &ctx->thread_attr->cpuset); - if (ret != 0) - EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed"); - } + free(arg); - pthread_mutex_lock(&ctx->wrapper_mutex); - ctx->wrapper_ret = ret; - ctx->wrapper_done = true; - pthread_cond_signal(&ctx->wrapper_cond); - pthread_mutex_unlock(&ctx->wrapper_mutex); - - if (ret != 0) - return NULL; - - return (void *)(uintptr_t)thread_func(thread_args); + return (void *)(uintptr_t)ctx.thread_func(ctx.routine_args); } int @@ -122,18 +100,20 @@ rte_thread_create(rte_thread_t *thread_id, int ret = 0; pthread_attr_t attr; pthread_attr_t *attrp = NULL; + struct thread_routine_ctx *ctx; struct sched_param param = { .sched_priority = 0, }; int policy = SCHED_OTHER; - struct thread_start_context ctx = { - .thread_func = thread_func, - .thread_args = args, - .thread_attr = thread_attr, - .wrapper_done = false, - .wrapper_mutex = PTHREAD_MUTEX_INITIALIZER, - .wrapper_cond = PTHREAD_COND_INITIALIZER, - }; + + ctx = calloc(1, sizeof(*ctx)); + if (ctx == NULL) { + RTE_LOG(DEBUG, EAL, "Insufficient memory for thread context allocations\n"); + ret = ENOMEM; + goto cleanup; + } + ctx->routine_args = args; + ctx->thread_func = thread_func; if (thread_attr != NULL) { ret = pthread_attr_init(&attr); @@ -155,6 +135,7 @@ rte_thread_create(rte_thread_t *thread_id, goto cleanup; } + if (thread_attr->priority == RTE_THREAD_PRIORITY_REALTIME_CRITICAL) { ret = ENOTSUP; @@ -179,22 +160,24 @@ rte_thread_create(rte_thread_t *thread_id, } ret = pthread_create((pthread_t *)&thread_id->opaque_id, attrp, - thread_start_wrapper, &ctx); + thread_func_wrapper, ctx); if (ret != 0) { EAL_LOG(DEBUG, "pthread_create failed"); goto cleanup; } - pthread_mutex_lock(&ctx.wrapper_mutex); - while (!ctx.wrapper_done) - pthread_cond_wait(&ctx.wrapper_cond, &ctx.wrapper_mutex); - ret = ctx.wrapper_ret; - pthread_mutex_unlock(&ctx.wrapper_mutex); - - if (ret != 0) - rte_thread_join(*thread_id, NULL); + if (thread_attr != NULL && CPU_COUNT(&thread_attr->cpuset) > 0) { + ret = rte_thread_set_affinity_by_id(*thread_id, + &thread_attr->cpuset); + if (ret != 0) { + EAL_LOG(DEBUG, "rte_thread_set_affinity_by_id failed"); + goto cleanup; + } + } + ctx = NULL; cleanup: + free(ctx); if (attrp != NULL) pthread_attr_destroy(&attr); -- 2.45.2