From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E96ABA050B for ; Thu, 14 Apr 2022 22:28:40 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E01314069F; Thu, 14 Apr 2022 22:28:40 +0200 (CEST) Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by mails.dpdk.org (Postfix) with ESMTP id F3B144003C for ; Thu, 14 Apr 2022 22:28:39 +0200 (CEST) Received: by mail-pj1-f49.google.com with SMTP id 2so6111113pjw.2 for ; Thu, 14 Apr 2022 13:28:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=mCIeqfTBsVMMYF8Kc2lUi0tqpZ/JA0k841/55a2z/T0=; b=fPvTT4mP4MJUtLJMnE+naVYF/eAoG+OcO+WoCbsrRMAvgJhax7bD2dnWg156aeghLU dzlEB+OGfzSPdZKEjj49QHWDjoOSmEbTRx38SDyxy7x7u+KT166Zs/3soN6yguwZW/mB 8C6bdQG0ltEihAKE19VZF6m32xkWUloTSS96my/3JTu64aHGyDhGPqeHwoeLcLM3WLQk SaiTe+ycJ3DPJ9SfdcL3Q2Hl/cqNVSpefJAcq0Jt50v4JK+dY0MNacGt0wwHy8E+O/Lz iuuySd7p8ePP1e4Drl8P5Sd0uNrsliyUbrnm0Yq/eHxUPH1/pGnoASz9O1IrLtoW1joa a1cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mCIeqfTBsVMMYF8Kc2lUi0tqpZ/JA0k841/55a2z/T0=; b=lsxKhMGcu40pjSnW1TLkEQ6Vdd5v/UdGO1OnZwAXLWSgriSLyS4IGIlN4vGeZv03y3 iXG2nrELGgx3fKFccXj0lpkQJEyRt3vCnW22mZvvebIMmCeFdnp0ZxXLAPsRC7XHSonD r2MkTERhB5wvMIb89b2sxTTTdnR2eZMCdb0OSPH7bVmsuQDJl7CkRkurYk+zGGLNeY3B to6Y6IbOP7EBKmrFUK43md47i7yjYS/C4IX9I3zv04rx8h/NVLmNblkiuO1h+xgUguSB nSDjkfmREkpY89SzXGtYb0gMoQM6gazqMsdEsVKQus4ERyPdz/gU8tIu2S1CKei7ZEAu asng== X-Gm-Message-State: AOAM530u8y+D8hVDic5ye8gKO3UgSA58PuDAGfC2eUNBr4YZgLR1BUIp dKpJHLRWLaa2GwZjX+J5Obw5lg== X-Google-Smtp-Source: ABdhPJymMe+P6LTbPYN14YE0D9oJ3RU0dgEGk3BMYrf6oOvWzTDZnYdckeqKoYz1KHHmAgvyr7em7A== X-Received: by 2002:a17:902:b10e:b0:156:1bf8:bf26 with SMTP id q14-20020a170902b10e00b001561bf8bf26mr49301670plr.8.1649968118679; Thu, 14 Apr 2022 13:28:38 -0700 (PDT) Received: from hermes.local (204-195-112-199.wavecable.com. [204.195.112.199]) by smtp.gmail.com with ESMTPSA id c7-20020a17090ab28700b001ca9514df81sm2604400pjr.45.2022.04.14.13.28.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Apr 2022 13:28:38 -0700 (PDT) Date: Thu, 14 Apr 2022 13:28:34 -0700 From: Stephen Hemminger To: Thomas Monjalon Cc: anatoly.burakov@intel.com, stable@dpdk.org, dev@dpdk.org, david.marchand@redhat.com Subject: Re: [PATCH] eal: fix data race in multi-process support Message-ID: <20220414132834.5c073dad@hermes.local> In-Reply-To: <9400637.ag9G3TJQzC@thomas> References: <20211217181649.154972-1-stephen@networkplumber.org> <20211217182922.159503-1-stephen@networkplumber.org> <9400637.ag9G3TJQzC@thomas> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On Sun, 13 Feb 2022 12:39:59 +0100 Thomas Monjalon wrote: > 17/12/2021 19:29, Stephen Hemminger: > > If DPDK is built with thread sanitizer it reports a race > > in setting of multiprocess file descriptor. The fix is to > > use atomic operations when updating mp_fd. > > Please could explain more the condition of the race? > Is it between init and cleanup of the same file descriptor? > How atomic is helping here? > > > > > > Simple example: > > $ dpdk-testpmd -l 1-3 --no-huge > > ... > > EAL: Error - exiting with code: 1 > > Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory > > ================== > > WARNING: ThreadSanitizer: data race (pid=83054) > > Write of size 4 at 0x55e3b7fce450 by main thread: > > #0 rte_mp_channel_cleanup (dpdk-testpmd+0x160d79c) > > #1 rte_eal_cleanup (dpdk-testpmd+0x1614fb5) > > #2 rte_exit (dpdk-testpmd+0x15ec97a) > > #3 mbuf_pool_create.cold (dpdk-testpmd+0x242e1a) > > #4 main (dpdk-testpmd+0x5ab05d) > > > > Previous read of size 4 at 0x55e3b7fce450 by thread T2: > > #0 mp_handle (dpdk-testpmd+0x160c979) > > #1 ctrl_thread_init (dpdk-testpmd+0x15ff76e) > > > > As if synchronized via sleep: > > #0 nanosleep ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:362 (libtsan.so.0+0x5cd8e) > > #1 get_tsc_freq (dpdk-testpmd+0x1622889) > > #2 set_tsc_freq (dpdk-testpmd+0x15ffb9c) > > #3 rte_eal_timer_init (dpdk-testpmd+0x1622a34) > > #4 rte_eal_init.cold (dpdk-testpmd+0x26b314) > > #5 main (dpdk-testpmd+0x5aab45) > > > > Location is global 'mp_fd' of size 4 at 0x55e3b7fce450 (dpdk-testpmd+0x0000027c7450) > > > > Thread T2 'rte_mp_handle' (tid=83057, running) created by main thread at: > > #0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x58ba2) > > #1 rte_ctrl_thread_create (dpdk-testpmd+0x15ff870) > > #2 rte_mp_channel_init.cold (dpdk-testpmd+0x269986) > > #3 rte_eal_init (dpdk-testpmd+0x1615b28) > > #4 main (dpdk-testpmd+0x5aab45) > > > The issue is that two threads are sharing a global variable without barriers or atomic. The variable mp_fd is set in control thread rte_mp_channel_init/rte_mp_channel_cleanup but then read by the thread that handles multiprocess (mp_handle). This sharing of global data without barrier or lock is unsafe/undefined, and can break on weakly ordered CPU's like ARM. Kind of surprised that we don't see bug already since compiler could decide that mp_fd in the function mp_handle() is invariant and not test it and have the thread run forever. This is a bug from the beginning of MP support in DPDK.