From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 705EAA00C4; Thu, 30 Jun 2022 14:24:40 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 12D7940150; Thu, 30 Jun 2022 14:24:40 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by mails.dpdk.org (Postfix) with ESMTP id 4F21B400EF; Thu, 30 Jun 2022 14:24:38 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656591878; x=1688127878; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=eB7Wrg+moP1Rpwx91icJ7xTYE6o70I2BaUK1A2zTWws=; b=XsaGjtdPp2EEl70XUc5JAxSNAShRehzdcYyW6DzQE5HutouLx/CT1nrA q1fmoAYnQJ8hwzJzVREi8ZKGlVOh/B7zwSQpnhv95tmMlA4t0AmdCBgRK OcqWneVz9gb/dv8vNfIaM7o+SsLexgvzIhzPWSp/BGZyJu5WgjMjsQQtk +RnNbbA+Rf8ExRcCBRKowm3n5CujXrC+DdcM/Yadsd8f+vohSeG83/PQt YcUs8h6+WkfVncSXZG158lfXzkqo4FdHl3WEamZpr+SUS5CW3jg640XKE lYIkgDEM7ihplxliNoFojmR6vRhnkvFWb54irpw0empERhFMh47eGaSWz g==; X-IronPort-AV: E=McAfee;i="6400,9594,10393"; a="283066317" X-IronPort-AV: E=Sophos;i="5.92,234,1650956400"; d="scan'208";a="283066317" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2022 05:24:37 -0700 X-IronPort-AV: E=Sophos;i="5.92,234,1650956400"; d="scan'208";a="658978444" Received: from bricha3-mobl.ger.corp.intel.com ([10.55.133.37]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 30 Jun 2022 05:24:35 -0700 Date: Thu, 30 Jun 2022 13:24:31 +0100 From: Bruce Richardson To: zhichaox.zeng@intel.com Cc: dev@dpdk.org, stable@dpdk.org, qiming.yang@intel.com, david.marchand@redhat.com, stephen@networkplumber.org, mb@smartsharesystems.com, Harman Kalra Subject: Re: [PATCH v4] lib/eal: fix segfaults due to thread exit order Message-ID: References: <20220530134738.488602-1-zhichaox.zeng@intel.com> <20220615060154.6905-1-zhichaox.zeng@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220615060154.6905-1-zhichaox.zeng@intel.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Jun 15, 2022 at 02:01:54PM +0800, zhichaox.zeng@intel.com wrote: > From: Zhichao Zeng > > The eal-intr-thread is not closed before memory cleanup in the > process of exiting. There is a small probability that when the > eal-intr-thread is about to use some pointers, the memory were > just cleaned, which cause the segment fault error caught by ASan. > > This patch close the eal-intr-thread before memory cleanup when > exiting to avoid segment fault. And add some atomic operations > to avoid executing rte_eal_cleanup in the child process spawned > by fork() in some test cases, e.g. debug_autotest of dpdk-test. > > Cc: stable@dpdk.org > Hi, some comments inline below. /Bruce > --- > v2: > add the same API for FreeBSD > --- > v3: > fix rte_eal_cleanup crash in debug_autotest > --- > v4: > shorten the prompt message and optimize the commit log > Please put these updates below the cutline after the sign-offs, i.e. immediately before the diffstat. > Suggested-by: David Marchand > Signed-off-by: Zhichao Zeng > --- > lib/eal/common/eal_private.h | 7 +++++++ > lib/eal/freebsd/eal.c | 21 ++++++++++++++++++++- > lib/eal/freebsd/eal_interrupts.c | 12 ++++++++++++ > lib/eal/linux/eal.c | 20 +++++++++++++++++++- > lib/eal/linux/eal_interrupts.c | 12 ++++++++++++ > 5 files changed, 70 insertions(+), 2 deletions(-) > > diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h > index 44d14241f0..7adf41b7d7 100644 > --- a/lib/eal/common/eal_private.h > +++ b/lib/eal/common/eal_private.h > @@ -152,6 +152,13 @@ int rte_eal_tailqs_init(void); > */ > int rte_eal_intr_init(void); > > +/** > + * Destroy interrupt handling thread. > + * > + * This function is private to EAL. > + */ > +void rte_eal_intr_destroy(void); > + > /** > * Close the default log stream > * > diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c > index a6b20960f2..4882f27abd 100644 > --- a/lib/eal/freebsd/eal.c > +++ b/lib/eal/freebsd/eal.c > @@ -72,6 +72,8 @@ struct lcore_config lcore_config[RTE_MAX_LCORE]; > /* used by rte_rdtsc() */ > int rte_cycles_vmware_tsc_map; > > +/* used to judge the running status of the eal */ > +static uint32_t run_once; > I don't like just moving this variable from the eal_init function. When in eal_init the name "run_once" made sense as it tracked how often the EAL init function was run. However, now as a global variable the name "run_once" no longer makes sense. Two suggestions: 1. Keep run_once in EAL init as-is, and use a different variable or value to indicate that DPDK is initialized for cleanup. 2. Move the variable as you have here, just rename it to a more meaningful name. > int > eal_clean_runtime_dir(void) > @@ -574,12 +576,22 @@ static void rte_eal_init_alert(const char *msg) > RTE_LOG(ERR, EAL, "%s\n", msg); > } > > +static void warn_parent(void) > +{ > + RTE_LOG(WARNING, EAL, "DPDK won't work in the child process\n"); > +} I wonder if this contains enough information. Can we identify briefly what parts will or won't work, or if we just want to deny everything, can we give a brief reason why? > + > +static void scratch_child(void) > +{ > + /* Scratch run_once so that a call to rte_eal_cleanup won't crash... */ > + __atomic_store_n(&run_once, 0, __ATOMIC_RELAXED); > +} > + I think the name of this function needs improvement. I'm not sure that "scratch" is the best term to use. Something like "clear_eal_flag" is probably better. > /* Launch threads, called at application init(). */ > int > rte_eal_init(int argc, char **argv) > { > int i, fctret, ret; > - static uint32_t run_once; > uint32_t has_run = 0; > char cpuset[RTE_CPU_AFFINITY_STR_LEN]; > char thread_name[RTE_MAX_THREAD_NAME_LEN]; > @@ -883,6 +895,8 @@ rte_eal_init(int argc, char **argv) > > eal_mcfg_complete(); > > + pthread_atfork(NULL, warn_parent, scratch_child); > + > return fctret; > } > > @@ -891,8 +905,13 @@ rte_eal_cleanup(void) > { > struct internal_config *internal_conf = > eal_get_internal_configuration(); > + > + if (__atomic_load_n(&run_once, __ATOMIC_RELAXED) == 0) > + return 0; > + > rte_service_finalize(); > rte_mp_channel_cleanup(); > + rte_eal_intr_destroy(); > /* after this point, any DPDK pointers will become dangling */ > rte_eal_memory_detach(); > rte_eal_alarm_cleanup();