From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 705EAA00C4;
	Thu, 30 Jun 2022 14:24:40 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 12D7940150;
	Thu, 30 Jun 2022 14:24:40 +0200 (CEST)
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by mails.dpdk.org (Postfix) with ESMTP id 4F21B400EF;
 Thu, 30 Jun 2022 14:24:38 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1656591878; x=1688127878;
 h=date:from:to:cc:subject:message-id:references:
 mime-version:in-reply-to;
 bh=eB7Wrg+moP1Rpwx91icJ7xTYE6o70I2BaUK1A2zTWws=;
 b=XsaGjtdPp2EEl70XUc5JAxSNAShRehzdcYyW6DzQE5HutouLx/CT1nrA
 q1fmoAYnQJ8hwzJzVREi8ZKGlVOh/B7zwSQpnhv95tmMlA4t0AmdCBgRK
 OcqWneVz9gb/dv8vNfIaM7o+SsLexgvzIhzPWSp/BGZyJu5WgjMjsQQtk
 +RnNbbA+Rf8ExRcCBRKowm3n5CujXrC+DdcM/Yadsd8f+vohSeG83/PQt
 YcUs8h6+WkfVncSXZG158lfXzkqo4FdHl3WEamZpr+SUS5CW3jg640XKE
 lYIkgDEM7ihplxliNoFojmR6vRhnkvFWb54irpw0empERhFMh47eGaSWz g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10393"; a="283066317"
X-IronPort-AV: E=Sophos;i="5.92,234,1650956400"; d="scan'208";a="283066317"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 30 Jun 2022 05:24:37 -0700
X-IronPort-AV: E=Sophos;i="5.92,234,1650956400"; d="scan'208";a="658978444"
Received: from bricha3-mobl.ger.corp.intel.com ([10.55.133.37])
 by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA;
 30 Jun 2022 05:24:35 -0700
Date: Thu, 30 Jun 2022 13:24:31 +0100
From: Bruce Richardson <bruce.richardson@intel.com>
To: zhichaox.zeng@intel.com
Cc: dev@dpdk.org, stable@dpdk.org, qiming.yang@intel.com,
 david.marchand@redhat.com, stephen@networkplumber.org,
 mb@smartsharesystems.com, Harman Kalra <hkalra@marvell.com>
Subject: Re: [PATCH v4] lib/eal: fix segfaults due to thread exit order
Message-ID: <Yr2V/yvgbWB4l6xW@bricha3-MOBL.ger.corp.intel.com>
References: <20220530134738.488602-1-zhichaox.zeng@intel.com>
 <20220615060154.6905-1-zhichaox.zeng@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20220615060154.6905-1-zhichaox.zeng@intel.com>
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Wed, Jun 15, 2022 at 02:01:54PM +0800, zhichaox.zeng@intel.com wrote:
> From: Zhichao Zeng <zhichaox.zeng@intel.com>
> 
> The eal-intr-thread is not closed before memory cleanup in the
> process of exiting. There is a small probability that when the
> eal-intr-thread is about to use some pointers, the memory were
> just cleaned, which cause the segment fault error caught by ASan.
> 
> This patch close the eal-intr-thread before memory cleanup when
> exiting to avoid segment fault. And add some atomic operations
> to avoid executing rte_eal_cleanup in the child process spawned
> by fork() in some test cases, e.g. debug_autotest of dpdk-test.
> 
> Cc: stable@dpdk.org
> 

Hi,

some comments inline below.

/Bruce

> ---
> v2:
> add the same API for FreeBSD
> ---
> v3:
> fix rte_eal_cleanup crash in debug_autotest
> ---
> v4:
> shorten the prompt message and optimize the commit log
> 

Please put these updates below the cutline after the sign-offs, i.e.
immediately before the diffstat.

> Suggested-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
> ---
>  lib/eal/common/eal_private.h     |  7 +++++++
>  lib/eal/freebsd/eal.c            | 21 ++++++++++++++++++++-
>  lib/eal/freebsd/eal_interrupts.c | 12 ++++++++++++
>  lib/eal/linux/eal.c              | 20 +++++++++++++++++++-
>  lib/eal/linux/eal_interrupts.c   | 12 ++++++++++++
>  5 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
> index 44d14241f0..7adf41b7d7 100644
> --- a/lib/eal/common/eal_private.h
> +++ b/lib/eal/common/eal_private.h
> @@ -152,6 +152,13 @@ int rte_eal_tailqs_init(void);
>   */
>  int rte_eal_intr_init(void);
>  
> +/**
> + * Destroy interrupt handling thread.
> + *
> + * This function is private to EAL.
> + */
> +void rte_eal_intr_destroy(void);
> +
>  /**
>   * Close the default log stream
>   *
> diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
> index a6b20960f2..4882f27abd 100644
> --- a/lib/eal/freebsd/eal.c
> +++ b/lib/eal/freebsd/eal.c
> @@ -72,6 +72,8 @@ struct lcore_config lcore_config[RTE_MAX_LCORE];
>  /* used by rte_rdtsc() */
>  int rte_cycles_vmware_tsc_map;
>  
> +/* used to judge the running status of the eal */
> +static uint32_t run_once;
>  

I don't like just moving this variable from the eal_init function. When in
eal_init the name "run_once" made sense as it tracked how often the EAL
init function was run. However, now as a global variable the name
"run_once" no longer makes sense.

Two suggestions:
1. Keep run_once in EAL init as-is, and use a different variable or value
   to indicate that DPDK is initialized for cleanup.
2. Move the variable as you have here, just rename it to a more meaningful
   name.


>  int
>  eal_clean_runtime_dir(void)
> @@ -574,12 +576,22 @@ static void rte_eal_init_alert(const char *msg)
>  	RTE_LOG(ERR, EAL, "%s\n", msg);
>  }
>  
> +static void warn_parent(void)
> +{
> +	RTE_LOG(WARNING, EAL, "DPDK won't work in the child process\n");
> +}

I wonder if this contains enough information. Can we identify briefly what
parts will or won't work, or if we just want to deny everything, can we
give a brief reason why?

> +
> +static void scratch_child(void)
> +{
> +	/* Scratch run_once so that a call to rte_eal_cleanup won't crash... */
> +	__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
> +}
> +

I think the name of this function needs improvement. I'm not sure that
"scratch" is the best term to use. Something like "clear_eal_flag" is
probably better.

>  /* Launch threads, called at application init(). */
>  int
>  rte_eal_init(int argc, char **argv)
>  {
>  	int i, fctret, ret;
> -	static uint32_t run_once;
>  	uint32_t has_run = 0;
>  	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
>  	char thread_name[RTE_MAX_THREAD_NAME_LEN];
> @@ -883,6 +895,8 @@ rte_eal_init(int argc, char **argv)
>  
>  	eal_mcfg_complete();
>  
> +	pthread_atfork(NULL, warn_parent, scratch_child);
> +
>  	return fctret;
>  }
>  
> @@ -891,8 +905,13 @@ rte_eal_cleanup(void)
>  {
>  	struct internal_config *internal_conf =
>  		eal_get_internal_configuration();
> +
> +	if (__atomic_load_n(&run_once, __ATOMIC_RELAXED) == 0)
> +		return 0;
> +
>  	rte_service_finalize();
>  	rte_mp_channel_cleanup();
> +	rte_eal_intr_destroy();
>  	/* after this point, any DPDK pointers will become dangling */
>  	rte_eal_memory_detach();
>  	rte_eal_alarm_cleanup();
<snip for brevity>