From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E91C8A32A2 for ; Thu, 24 Oct 2019 18:07:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C15BB1EB75; Thu, 24 Oct 2019 18:07:11 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 8A5631EB5A; Thu, 24 Oct 2019 18:07:08 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Oct 2019 09:07:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,225,1569308400"; d="scan'208";a="210187208" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.252.8.129]) ([10.252.8.129]) by fmsmga001.fm.intel.com with ESMTP; 24 Oct 2019 09:07:01 -0700 To: David Marchand Cc: dev , Bruce Richardson , Stephen Hemminger , dpdk stable References: <67a795e77bc9f5ac79ab78a878ae19abbead9f50.1563984454.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <27c0cf41-2dea-9e0e-7c25-df76796ca2e6@intel.com> Date: Thu, 24 Oct 2019 17:07:00 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4] eal: fix proc type auto detection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12-Aug-19 11:03 AM, David Marchand wrote: > On Wed, Jul 24, 2019 at 6:08 PM Anatoly Burakov > wrote: >> >> Currently, primary process holds an exclusive lock on the config >> file, thereby preventing other primaries from spinning up. However, >> when the primary dies, the lock is no longer being held, even though >> there might be other secondary processes still running. >> >> The fix is two-fold. First of all, downgrade the primary process's >> exclusive lock to a shared lock once we have it. Second of all, >> also take out shared locks on the config from the secondaries. We >> are using fcntl() locks, which get dropped when the file handle is >> closed, so also remove the closure of config file handle. >> >> Fixes: af75078fece3 ("first public release") >> Cc: stable@dpdk.org >> >> Signed-off-by: Anatoly Burakov >> --- >> Hi David, I've been investigating how to improve this patch, and i've hit a dead end. The problems here are two-fold. Using fcntl() and flock() locks together is not advisable, so both primary-secondary detection and rte_eal_primary_proc_alive() (as per Harry's point) have to use the same method for checking locks. Using flock() would work for this purpose. Unfortunately, on FreeBSD, converting exclusive lock into a shared lock involves unlocking first [1] (creating a race). On Linux it doesn't specifically say that it does that, but it does mention that it is not guaranteed to be atomic [2]. So, we can't use flock() here. It seems that fcntl() lock conversions are atomic, however fcntl() locks on both Linux and FreeBSD are implemented in a very stupid way in that /any/ closure of a file descriptor drops /all/ locks on that file. Meaning, the moment secondary does the check in primary_proc_alive() and closes the config file fd, the process-wide lock drops. Mind you, primary_proc_alive() is implemented using lockf() rather than fcntl(), which is an issue in itself, but on Linux, lockf() is implemented on top of fcntl() locks and thus suffers from the same issue. So, unless you have better ideas, i think this patch can be marked as rejected. [1] https://www.freebsd.org/cgi/man.cgi?query=flock&sektion=2 [2] https://linux.die.net/man/2/flock -- Thanks, Anatoly