From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id B105DE07 for ; Mon, 30 Apr 2018 15:09:07 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-us4.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id EE8AFBC005A; Mon, 30 Apr 2018 13:09:05 +0000 (UTC) Received: from [192.168.38.17] (84.52.114.114) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Mon, 30 Apr 2018 14:09:01 +0100 To: "Burakov, Anatoly" , CC: , References: <143708d4c806a69b5f5bb94edb2c6f7714fa912b.1524652577.git.anatoly.burakov@intel.com> <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com> <40b42df8-cc22-e529-e85c-c41edc36a0ff@intel.com> From: Andrew Rybchenko Message-ID: <278b60f9-1cee-fede-2af9-71fadc8dddb5@solarflare.com> Date: Mon, 30 Apr 2018 16:08:56 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <40b42df8-cc22-e529-e85c-c41edc36a0ff@intel.com> Content-Language: en-GB X-Originating-IP: [84.52.114.114] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23814.003 X-TM-AS-Result: No--27.420000-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-MDID: 1525093746-jqkjf-QQ5oip Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2018 13:09:08 -0000 On 04/30/2018 02:31 PM, Burakov, Anatoly wrote: > On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote: >> On 04/25/2018 01:36 PM, Anatoly Burakov wrote: >>> The original implementation used flock() locks, but was later >>> switched to using fcntl() locks for page locking, because >>> fcntl() locks allow locking parts of a file, which is useful >>> for single-file segments mode, where locking the entire file >>> isn't as useful because we still need to grow and shrink it. >>> >>> However, according to fcntl()'s Ubuntu manpage [1], semantics of >>> fcntl() locks have a giant oversight: >>> >>>    This interface follows the completely stupid semantics of System >>>    V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all >>>    locks associated with a file for a given process are removed >>>    when any file descriptor for that file is closed by that process. >>>    This semantic means that applications must be aware of any files >>>    that a subroutine library may access. >>> >>> Basically, closing *any* fd with an fcntl() lock (which we do because >>> we don't want to leak fd's) will drop the lock completely. >>> >>> So, in this commit, we will be reverting back to using flock() locks >>> everywhere. However, that still leaves the problem of locking parts >>> of a memseg list file in single file segments mode, and we will be >>> solving it with creating separate lock files per each page, and >>> tracking those with flock(). >>> >>> We will also be removing all of this tailq business and replacing it >>> with a simple array - saving a few bytes is not worth the extra >>> hassle of dealing with pointers and potential memory allocation >>> failures. Also, remove the tailq lock since it is not needed - these >>> fd lists are per-process, and within a given process, it is always >>> only one thread handling access to hugetlbfs. >>> >>> So, first one to allocate a segment will create a lockfile, and put >>> a shared lock on it. When we're shrinking the page file, we will be >>> trying to take out a write lock on that lockfile, which would fail if >>> any other process is holding onto the lockfile as well. This way, we >>> can know if we can shrink the segment file. Also, if no other locks >>> are found in the lock list for a given memseg list, the memseg list >>> fd is automatically closed. >>> >>> One other thing to note is, according to flock() Ubuntu manpage [2], >>> upgrading the lock from shared to exclusive is implemented by dropping >>> and reacquiring the lock, which is not atomic and thus would have >>> created race conditions. So, on attempting to perform operations in >>> hugetlbfs, we will take out a writelock on hugetlbfs directory, so >>> that only one process could perform hugetlbfs operations concurrently. >>> >>> [1] >>> http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html >>> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html >>> >>> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") >>> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") >>> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") >>> Fixes: 2a04139f66b4 ("eal: add single file segments option") >>> Cc: anatoly.burakov@intel.com >>> >>> Signed-off-by: Anatoly Burakov >>> Acked-by: Bruce Richardson >> >> We have a problem with the changeset if EAL option -m or --socket-mem >> is used. >> EAL initialization hangs just after EAL: Probing VFIO support... >> strace points to flock(7, LOCK_EX >> List of file descriptors: >> # ls /proc/25452/fd -l >> total 0 >> lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config >> lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166] >> lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158] >> lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages >> lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages >> >> I guess the problem is that there are two /dev/hugepages and >> it hangs on the second. >> >> Ideas how to solve it? >> >> Andrew. >> > > Hi Andrew, > > Please try the following patch: > > http://dpdk.org/dev/patchwork/patch/39166/ > > This should fix the issue. Hi Anatoly, yes, it fixes the issue. Thanks, Andrew.