From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [67.231.154.164]) by dpdk.org (Postfix) with ESMTP id DA0571DBC for ; Sat, 28 Apr 2018 11:38:59 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1-us3.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id 3F4F79C0066; Sat, 28 Apr 2018 09:38:58 +0000 (UTC) Received: from [192.168.38.17] (84.52.114.114) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1044.25; Sat, 28 Apr 2018 10:38:53 +0100 To: Anatoly Burakov , CC: , References: <143708d4c806a69b5f5bb94edb2c6f7714fa912b.1524652577.git.anatoly.burakov@intel.com> From: Andrew Rybchenko Message-ID: <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com> Date: Sat, 28 Apr 2018 12:38:48 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <143708d4c806a69b5f5bb94edb2c6f7714fa912b.1524652577.git.anatoly.burakov@intel.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-Originating-IP: [84.52.114.114] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-11.0.0.1191-8.100.1062-23810.003 X-TM-AS-Result: No--20.743100-0.000000-31 X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-MDID: 1524908339-Wn2umjaj6Pax Subject: Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Apr 2018 09:39:00 -0000 On 04/25/2018 01:36 PM, Anatoly Burakov wrote: > The original implementation used flock() locks, but was later > switched to using fcntl() locks for page locking, because > fcntl() locks allow locking parts of a file, which is useful > for single-file segments mode, where locking the entire file > isn't as useful because we still need to grow and shrink it. > > However, according to fcntl()'s Ubuntu manpage [1], semantics of > fcntl() locks have a giant oversight: > > This interface follows the completely stupid semantics of System > V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all > locks associated with a file for a given process are removed > when any file descriptor for that file is closed by that process. > This semantic means that applications must be aware of any files > that a subroutine library may access. > > Basically, closing *any* fd with an fcntl() lock (which we do because > we don't want to leak fd's) will drop the lock completely. > > So, in this commit, we will be reverting back to using flock() locks > everywhere. However, that still leaves the problem of locking parts > of a memseg list file in single file segments mode, and we will be > solving it with creating separate lock files per each page, and > tracking those with flock(). > > We will also be removing all of this tailq business and replacing it > with a simple array - saving a few bytes is not worth the extra > hassle of dealing with pointers and potential memory allocation > failures. Also, remove the tailq lock since it is not needed - these > fd lists are per-process, and within a given process, it is always > only one thread handling access to hugetlbfs. > > So, first one to allocate a segment will create a lockfile, and put > a shared lock on it. When we're shrinking the page file, we will be > trying to take out a write lock on that lockfile, which would fail if > any other process is holding onto the lockfile as well. This way, we > can know if we can shrink the segment file. Also, if no other locks > are found in the lock list for a given memseg list, the memseg list > fd is automatically closed. > > One other thing to note is, according to flock() Ubuntu manpage [2], > upgrading the lock from shared to exclusive is implemented by dropping > and reacquiring the lock, which is not atomic and thus would have > created race conditions. So, on attempting to perform operations in > hugetlbfs, we will take out a writelock on hugetlbfs directory, so > that only one process could perform hugetlbfs operations concurrently. > > [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html > [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html > > Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") > Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") > Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") > Fixes: 2a04139f66b4 ("eal: add single file segments option") > Cc: anatoly.burakov@intel.com > > Signed-off-by: Anatoly Burakov > Acked-by: Bruce Richardson We have a problem with the changeset if EAL option -m or --socket-mem is used. EAL initialization hangs just after EAL: Probing VFIO support... strace points to flock(7, LOCK_EX List of file descriptors: # ls /proc/25452/fd -l total 0 lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0 lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0 lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0 lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166] lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158] lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages I guess the problem is that there are two /dev/hugepages and it hangs on the second. Ideas how to solve it? Andrew.