From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 84258DD2 for ; Mon, 30 Apr 2018 10:29:36 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Apr 2018 01:29:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,346,1520924400"; d="scan'208";a="51915954" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.102]) ([10.237.220.102]) by orsmga001.jf.intel.com with ESMTP; 30 Apr 2018 01:29:33 -0700 To: Andrew Rybchenko , dev@dpdk.org Cc: bruce.richardson@intel.com, thomas@monjalon.net References: <143708d4c806a69b5f5bb94edb2c6f7714fa912b.1524652577.git.anatoly.burakov@intel.com> <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com> From: "Burakov, Anatoly" Message-ID: <6b0e58bb-3658-9c00-db7d-bd93a8f88c5c@intel.com> Date: Mon, 30 Apr 2018 09:29:29 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2018 08:29:37 -0000 On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote: > On 04/25/2018 01:36 PM, Anatoly Burakov wrote: >> The original implementation used flock() locks, but was later >> switched to using fcntl() locks for page locking, because >> fcntl() locks allow locking parts of a file, which is useful >> for single-file segments mode, where locking the entire file >> isn't as useful because we still need to grow and shrink it. >> >> However, according to fcntl()'s Ubuntu manpage [1], semantics of >> fcntl() locks have a giant oversight: >> >>    This interface follows the completely stupid semantics of System >>    V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all >>    locks associated with a file for a given process are removed >>    when any file descriptor for that file is closed by that process. >>    This semantic means that applications must be aware of any files >>    that a subroutine library may access. >> >> Basically, closing *any* fd with an fcntl() lock (which we do because >> we don't want to leak fd's) will drop the lock completely. >> >> So, in this commit, we will be reverting back to using flock() locks >> everywhere. However, that still leaves the problem of locking parts >> of a memseg list file in single file segments mode, and we will be >> solving it with creating separate lock files per each page, and >> tracking those with flock(). >> >> We will also be removing all of this tailq business and replacing it >> with a simple array - saving a few bytes is not worth the extra >> hassle of dealing with pointers and potential memory allocation >> failures. Also, remove the tailq lock since it is not needed - these >> fd lists are per-process, and within a given process, it is always >> only one thread handling access to hugetlbfs. >> >> So, first one to allocate a segment will create a lockfile, and put >> a shared lock on it. When we're shrinking the page file, we will be >> trying to take out a write lock on that lockfile, which would fail if >> any other process is holding onto the lockfile as well. This way, we >> can know if we can shrink the segment file. Also, if no other locks >> are found in the lock list for a given memseg list, the memseg list >> fd is automatically closed. >> >> One other thing to note is, according to flock() Ubuntu manpage [2], >> upgrading the lock from shared to exclusive is implemented by dropping >> and reacquiring the lock, which is not atomic and thus would have >> created race conditions. So, on attempting to perform operations in >> hugetlbfs, we will take out a writelock on hugetlbfs directory, so >> that only one process could perform hugetlbfs operations concurrently. >> >> [1] >> http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html >> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html >> >> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") >> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") >> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") >> Fixes: 2a04139f66b4 ("eal: add single file segments option") >> Cc: anatoly.burakov@intel.com >> >> Signed-off-by: Anatoly Burakov >> Acked-by: Bruce Richardson > > We have a problem with the changeset if EAL option -m or --socket-mem is > used. > EAL initialization hangs just after EAL: Probing VFIO support... > strace points to flock(7, LOCK_EX > List of file descriptors: > # ls /proc/25452/fd -l > total 0 > lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0 > lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0 > lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0 > lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config > lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166] > lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158] > lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages > lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages > > I guess the problem is that there are two /dev/hugepages and > it hangs on the second. > > Ideas how to solve it? > > Andrew. > Seeing similar reports from validation. I'm looking into it. -- Thanks, Anatoly