From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id 600DB23C for ; Mon, 30 Apr 2018 13:51:33 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 707468163AD4; Mon, 30 Apr 2018 11:51:32 +0000 (UTC) Received: from [10.36.112.35] (ovpn-112-35.ams2.redhat.com [10.36.112.35]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 007C32166BAD; Mon, 30 Apr 2018 11:51:30 +0000 (UTC) To: "Burakov, Anatoly" , Andrew Rybchenko , dev@dpdk.org Cc: bruce.richardson@intel.com, thomas@monjalon.net References: <143708d4c806a69b5f5bb94edb2c6f7714fa912b.1524652577.git.anatoly.burakov@intel.com> <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com> <40b42df8-cc22-e529-e85c-c41edc36a0ff@intel.com> From: Maxime Coquelin Message-ID: <0b570083-adce-815a-cc23-9febf4fb3d9a@redhat.com> Date: Mon, 30 Apr 2018 13:51:29 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <40b42df8-cc22-e529-e85c-c41edc36a0ff@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 30 Apr 2018 11:51:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 30 Apr 2018 11:51:32 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2018 11:51:33 -0000 On 04/30/2018 01:31 PM, Burakov, Anatoly wrote: > On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote: >> On 04/25/2018 01:36 PM, Anatoly Burakov wrote: >>> The original implementation used flock() locks, but was later >>> switched to using fcntl() locks for page locking, because >>> fcntl() locks allow locking parts of a file, which is useful >>> for single-file segments mode, where locking the entire file >>> isn't as useful because we still need to grow and shrink it. >>> >>> However, according to fcntl()'s Ubuntu manpage [1], semantics of >>> fcntl() locks have a giant oversight: >>> >>>    This interface follows the completely stupid semantics of System >>>    V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all >>>    locks associated with a file for a given process are removed >>>    when any file descriptor for that file is closed by that process. >>>    This semantic means that applications must be aware of any files >>>    that a subroutine library may access. >>> >>> Basically, closing *any* fd with an fcntl() lock (which we do because >>> we don't want to leak fd's) will drop the lock completely. >>> >>> So, in this commit, we will be reverting back to using flock() locks >>> everywhere. However, that still leaves the problem of locking parts >>> of a memseg list file in single file segments mode, and we will be >>> solving it with creating separate lock files per each page, and >>> tracking those with flock(). >>> >>> We will also be removing all of this tailq business and replacing it >>> with a simple array - saving a few bytes is not worth the extra >>> hassle of dealing with pointers and potential memory allocation >>> failures. Also, remove the tailq lock since it is not needed - these >>> fd lists are per-process, and within a given process, it is always >>> only one thread handling access to hugetlbfs. >>> >>> So, first one to allocate a segment will create a lockfile, and put >>> a shared lock on it. When we're shrinking the page file, we will be >>> trying to take out a write lock on that lockfile, which would fail if >>> any other process is holding onto the lockfile as well. This way, we >>> can know if we can shrink the segment file. Also, if no other locks >>> are found in the lock list for a given memseg list, the memseg list >>> fd is automatically closed. >>> >>> One other thing to note is, according to flock() Ubuntu manpage [2], >>> upgrading the lock from shared to exclusive is implemented by dropping >>> and reacquiring the lock, which is not atomic and thus would have >>> created race conditions. So, on attempting to perform operations in >>> hugetlbfs, we will take out a writelock on hugetlbfs directory, so >>> that only one process could perform hugetlbfs operations concurrently. >>> >>> [1] >>> http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html >>> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html >>> >>> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") >>> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") >>> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") >>> Fixes: 2a04139f66b4 ("eal: add single file segments option") >>> Cc: anatoly.burakov@intel.com >>> >>> Signed-off-by: Anatoly Burakov >>> Acked-by: Bruce Richardson >> >> We have a problem with the changeset if EAL option -m or --socket-mem >> is used. >> EAL initialization hangs just after EAL: Probing VFIO support... >> strace points to flock(7, LOCK_EX >> List of file descriptors: >> # ls /proc/25452/fd -l >> total 0 >> lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0 >> lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config >> lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166] >> lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158] >> lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages >> lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages >> >> I guess the problem is that there are two /dev/hugepages and >> it hangs on the second. >> >> Ideas how to solve it? >> >> Andrew. >> > > Hi Andrew, > > Please try the following patch: > > http://dpdk.org/dev/patchwork/patch/39166/ > > This should fix the issue. > I faced the regression in my test bench, your patch fixes the issue in my case: Tested-by: Maxime Coquelin Thanks, Maxime