DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Andrew Rybchenko <arybchenko@solarflare.com>, dev@dpdk.org
Cc: bruce.richardson@intel.com, thomas@monjalon.net
Subject: Re: [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles
Date: Mon, 30 Apr 2018 09:29:29 +0100	[thread overview]
Message-ID: <6b0e58bb-3658-9c00-db7d-bd93a8f88c5c@intel.com> (raw)
In-Reply-To: <925daa23-9265-2547-720b-4048ba7aa623@solarflare.com>

On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:
> On 04/25/2018 01:36 PM, Anatoly Burakov wrote:
>> The original implementation used flock() locks, but was later
>> switched to using fcntl() locks for page locking, because
>> fcntl() locks allow locking parts of a file, which is useful
>> for single-file segments mode, where locking the entire file
>> isn't as useful because we still need to grow and shrink it.
>>
>> However, according to fcntl()'s Ubuntu manpage [1], semantics of
>> fcntl() locks have a giant oversight:
>>
>>    This interface follows the completely stupid semantics of System
>>    V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
>>    locks associated with a file for a given process are removed
>>    when any file descriptor for that file is closed by that process.
>>    This semantic means that applications must be aware of any files
>>    that a subroutine library may access.
>>
>> Basically, closing *any* fd with an fcntl() lock (which we do because
>> we don't want to leak fd's) will drop the lock completely.
>>
>> So, in this commit, we will be reverting back to using flock() locks
>> everywhere. However, that still leaves the problem of locking parts
>> of a memseg list file in single file segments mode, and we will be
>> solving it with creating separate lock files per each page, and
>> tracking those with flock().
>>
>> We will also be removing all of this tailq business and replacing it
>> with a simple array - saving a few bytes is not worth the extra
>> hassle of dealing with pointers and potential memory allocation
>> failures. Also, remove the tailq lock since it is not needed - these
>> fd lists are per-process, and within a given process, it is always
>> only one thread handling access to hugetlbfs.
>>
>> So, first one to allocate a segment will create a lockfile, and put
>> a shared lock on it. When we're shrinking the page file, we will be
>> trying to take out a write lock on that lockfile, which would fail if
>> any other process is holding onto the lockfile as well. This way, we
>> can know if we can shrink the segment file. Also, if no other locks
>> are found in the lock list for a given memseg list, the memseg list
>> fd is automatically closed.
>>
>> One other thing to note is, according to flock() Ubuntu manpage [2],
>> upgrading the lock from shared to exclusive is implemented by dropping
>> and reacquiring the lock, which is not atomic and thus would have
>> created race conditions. So, on attempting to perform operations in
>> hugetlbfs, we will take out a writelock on hugetlbfs directory, so
>> that only one process could perform hugetlbfs operations concurrently.
>>
>> [1] 
>> http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
>> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html
>>
>> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
>> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
>> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
>> Fixes: 2a04139f66b4 ("eal: add single file segments option")
>> Cc: anatoly.burakov@intel.com
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> We have a problem with the changeset if EAL option -m or --socket-mem is 
> used.
> EAL initialization hangs just after EAL: Probing VFIO support...
> strace points to flock(7, LOCK_EX
> List of file descriptors:
> # ls /proc/25452/fd -l
> total 0
> lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
> lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
> lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
> lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
> lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
> lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
> lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
> lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages
> 
> I guess the problem is that there are two /dev/hugepages and
> it hangs on the second.
> 
> Ideas how to solve it?
> 
> Andrew.
> 

Seeing similar reports from validation. I'm looking into it.

-- 
Thanks,
Anatoly

  reply	other threads:[~2018-04-30  8:29 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 12:26 [dpdk-dev] [PATCH 0/2] Fix file locking in EAL memory Anatoly Burakov
2018-04-19 12:26 ` [dpdk-dev] [PATCH 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-24 14:06   ` Bruce Richardson
2018-04-19 12:26 ` [dpdk-dev] [PATCH 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov
2018-04-24 13:57   ` Bruce Richardson
2018-04-24 14:07   ` Bruce Richardson
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 0/2] Fix file locking in EAL memory Anatoly Burakov
2018-04-24 16:32   ` Stephen Hemminger
2018-04-24 17:25     ` Burakov, Anatoly
2018-04-24 20:05       ` Thomas Monjalon
2018-04-24 20:34         ` Stephen Hemminger
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-04-27 21:50     ` Thomas Monjalon
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov
2018-04-28  9:38     ` Andrew Rybchenko
2018-04-30  8:29       ` Burakov, Anatoly [this message]
2018-04-30 11:31       ` Burakov, Anatoly
2018-04-30 11:51         ` Maxime Coquelin
2018-04-30 13:08         ` Andrew Rybchenko
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b0e58bb-3658-9c00-db7d-bd93a8f88c5c@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).