Re: [dpdk-dev] [PATCH 2/2] mem: revert to using flock() and add per-segment lockfiles

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Bruce Richardson <bruce.richardson@intel.com>
To: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] mem: revert to using flock() and add per-segment lockfiles
Date: Tue, 24 Apr 2018 14:57:00 +0100	[thread overview]
Message-ID: <20180424135700.GA135436@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <aef8bd84eb81cf6c08cf6f27de2eb9a27930bfda.1524140413.git.anatoly.burakov@intel.com>

On Thu, Apr 19, 2018 at 01:26:29PM +0100, Anatoly Burakov wrote:
> The original implementation used flock() locks, but was later
> switched to using fcntl() locks for page locking, because
> fcntl() locks allow locking parts of a file, which is useful
> for single-file segments mode, where locking the entire file
> isn't as useful because we still need to grow and shrink it.
> 
> However, according to fcntl()'s Ubuntu manpage [1], semantics of
> fcntl() locks have a giant oversight:
> 
>   This interface follows the completely stupid semantics of System
>   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
>   locks associated with a file for a given process are removed
>   when any file descriptor for that file is closed by that process.
>   This semantic means that applications must be aware of any files
>   that a subroutine library may access.
> 
> Basically, closing *any* fd with an fcntl() lock (which we do because
> we don't want to leak fd's) will drop the lock completely.
> 
> So, in this commit, we will be reverting back to using flock() locks
> everywhere. However, that still leaves the problem of locking parts
> of a memseg list file in single file segments mode, and we will be
> solving it with creating separate lock files per each page, and
> tracking those with flock().
> 
> We will also be removing all of this tailq business and replacing it
> with a simple array - saving a few bytes is not worth the extra
> hassle of dealing with pointers and potential memory allocation
> failures. Also, remove the tailq lock since it is not needed - these
> fd lists are per-process, and within a given process, it is always
> only one thread handling access to hugetlbfs.
> 
> So, first one to allocate a segment will create a lockfile, and put
> a shared lock on it. When we're shrinking the page file, we will be
> trying to take out a write lock on that lockfile, which would fail if
> any other process is holding onto the lockfile as well. This way, we
> can know if we can shrink the segment file. Also, if no other locks
> are found in the lock list for a given memseg list, the memseg list
> fd is automatically closed.
> 
> One other thing to note is, according to flock() Ubuntu manpage [2],
> upgrading the lock from shared to exclusive is implemented by dropping
> and reacquiring the lock, which is not atomic and thus would have
> created race conditions. So, on attempting to perform operations in
> hugetlbfs, we will take out a writelock on hugetlbfs directory, so
> that only one process could perform hugetlbfs operations concurrently.
> 
> [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
> [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html
> 
> Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
> Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
> Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
> Fixes: 2a04139f66b4 ("eal: add single file segments option")
> Cc: anatoly.burakov@intel.com
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
<snip>
> +
> +static int get_lockfile(int list_idx, int seg_idx)

Minor nit:
Not sure about this name, since it returns an fd rather than a filename or
filepath. How about get_lock_fd().

> +{
> +	char path[PATH_MAX] = {0};
> +	int fd;
> +
> +	if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
> +		return -1;
> +	if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
> +		return -1;
> +
> +	fd = lock_fds[list_idx].fds[seg_idx];
> +	/* does this lock already exist? */
> +	if (fd >= 0)
> +		return fd;
> +
> +	eal_get_hugefile_lock_path(path, sizeof(path), list_idx, seg_idx);
> +
> +	fd = open(path, O_CREAT | O_RDWR, 0660);
> +	if (fd < 0) {
> +		RTE_LOG(ERR, EAL, "%s(): error creating lockfile '%s': %s\n",
> +			__func__, path, strerror(errno));
> +		return -1;
> +	}
> +	/* take out a read lock */
> +	if (lock(fd, LOCK_SH) != 1) {
> +		RTE_LOG(ERR, EAL, "%s(): failed to take out a readlock on '%s': %s\n",
> +			__func__, path, strerror(errno));
> +		close(fd);
> +		return -1;
> +	}
> +	/* store it for future reference */
> +	lock_fds[list_idx].fds[seg_idx] = fd;
> +	lock_fds[list_idx].count++;
> +	return fd;
> +}
> +
> +static int put_lockfile(int list_idx, int seg_idx)

This name doesn't really tell me much. I realise that "put" is the opposite
of "get", but if we get a file descriptor from the previous function, it's
not exactly something we need to "put back". Can you come up with a more
descriptive name here.

> +{
> +	int fd, ret;
> +
> +	if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
> +		return -1;
> +	if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
> +		return -1;
> +
> +	fd = lock_fds[list_idx].fds[seg_idx];
> +
> +	/* upgrade lock to exclusive to see if we can remove the lockfile */
> +	ret = lock(fd, LOCK_EX);
> +	if (ret == 1) {
> +		/* we've succeeded in taking exclusive lock, this lockfile may
> +		 * be removed.
> +		 */
> +		char path[PATH_MAX] = {0};
> +		eal_get_hugefile_lock_path(path, sizeof(path), list_idx,
> +				seg_idx);
> +		if (unlink(path)) {
> +			RTE_LOG(ERR, EAL, "%s(): error removing lockfile '%s': %s\n",
> +					__func__, path, strerror(errno));
> +		}
> +	}
> +	/* we don't want to leak the fd, so even if we fail to lock, close fd
> +	 * and remove it from list anyway.
> +	 */
> +	close(fd);
> +	lock_fds[list_idx].fds[seg_idx] = -1;
> +	lock_fds[list_idx].count--;
> +
> +	if (ret < 0)
> +		return -1;
> +	return 0;
> +}
> +
<snip>

next prev parent reply	other threads:[~2018-04-24 13:57 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-19 12:26 [dpdk-dev] [PATCH 0/2] Fix file locking in EAL memory Anatoly Burakov
2018-04-19 12:26 ` [dpdk-dev] [PATCH 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-24 14:06   ` Bruce Richardson
2018-04-19 12:26 ` [dpdk-dev] [PATCH 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov
2018-04-24 13:57   ` Bruce Richardson [this message]
2018-04-24 14:07   ` Bruce Richardson
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 0/2] Fix file locking in EAL memory Anatoly Burakov
2018-04-24 16:32   ` Stephen Hemminger
2018-04-24 17:25     ` Burakov, Anatoly
2018-04-24 20:05       ` Thomas Monjalon
2018-04-24 20:34         ` Stephen Hemminger
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-04-27 21:50     ` Thomas Monjalon
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-25 10:36   ` [dpdk-dev] [PATCH v3 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov
2018-04-28  9:38     ` Andrew Rybchenko
2018-04-30  8:29       ` Burakov, Anatoly
2018-04-30 11:31       ` Burakov, Anatoly
2018-04-30 11:51         ` Maxime Coquelin
2018-04-30 13:08         ` Andrew Rybchenko
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 1/2] mem: add memalloc init stage Anatoly Burakov
2018-04-24 15:54 ` [dpdk-dev] [PATCH v2 2/2] mem: revert to using flock() and add per-segment lockfiles Anatoly Burakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180424135700.GA135436@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).