From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 275E629D2 for ; Tue, 24 Apr 2018 15:57:05 +0200 (CEST) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Apr 2018 06:57:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,323,1520924400"; d="scan'208";a="48683619" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.51]) by fmsmga004.fm.intel.com with SMTP; 24 Apr 2018 06:57:02 -0700 Received: by (sSMTP sendmail emulation); Tue, 24 Apr 2018 14:57:01 +0100 Date: Tue, 24 Apr 2018 14:57:00 +0100 From: Bruce Richardson To: Anatoly Burakov Cc: dev@dpdk.org Message-ID: <20180424135700.GA135436@bricha3-MOBL.ger.corp.intel.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH 2/2] mem: revert to using flock() and add per-segment lockfiles X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Apr 2018 13:57:06 -0000 On Thu, Apr 19, 2018 at 01:26:29PM +0100, Anatoly Burakov wrote: > The original implementation used flock() locks, but was later > switched to using fcntl() locks for page locking, because > fcntl() locks allow locking parts of a file, which is useful > for single-file segments mode, where locking the entire file > isn't as useful because we still need to grow and shrink it. > > However, according to fcntl()'s Ubuntu manpage [1], semantics of > fcntl() locks have a giant oversight: > > This interface follows the completely stupid semantics of System > V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all > locks associated with a file for a given process are removed > when any file descriptor for that file is closed by that process. > This semantic means that applications must be aware of any files > that a subroutine library may access. > > Basically, closing *any* fd with an fcntl() lock (which we do because > we don't want to leak fd's) will drop the lock completely. > > So, in this commit, we will be reverting back to using flock() locks > everywhere. However, that still leaves the problem of locking parts > of a memseg list file in single file segments mode, and we will be > solving it with creating separate lock files per each page, and > tracking those with flock(). > > We will also be removing all of this tailq business and replacing it > with a simple array - saving a few bytes is not worth the extra > hassle of dealing with pointers and potential memory allocation > failures. Also, remove the tailq lock since it is not needed - these > fd lists are per-process, and within a given process, it is always > only one thread handling access to hugetlbfs. > > So, first one to allocate a segment will create a lockfile, and put > a shared lock on it. When we're shrinking the page file, we will be > trying to take out a write lock on that lockfile, which would fail if > any other process is holding onto the lockfile as well. This way, we > can know if we can shrink the segment file. Also, if no other locks > are found in the lock list for a given memseg list, the memseg list > fd is automatically closed. > > One other thing to note is, according to flock() Ubuntu manpage [2], > upgrading the lock from shared to exclusive is implemented by dropping > and reacquiring the lock, which is not atomic and thus would have > created race conditions. So, on attempting to perform operations in > hugetlbfs, we will take out a writelock on hugetlbfs directory, so > that only one process could perform hugetlbfs operations concurrently. > > [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html > [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html > > Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") > Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") > Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") > Fixes: 2a04139f66b4 ("eal: add single file segments option") > Cc: anatoly.burakov@intel.com > > Signed-off-by: Anatoly Burakov > --- > + > +static int get_lockfile(int list_idx, int seg_idx) Minor nit: Not sure about this name, since it returns an fd rather than a filename or filepath. How about get_lock_fd(). > +{ > + char path[PATH_MAX] = {0}; > + int fd; > + > + if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) > + return -1; > + if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) > + return -1; > + > + fd = lock_fds[list_idx].fds[seg_idx]; > + /* does this lock already exist? */ > + if (fd >= 0) > + return fd; > + > + eal_get_hugefile_lock_path(path, sizeof(path), list_idx, seg_idx); > + > + fd = open(path, O_CREAT | O_RDWR, 0660); > + if (fd < 0) { > + RTE_LOG(ERR, EAL, "%s(): error creating lockfile '%s': %s\n", > + __func__, path, strerror(errno)); > + return -1; > + } > + /* take out a read lock */ > + if (lock(fd, LOCK_SH) != 1) { > + RTE_LOG(ERR, EAL, "%s(): failed to take out a readlock on '%s': %s\n", > + __func__, path, strerror(errno)); > + close(fd); > + return -1; > + } > + /* store it for future reference */ > + lock_fds[list_idx].fds[seg_idx] = fd; > + lock_fds[list_idx].count++; > + return fd; > +} > + > +static int put_lockfile(int list_idx, int seg_idx) This name doesn't really tell me much. I realise that "put" is the opposite of "get", but if we get a file descriptor from the previous function, it's not exactly something we need to "put back". Can you come up with a more descriptive name here. > +{ > + int fd, ret; > + > + if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds)) > + return -1; > + if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len) > + return -1; > + > + fd = lock_fds[list_idx].fds[seg_idx]; > + > + /* upgrade lock to exclusive to see if we can remove the lockfile */ > + ret = lock(fd, LOCK_EX); > + if (ret == 1) { > + /* we've succeeded in taking exclusive lock, this lockfile may > + * be removed. > + */ > + char path[PATH_MAX] = {0}; > + eal_get_hugefile_lock_path(path, sizeof(path), list_idx, > + seg_idx); > + if (unlink(path)) { > + RTE_LOG(ERR, EAL, "%s(): error removing lockfile '%s': %s\n", > + __func__, path, strerror(errno)); > + } > + } > + /* we don't want to leak the fd, so even if we fail to lock, close fd > + * and remove it from list anyway. > + */ > + close(fd); > + lock_fds[list_idx].fds[seg_idx] = -1; > + lock_fds[list_idx].count--; > + > + if (ret < 0) > + return -1; > + return 0; > +} > +