* [dpdk-dev] [PATCH 1/8] fbarray: fix detach in noshconf mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 2/8] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
` (16 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In noshconf mode, no shared files are created, but we're still trying
to unlink them, resulting in detach/destroy failure even though it
should have succeeded. Fix it by exiting early in noshconf mode.
Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_fbarray.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 43caf3ced..ba6c4ae39 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -878,6 +878,10 @@ rte_fbarray_destroy(struct rte_fbarray *arr)
if (ret)
return ret;
+ /* with no shconf, there were never any files to begin with */
+ if (internal_config.no_shconf)
+ return 0;
+
/* try deleting the file */
eal_get_fbarray_path(path, sizeof(path), arr->name);
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 2/8] eal: don't allow legacy mode with in-memory mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 1/8] fbarray: fix detach in noshconf mode Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 3/8] mem: raise maximum fd limit unconditionally Anatoly Burakov
` (15 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In-memory mode was never meant to support legacy mode, because we
cannot sort anonymous pages anyway.
Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index dd5f97402..873099acc 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
"--"OPT_HUGE_UNLINK"\n");
return -1;
}
+ if (internal_cfg->legacy_mem &&
+ internal_cfg->in_memory) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
+ "with --"OPT_IN_MEMORY"\n");
+ return -1;
+ }
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 3/8] mem: raise maximum fd limit unconditionally
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 1/8] fbarray: fix detach in noshconf mode Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 2/8] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 4/8] memalloc: rename lock list to fd list Anatoly Burakov
` (14 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.
Fix this to raise the fd limit to maximum unconditionally.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..dfb537f59 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -17,6 +17,7 @@
#include <sys/stat.h>
#include <sys/queue.h>
#include <sys/file.h>
+#include <sys/resource.h>
#include <unistd.h>
#include <limits.h>
#include <sys/ioctl.h>
@@ -2204,6 +2205,25 @@ memseg_secondary_init(void)
int
rte_eal_memseg_init(void)
{
+ /* increase rlimit to maximum */
+ struct rlimit lim;
+
+ if (getrlimit(RLIMIT_NOFILE, &lim) == 0) {
+ /* set limit to maximum */
+ lim.rlim_cur = lim.rlim_max;
+
+ if (setrlimit(RLIMIT_NOFILE, &lim) < 0) {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n",
+ strerror(errno));
+ } else {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %"
+ PRIu64 "\n",
+ (uint64_t)lim.rlim_cur);
+ }
+ } else {
+ RTE_LOG(ERR, EAL, "Cannot get current resource limits\n");
+ }
+
return rte_eal_process_type() == RTE_PROC_PRIMARY ?
#ifndef RTE_ARCH_64
memseg_primary_init_32() :
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 4/8] memalloc: rename lock list to fd list
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (2 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 3/8] mem: raise maximum fd limit unconditionally Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 5/8] memalloc: track page fd's in non-single file mode Anatoly Burakov
` (13 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.
Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 ++++++++++++----------
1 file changed, 37 insertions(+), 29 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..14bc5dce9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -57,25 +57,33 @@ const int anonymous_hugepages_supported =
*/
static int fallocate_supported = -1; /* unknown */
-/* for single-file segments, we need some kind of mechanism to keep track of
+/*
+ * we have two modes - single file segments, and file-per-page mode.
+ *
+ * for single-file segments, we need some kind of mechanism to keep track of
* which hugepages can be freed back to the system, and which cannot. we cannot
* use flock() because they don't allow locking parts of a file, and we cannot
* use fcntl() due to issues with their semantics, so we will have to rely on a
- * bunch of lockfiles for each page.
+ * bunch of lockfiles for each page. so, we will use 'fds' array to keep track
+ * of per-page lockfiles. we will store the actual segment list fd in the
+ * 'memseg_list_fd' field.
+ *
+ * for file-per-page mode, each page will have its own fd, so 'memseg_list_fd'
+ * will be invalid (set to -1), and we'll use 'fds' to keep track of page fd's.
*
* we cannot know how many pages a system will have in advance, but we do know
* that they come in lists, and we know lengths of these lists. so, simply store
* a malloc'd array of fd's indexed by list and segment index.
*
* they will be initialized at startup, and filled as we allocate/deallocate
- * segments. also, use this to track memseg list proper fd.
+ * segments.
*/
static struct {
int *fds; /**< dynamically allocated array of segment lock fd's */
int memseg_list_fd; /**< memseg list fd */
int len; /**< total length of the array */
int count; /**< entries used in an array */
-} lock_fds[RTE_MAX_MEMSEG_LISTS];
+} fd_list[RTE_MAX_MEMSEG_LISTS];
/** local copy of a memory map, used to synchronize memory hotplug in MP */
static struct rte_memseg_list local_memsegs[RTE_MAX_MEMSEG_LISTS];
@@ -209,12 +217,12 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
char path[PATH_MAX] = {0};
int fd;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* does this lock already exist? */
if (fd >= 0)
return fd;
@@ -236,8 +244,8 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
return -1;
}
/* store it for future reference */
- lock_fds[list_idx].fds[seg_idx] = fd;
- lock_fds[list_idx].count++;
+ fd_list[list_idx].fds[seg_idx] = fd;
+ fd_list[list_idx].count++;
return fd;
}
@@ -245,12 +253,12 @@ static int unlock_segment(int list_idx, int seg_idx)
{
int fd, ret;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* upgrade lock to exclusive to see if we can remove the lockfile */
ret = lock(fd, LOCK_EX);
@@ -270,8 +278,8 @@ static int unlock_segment(int list_idx, int seg_idx)
* and remove it from list anyway.
*/
close(fd);
- lock_fds[list_idx].fds[seg_idx] = -1;
- lock_fds[list_idx].count--;
+ fd_list[list_idx].fds[seg_idx] = -1;
+ fd_list[list_idx].count--;
if (ret < 0)
return -1;
@@ -288,7 +296,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
- fd = lock_fds[list_idx].memseg_list_fd;
+ fd = fd_list[list_idx].memseg_list_fd;
if (fd < 0) {
fd = open(path, O_CREAT | O_RDWR, 0600);
@@ -304,7 +312,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
close(fd);
return -1;
}
- lock_fds[list_idx].memseg_list_fd = fd;
+ fd_list[list_idx].memseg_list_fd = fd;
}
} else {
/* create a hugepage file path */
@@ -410,9 +418,9 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* page file fd, so that one of the processes
* could then delete the file after shrinking.
*/
- if (ret < 1 && lock_fds[list_idx].count == 0) {
+ if (ret < 1 && fd_list[list_idx].count == 0) {
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
if (ret < 0) {
@@ -448,13 +456,13 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* more segments active in this segment list,
* and remove the file if there aren't.
*/
- if (lock_fds[list_idx].count == 0) {
+ if (fd_list[list_idx].count == 0) {
if (unlink(path))
RTE_LOG(ERR, EAL, "%s(): unlinking '%s' failed: %s\n",
__func__, path,
strerror(errno));
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
}
}
@@ -1319,7 +1327,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
+fd_list_create_walk(const struct rte_memseg_list *msl,
void *arg __rte_unused)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -1330,20 +1338,20 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
- /* ensure we have space to store lock fd per each possible segment */
+ /* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
if (data == NULL) {
- RTE_LOG(ERR, EAL, "Unable to allocate space for lock descriptors\n");
+ RTE_LOG(ERR, EAL, "Unable to allocate space for file descriptors\n");
return -1;
}
/* set all fd's as invalid */
for (i = 0; i < len; i++)
data[i] = -1;
- lock_fds[msl_idx].fds = data;
- lock_fds[msl_idx].len = len;
- lock_fds[msl_idx].count = 0;
- lock_fds[msl_idx].memseg_list_fd = -1;
+ fd_list[msl_idx].fds = data;
+ fd_list[msl_idx].len = len;
+ fd_list[msl_idx].count = 0;
+ fd_list[msl_idx].memseg_list_fd = -1;
return 0;
}
@@ -1355,9 +1363,9 @@ eal_memalloc_init(void)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
- /* initialize all of the lock fd lists */
+ /* initialize all of the fd lists */
if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(secondary_lock_list_create_walk, NULL))
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 5/8] memalloc: track page fd's in non-single file mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (3 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 4/8] memalloc: rename lock list to fd list Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 6/8] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
` (12 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.
We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 ++++++++++++----------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 14bc5dce9..7d536350e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -318,18 +318,24 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir,
list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx);
- fd = open(path, O_CREAT | O_RDWR, 0600);
+
+ fd = fd_list[list_idx].fds[seg_idx];
+
if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", __func__,
- strerror(errno));
- return -1;
- }
- /* take out a read lock */
- if (lock(fd, LOCK_SH) < 0) {
- RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
- __func__, strerror(errno));
- close(fd);
- return -1;
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ /* take out a read lock */
+ if (lock(fd, LOCK_SH) < 0) {
+ RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
+ __func__, strerror(errno));
+ close(fd);
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
}
}
return fd;
@@ -601,10 +607,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto mapped;
}
#endif
- /* for non-single file segments that aren't in-memory, we can close fd
- * here */
- if (!internal_config.single_file_segments && !internal_config.in_memory)
- close(fd);
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
@@ -634,7 +636,10 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n");
}
resized:
- /* in-memory mode will never be single-file-segments mode */
+ /* some codepaths will return negative fd, so exit early */
+ if (fd < 0)
+ return -1;
+
if (internal_config.single_file_segments) {
resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
alloc_sz, false);
@@ -646,6 +651,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
return -1;
}
@@ -700,6 +706,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
}
/* closing fd will drop the lock */
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
memset(ms, 0, sizeof(*ms));
@@ -1364,8 +1371,7 @@ eal_memalloc_init(void)
return -1;
/* initialize all of the fd lists */
- if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
- return -1;
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 6/8] memalloc: add EAL-internal API to get and set segment fd's
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (4 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 5/8] memalloc: track page fd's in non-single file mode Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 7/8] mem: add external API to retrieve page fd from EAL Anatoly Burakov
` (11 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
Enable setting and retrieving segment fd's internally.
For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.
Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.
Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.
Not supported on FreeBSD.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 +++++
lib/librte_eal/common/eal_common_memory.c | 8 +--
lib/librte_eal/common/eal_memalloc.h | 6 +++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +++++++++++++++++-----
lib/librte_eal/linuxapp/eal/eal_memory.c | 40 ++++++++++++---
5 files changed, 105 insertions(+), 21 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index f7f07abd6..a5fb09f71 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void)
return -1;
}
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ return -1;
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ return -1;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..034c2026a 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -294,7 +294,7 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int msl_idx, ms_idx;
+ int msl_idx, ms_idx, fd;
FILE *f = arg;
msl_idx = msl - mcfg->memsegs;
@@ -305,10 +305,11 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
if (ms_idx < 0)
return -1;
+ fd = eal_memalloc_get_seg_fd(msl_idx, ms_idx);
fprintf(f, "Segment %i-%i: IOVA:0x%"PRIx64", len:%zu, "
"virt:%p, socket_id:%"PRId32", "
"hugepage_sz:%"PRIu64", nchannel:%"PRIx32", "
- "nrank:%"PRIx32"\n",
+ "nrank:%"PRIx32" fd:%i\n",
msl_idx, ms_idx,
ms->iova,
ms->len,
@@ -316,7 +317,8 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
ms->socket_id,
ms->hugepage_sz,
ms->nchannel,
- ms->nrank);
+ ms->nrank,
+ fd);
return 0;
}
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index 36bb1a027..a46c69c72 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -76,6 +76,12 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id);
int
eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len);
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+
int
eal_memalloc_init(void);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 7d536350e..b820989e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1334,16 +1334,10 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-fd_list_create_walk(const struct rte_memseg_list *msl,
- void *arg __rte_unused)
+alloc_list(int list_idx, int len)
{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- unsigned int i, len;
- int msl_idx;
int *data;
-
- msl_idx = msl - mcfg->memsegs;
- len = msl->memseg_arr.len;
+ int i;
/* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
@@ -1355,14 +1349,56 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
for (i = 0; i < len; i++)
data[i] = -1;
- fd_list[msl_idx].fds = data;
- fd_list[msl_idx].len = len;
- fd_list[msl_idx].count = 0;
- fd_list[msl_idx].memseg_list_fd = -1;
+ fd_list[list_idx].fds = data;
+ fd_list[list_idx].len = len;
+ fd_list[list_idx].count = 0;
+ fd_list[list_idx].memseg_list_fd = -1;
return 0;
}
+static int
+fd_list_create_walk(const struct rte_memseg_list *msl,
+ void *arg __rte_unused)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int len;
+ int msl_idx;
+
+ msl_idx = msl - mcfg->memsegs;
+ len = msl->memseg_arr.len;
+
+ return alloc_list(msl_idx, len);
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* if list is not allocated, allocate it */
+ if (fd_list[list_idx].len == 0) {
+ int len = mcfg->memsegs[list_idx].memseg_arr.len;
+
+ if (alloc_list(list_idx, len) < 0)
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+
+ return 0;
+}
+
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ if (internal_config.single_file_segments)
+ return fd_list[list_idx].memseg_list_fd;
+ /* list not initialized */
+ if (fd_list[list_idx].len == 0)
+ return -1;
+ return fd_list[list_idx].fds[seg_idx];
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dfb537f59..ace14c877 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -772,7 +772,8 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end)
rte_fbarray_set_used(arr, ms_idx);
- close(fd);
+ /* store segment fd internally */
+ eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd);
}
RTE_LOG(DEBUG, EAL, "Allocated %" PRIu64 "M on socket %i\n",
(seg_len * page_sz) >> 20, socket_id);
@@ -1771,6 +1772,7 @@ getFileSize(int fd)
static int
eal_legacy_hugepage_attach(void)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct hugepage_file *hp = NULL;
unsigned int num_hp = 0;
unsigned int i = 0;
@@ -1814,6 +1816,9 @@ eal_legacy_hugepage_attach(void)
struct hugepage_file *hf = &hp[i];
size_t map_sz = hf->size;
void *map_addr = hf->final_va;
+ int msl_idx, ms_idx;
+ struct rte_memseg_list *msl;
+ struct rte_memseg *ms;
/* if size is zero, no more pages left */
if (map_sz == 0)
@@ -1831,25 +1836,48 @@ eal_legacy_hugepage_attach(void)
if (map_addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "Could not map %s: %s\n",
hf->filepath, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
/* set shared lock on the file. */
if (flock(fd, LOCK_SH) < 0) {
RTE_LOG(DEBUG, EAL, "%s(): Locking file failed: %s\n",
__func__, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
- close(fd);
+ /* find segment data */
+ msl = rte_mem_virt2memseg_list(map_addr);
+ if (msl == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg list\n",
+ __func__);
+ goto fd_error;
+ }
+ ms = rte_mem_virt2memseg(map_addr, msl);
+ if (ms == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg\n",
+ __func__);
+ goto fd_error;
+ }
+
+ msl_idx = msl - mcfg->memsegs;
+ ms_idx = rte_fbarray_find_idx(&msl->memseg_arr, ms);
+ if (ms_idx < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg idx\n",
+ __func__);
+ goto fd_error;
+ }
+
+ /* store segment fd internally */
+ eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd);
}
/* unmap the hugepage config file, since we are done using it */
munmap(hp, size);
close(fd_hugepage);
return 0;
+fd_error:
+ close(fd);
error:
/* map all segments into memory to make sure we get the addrs */
cur_seg = 0;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 7/8] mem: add external API to retrieve page fd from EAL
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (5 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 6/8] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-23 16:59 ` [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode Anatoly Burakov
` (10 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_memory.c | 47 ++++++++++++++++++++++
lib/librte_eal/common/include/rte_memory.h | 44 ++++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 2 +
3 files changed, 93 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 034c2026a..cd3805b46 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -550,6 +550,53 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
return ret;
}
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct rte_memseg_list *msl;
+ struct rte_fbarray *arr;
+ int msl_idx, seg_idx, fd;
+
+ if (ms == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ msl = rte_mem_virt2memseg_list(ms->addr);
+ if (msl == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ msl_idx = msl - mcfg->memsegs;
+ seg_idx = rte_fbarray_find_idx(arr, ms);
+
+ if (!rte_fbarray_is_used(arr, seg_idx)) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+
+ fd = eal_memalloc_get_seg_fd(msl_idx, seg_idx);
+ if (fd < 0)
+ rte_errno = ENOENT;
+ return fd;
+}
+
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int ret;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ ret = rte_memseg_get_fd_thread_unsafe(ms);
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..4d075b1cc 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -317,6 +317,50 @@ rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg);
int __rte_experimental
rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg);
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ * be used within memory-related callback functions.
+ *
+ * @note This returns the internal file descriptor - no operations should ever
+ * be performed on it, and it should be treated as read-only for all
+ * intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENOENT - ``ms`` pointed to an unused segment, or fd is not available
+ */
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms);
+
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ * from within memory-related callback functions.
+ *
+ * @note This returns the internal file descriptor - no operations should ever
+ * be performed on it, and it should be treated as read-only for all
+ * intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENOENT - ``ms`` pointed to an unused segment, or fd is not available
+ */
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms);
+
/**
* Dump the physical memory layout to a file.
*
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..e659e19d6 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -320,6 +320,8 @@ EXPERIMENTAL {
rte_mem_virt2memseg_list;
rte_memseg_contig_walk;
rte_memseg_contig_walk_thread_unsafe;
+ rte_memseg_get_fd;
+ rte_memseg_get_fd_thread_unsafe;
rte_memseg_list_walk;
rte_memseg_list_walk_thread_unsafe;
rte_memseg_walk;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (6 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 7/8] mem: add external API to retrieve page fd from EAL Anatoly Burakov
@ 2018-08-23 16:59 ` Anatoly Burakov
2018-08-24 4:39 ` Jerin Jacob
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (9 subsequent siblings)
17 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-08-23 16:59 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Enable using memfd-created segments if supported by the system.
This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.
The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.
We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
2 files changed, 235 insertions(+), 36 deletions(-)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 873099acc..ddd624110 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1384,10 +1384,10 @@ eal_check_common_options(struct internal_config *internal_cfg)
" is only supported in non-legacy memory mode\n");
}
if (internal_cfg->single_file_segments &&
- internal_cfg->hugepage_unlink) {
+ internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
- "not compatible with neither --"OPT_IN_MEMORY" nor "
- "--"OPT_HUGE_UNLINK"\n");
+ "not compatible with --"OPT_HUGE_UNLINK"\n");
return -1;
}
if (internal_cfg->legacy_mem &&
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b820989e9..50379136c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -51,6 +51,23 @@ const int anonymous_hugepages_supported =
#define RTE_MAP_HUGE_SHIFT 26
#endif
+/*
+ * we don't actually care if memfd itself is supported - we only need to check
+ * if memfd supports hugetlbfs, as that already implies memfd support.
+ *
+ * also, this is not a constant, because while we may be *compiled* with memfd
+ * hugetlbfs support, we might not be *running* on a system that supports memfd
+ * and/or memfd with hugetlbfs, so we need to be able to adjust this flag at
+ * runtime, and fall back to anonymous memory.
+ */
+int memfd_create_supported =
+#ifdef MFD_HUGETLB
+#define MEMFD_SUPPORTED
+ 1;
+#else
+ 0;
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -190,6 +207,31 @@ get_file_size(int fd)
return st.st_size;
}
+static inline uint32_t
+bsf64(uint64_t v)
+{
+ return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+ if (v == 0)
+ return 0;
+ v = rte_align64pow2(v);
+ return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ int log2 = log2_u64(page_sz);
+ return log2 << RTE_MAP_HUGE_SHIFT;
+}
+
/* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */
static int lock(int fd, int type)
{
@@ -286,12 +328,64 @@ static int unlock_segment(int list_idx, int seg_idx)
return 0;
}
+static int
+get_seg_memfd(struct hugepage_info *hi __rte_unused,
+ unsigned int list_idx __rte_unused,
+ unsigned int seg_idx __rte_unused)
+{
+#ifdef MEMFD_SUPPORTED
+ int fd;
+ char segname[250]; /* as per manpage, limit is 249 bytes plus null */
+
+ if (internal_config.single_file_segments) {
+ fd = fd_list[list_idx].memseg_list_fd;
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i", list_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].memseg_list_fd = fd;
+ }
+ } else {
+ fd = fd_list[list_idx].fds[seg_idx];
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i-%i",
+ list_idx, seg_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+ }
+ }
+ return fd;
+#endif
+ return -1;
+}
+
static int
get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
unsigned int list_idx, unsigned int seg_idx)
{
int fd;
+ /* for in-memory mode, we only make it here when we're sure we support
+ * memfd, and this is a special case.
+ */
+ if (internal_config.in_memory)
+ return get_seg_memfd(hi, list_idx, seg_idx);
+
if (internal_config.single_file_segments) {
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
@@ -346,6 +440,33 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
uint64_t fa_offset, uint64_t page_sz, bool grow)
{
bool again = false;
+
+ /* in-memory mode is a special case, because we don't need to perform
+ * any locking, and we can be sure that fallocate() is supported.
+ */
+ if (internal_config.in_memory) {
+ int flags = grow ? 0 : FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE;
+ int ret;
+
+ /* grow or shrink the file */
+ ret = fallocate(fd, flags, fa_offset, page_sz);
+
+ if (ret < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): fallocate() failed: %s\n",
+ __func__,
+ strerror(errno));
+ return -1;
+ }
+ /* increase/decrease total segment count */
+ fd_list[list_idx].count += (grow ? 1 : -1);
+ if (!grow && fd_list[list_idx].count == 0) {
+ close(fd_list[list_idx].memseg_list_fd);
+ fd_list[list_idx].memseg_list_fd = -1;
+ }
+ return 0;
+ }
+
do {
if (fallocate_supported == 0) {
/* we cannot deallocate memory if fallocate() is not
@@ -495,26 +616,34 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
void *new_addr;
alloc_sz = hi->hugepage_sz;
- if (!internal_config.single_file_segments &&
- internal_config.in_memory &&
- anonymous_hugepages_supported) {
- int log2, flags;
-
- log2 = rte_log2_u32(alloc_sz);
- /* as per mmap() manpage, all page sizes are log2 of page size
- * shifted by MAP_HUGE_SHIFT
- */
- flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+
+ /* these are checked at init, but code analyzers don't know that */
+ if (internal_config.in_memory && !anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Anonymous hugepages not supported, in-memory mode cannot allocate memory\n");
+ return -1;
+ }
+ if (internal_config.in_memory && !memfd_create_supported &&
+ internal_config.single_file_segments) {
+ RTE_LOG(ERR, EAL, "Single-file segments are not supported without memfd support\n");
+ return -1;
+ }
+
+ /* in-memory without memfd is a special case */
+ int mmap_flags;
+
+ if (internal_config.in_memory && !memfd_create_supported) {
+ int pagesz_flag, flags;
+
+ pagesz_flag = pagesz_flags(alloc_sz);
+ flags = pagesz_flag | MAP_HUGETLB | MAP_FIXED |
MAP_PRIVATE | MAP_ANONYMOUS;
fd = -1;
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
+ mmap_flags = flags;
- /* single-file segments codepath will never be active because
- * in-memory mode is incompatible with it and it's stopped at
- * EAL initialization stage, however the compiler doesn't know
- * that and complains about map_offset being used uninitialized
- * on failure codepaths while having in-memory mode enabled. so,
- * assign a value here.
+ /* single-file segments codepath will never be active
+ * here because in-memory mode is incompatible with the
+ * fallback path, and it's stopped at EAL initialization
+ * stage.
*/
map_offset = 0;
} else {
@@ -538,7 +667,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
- if (internal_config.hugepage_unlink) {
+ if (internal_config.hugepage_unlink &&
+ !internal_config.in_memory) {
if (unlink(path)) {
RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
__func__, strerror(errno));
@@ -546,16 +676,16 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
}
}
}
-
- /*
- * map the segment, and populate page tables, the kernel fills
- * this segment with zeros if it's a new page.
- */
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
- map_offset);
+ mmap_flags = MAP_SHARED | MAP_POPULATE | MAP_FIXED;
}
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, mmap_flags, fd,
+ map_offset);
+
if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
strerror(errno));
@@ -662,7 +792,8 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
{
uint64_t map_offset;
char path[PATH_MAX];
- int fd, ret;
+ int fd, ret = 0;
+ bool exit_early;
/* erase page data */
memset(ms->addr, 0, ms->len);
@@ -674,8 +805,17 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}
+ exit_early = false;
+
+ /* if we're using anonymous hugepages, nothing to be done */
+ if (internal_config.in_memory && !memfd_create_supported)
+ exit_early = true;
+
/* if we've already unlinked the page, nothing needs to be done */
- if (internal_config.hugepage_unlink) {
+ if (!internal_config.in_memory && internal_config.hugepage_unlink)
+ exit_early = true;
+
+ if (exit_early) {
memset(ms, 0, sizeof(*ms));
return 0;
}
@@ -698,11 +838,13 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
/* if we're able to take out a write lock, we're the last one
* holding onto this page.
*/
- ret = lock(fd, LOCK_EX);
- if (ret >= 0) {
- /* no one else is using this page */
- if (ret == 1)
- unlink(path);
+ if (!internal_config.in_memory) {
+ ret = lock(fd, LOCK_EX);
+ if (ret >= 0) {
+ /* no one else is using this page */
+ if (ret == 1)
+ unlink(path);
+ }
}
/* closing fd will drop the lock */
close(fd);
@@ -1399,12 +1541,69 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
return fd_list[list_idx].fds[seg_idx];
}
+static int
+test_memfd_create(void)
+{
+#ifdef MEMFD_SUPPORTED
+ unsigned int i;
+ for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
+ uint64_t pagesz = internal_config.hugepage_info[i].hugepage_sz;
+ int pagesz_flag = pagesz_flags(pagesz);
+ int flags;
+
+ flags = pagesz_flag | MFD_HUGETLB;
+ int fd = memfd_create("test", flags);
+ if (fd < 0) {
+ /* we failed - let memalloc know this isn't working */
+ if (errno == EINVAL) {
+ memfd_create_supported = 0;
+ return 0; /* not supported */
+ }
+
+ /* we got other error - something's wrong */
+ return -1; /* error */
+ }
+ close(fd);
+ return 1; /* supported */
+ }
+#endif
+ return 0; /* not supported */
+}
+
int
eal_memalloc_init(void)
{
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
+ internal_config.in_memory) {
+ int mfd_res = test_memfd_create();
+
+ if (mfd_res < 0) {
+ RTE_LOG(ERR, EAL, "Unable to check if memfd is supported\n");
+ return -1;
+ }
+ if (mfd_res == 1)
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ else
+ RTE_LOG(INFO, EAL, "Using memfd is not supported, falling back to anonymous hugepages\n");
+
+ /* we only support single-file segments mode with in-memory mode
+ * if we support hugetlbfs with memfd_create. this code will
+ * test if we do.
+ */
+ if (internal_config.single_file_segments &&
+ mfd_res != 1) {
+ RTE_LOG(ERR, EAL, "Single-file segments mode cannot be used without memfd support\n");
+ return -1;
+ }
+ /* this cannot ever happen but better safe than sorry */
+ if (!anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Using anonymous memory is not supported\n");
+ return -1;
+ }
+ }
/* initialize all of the fd lists */
if (rte_memseg_list_walk(fd_list_create_walk, NULL))
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode
2018-08-23 16:59 ` [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode Anatoly Burakov
@ 2018-08-24 4:39 ` Jerin Jacob
2018-08-24 8:56 ` Burakov, Anatoly
0 siblings, 1 reply; 45+ messages in thread
From: Jerin Jacob @ 2018-08-24 4:39 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
-----Original Message-----
> Date: Thu, 23 Aug 2018 17:59:55 +0100
> From: Anatoly Burakov <anatoly.burakov@intel.com>
> To: dev@dpdk.org
> CC: tiwei.bie@intel.com, ray.kinsella@intel.com, zhihong.wang@intel.com,
> maxime.coquelin@redhat.com, kuralamudhan.ramakrishnan@intel.com
> Subject: [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for
> in-memory mode
> X-Mailer: git-send-email 1.7.0.7
>
>
> Enable using memfd-created segments if supported by the system.
>
> This will allow having real fd's for pages but without hugetlbfs
> mounts, which will enable in-memory mode to be used with virtio.
>
> The implementation is mostly piggy-backing on existing real-fd
> code, except that we no longer need to unlink any files or track
> per-page locks in single-file segments mode, because in-memory
> mode does not support secondary processes anyway.
>
> We move some checks from EAL command-line parsing code to memalloc
> because it is now possible to use single-file segments mode with
> in-memory mode, but only if memfd is supported.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/common/eal_common_options.c | 6 +-
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
> 2 files changed, 235 insertions(+), 36 deletions(-)
>
>
> +static inline uint32_t
> +bsf64(uint64_t v)
> +{
> + return (uint32_t)__builtin_ctzll(v);
> +}
> +
> +static inline uint32_t
> +log2_u64(uint64_t v)
> +{
> + if (v == 0)
> + return 0;
> + v = rte_align64pow2(v);
> + return bsf64(v);
> +}
> +
Can we move this to lib/librte_eal/common/include/rte_common.h?
It has already rte_log2_u32()
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode
2018-08-24 4:39 ` Jerin Jacob
@ 2018-08-24 8:56 ` Burakov, Anatoly
0 siblings, 0 replies; 45+ messages in thread
From: Burakov, Anatoly @ 2018-08-24 8:56 UTC (permalink / raw)
To: Jerin Jacob
Cc: dev, tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
On 24-Aug-18 5:39 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Thu, 23 Aug 2018 17:59:55 +0100
>> From: Anatoly Burakov <anatoly.burakov@intel.com>
>> To: dev@dpdk.org
>> CC: tiwei.bie@intel.com, ray.kinsella@intel.com, zhihong.wang@intel.com,
>> maxime.coquelin@redhat.com, kuralamudhan.ramakrishnan@intel.com
>> Subject: [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for
>> in-memory mode
>> X-Mailer: git-send-email 1.7.0.7
>>
>>
>> Enable using memfd-created segments if supported by the system.
>>
>> This will allow having real fd's for pages but without hugetlbfs
>> mounts, which will enable in-memory mode to be used with virtio.
>>
>> The implementation is mostly piggy-backing on existing real-fd
>> code, except that we no longer need to unlink any files or track
>> per-page locks in single-file segments mode, because in-memory
>> mode does not support secondary processes anyway.
>>
>> We move some checks from EAL command-line parsing code to memalloc
>> because it is now possible to use single-file segments mode with
>> in-memory mode, but only if memfd is supported.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> lib/librte_eal/common/eal_common_options.c | 6 +-
>> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
>> 2 files changed, 235 insertions(+), 36 deletions(-)
>>
>>
>> +static inline uint32_t
>> +bsf64(uint64_t v)
>> +{
>> + return (uint32_t)__builtin_ctzll(v);
>> +}
>> +
>> +static inline uint32_t
>> +log2_u64(uint64_t v)
>> +{
>> + if (v == 0)
>> + return 0;
>> + v = rte_align64pow2(v);
>> + return bsf64(v);
>> +}
>> +
>
> Can we move this to lib/librte_eal/common/include/rte_common.h?
> It has already rte_log2_u32()
>
>
We can (and i will), unfortunately we cannot do it in this release, due
to there already being an rte_bsf64 in rte_bitmap.h, and
changing/renaming that API will require a deprecation notice. So, this
change will be for 19.02.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (7 preceding siblings ...)
2018-08-23 16:59 ` [dpdk-dev] [PATCH 8/8] mem: support using memfd segments for in-memory mode Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
` (9 more replies)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 1/9] fbarray: fix detach in noshconf mode Anatoly Burakov
` (8 subsequent siblings)
17 siblings, 10 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
This patchset further improves DPDK support for running
without hugetlbfs mountpoints.
First of all, it enables using memfd-created hugepages in
in-memory mode. This way, instead of anonymous hugepages, we
can get proper fd's for each page (or for the entire segment,
if we're using single-file segments). Memfd will be used
automatically if support for it was compiled and is available
at runtime, however DPDK will fall back to using anonymous
hugepages if such support is not available.
The other thing this patchset does is exposing segment fd's
through an external API. There is a lot of ugliness in current
virtio/vhost code that deals with finding hugepage files
through procfs, while all virtio really needs are fd's
referring to the pages, and their offsets. Using this API,
virtio will be able to access segment fd's directly, without
the procfs magic.
As a bonus, because we enabled use of memfd (given that
sufficiently recent kernel version is used), once virtio
support for getting segment fd's using the new API is
implemented, virtio will also be able to work without having
hugetlbfs mountpoints.
Virtio support is not provided in this patchset, coordination
and implementation of it is up to virtio maintainers.
Once virtio support for this is in place, DPDK will have one
less barrier for adoption in container space.
v1->v2:
- Added a new API to retrieve segment offset into its fd
Anatoly Burakov (9):
fbarray: fix detach in noshconf mode
eal: don't allow legacy mode with in-memory mode
mem: raise maximum fd limit unconditionally
memalloc: rename lock list to fd list
memalloc: track page fd's in non-single file mode
memalloc: add EAL-internal API to get and set segment fd's
mem: add external API to retrieve page fd from EAL
mem: allow querying offset into segment fd
mem: support using memfd segments for in-memory mode
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 19 +
lib/librte_eal/common/eal_common_fbarray.c | 4 +
lib/librte_eal/common/eal_common_memory.c | 107 ++++-
lib/librte_eal/common/eal_common_options.c | 12 +-
lib/librte_eal/common/eal_memalloc.h | 11 +
lib/librte_eal/common/include/rte_memory.h | 97 +++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 449 +++++++++++++++++----
lib/librte_eal/linuxapp/eal/eal_memory.c | 64 ++-
lib/librte_eal/rte_eal_version.map | 4 +
9 files changed, 669 insertions(+), 98 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 0/9] Improve running DPDK without hugetlbfs mounpoint
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-19 13:04 ` Thomas Monjalon
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode Anatoly Burakov
` (8 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
This patchset further improves DPDK support for running
without hugetlbfs mountpoints.
First of all, it enables using memfd-created hugepages in
in-memory mode. This way, instead of anonymous hugepages, we
can get proper fd's for each page (or for the entire segment,
if we're using single-file segments). Memfd will be used
automatically if support for it was compiled and is available
at runtime, however DPDK will fall back to using anonymous
hugepages if such support is not available.
The other thing this patchset does is exposing segment fd's
through an external API. There is a lot of ugliness in current
virtio/vhost code that deals with finding hugepage files
through procfs, while all virtio really needs are fd's
referring to the pages, and their offsets. Using this API,
virtio will be able to access segment fd's directly, without
the procfs magic.
As a bonus, because we enabled use of memfd (given that
sufficiently recent kernel version is used), once virtio
support for getting segment fd's using the new API is
implemented, virtio will also be able to work without having
hugetlbfs mountpoints.
Virtio support is not provided in this patchset, coordination
and implementation of it is up to virtio maintainers.
Once virtio support for this is in place, DPDK will have one
less barrier for adoption in container space.
v2->v3:
- Fixed single file segments mode for fd offsets
v1->v2:
- Added a new API to retrieve segment offset into its fd
Anatoly Burakov (9):
fbarray: fix detach in noshconf mode
eal: don't allow legacy mode with in-memory mode
mem: raise maximum fd limit unconditionally
memalloc: rename lock list to fd list
memalloc: track page fd's in non-single file mode
memalloc: add EAL-internal API to get and set segment fd's
mem: add external API to retrieve page fd from EAL
mem: allow querying offset into segment fd
mem: support using memfd segments for in-memory mode
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 19 +
lib/librte_eal/common/eal_common_fbarray.c | 4 +
lib/librte_eal/common/eal_common_memory.c | 107 ++++-
lib/librte_eal/common/eal_common_options.c | 12 +-
lib/librte_eal/common/eal_memalloc.h | 11 +
lib/librte_eal/common/include/rte_memory.h | 97 +++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 452 +++++++++++++++++----
lib/librte_eal/linuxapp/eal/eal_memory.c | 64 ++-
lib/librte_eal/rte_eal_version.map | 4 +
9 files changed, 672 insertions(+), 98 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/9] Improve running DPDK without hugetlbfs mounpoint
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-09-19 13:04 ` Thomas Monjalon
2018-09-19 13:55 ` Burakov, Anatoly
0 siblings, 1 reply; 45+ messages in thread
From: Thomas Monjalon @ 2018-09-19 13:04 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
> Anatoly Burakov (9):
> fbarray: fix detach in noshconf mode
> eal: don't allow legacy mode with in-memory mode
> mem: raise maximum fd limit unconditionally
> memalloc: rename lock list to fd list
> memalloc: track page fd's in non-single file mode
> memalloc: add EAL-internal API to get and set segment fd's
> mem: add external API to retrieve page fd from EAL
> mem: allow querying offset into segment fd
> mem: support using memfd segments for in-memory mode
Applied, tha nks
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/9] Improve running DPDK without hugetlbfs mounpoint
2018-09-19 13:04 ` Thomas Monjalon
@ 2018-09-19 13:55 ` Burakov, Anatoly
2018-09-19 14:15 ` Thomas Monjalon
0 siblings, 1 reply; 45+ messages in thread
From: Burakov, Anatoly @ 2018-09-19 13:55 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
On 19-Sep-18 2:04 PM, Thomas Monjalon wrote:
>> Anatoly Burakov (9):
>> fbarray: fix detach in noshconf mode
>> eal: don't allow legacy mode with in-memory mode
>> mem: raise maximum fd limit unconditionally
>> memalloc: rename lock list to fd list
>> memalloc: track page fd's in non-single file mode
>> memalloc: add EAL-internal API to get and set segment fd's
>> mem: add external API to retrieve page fd from EAL
>> mem: allow querying offset into segment fd
>> mem: support using memfd segments for in-memory mode
>
> Applied, tha nks
>
You're welc ome!
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/9] Improve running DPDK without hugetlbfs mounpoint
2018-09-19 13:55 ` Burakov, Anatoly
@ 2018-09-19 14:15 ` Thomas Monjalon
0 siblings, 0 replies; 45+ messages in thread
From: Thomas Monjalon @ 2018-09-19 14:15 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
19/09/2018 15:55, Burakov, Anatoly:
> On 19-Sep-18 2:04 PM, Thomas Monjalon wrote:
> >> Anatoly Burakov (9):
> >> fbarray: fix detach in noshconf mode
> >> eal: don't allow legacy mode with in-memory mode
> >> mem: raise maximum fd limit unconditionally
> >> memalloc: rename lock list to fd list
> >> memalloc: track page fd's in non-single file mode
> >> memalloc: add EAL-internal API to get and set segment fd's
> >> mem: add external API to retrieve page fd from EAL
> >> mem: allow querying offset into segment fd
> >> mem: support using memfd segments for in-memory mode
> >
> > Applied, tha nks
> >
>
> You're welc ome!
On a french keyboard, "tab" is the key next to "a", so I may forgive myself.
But looking at "c" and "o" layout, I think your fingers are way bigger than mines :)
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-13 13:00 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
` (7 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In noshconf mode, no shared files are created, but we're still trying
to unlink them, resulting in detach/destroy failure even though it
should have succeeded. Fix it by exiting early in noshconf mode.
Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_fbarray.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 43caf3ced..ba6c4ae39 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -878,6 +878,10 @@ rte_fbarray_destroy(struct rte_fbarray *arr)
if (ret)
return ret;
+ /* with no shconf, there were never any files to begin with */
+ if (internal_config.no_shconf)
+ return 0;
+
/* try deleting the file */
eal_get_fbarray_path(path, sizeof(path), arr->name);
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode Anatoly Burakov
@ 2018-09-13 13:00 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-13 13:00 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan, stable
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> In noshconf mode, no shared files are created, but we're still trying
> to unlink them, resulting in detach/destroy failure even though it
> should have succeeded. Fix it by exiting early in noshconf mode.
>
> Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/common/eal_common_fbarray.c | 4 ++++
> 1 file changed, 4 insertions(+)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 1/9] fbarray: fix detach in noshconf mode Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-13 13:06 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally Anatoly Burakov
` (6 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In-memory mode was never meant to support legacy mode, because we
cannot sort anonymous pages anyway.
Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index dd5f97402..873099acc 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
"--"OPT_HUGE_UNLINK"\n");
return -1;
}
+ if (internal_cfg->legacy_mem &&
+ internal_cfg->in_memory) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
+ "with --"OPT_IN_MEMORY"\n");
+ return -1;
+ }
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
@ 2018-09-13 13:06 ` Maxime Coquelin
2018-09-17 9:49 ` Burakov, Anatoly
0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-13 13:06 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan, stable
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> In-memory mode was never meant to support legacy mode, because we
> cannot sort anonymous pages anyway.
>
> Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/common/eal_common_options.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
> index dd5f97402..873099acc 100644
> --- a/lib/librte_eal/common/eal_common_options.c
> +++ b/lib/librte_eal/common/eal_common_options.c
> @@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
> "--"OPT_HUGE_UNLINK"\n");
> return -1;
> }
> + if (internal_cfg->legacy_mem &&
> + internal_cfg->in_memory) {
> + RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
> + "with --"OPT_IN_MEMORY"\n");
This is a general comment, as it is consistent with the style of the
file. I generally prefer not splitting error strings into multiple lines
even if it is longer than 80 chars, because it makes grepping for the
error string more difficult.
> + return -1;
> + }
>
> return 0;
> }
>
Other than that:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode
2018-09-13 13:06 ` Maxime Coquelin
@ 2018-09-17 9:49 ` Burakov, Anatoly
0 siblings, 0 replies; 45+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 9:49 UTC (permalink / raw)
To: Maxime Coquelin, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan, stable
On 13-Sep-18 2:06 PM, Maxime Coquelin wrote:
>
>
> On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
>> In-memory mode was never meant to support legacy mode, because we
>> cannot sort anonymous pages anyway.
>>
>> Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> lib/librte_eal/common/eal_common_options.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/lib/librte_eal/common/eal_common_options.c
>> b/lib/librte_eal/common/eal_common_options.c
>> index dd5f97402..873099acc 100644
>> --- a/lib/librte_eal/common/eal_common_options.c
>> +++ b/lib/librte_eal/common/eal_common_options.c
>> @@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config
>> *internal_cfg)
>> "--"OPT_HUGE_UNLINK"\n");
>> return -1;
>> }
>> + if (internal_cfg->legacy_mem &&
>> + internal_cfg->in_memory) {
>> + RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
>> + "with --"OPT_IN_MEMORY"\n");
>
> This is a general comment, as it is consistent with the style of the
> file. I generally prefer not splitting error strings into multiple lines
> even if it is longer than 80 chars, because it makes grepping for the
> error string more difficult.
I agree in general, however in this particular case the string is
ungreppable (it is a word now!) anyway because it's split into a few pieces.
>
>> + return -1;
>> + }
>> return 0;
>> }
>>
>
> Other than that:
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>
> Thanks,
> Maxime
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (2 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 2/9] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-13 13:12 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list Anatoly Burakov
` (5 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.
Fix this to raise the fd limit to maximum unconditionally.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..dfb537f59 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -17,6 +17,7 @@
#include <sys/stat.h>
#include <sys/queue.h>
#include <sys/file.h>
+#include <sys/resource.h>
#include <unistd.h>
#include <limits.h>
#include <sys/ioctl.h>
@@ -2204,6 +2205,25 @@ memseg_secondary_init(void)
int
rte_eal_memseg_init(void)
{
+ /* increase rlimit to maximum */
+ struct rlimit lim;
+
+ if (getrlimit(RLIMIT_NOFILE, &lim) == 0) {
+ /* set limit to maximum */
+ lim.rlim_cur = lim.rlim_max;
+
+ if (setrlimit(RLIMIT_NOFILE, &lim) < 0) {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n",
+ strerror(errno));
+ } else {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %"
+ PRIu64 "\n",
+ (uint64_t)lim.rlim_cur);
+ }
+ } else {
+ RTE_LOG(ERR, EAL, "Cannot get current resource limits\n");
+ }
+
return rte_eal_process_type() == RTE_PROC_PRIMARY ?
#ifndef RTE_ARCH_64
memseg_primary_init_32() :
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally Anatoly Burakov
@ 2018-09-13 13:12 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-13 13:12 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Previously, when we allocated hugepages, we closed the fd's corresponding
> to them after we've done our mappings. Since we did mmap(), we didn't
> actually lose the reference, but file descriptors used for mmap() do not
> count against the fd limit. Since we are going to store all of our fd's,
> we will hit the fd limit much more often when using smaller page sizes.
>
> Fix this to raise the fd limit to maximum unconditionally.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index dbf19499e..dfb537f59 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -17,6 +17,7 @@
> #include <sys/stat.h>
> #include <sys/queue.h>
> #include <sys/file.h>
> +#include <sys/resource.h>
> #include <unistd.h>
> #include <limits.h>
> #include <sys/ioctl.h>
> @@ -2204,6 +2205,25 @@ memseg_secondary_init(void)
> int
> rte_eal_memseg_init(void)
> {
> + /* increase rlimit to maximum */
> + struct rlimit lim;
> +
> + if (getrlimit(RLIMIT_NOFILE, &lim) == 0) {
> + /* set limit to maximum */
> + lim.rlim_cur = lim.rlim_max;
> +
> + if (setrlimit(RLIMIT_NOFILE, &lim) < 0) {
> + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n",
> + strerror(errno));
> + } else {
> + RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %"
> + PRIu64 "\n",
> + (uint64_t)lim.rlim_cur);
> + }
> + } else {
> + RTE_LOG(ERR, EAL, "Cannot get current resource limits\n");
> + }
> +
> return rte_eal_process_type() == RTE_PROC_PRIMARY ?
> #ifndef RTE_ARCH_64
> memseg_primary_init_32() :
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (3 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 3/9] mem: raise maximum fd limit unconditionally Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-13 15:19 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode Anatoly Burakov
` (4 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.
Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 ++++++++++++----------
1 file changed, 37 insertions(+), 29 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..14bc5dce9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -57,25 +57,33 @@ const int anonymous_hugepages_supported =
*/
static int fallocate_supported = -1; /* unknown */
-/* for single-file segments, we need some kind of mechanism to keep track of
+/*
+ * we have two modes - single file segments, and file-per-page mode.
+ *
+ * for single-file segments, we need some kind of mechanism to keep track of
* which hugepages can be freed back to the system, and which cannot. we cannot
* use flock() because they don't allow locking parts of a file, and we cannot
* use fcntl() due to issues with their semantics, so we will have to rely on a
- * bunch of lockfiles for each page.
+ * bunch of lockfiles for each page. so, we will use 'fds' array to keep track
+ * of per-page lockfiles. we will store the actual segment list fd in the
+ * 'memseg_list_fd' field.
+ *
+ * for file-per-page mode, each page will have its own fd, so 'memseg_list_fd'
+ * will be invalid (set to -1), and we'll use 'fds' to keep track of page fd's.
*
* we cannot know how many pages a system will have in advance, but we do know
* that they come in lists, and we know lengths of these lists. so, simply store
* a malloc'd array of fd's indexed by list and segment index.
*
* they will be initialized at startup, and filled as we allocate/deallocate
- * segments. also, use this to track memseg list proper fd.
+ * segments.
*/
static struct {
int *fds; /**< dynamically allocated array of segment lock fd's */
int memseg_list_fd; /**< memseg list fd */
int len; /**< total length of the array */
int count; /**< entries used in an array */
-} lock_fds[RTE_MAX_MEMSEG_LISTS];
+} fd_list[RTE_MAX_MEMSEG_LISTS];
/** local copy of a memory map, used to synchronize memory hotplug in MP */
static struct rte_memseg_list local_memsegs[RTE_MAX_MEMSEG_LISTS];
@@ -209,12 +217,12 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
char path[PATH_MAX] = {0};
int fd;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* does this lock already exist? */
if (fd >= 0)
return fd;
@@ -236,8 +244,8 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
return -1;
}
/* store it for future reference */
- lock_fds[list_idx].fds[seg_idx] = fd;
- lock_fds[list_idx].count++;
+ fd_list[list_idx].fds[seg_idx] = fd;
+ fd_list[list_idx].count++;
return fd;
}
@@ -245,12 +253,12 @@ static int unlock_segment(int list_idx, int seg_idx)
{
int fd, ret;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* upgrade lock to exclusive to see if we can remove the lockfile */
ret = lock(fd, LOCK_EX);
@@ -270,8 +278,8 @@ static int unlock_segment(int list_idx, int seg_idx)
* and remove it from list anyway.
*/
close(fd);
- lock_fds[list_idx].fds[seg_idx] = -1;
- lock_fds[list_idx].count--;
+ fd_list[list_idx].fds[seg_idx] = -1;
+ fd_list[list_idx].count--;
if (ret < 0)
return -1;
@@ -288,7 +296,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
- fd = lock_fds[list_idx].memseg_list_fd;
+ fd = fd_list[list_idx].memseg_list_fd;
if (fd < 0) {
fd = open(path, O_CREAT | O_RDWR, 0600);
@@ -304,7 +312,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
close(fd);
return -1;
}
- lock_fds[list_idx].memseg_list_fd = fd;
+ fd_list[list_idx].memseg_list_fd = fd;
}
} else {
/* create a hugepage file path */
@@ -410,9 +418,9 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* page file fd, so that one of the processes
* could then delete the file after shrinking.
*/
- if (ret < 1 && lock_fds[list_idx].count == 0) {
+ if (ret < 1 && fd_list[list_idx].count == 0) {
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
if (ret < 0) {
@@ -448,13 +456,13 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* more segments active in this segment list,
* and remove the file if there aren't.
*/
- if (lock_fds[list_idx].count == 0) {
+ if (fd_list[list_idx].count == 0) {
if (unlink(path))
RTE_LOG(ERR, EAL, "%s(): unlinking '%s' failed: %s\n",
__func__, path,
strerror(errno));
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
}
}
@@ -1319,7 +1327,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
+fd_list_create_walk(const struct rte_memseg_list *msl,
void *arg __rte_unused)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -1330,20 +1338,20 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
- /* ensure we have space to store lock fd per each possible segment */
+ /* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
if (data == NULL) {
- RTE_LOG(ERR, EAL, "Unable to allocate space for lock descriptors\n");
+ RTE_LOG(ERR, EAL, "Unable to allocate space for file descriptors\n");
return -1;
}
/* set all fd's as invalid */
for (i = 0; i < len; i++)
data[i] = -1;
- lock_fds[msl_idx].fds = data;
- lock_fds[msl_idx].len = len;
- lock_fds[msl_idx].count = 0;
- lock_fds[msl_idx].memseg_list_fd = -1;
+ fd_list[msl_idx].fds = data;
+ fd_list[msl_idx].len = len;
+ fd_list[msl_idx].count = 0;
+ fd_list[msl_idx].memseg_list_fd = -1;
return 0;
}
@@ -1355,9 +1363,9 @@ eal_memalloc_init(void)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
- /* initialize all of the lock fd lists */
+ /* initialize all of the fd lists */
if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(secondary_lock_list_create_walk, NULL))
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list Anatoly Burakov
@ 2018-09-13 15:19 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-13 15:19 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Previously, we were only using lock lists to store per-page lock fd's
> because we cannot use modern fcntl() file description locks to lock
> parts of the page in single file segments mode.
>
> Now, we will be using this list to store either lock fd's (along with
> memseg list fd) in single file segments mode, or per-page fd's (and set
> memseg list fd to -1), so rename the list accordingly.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 ++++++++++++----------
> 1 file changed, 37 insertions(+), 29 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (4 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 4/9] memalloc: rename lock list to fd list Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-13 15:56 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
` (3 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.
We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 ++++++++++++----------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 14bc5dce9..7d536350e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -318,18 +318,24 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir,
list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx);
- fd = open(path, O_CREAT | O_RDWR, 0600);
+
+ fd = fd_list[list_idx].fds[seg_idx];
+
if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", __func__,
- strerror(errno));
- return -1;
- }
- /* take out a read lock */
- if (lock(fd, LOCK_SH) < 0) {
- RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
- __func__, strerror(errno));
- close(fd);
- return -1;
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ /* take out a read lock */
+ if (lock(fd, LOCK_SH) < 0) {
+ RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
+ __func__, strerror(errno));
+ close(fd);
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
}
}
return fd;
@@ -601,10 +607,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto mapped;
}
#endif
- /* for non-single file segments that aren't in-memory, we can close fd
- * here */
- if (!internal_config.single_file_segments && !internal_config.in_memory)
- close(fd);
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
@@ -634,7 +636,10 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n");
}
resized:
- /* in-memory mode will never be single-file-segments mode */
+ /* some codepaths will return negative fd, so exit early */
+ if (fd < 0)
+ return -1;
+
if (internal_config.single_file_segments) {
resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
alloc_sz, false);
@@ -646,6 +651,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
return -1;
}
@@ -700,6 +706,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
}
/* closing fd will drop the lock */
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
memset(ms, 0, sizeof(*ms));
@@ -1364,8 +1371,7 @@ eal_memalloc_init(void)
return -1;
/* initialize all of the fd lists */
- if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
- return -1;
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode Anatoly Burakov
@ 2018-09-13 15:56 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-13 15:56 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Previously, we were only tracking lock file fd's in single-file
> segments mode, but did not track fd's in non-single file mode
> because we didn't need to (mmap() call still kept the lock). Now
> that we are going to expose these fd's to the world, we need to
> have access to them, so track them even in non-single file
> segments mode.
>
> We don't need to close fd's after mmap() because we're still
> tracking them in an fd list. Also, for anonymous hugepages mode,
> fd will always be -1 so exit early on error.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 ++++++++++++----------
> 1 file changed, 25 insertions(+), 19 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (5 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 5/9] memalloc: track page fd's in non-single file mode Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-14 7:54 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL Anatoly Burakov
` (2 subsequent siblings)
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
Enable setting and retrieving segment fd's internally.
For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.
Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.
Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.
Not supported on FreeBSD.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 +++++
lib/librte_eal/common/eal_common_memory.c | 8 +--
lib/librte_eal/common/eal_memalloc.h | 6 +++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +++++++++++++++++-----
lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +++++++++++++---
5 files changed, 109 insertions(+), 21 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index f7f07abd6..a5fb09f71 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void)
return -1;
}
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ return -1;
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ return -1;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..034c2026a 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -294,7 +294,7 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int msl_idx, ms_idx;
+ int msl_idx, ms_idx, fd;
FILE *f = arg;
msl_idx = msl - mcfg->memsegs;
@@ -305,10 +305,11 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
if (ms_idx < 0)
return -1;
+ fd = eal_memalloc_get_seg_fd(msl_idx, ms_idx);
fprintf(f, "Segment %i-%i: IOVA:0x%"PRIx64", len:%zu, "
"virt:%p, socket_id:%"PRId32", "
"hugepage_sz:%"PRIu64", nchannel:%"PRIx32", "
- "nrank:%"PRIx32"\n",
+ "nrank:%"PRIx32" fd:%i\n",
msl_idx, ms_idx,
ms->iova,
ms->len,
@@ -316,7 +317,8 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
ms->socket_id,
ms->hugepage_sz,
ms->nchannel,
- ms->nrank);
+ ms->nrank,
+ fd);
return 0;
}
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index 36bb1a027..a46c69c72 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -76,6 +76,12 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id);
int
eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len);
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+
int
eal_memalloc_init(void);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 7d536350e..b820989e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1334,16 +1334,10 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-fd_list_create_walk(const struct rte_memseg_list *msl,
- void *arg __rte_unused)
+alloc_list(int list_idx, int len)
{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- unsigned int i, len;
- int msl_idx;
int *data;
-
- msl_idx = msl - mcfg->memsegs;
- len = msl->memseg_arr.len;
+ int i;
/* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
@@ -1355,14 +1349,56 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
for (i = 0; i < len; i++)
data[i] = -1;
- fd_list[msl_idx].fds = data;
- fd_list[msl_idx].len = len;
- fd_list[msl_idx].count = 0;
- fd_list[msl_idx].memseg_list_fd = -1;
+ fd_list[list_idx].fds = data;
+ fd_list[list_idx].len = len;
+ fd_list[list_idx].count = 0;
+ fd_list[list_idx].memseg_list_fd = -1;
return 0;
}
+static int
+fd_list_create_walk(const struct rte_memseg_list *msl,
+ void *arg __rte_unused)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int len;
+ int msl_idx;
+
+ msl_idx = msl - mcfg->memsegs;
+ len = msl->memseg_arr.len;
+
+ return alloc_list(msl_idx, len);
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* if list is not allocated, allocate it */
+ if (fd_list[list_idx].len == 0) {
+ int len = mcfg->memsegs[list_idx].memseg_arr.len;
+
+ if (alloc_list(list_idx, len) < 0)
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+
+ return 0;
+}
+
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ if (internal_config.single_file_segments)
+ return fd_list[list_idx].memseg_list_fd;
+ /* list not initialized */
+ if (fd_list[list_idx].len == 0)
+ return -1;
+ return fd_list[list_idx].fds[seg_idx];
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dfb537f59..e3ac24815 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -772,7 +772,10 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end)
rte_fbarray_set_used(arr, ms_idx);
- close(fd);
+ /* store segment fd internally */
+ if (eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd) < 0)
+ RTE_LOG(ERR, EAL, "Could not store segment fd: %s\n",
+ rte_strerror(rte_errno));
}
RTE_LOG(DEBUG, EAL, "Allocated %" PRIu64 "M on socket %i\n",
(seg_len * page_sz) >> 20, socket_id);
@@ -1771,6 +1774,7 @@ getFileSize(int fd)
static int
eal_legacy_hugepage_attach(void)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct hugepage_file *hp = NULL;
unsigned int num_hp = 0;
unsigned int i = 0;
@@ -1814,6 +1818,9 @@ eal_legacy_hugepage_attach(void)
struct hugepage_file *hf = &hp[i];
size_t map_sz = hf->size;
void *map_addr = hf->final_va;
+ int msl_idx, ms_idx;
+ struct rte_memseg_list *msl;
+ struct rte_memseg *ms;
/* if size is zero, no more pages left */
if (map_sz == 0)
@@ -1831,25 +1838,50 @@ eal_legacy_hugepage_attach(void)
if (map_addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "Could not map %s: %s\n",
hf->filepath, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
/* set shared lock on the file. */
if (flock(fd, LOCK_SH) < 0) {
RTE_LOG(DEBUG, EAL, "%s(): Locking file failed: %s\n",
__func__, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
- close(fd);
+ /* find segment data */
+ msl = rte_mem_virt2memseg_list(map_addr);
+ if (msl == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg list\n",
+ __func__);
+ goto fd_error;
+ }
+ ms = rte_mem_virt2memseg(map_addr, msl);
+ if (ms == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg\n",
+ __func__);
+ goto fd_error;
+ }
+
+ msl_idx = msl - mcfg->memsegs;
+ ms_idx = rte_fbarray_find_idx(&msl->memseg_arr, ms);
+ if (ms_idx < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg idx\n",
+ __func__);
+ goto fd_error;
+ }
+
+ /* store segment fd internally */
+ if (eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd) < 0)
+ RTE_LOG(ERR, EAL, "Could not store segment fd: %s\n",
+ rte_strerror(rte_errno));
}
/* unmap the hugepage config file, since we are done using it */
munmap(hp, size);
close(fd_hugepage);
return 0;
+fd_error:
+ close(fd);
error:
/* map all segments into memory to make sure we get the addrs */
cur_seg = 0;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
@ 2018-09-14 7:54 ` Maxime Coquelin
2018-09-17 9:53 ` Burakov, Anatoly
0 siblings, 1 reply; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-14 7:54 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Enable setting and retrieving segment fd's internally.
>
> For now, retrieving fd's will not be used anywhere until we
> get an external API, but it will be useful for things like
> virtio, where we wish to share segment fd's.
>
> Setting segment fd's will not be available as a public API
> at this time, but internally it is needed for legacy mode,
> because we're not allocating our hugepages in memalloc in
> legacy mode case, and we still need to store the fd.
>
> Another user of get segment fd API is memseg info dump, to
> show which pages use which fd's.
>
> Not supported on FreeBSD.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 +++++
> lib/librte_eal/common/eal_common_memory.c | 8 +--
> lib/librte_eal/common/eal_memalloc.h | 6 +++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +++++++++++++++++-----
> lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +++++++++++++---
> 5 files changed, 109 insertions(+), 21 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
> index f7f07abd6..a5fb09f71 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
> @@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void)
> return -1;
> }
>
> +int
> +eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
> +{
> + return -1;
Why not returning -ENOTSUPP directly here? (see patch 7).
> +}
> +
> +int
> +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
> +{
> + return -1;
Ditto.
> +}
> +
Other than that, the patch looks good to me.
Maxime
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's
2018-09-14 7:54 ` Maxime Coquelin
@ 2018-09-17 9:53 ` Burakov, Anatoly
0 siblings, 0 replies; 45+ messages in thread
From: Burakov, Anatoly @ 2018-09-17 9:53 UTC (permalink / raw)
To: Maxime Coquelin, dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
kuralamudhan.ramakrishnan
On 14-Sep-18 8:54 AM, Maxime Coquelin wrote:
>
>
> On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
>> Enable setting and retrieving segment fd's internally.
>>
>> For now, retrieving fd's will not be used anywhere until we
>> get an external API, but it will be useful for things like
>> virtio, where we wish to share segment fd's.
>>
>> Setting segment fd's will not be available as a public API
>> at this time, but internally it is needed for legacy mode,
>> because we're not allocating our hugepages in memalloc in
>> legacy mode case, and we still need to store the fd.
>>
>> Another user of get segment fd API is memseg info dump, to
>> show which pages use which fd's.
>>
>> Not supported on FreeBSD.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 +++++
>> lib/librte_eal/common/eal_common_memory.c | 8 +--
>> lib/librte_eal/common/eal_memalloc.h | 6 +++
>> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +++++++++++++++++-----
>> lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +++++++++++++---
>> 5 files changed, 109 insertions(+), 21 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
>> b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
>> index f7f07abd6..a5fb09f71 100644
>> --- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
>> +++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
>> @@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void)
>> return -1;
>> }
>> +int
>> +eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
>> +{
>> + return -1;
>
> Why not returning -ENOTSUPP directly here? (see patch 7).
>
>> +}
>> +
>> +int
>> +eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
>> +{
>> + return -1;
>
> Ditto.
>
>> +}
>> +
>
> Other than that, the patch looks good to me.
>
> Maxime
>
Mainly for consistency reasons. Returning errno values but not using
them looks weird to me, but i can change this to return those at the outset.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (6 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 6/9] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-14 8:00 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd Anatoly Burakov
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode Anatoly Burakov
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.
We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 5 ++-
lib/librte_eal/common/eal_common_memory.c | 49 ++++++++++++++++++++++
lib/librte_eal/common/eal_memalloc.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 48 +++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 ++++++----
lib/librte_eal/rte_eal_version.map | 2 +
6 files changed, 118 insertions(+), 9 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index a5fb09f71..80e4c3d4f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -4,6 +4,7 @@
#include <inttypes.h>
+#include <rte_errno.h>
#include <rte_log.h>
#include <rte_memory.h>
@@ -50,13 +51,13 @@ eal_memalloc_sync_with_primary(void)
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
- return -1;
+ return -ENOTSUP;
}
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
{
- return -1;
+ return -ENOTSUP;
}
int
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 034c2026a..4a80deaf5 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -550,6 +550,55 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
return ret;
}
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct rte_memseg_list *msl;
+ struct rte_fbarray *arr;
+ int msl_idx, seg_idx, ret;
+
+ if (ms == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ msl = rte_mem_virt2memseg_list(ms->addr);
+ if (msl == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ msl_idx = msl - mcfg->memsegs;
+ seg_idx = rte_fbarray_find_idx(arr, ms);
+
+ if (!rte_fbarray_is_used(arr, seg_idx)) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+
+ ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx);
+ if (ret < 0) {
+ rte_errno = -ret;
+ ret = -1;
+ }
+ return ret;
+}
+
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int ret;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ ret = rte_memseg_get_fd_thread_unsafe(ms);
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index a46c69c72..70a214de4 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -76,9 +76,11 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id);
int
eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len);
+/* returns fd or -errno */
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
+/* returns 0 or -errno */
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..0d2a30056 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -317,6 +317,54 @@ rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg);
int __rte_experimental
rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg);
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ * be used within memory-related callback functions.
+ *
+ * @note This returns an internal file descriptor. Performing any operations on
+ * this file descriptor is inherently dangerous, so it should be treated
+ * as read-only for all intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms);
+
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ * from within memory-related callback functions.
+ *
+ * @note This returns an internal file descriptor. Performing any operations on
+ * this file descriptor is inherently dangerous, so it should be treated
+ * as read-only for all intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms);
+
/**
* Dump the physical memory layout to a file.
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b820989e9..21f842753 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -34,6 +34,7 @@
#include <rte_log.h>
#include <rte_eal_memconfig.h>
#include <rte_eal.h>
+#include <rte_errno.h>
#include <rte_memory.h>
#include <rte_spinlock.h>
@@ -1381,7 +1382,7 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
int len = mcfg->memsegs[list_idx].memseg_arr.len;
if (alloc_list(list_idx, len) < 0)
- return -1;
+ return -ENOMEM;
}
fd_list[list_idx].fds[seg_idx] = fd;
@@ -1391,12 +1392,18 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
- if (internal_config.single_file_segments)
- return fd_list[list_idx].memseg_list_fd;
- /* list not initialized */
- if (fd_list[list_idx].len == 0)
- return -1;
- return fd_list[list_idx].fds[seg_idx];
+ int fd;
+ if (internal_config.single_file_segments) {
+ fd = fd_list[list_idx].memseg_list_fd;
+ } else if (fd_list[list_idx].len == 0) {
+ /* list not initialized */
+ fd = -1;
+ } else {
+ fd = fd_list[list_idx].fds[seg_idx];
+ }
+ if (fd < 0)
+ return -ENODEV;
+ return fd;
}
int
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..e659e19d6 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -320,6 +320,8 @@ EXPERIMENTAL {
rte_mem_virt2memseg_list;
rte_memseg_contig_walk;
rte_memseg_contig_walk_thread_unsafe;
+ rte_memseg_get_fd;
+ rte_memseg_get_fd_thread_unsafe;
rte_memseg_list_walk;
rte_memseg_list_walk_thread_unsafe;
rte_memseg_walk;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL Anatoly Burakov
@ 2018-09-14 8:00 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-14 8:00 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Now that we can retrieve page fd's internally, we can expose it
> as an external API. This will add two flavors of API - thread-safe
> and non-thread-safe. Fix up internal API's to return values we need
> without modifying rte_errno internally if called from within EAL.
>
> We do not want calling code to accidentally close an internal fd, so
> we make a duplicate of it before we return it to the user. Caller is
> therefore responsible for closing this fd.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 5 ++-
> lib/librte_eal/common/eal_common_memory.c | 49 ++++++++++++++++++++++
> lib/librte_eal/common/eal_memalloc.h | 2 +
> lib/librte_eal/common/include/rte_memory.h | 48 +++++++++++++++++++++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 ++++++----
> lib/librte_eal/rte_eal_version.map | 2 +
> 6 files changed, 118 insertions(+), 9 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (7 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 7/9] mem: add external API to retrieve page fd from EAL Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-14 7:57 ` Maxime Coquelin
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode Anatoly Burakov
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v3:
- Fix single file segments mode not working
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++
lib/librte_eal/common/eal_common_memory.c | 50 ++++++++++++++++++++++
lib/librte_eal/common/eal_memalloc.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 49 +++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 24 +++++++++++
lib/librte_eal/rte_eal_version.map | 2 +
6 files changed, 134 insertions(+)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index 80e4c3d4f..06afbcc99 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -60,6 +60,12 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
return -ENOTSUP;
}
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
+{
+ return -ENOTSUP;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4a80deaf5..0b69804ff 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -599,6 +599,56 @@ rte_memseg_get_fd(const struct rte_memseg *ms)
return ret;
}
+int __rte_experimental
+rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
+ size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct rte_memseg_list *msl;
+ struct rte_fbarray *arr;
+ int msl_idx, seg_idx, ret;
+
+ if (ms == NULL || offset == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ msl = rte_mem_virt2memseg_list(ms->addr);
+ if (msl == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ msl_idx = msl - mcfg->memsegs;
+ seg_idx = rte_fbarray_find_idx(arr, ms);
+
+ if (!rte_fbarray_is_used(arr, seg_idx)) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+
+ ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset);
+ if (ret < 0) {
+ rte_errno = -ret;
+ ret = -1;
+ }
+ return ret;
+}
+
+int __rte_experimental
+rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int ret;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ ret = rte_memseg_get_fd_offset_thread_unsafe(ms, offset);
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index 70a214de4..af917c2f9 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -84,6 +84,9 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
+
int
eal_memalloc_init(void);
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 0d2a30056..14bd277a4 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -365,6 +365,55 @@ rte_memseg_get_fd(const struct rte_memseg *ms);
int __rte_experimental
rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms);
+/**
+ * Get offset into segment file descriptor associated with a particular memseg
+ * (if available).
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ * be used within memory-related callback functions.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ * @param offset
+ * A pointer to offset value where the result will be stored.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - EINVAL - ``offset`` pointer was NULL
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset);
+
+/**
+ * Get offset into segment file descriptor associated with a particular memseg
+ * (if available).
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ * from within memory-related callback functions.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ * @param offset
+ * A pointer to offset value where the result will be stored.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - EINVAL - ``offset`` pointer was NULL
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
+ size_t *offset);
+
/**
* Dump the physical memory layout to a file.
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 21f842753..6fc9dc0ca 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1406,6 +1406,30 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
return fd;
}
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* fd_list not initialized? */
+ if (fd_list[list_idx].len == 0)
+ return -ENODEV;
+ if (internal_config.single_file_segments) {
+ size_t pgsz = mcfg->memsegs[list_idx].page_sz;
+
+ /* segment not active? */
+ if (fd_list[list_idx].memseg_list_fd < 0)
+ return -ENOENT;
+ *offset = pgsz * seg_idx;
+ } else {
+ /* segment not active? */
+ if (fd_list[list_idx].fds[seg_idx] < 0)
+ return -ENOENT;
+ *offset = 0;
+ }
+ return 0;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index e659e19d6..ba159f7cd 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,7 +321,9 @@ EXPERIMENTAL {
rte_memseg_contig_walk;
rte_memseg_contig_walk_thread_unsafe;
rte_memseg_get_fd;
+ rte_memseg_get_fd_offset;
rte_memseg_get_fd_thread_unsafe;
+ rte_memseg_get_fd_offset_thread_unsafe;
rte_memseg_list_walk;
rte_memseg_list_walk_thread_unsafe;
rte_memseg_walk;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd Anatoly Burakov
@ 2018-09-14 7:57 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-14 7:57 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> In a few cases, user may need to query offset into fd for a
> particular memory segment (for example, to selectively map
> pages). This commit adds a new API to do that.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> v3:
> - Fix single file segments mode not working
>
> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++
> lib/librte_eal/common/eal_common_memory.c | 50 ++++++++++++++++++++++
> lib/librte_eal/common/eal_memalloc.h | 3 ++
> lib/librte_eal/common/include/rte_memory.h | 49 +++++++++++++++++++++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 24 +++++++++++
> lib/librte_eal/rte_eal_version.map | 2 +
> 6 files changed, 134 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (8 preceding siblings ...)
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 8/9] mem: allow querying offset into segment fd Anatoly Burakov
@ 2018-09-04 15:15 ` Anatoly Burakov
2018-09-14 8:06 ` Maxime Coquelin
9 siblings, 1 reply; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:15 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Enable using memfd-created segments if supported by the system.
This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.
The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.
We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
2 files changed, 235 insertions(+), 36 deletions(-)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 873099acc..ddd624110 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1384,10 +1384,10 @@ eal_check_common_options(struct internal_config *internal_cfg)
" is only supported in non-legacy memory mode\n");
}
if (internal_cfg->single_file_segments &&
- internal_cfg->hugepage_unlink) {
+ internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
- "not compatible with neither --"OPT_IN_MEMORY" nor "
- "--"OPT_HUGE_UNLINK"\n");
+ "not compatible with --"OPT_HUGE_UNLINK"\n");
return -1;
}
if (internal_cfg->legacy_mem &&
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 6fc9dc0ca..b2e2a9599 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -52,6 +52,23 @@ const int anonymous_hugepages_supported =
#define RTE_MAP_HUGE_SHIFT 26
#endif
+/*
+ * we don't actually care if memfd itself is supported - we only need to check
+ * if memfd supports hugetlbfs, as that already implies memfd support.
+ *
+ * also, this is not a constant, because while we may be *compiled* with memfd
+ * hugetlbfs support, we might not be *running* on a system that supports memfd
+ * and/or memfd with hugetlbfs, so we need to be able to adjust this flag at
+ * runtime, and fall back to anonymous memory.
+ */
+int memfd_create_supported =
+#ifdef MFD_HUGETLB
+#define MEMFD_SUPPORTED
+ 1;
+#else
+ 0;
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -191,6 +208,31 @@ get_file_size(int fd)
return st.st_size;
}
+static inline uint32_t
+bsf64(uint64_t v)
+{
+ return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+ if (v == 0)
+ return 0;
+ v = rte_align64pow2(v);
+ return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ int log2 = log2_u64(page_sz);
+ return log2 << RTE_MAP_HUGE_SHIFT;
+}
+
/* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */
static int lock(int fd, int type)
{
@@ -287,12 +329,64 @@ static int unlock_segment(int list_idx, int seg_idx)
return 0;
}
+static int
+get_seg_memfd(struct hugepage_info *hi __rte_unused,
+ unsigned int list_idx __rte_unused,
+ unsigned int seg_idx __rte_unused)
+{
+#ifdef MEMFD_SUPPORTED
+ int fd;
+ char segname[250]; /* as per manpage, limit is 249 bytes plus null */
+
+ if (internal_config.single_file_segments) {
+ fd = fd_list[list_idx].memseg_list_fd;
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i", list_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].memseg_list_fd = fd;
+ }
+ } else {
+ fd = fd_list[list_idx].fds[seg_idx];
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i-%i",
+ list_idx, seg_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+ }
+ }
+ return fd;
+#endif
+ return -1;
+}
+
static int
get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
unsigned int list_idx, unsigned int seg_idx)
{
int fd;
+ /* for in-memory mode, we only make it here when we're sure we support
+ * memfd, and this is a special case.
+ */
+ if (internal_config.in_memory)
+ return get_seg_memfd(hi, list_idx, seg_idx);
+
if (internal_config.single_file_segments) {
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
@@ -347,6 +441,33 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
uint64_t fa_offset, uint64_t page_sz, bool grow)
{
bool again = false;
+
+ /* in-memory mode is a special case, because we don't need to perform
+ * any locking, and we can be sure that fallocate() is supported.
+ */
+ if (internal_config.in_memory) {
+ int flags = grow ? 0 : FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE;
+ int ret;
+
+ /* grow or shrink the file */
+ ret = fallocate(fd, flags, fa_offset, page_sz);
+
+ if (ret < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): fallocate() failed: %s\n",
+ __func__,
+ strerror(errno));
+ return -1;
+ }
+ /* increase/decrease total segment count */
+ fd_list[list_idx].count += (grow ? 1 : -1);
+ if (!grow && fd_list[list_idx].count == 0) {
+ close(fd_list[list_idx].memseg_list_fd);
+ fd_list[list_idx].memseg_list_fd = -1;
+ }
+ return 0;
+ }
+
do {
if (fallocate_supported == 0) {
/* we cannot deallocate memory if fallocate() is not
@@ -496,26 +617,34 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
void *new_addr;
alloc_sz = hi->hugepage_sz;
- if (!internal_config.single_file_segments &&
- internal_config.in_memory &&
- anonymous_hugepages_supported) {
- int log2, flags;
-
- log2 = rte_log2_u32(alloc_sz);
- /* as per mmap() manpage, all page sizes are log2 of page size
- * shifted by MAP_HUGE_SHIFT
- */
- flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+
+ /* these are checked at init, but code analyzers don't know that */
+ if (internal_config.in_memory && !anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Anonymous hugepages not supported, in-memory mode cannot allocate memory\n");
+ return -1;
+ }
+ if (internal_config.in_memory && !memfd_create_supported &&
+ internal_config.single_file_segments) {
+ RTE_LOG(ERR, EAL, "Single-file segments are not supported without memfd support\n");
+ return -1;
+ }
+
+ /* in-memory without memfd is a special case */
+ int mmap_flags;
+
+ if (internal_config.in_memory && !memfd_create_supported) {
+ int pagesz_flag, flags;
+
+ pagesz_flag = pagesz_flags(alloc_sz);
+ flags = pagesz_flag | MAP_HUGETLB | MAP_FIXED |
MAP_PRIVATE | MAP_ANONYMOUS;
fd = -1;
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
+ mmap_flags = flags;
- /* single-file segments codepath will never be active because
- * in-memory mode is incompatible with it and it's stopped at
- * EAL initialization stage, however the compiler doesn't know
- * that and complains about map_offset being used uninitialized
- * on failure codepaths while having in-memory mode enabled. so,
- * assign a value here.
+ /* single-file segments codepath will never be active
+ * here because in-memory mode is incompatible with the
+ * fallback path, and it's stopped at EAL initialization
+ * stage.
*/
map_offset = 0;
} else {
@@ -539,7 +668,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
- if (internal_config.hugepage_unlink) {
+ if (internal_config.hugepage_unlink &&
+ !internal_config.in_memory) {
if (unlink(path)) {
RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
__func__, strerror(errno));
@@ -547,16 +677,16 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
}
}
}
-
- /*
- * map the segment, and populate page tables, the kernel fills
- * this segment with zeros if it's a new page.
- */
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
- map_offset);
+ mmap_flags = MAP_SHARED | MAP_POPULATE | MAP_FIXED;
}
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, mmap_flags, fd,
+ map_offset);
+
if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
strerror(errno));
@@ -663,7 +793,8 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
{
uint64_t map_offset;
char path[PATH_MAX];
- int fd, ret;
+ int fd, ret = 0;
+ bool exit_early;
/* erase page data */
memset(ms->addr, 0, ms->len);
@@ -675,8 +806,17 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}
+ exit_early = false;
+
+ /* if we're using anonymous hugepages, nothing to be done */
+ if (internal_config.in_memory && !memfd_create_supported)
+ exit_early = true;
+
/* if we've already unlinked the page, nothing needs to be done */
- if (internal_config.hugepage_unlink) {
+ if (!internal_config.in_memory && internal_config.hugepage_unlink)
+ exit_early = true;
+
+ if (exit_early) {
memset(ms, 0, sizeof(*ms));
return 0;
}
@@ -699,11 +839,13 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
/* if we're able to take out a write lock, we're the last one
* holding onto this page.
*/
- ret = lock(fd, LOCK_EX);
- if (ret >= 0) {
- /* no one else is using this page */
- if (ret == 1)
- unlink(path);
+ if (!internal_config.in_memory) {
+ ret = lock(fd, LOCK_EX);
+ if (ret >= 0) {
+ /* no one else is using this page */
+ if (ret == 1)
+ unlink(path);
+ }
}
/* closing fd will drop the lock */
close(fd);
@@ -1406,6 +1548,35 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
return fd;
}
+static int
+test_memfd_create(void)
+{
+#ifdef MEMFD_SUPPORTED
+ unsigned int i;
+ for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
+ uint64_t pagesz = internal_config.hugepage_info[i].hugepage_sz;
+ int pagesz_flag = pagesz_flags(pagesz);
+ int flags;
+
+ flags = pagesz_flag | MFD_HUGETLB;
+ int fd = memfd_create("test", flags);
+ if (fd < 0) {
+ /* we failed - let memalloc know this isn't working */
+ if (errno == EINVAL) {
+ memfd_create_supported = 0;
+ return 0; /* not supported */
+ }
+
+ /* we got other error - something's wrong */
+ return -1; /* error */
+ }
+ close(fd);
+ return 1; /* supported */
+ }
+#endif
+ return 0; /* not supported */
+}
+
int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
{
@@ -1436,6 +1607,34 @@ eal_memalloc_init(void)
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
+ internal_config.in_memory) {
+ int mfd_res = test_memfd_create();
+
+ if (mfd_res < 0) {
+ RTE_LOG(ERR, EAL, "Unable to check if memfd is supported\n");
+ return -1;
+ }
+ if (mfd_res == 1)
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ else
+ RTE_LOG(INFO, EAL, "Using memfd is not supported, falling back to anonymous hugepages\n");
+
+ /* we only support single-file segments mode with in-memory mode
+ * if we support hugetlbfs with memfd_create. this code will
+ * test if we do.
+ */
+ if (internal_config.single_file_segments &&
+ mfd_res != 1) {
+ RTE_LOG(ERR, EAL, "Single-file segments mode cannot be used without memfd support\n");
+ return -1;
+ }
+ /* this cannot ever happen but better safe than sorry */
+ if (!anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Using anonymous memory is not supported\n");
+ return -1;
+ }
+ }
/* initialize all of the fd lists */
if (rte_memseg_list_walk(fd_list_create_walk, NULL))
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode
2018-09-04 15:15 ` [dpdk-dev] [PATCH v3 9/9] mem: support using memfd segments for in-memory mode Anatoly Burakov
@ 2018-09-14 8:06 ` Maxime Coquelin
0 siblings, 0 replies; 45+ messages in thread
From: Maxime Coquelin @ 2018-09-14 8:06 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, kuralamudhan.ramakrishnan
On 09/04/2018 05:15 PM, Anatoly Burakov wrote:
> Enable using memfd-created segments if supported by the system.
>
> This will allow having real fd's for pages but without hugetlbfs
> mounts, which will enable in-memory mode to be used with virtio.
>
> The implementation is mostly piggy-backing on existing real-fd
> code, except that we no longer need to unlink any files or track
> per-page locks in single-file segments mode, because in-memory
> mode does not support secondary processes anyway.
>
> We move some checks from EAL command-line parsing code to memalloc
> because it is now possible to use single-file segments mode with
> in-memory mode, but only if memfd is supported.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/common/eal_common_options.c | 6 +-
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
> 2 files changed, 235 insertions(+), 36 deletions(-)
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 1/9] fbarray: fix detach in noshconf mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (8 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 0/9] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 2/9] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
` (7 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In noshconf mode, no shared files are created, but we're still trying
to unlink them, resulting in detach/destroy failure even though it
should have succeeded. Fix it by exiting early in noshconf mode.
Fixes: 3ee2cde248a7 ("fbarray: support --no-shconf mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_fbarray.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c
index 43caf3ced..ba6c4ae39 100644
--- a/lib/librte_eal/common/eal_common_fbarray.c
+++ b/lib/librte_eal/common/eal_common_fbarray.c
@@ -878,6 +878,10 @@ rte_fbarray_destroy(struct rte_fbarray *arr)
if (ret)
return ret;
+ /* with no shconf, there were never any files to begin with */
+ if (internal_config.no_shconf)
+ return 0;
+
/* try deleting the file */
eal_get_fbarray_path(path, sizeof(path), arr->name);
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 2/9] eal: don't allow legacy mode with in-memory mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (9 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 1/9] fbarray: fix detach in noshconf mode Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 3/9] mem: raise maximum fd limit unconditionally Anatoly Burakov
` (6 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan, stable
In-memory mode was never meant to support legacy mode, because we
cannot sort anonymous pages anyway.
Fixes: 72b49ff623c4 ("mem: support --in-memory mode")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index dd5f97402..873099acc 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1390,6 +1390,12 @@ eal_check_common_options(struct internal_config *internal_cfg)
"--"OPT_HUGE_UNLINK"\n");
return -1;
}
+ if (internal_cfg->legacy_mem &&
+ internal_cfg->in_memory) {
+ RTE_LOG(ERR, EAL, "Option --"OPT_LEGACY_MEM" is not compatible "
+ "with --"OPT_IN_MEMORY"\n");
+ return -1;
+ }
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 3/9] mem: raise maximum fd limit unconditionally
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (10 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 2/9] eal: don't allow legacy mode with in-memory mode Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 4/9] memalloc: rename lock list to fd list Anatoly Burakov
` (5 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, when we allocated hugepages, we closed the fd's corresponding
to them after we've done our mappings. Since we did mmap(), we didn't
actually lose the reference, but file descriptors used for mmap() do not
count against the fd limit. Since we are going to store all of our fd's,
we will hit the fd limit much more often when using smaller page sizes.
Fix this to raise the fd limit to maximum unconditionally.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memory.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dbf19499e..dfb537f59 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -17,6 +17,7 @@
#include <sys/stat.h>
#include <sys/queue.h>
#include <sys/file.h>
+#include <sys/resource.h>
#include <unistd.h>
#include <limits.h>
#include <sys/ioctl.h>
@@ -2204,6 +2205,25 @@ memseg_secondary_init(void)
int
rte_eal_memseg_init(void)
{
+ /* increase rlimit to maximum */
+ struct rlimit lim;
+
+ if (getrlimit(RLIMIT_NOFILE, &lim) == 0) {
+ /* set limit to maximum */
+ lim.rlim_cur = lim.rlim_max;
+
+ if (setrlimit(RLIMIT_NOFILE, &lim) < 0) {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files failed: %s\n",
+ strerror(errno));
+ } else {
+ RTE_LOG(DEBUG, EAL, "Setting maximum number of open files to %"
+ PRIu64 "\n",
+ (uint64_t)lim.rlim_cur);
+ }
+ } else {
+ RTE_LOG(ERR, EAL, "Cannot get current resource limits\n");
+ }
+
return rte_eal_process_type() == RTE_PROC_PRIMARY ?
#ifndef RTE_ARCH_64
memseg_primary_init_32() :
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 4/9] memalloc: rename lock list to fd list
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (11 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 3/9] mem: raise maximum fd limit unconditionally Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 5/9] memalloc: track page fd's in non-single file mode Anatoly Burakov
` (4 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only using lock lists to store per-page lock fd's
because we cannot use modern fcntl() file description locks to lock
parts of the page in single file segments mode.
Now, we will be using this list to store either lock fd's (along with
memseg list fd) in single file segments mode, or per-page fd's (and set
memseg list fd to -1), so rename the list accordingly.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 ++++++++++++----------
1 file changed, 37 insertions(+), 29 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index aa95551a8..14bc5dce9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -57,25 +57,33 @@ const int anonymous_hugepages_supported =
*/
static int fallocate_supported = -1; /* unknown */
-/* for single-file segments, we need some kind of mechanism to keep track of
+/*
+ * we have two modes - single file segments, and file-per-page mode.
+ *
+ * for single-file segments, we need some kind of mechanism to keep track of
* which hugepages can be freed back to the system, and which cannot. we cannot
* use flock() because they don't allow locking parts of a file, and we cannot
* use fcntl() due to issues with their semantics, so we will have to rely on a
- * bunch of lockfiles for each page.
+ * bunch of lockfiles for each page. so, we will use 'fds' array to keep track
+ * of per-page lockfiles. we will store the actual segment list fd in the
+ * 'memseg_list_fd' field.
+ *
+ * for file-per-page mode, each page will have its own fd, so 'memseg_list_fd'
+ * will be invalid (set to -1), and we'll use 'fds' to keep track of page fd's.
*
* we cannot know how many pages a system will have in advance, but we do know
* that they come in lists, and we know lengths of these lists. so, simply store
* a malloc'd array of fd's indexed by list and segment index.
*
* they will be initialized at startup, and filled as we allocate/deallocate
- * segments. also, use this to track memseg list proper fd.
+ * segments.
*/
static struct {
int *fds; /**< dynamically allocated array of segment lock fd's */
int memseg_list_fd; /**< memseg list fd */
int len; /**< total length of the array */
int count; /**< entries used in an array */
-} lock_fds[RTE_MAX_MEMSEG_LISTS];
+} fd_list[RTE_MAX_MEMSEG_LISTS];
/** local copy of a memory map, used to synchronize memory hotplug in MP */
static struct rte_memseg_list local_memsegs[RTE_MAX_MEMSEG_LISTS];
@@ -209,12 +217,12 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
char path[PATH_MAX] = {0};
int fd;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* does this lock already exist? */
if (fd >= 0)
return fd;
@@ -236,8 +244,8 @@ static int get_segment_lock_fd(int list_idx, int seg_idx)
return -1;
}
/* store it for future reference */
- lock_fds[list_idx].fds[seg_idx] = fd;
- lock_fds[list_idx].count++;
+ fd_list[list_idx].fds[seg_idx] = fd;
+ fd_list[list_idx].count++;
return fd;
}
@@ -245,12 +253,12 @@ static int unlock_segment(int list_idx, int seg_idx)
{
int fd, ret;
- if (list_idx < 0 || list_idx >= (int)RTE_DIM(lock_fds))
+ if (list_idx < 0 || list_idx >= (int)RTE_DIM(fd_list))
return -1;
- if (seg_idx < 0 || seg_idx >= lock_fds[list_idx].len)
+ if (seg_idx < 0 || seg_idx >= fd_list[list_idx].len)
return -1;
- fd = lock_fds[list_idx].fds[seg_idx];
+ fd = fd_list[list_idx].fds[seg_idx];
/* upgrade lock to exclusive to see if we can remove the lockfile */
ret = lock(fd, LOCK_EX);
@@ -270,8 +278,8 @@ static int unlock_segment(int list_idx, int seg_idx)
* and remove it from list anyway.
*/
close(fd);
- lock_fds[list_idx].fds[seg_idx] = -1;
- lock_fds[list_idx].count--;
+ fd_list[list_idx].fds[seg_idx] = -1;
+ fd_list[list_idx].count--;
if (ret < 0)
return -1;
@@ -288,7 +296,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
- fd = lock_fds[list_idx].memseg_list_fd;
+ fd = fd_list[list_idx].memseg_list_fd;
if (fd < 0) {
fd = open(path, O_CREAT | O_RDWR, 0600);
@@ -304,7 +312,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
close(fd);
return -1;
}
- lock_fds[list_idx].memseg_list_fd = fd;
+ fd_list[list_idx].memseg_list_fd = fd;
}
} else {
/* create a hugepage file path */
@@ -410,9 +418,9 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* page file fd, so that one of the processes
* could then delete the file after shrinking.
*/
- if (ret < 1 && lock_fds[list_idx].count == 0) {
+ if (ret < 1 && fd_list[list_idx].count == 0) {
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
if (ret < 0) {
@@ -448,13 +456,13 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
* more segments active in this segment list,
* and remove the file if there aren't.
*/
- if (lock_fds[list_idx].count == 0) {
+ if (fd_list[list_idx].count == 0) {
if (unlink(path))
RTE_LOG(ERR, EAL, "%s(): unlinking '%s' failed: %s\n",
__func__, path,
strerror(errno));
close(fd);
- lock_fds[list_idx].memseg_list_fd = -1;
+ fd_list[list_idx].memseg_list_fd = -1;
}
}
}
@@ -1319,7 +1327,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
+fd_list_create_walk(const struct rte_memseg_list *msl,
void *arg __rte_unused)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
@@ -1330,20 +1338,20 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
- /* ensure we have space to store lock fd per each possible segment */
+ /* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
if (data == NULL) {
- RTE_LOG(ERR, EAL, "Unable to allocate space for lock descriptors\n");
+ RTE_LOG(ERR, EAL, "Unable to allocate space for file descriptors\n");
return -1;
}
/* set all fd's as invalid */
for (i = 0; i < len; i++)
data[i] = -1;
- lock_fds[msl_idx].fds = data;
- lock_fds[msl_idx].len = len;
- lock_fds[msl_idx].count = 0;
- lock_fds[msl_idx].memseg_list_fd = -1;
+ fd_list[msl_idx].fds = data;
+ fd_list[msl_idx].len = len;
+ fd_list[msl_idx].count = 0;
+ fd_list[msl_idx].memseg_list_fd = -1;
return 0;
}
@@ -1355,9 +1363,9 @@ eal_memalloc_init(void)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
- /* initialize all of the lock fd lists */
+ /* initialize all of the fd lists */
if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(secondary_lock_list_create_walk, NULL))
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 5/9] memalloc: track page fd's in non-single file mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (12 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 4/9] memalloc: rename lock list to fd list Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 6/9] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
` (3 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Previously, we were only tracking lock file fd's in single-file
segments mode, but did not track fd's in non-single file mode
because we didn't need to (mmap() call still kept the lock). Now
that we are going to expose these fd's to the world, we need to
have access to them, so track them even in non-single file
segments mode.
We don't need to close fd's after mmap() because we're still
tracking them in an fd list. Also, for anonymous hugepages mode,
fd will always be -1 so exit early on error.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 44 ++++++++++++----------
1 file changed, 25 insertions(+), 19 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 14bc5dce9..7d536350e 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -318,18 +318,24 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir,
list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx);
- fd = open(path, O_CREAT | O_RDWR, 0600);
+
+ fd = fd_list[list_idx].fds[seg_idx];
+
if (fd < 0) {
- RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n", __func__,
- strerror(errno));
- return -1;
- }
- /* take out a read lock */
- if (lock(fd, LOCK_SH) < 0) {
- RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
- __func__, strerror(errno));
- close(fd);
- return -1;
+ fd = open(path, O_CREAT | O_RDWR, 0600);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): open failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ /* take out a read lock */
+ if (lock(fd, LOCK_SH) < 0) {
+ RTE_LOG(ERR, EAL, "%s(): lock failed: %s\n",
+ __func__, strerror(errno));
+ close(fd);
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
}
}
return fd;
@@ -601,10 +607,6 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
goto mapped;
}
#endif
- /* for non-single file segments that aren't in-memory, we can close fd
- * here */
- if (!internal_config.single_file_segments && !internal_config.in_memory)
- close(fd);
ms->addr = addr;
ms->hugepage_sz = alloc_sz;
@@ -634,7 +636,10 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
RTE_LOG(CRIT, EAL, "Can't mmap holes in our virtual address space\n");
}
resized:
- /* in-memory mode will never be single-file-segments mode */
+ /* some codepaths will return negative fd, so exit early */
+ if (fd < 0)
+ return -1;
+
if (internal_config.single_file_segments) {
resize_hugefile(fd, path, list_idx, seg_idx, map_offset,
alloc_sz, false);
@@ -646,6 +651,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
lock(fd, LOCK_EX) == 1)
unlink(path);
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
return -1;
}
@@ -700,6 +706,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
}
/* closing fd will drop the lock */
close(fd);
+ fd_list[list_idx].fds[seg_idx] = -1;
}
memset(ms, 0, sizeof(*ms));
@@ -1364,8 +1371,7 @@ eal_memalloc_init(void)
return -1;
/* initialize all of the fd lists */
- if (internal_config.single_file_segments)
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
- return -1;
+ if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ return -1;
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 6/9] memalloc: add EAL-internal API to get and set segment fd's
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (13 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 5/9] memalloc: track page fd's in non-single file mode Anatoly Burakov
@ 2018-09-04 15:01 ` Anatoly Burakov
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 7/9] mem: add external API to retrieve page fd from EAL Anatoly Burakov
` (2 subsequent siblings)
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:01 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
Enable setting and retrieving segment fd's internally.
For now, retrieving fd's will not be used anywhere until we
get an external API, but it will be useful for things like
virtio, where we wish to share segment fd's.
Setting segment fd's will not be available as a public API
at this time, but internally it is needed for legacy mode,
because we're not allocating our hugepages in memalloc in
legacy mode case, and we still need to store the fd.
Another user of get segment fd API is memseg info dump, to
show which pages use which fd's.
Not supported on FreeBSD.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 12 +++++
lib/librte_eal/common/eal_common_memory.c | 8 +--
lib/librte_eal/common/eal_memalloc.h | 6 +++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 60 +++++++++++++++++-----
lib/librte_eal/linuxapp/eal/eal_memory.c | 44 +++++++++++++---
5 files changed, 109 insertions(+), 21 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index f7f07abd6..a5fb09f71 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -47,6 +47,18 @@ eal_memalloc_sync_with_primary(void)
return -1;
}
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ return -1;
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ return -1;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index fbfb1b055..034c2026a 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -294,7 +294,7 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- int msl_idx, ms_idx;
+ int msl_idx, ms_idx, fd;
FILE *f = arg;
msl_idx = msl - mcfg->memsegs;
@@ -305,10 +305,11 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
if (ms_idx < 0)
return -1;
+ fd = eal_memalloc_get_seg_fd(msl_idx, ms_idx);
fprintf(f, "Segment %i-%i: IOVA:0x%"PRIx64", len:%zu, "
"virt:%p, socket_id:%"PRId32", "
"hugepage_sz:%"PRIu64", nchannel:%"PRIx32", "
- "nrank:%"PRIx32"\n",
+ "nrank:%"PRIx32" fd:%i\n",
msl_idx, ms_idx,
ms->iova,
ms->len,
@@ -316,7 +317,8 @@ dump_memseg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
ms->socket_id,
ms->hugepage_sz,
ms->nchannel,
- ms->nrank);
+ ms->nrank,
+ fd);
return 0;
}
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index 36bb1a027..a46c69c72 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -76,6 +76,12 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id);
int
eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len);
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+
int
eal_memalloc_init(void);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 7d536350e..b820989e9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1334,16 +1334,10 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
}
static int
-fd_list_create_walk(const struct rte_memseg_list *msl,
- void *arg __rte_unused)
+alloc_list(int list_idx, int len)
{
- struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- unsigned int i, len;
- int msl_idx;
int *data;
-
- msl_idx = msl - mcfg->memsegs;
- len = msl->memseg_arr.len;
+ int i;
/* ensure we have space to store fd per each possible segment */
data = malloc(sizeof(int) * len);
@@ -1355,14 +1349,56 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
for (i = 0; i < len; i++)
data[i] = -1;
- fd_list[msl_idx].fds = data;
- fd_list[msl_idx].len = len;
- fd_list[msl_idx].count = 0;
- fd_list[msl_idx].memseg_list_fd = -1;
+ fd_list[list_idx].fds = data;
+ fd_list[list_idx].len = len;
+ fd_list[list_idx].count = 0;
+ fd_list[list_idx].memseg_list_fd = -1;
return 0;
}
+static int
+fd_list_create_walk(const struct rte_memseg_list *msl,
+ void *arg __rte_unused)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int len;
+ int msl_idx;
+
+ msl_idx = msl - mcfg->memsegs;
+ len = msl->memseg_arr.len;
+
+ return alloc_list(msl_idx, len);
+}
+
+int
+eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* if list is not allocated, allocate it */
+ if (fd_list[list_idx].len == 0) {
+ int len = mcfg->memsegs[list_idx].memseg_arr.len;
+
+ if (alloc_list(list_idx, len) < 0)
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+
+ return 0;
+}
+
+int
+eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
+{
+ if (internal_config.single_file_segments)
+ return fd_list[list_idx].memseg_list_fd;
+ /* list not initialized */
+ if (fd_list[list_idx].len == 0)
+ return -1;
+ return fd_list[list_idx].fds[seg_idx];
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index dfb537f59..e3ac24815 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -772,7 +772,10 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end)
rte_fbarray_set_used(arr, ms_idx);
- close(fd);
+ /* store segment fd internally */
+ if (eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd) < 0)
+ RTE_LOG(ERR, EAL, "Could not store segment fd: %s\n",
+ rte_strerror(rte_errno));
}
RTE_LOG(DEBUG, EAL, "Allocated %" PRIu64 "M on socket %i\n",
(seg_len * page_sz) >> 20, socket_id);
@@ -1771,6 +1774,7 @@ getFileSize(int fd)
static int
eal_legacy_hugepage_attach(void)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct hugepage_file *hp = NULL;
unsigned int num_hp = 0;
unsigned int i = 0;
@@ -1814,6 +1818,9 @@ eal_legacy_hugepage_attach(void)
struct hugepage_file *hf = &hp[i];
size_t map_sz = hf->size;
void *map_addr = hf->final_va;
+ int msl_idx, ms_idx;
+ struct rte_memseg_list *msl;
+ struct rte_memseg *ms;
/* if size is zero, no more pages left */
if (map_sz == 0)
@@ -1831,25 +1838,50 @@ eal_legacy_hugepage_attach(void)
if (map_addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "Could not map %s: %s\n",
hf->filepath, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
/* set shared lock on the file. */
if (flock(fd, LOCK_SH) < 0) {
RTE_LOG(DEBUG, EAL, "%s(): Locking file failed: %s\n",
__func__, strerror(errno));
- close(fd);
- goto error;
+ goto fd_error;
}
- close(fd);
+ /* find segment data */
+ msl = rte_mem_virt2memseg_list(map_addr);
+ if (msl == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg list\n",
+ __func__);
+ goto fd_error;
+ }
+ ms = rte_mem_virt2memseg(map_addr, msl);
+ if (ms == NULL) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg\n",
+ __func__);
+ goto fd_error;
+ }
+
+ msl_idx = msl - mcfg->memsegs;
+ ms_idx = rte_fbarray_find_idx(&msl->memseg_arr, ms);
+ if (ms_idx < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): Cannot find memseg idx\n",
+ __func__);
+ goto fd_error;
+ }
+
+ /* store segment fd internally */
+ if (eal_memalloc_set_seg_fd(msl_idx, ms_idx, fd) < 0)
+ RTE_LOG(ERR, EAL, "Could not store segment fd: %s\n",
+ rte_strerror(rte_errno));
}
/* unmap the hugepage config file, since we are done using it */
munmap(hp, size);
close(fd_hugepage);
return 0;
+fd_error:
+ close(fd);
error:
/* map all segments into memory to make sure we get the addrs */
cur_seg = 0;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 7/9] mem: add external API to retrieve page fd from EAL
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (14 preceding siblings ...)
2018-09-04 15:01 ` [dpdk-dev] [PATCH v2 6/9] memalloc: add EAL-internal API to get and set segment fd's Anatoly Burakov
@ 2018-09-04 15:02 ` Anatoly Burakov
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 8/9] mem: allow querying offset into segment fd Anatoly Burakov
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 9/9] mem: support using memfd segments for in-memory mode Anatoly Burakov
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:02 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
Now that we can retrieve page fd's internally, we can expose it
as an external API. This will add two flavors of API - thread-safe
and non-thread-safe. Fix up internal API's to return values we need
without modifying rte_errno internally if called from within EAL.
We do not want calling code to accidentally close an internal fd, so
we make a duplicate of it before we return it to the user. Caller is
therefore responsible for closing this fd.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 5 ++-
lib/librte_eal/common/eal_common_memory.c | 49 ++++++++++++++++++++++
lib/librte_eal/common/eal_memalloc.h | 2 +
lib/librte_eal/common/include/rte_memory.h | 48 +++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 ++++++----
lib/librte_eal/rte_eal_version.map | 2 +
6 files changed, 118 insertions(+), 9 deletions(-)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index a5fb09f71..80e4c3d4f 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -4,6 +4,7 @@
#include <inttypes.h>
+#include <rte_errno.h>
#include <rte_log.h>
#include <rte_memory.h>
@@ -50,13 +51,13 @@ eal_memalloc_sync_with_primary(void)
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
- return -1;
+ return -ENOTSUP;
}
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
{
- return -1;
+ return -ENOTSUP;
}
int
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 034c2026a..4a80deaf5 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -550,6 +550,55 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
return ret;
}
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct rte_memseg_list *msl;
+ struct rte_fbarray *arr;
+ int msl_idx, seg_idx, ret;
+
+ if (ms == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ msl = rte_mem_virt2memseg_list(ms->addr);
+ if (msl == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ msl_idx = msl - mcfg->memsegs;
+ seg_idx = rte_fbarray_find_idx(arr, ms);
+
+ if (!rte_fbarray_is_used(arr, seg_idx)) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+
+ ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx);
+ if (ret < 0) {
+ rte_errno = -ret;
+ ret = -1;
+ }
+ return ret;
+}
+
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int ret;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ ret = rte_memseg_get_fd_thread_unsafe(ms);
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index a46c69c72..70a214de4 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -76,9 +76,11 @@ eal_memalloc_mem_alloc_validator_unregister(const char *name, int socket_id);
int
eal_memalloc_mem_alloc_validate(int socket_id, size_t new_len);
+/* returns fd or -errno */
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
+/* returns 0 or -errno */
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..0d2a30056 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -317,6 +317,54 @@ rte_memseg_contig_walk_thread_unsafe(rte_memseg_contig_walk_t func, void *arg);
int __rte_experimental
rte_memseg_list_walk_thread_unsafe(rte_memseg_list_walk_t func, void *arg);
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ * be used within memory-related callback functions.
+ *
+ * @note This returns an internal file descriptor. Performing any operations on
+ * this file descriptor is inherently dangerous, so it should be treated
+ * as read-only for all intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd(const struct rte_memseg *ms);
+
+/**
+ * Return file descriptor associated with a particular memseg (if available).
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ * from within memory-related callback functions.
+ *
+ * @note This returns an internal file descriptor. Performing any operations on
+ * this file descriptor is inherently dangerous, so it should be treated
+ * as read-only for all intents and purposes.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms);
+
/**
* Dump the physical memory layout to a file.
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b820989e9..21f842753 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -34,6 +34,7 @@
#include <rte_log.h>
#include <rte_eal_memconfig.h>
#include <rte_eal.h>
+#include <rte_errno.h>
#include <rte_memory.h>
#include <rte_spinlock.h>
@@ -1381,7 +1382,7 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
int len = mcfg->memsegs[list_idx].memseg_arr.len;
if (alloc_list(list_idx, len) < 0)
- return -1;
+ return -ENOMEM;
}
fd_list[list_idx].fds[seg_idx] = fd;
@@ -1391,12 +1392,18 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
- if (internal_config.single_file_segments)
- return fd_list[list_idx].memseg_list_fd;
- /* list not initialized */
- if (fd_list[list_idx].len == 0)
- return -1;
- return fd_list[list_idx].fds[seg_idx];
+ int fd;
+ if (internal_config.single_file_segments) {
+ fd = fd_list[list_idx].memseg_list_fd;
+ } else if (fd_list[list_idx].len == 0) {
+ /* list not initialized */
+ fd = -1;
+ } else {
+ fd = fd_list[list_idx].fds[seg_idx];
+ }
+ if (fd < 0)
+ return -ENODEV;
+ return fd;
}
int
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 344a43d32..e659e19d6 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -320,6 +320,8 @@ EXPERIMENTAL {
rte_mem_virt2memseg_list;
rte_memseg_contig_walk;
rte_memseg_contig_walk_thread_unsafe;
+ rte_memseg_get_fd;
+ rte_memseg_get_fd_thread_unsafe;
rte_memseg_list_walk;
rte_memseg_list_walk_thread_unsafe;
rte_memseg_walk;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 8/9] mem: allow querying offset into segment fd
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (15 preceding siblings ...)
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 7/9] mem: add external API to retrieve page fd from EAL Anatoly Burakov
@ 2018-09-04 15:02 ` Anatoly Burakov
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 9/9] mem: support using memfd segments for in-memory mode Anatoly Burakov
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:02 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, tiwei.bie, ray.kinsella, zhihong.wang,
maxime.coquelin, kuralamudhan.ramakrishnan
In a few cases, user may need to query offset into fd for a
particular memory segment (for example, to selectively map
pages). This commit adds a new API to do that.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++
lib/librte_eal/common/eal_common_memory.c | 50 ++++++++++++++++++++++
lib/librte_eal/common/eal_memalloc.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 49 +++++++++++++++++++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 21 +++++++++
lib/librte_eal/rte_eal_version.map | 2 +
6 files changed, 131 insertions(+)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index 80e4c3d4f..06afbcc99 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -60,6 +60,12 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
return -ENOTSUP;
}
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
+{
+ return -ENOTSUP;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 4a80deaf5..0b69804ff 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -599,6 +599,56 @@ rte_memseg_get_fd(const struct rte_memseg *ms)
return ret;
}
+int __rte_experimental
+rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
+ size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct rte_memseg_list *msl;
+ struct rte_fbarray *arr;
+ int msl_idx, seg_idx, ret;
+
+ if (ms == NULL || offset == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+
+ msl = rte_mem_virt2memseg_list(ms->addr);
+ if (msl == NULL) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ arr = &msl->memseg_arr;
+
+ msl_idx = msl - mcfg->memsegs;
+ seg_idx = rte_fbarray_find_idx(arr, ms);
+
+ if (!rte_fbarray_is_used(arr, seg_idx)) {
+ rte_errno = ENOENT;
+ return -1;
+ }
+
+ ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset);
+ if (ret < 0) {
+ rte_errno = -ret;
+ ret = -1;
+ }
+ return ret;
+}
+
+int __rte_experimental
+rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int ret;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+ ret = rte_memseg_get_fd_offset_thread_unsafe(ms, offset);
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index 70a214de4..af917c2f9 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -84,6 +84,9 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
+
int
eal_memalloc_init(void);
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 0d2a30056..14bd277a4 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -365,6 +365,55 @@ rte_memseg_get_fd(const struct rte_memseg *ms);
int __rte_experimental
rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms);
+/**
+ * Get offset into segment file descriptor associated with a particular memseg
+ * (if available).
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ * be used within memory-related callback functions.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ * @param offset
+ * A pointer to offset value where the result will be stored.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - EINVAL - ``offset`` pointer was NULL
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset);
+
+/**
+ * Get offset into segment file descriptor associated with a particular memseg
+ * (if available).
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ * from within memory-related callback functions.
+ *
+ * @param ms
+ * A pointer to memseg for which to get file descriptor.
+ * @param offset
+ * A pointer to offset value where the result will be stored.
+ *
+ * @return
+ * Valid file descriptor in case of success.
+ * -1 in case of error, with ``rte_errno`` set to the following values:
+ * - EINVAL - ``ms`` pointer was NULL or did not point to a valid memseg
+ * - EINVAL - ``offset`` pointer was NULL
+ * - ENODEV - ``ms`` fd is not available
+ * - ENOENT - ``ms`` is an unused segment
+ * - ENOTSUP - segment fd's are not supported
+ */
+int __rte_experimental
+rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
+ size_t *offset);
+
/**
* Dump the physical memory layout to a file.
*
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 21f842753..66e1d87b6 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1406,6 +1406,27 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
return fd;
}
+int
+eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* fd_list not initialized? */
+ if (fd_list[list_idx].len == 0)
+ return -ENODEV;
+ /* segment not active? */
+ if (fd_list[list_idx].fds[seg_idx] < 0)
+ return -ENOENT;
+ /* for single-file segments mode, offset is seg idx times page size */
+ if (internal_config.single_file_segments) {
+ size_t pgsz = mcfg->memsegs[list_idx].page_sz;
+ *offset = pgsz * seg_idx;
+ } else {
+ *offset = 0;
+ }
+ return 0;
+}
+
int
eal_memalloc_init(void)
{
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index e659e19d6..ba159f7cd 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -321,7 +321,9 @@ EXPERIMENTAL {
rte_memseg_contig_walk;
rte_memseg_contig_walk_thread_unsafe;
rte_memseg_get_fd;
+ rte_memseg_get_fd_offset;
rte_memseg_get_fd_thread_unsafe;
+ rte_memseg_get_fd_offset_thread_unsafe;
rte_memseg_list_walk;
rte_memseg_list_walk_thread_unsafe;
rte_memseg_walk;
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread
* [dpdk-dev] [PATCH v2 9/9] mem: support using memfd segments for in-memory mode
2018-08-23 16:59 [dpdk-dev] [PATCH 0/8] Improve running DPDK without hugetlbfs mounpoint Anatoly Burakov
` (16 preceding siblings ...)
2018-09-04 15:02 ` [dpdk-dev] [PATCH v2 8/9] mem: allow querying offset into segment fd Anatoly Burakov
@ 2018-09-04 15:02 ` Anatoly Burakov
17 siblings, 0 replies; 45+ messages in thread
From: Anatoly Burakov @ 2018-09-04 15:02 UTC (permalink / raw)
To: dev
Cc: tiwei.bie, ray.kinsella, zhihong.wang, maxime.coquelin,
kuralamudhan.ramakrishnan
Enable using memfd-created segments if supported by the system.
This will allow having real fd's for pages but without hugetlbfs
mounts, which will enable in-memory mode to be used with virtio.
The implementation is mostly piggy-backing on existing real-fd
code, except that we no longer need to unlink any files or track
per-page locks in single-file segments mode, because in-memory
mode does not support secondary processes anyway.
We move some checks from EAL command-line parsing code to memalloc
because it is now possible to use single-file segments mode with
in-memory mode, but only if memfd is supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/common/eal_common_options.c | 6 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 265 ++++++++++++++++++---
2 files changed, 235 insertions(+), 36 deletions(-)
diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 873099acc..ddd624110 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -1384,10 +1384,10 @@ eal_check_common_options(struct internal_config *internal_cfg)
" is only supported in non-legacy memory mode\n");
}
if (internal_cfg->single_file_segments &&
- internal_cfg->hugepage_unlink) {
+ internal_cfg->hugepage_unlink &&
+ !internal_cfg->in_memory) {
RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE_SEGMENTS" is "
- "not compatible with neither --"OPT_IN_MEMORY" nor "
- "--"OPT_HUGE_UNLINK"\n");
+ "not compatible with --"OPT_HUGE_UNLINK"\n");
return -1;
}
if (internal_cfg->legacy_mem &&
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 66e1d87b6..0422cbd8d 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -52,6 +52,23 @@ const int anonymous_hugepages_supported =
#define RTE_MAP_HUGE_SHIFT 26
#endif
+/*
+ * we don't actually care if memfd itself is supported - we only need to check
+ * if memfd supports hugetlbfs, as that already implies memfd support.
+ *
+ * also, this is not a constant, because while we may be *compiled* with memfd
+ * hugetlbfs support, we might not be *running* on a system that supports memfd
+ * and/or memfd with hugetlbfs, so we need to be able to adjust this flag at
+ * runtime, and fall back to anonymous memory.
+ */
+int memfd_create_supported =
+#ifdef MFD_HUGETLB
+#define MEMFD_SUPPORTED
+ 1;
+#else
+ 0;
+#endif
+
/*
* not all kernel version support fallocate on hugetlbfs, so fall back to
* ftruncate and disallow deallocation if fallocate is not supported.
@@ -191,6 +208,31 @@ get_file_size(int fd)
return st.st_size;
}
+static inline uint32_t
+bsf64(uint64_t v)
+{
+ return (uint32_t)__builtin_ctzll(v);
+}
+
+static inline uint32_t
+log2_u64(uint64_t v)
+{
+ if (v == 0)
+ return 0;
+ v = rte_align64pow2(v);
+ return bsf64(v);
+}
+
+static int
+pagesz_flags(uint64_t page_sz)
+{
+ /* as per mmap() manpage, all page sizes are log2 of page size
+ * shifted by MAP_HUGE_SHIFT
+ */
+ int log2 = log2_u64(page_sz);
+ return log2 << RTE_MAP_HUGE_SHIFT;
+}
+
/* returns 1 on successful lock, 0 on unsuccessful lock, -1 on error */
static int lock(int fd, int type)
{
@@ -287,12 +329,64 @@ static int unlock_segment(int list_idx, int seg_idx)
return 0;
}
+static int
+get_seg_memfd(struct hugepage_info *hi __rte_unused,
+ unsigned int list_idx __rte_unused,
+ unsigned int seg_idx __rte_unused)
+{
+#ifdef MEMFD_SUPPORTED
+ int fd;
+ char segname[250]; /* as per manpage, limit is 249 bytes plus null */
+
+ if (internal_config.single_file_segments) {
+ fd = fd_list[list_idx].memseg_list_fd;
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i", list_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].memseg_list_fd = fd;
+ }
+ } else {
+ fd = fd_list[list_idx].fds[seg_idx];
+
+ if (fd < 0) {
+ int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
+ snprintf(segname, sizeof(segname), "seg_%i-%i",
+ list_idx, seg_idx);
+ fd = memfd_create(segname, flags);
+ if (fd < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): memfd create failed: %s\n",
+ __func__, strerror(errno));
+ return -1;
+ }
+ fd_list[list_idx].fds[seg_idx] = fd;
+ }
+ }
+ return fd;
+#endif
+ return -1;
+}
+
static int
get_seg_fd(char *path, int buflen, struct hugepage_info *hi,
unsigned int list_idx, unsigned int seg_idx)
{
int fd;
+ /* for in-memory mode, we only make it here when we're sure we support
+ * memfd, and this is a special case.
+ */
+ if (internal_config.in_memory)
+ return get_seg_memfd(hi, list_idx, seg_idx);
+
if (internal_config.single_file_segments) {
/* create a hugepage file path */
eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx);
@@ -347,6 +441,33 @@ resize_hugefile(int fd, char *path, int list_idx, int seg_idx,
uint64_t fa_offset, uint64_t page_sz, bool grow)
{
bool again = false;
+
+ /* in-memory mode is a special case, because we don't need to perform
+ * any locking, and we can be sure that fallocate() is supported.
+ */
+ if (internal_config.in_memory) {
+ int flags = grow ? 0 : FALLOC_FL_PUNCH_HOLE |
+ FALLOC_FL_KEEP_SIZE;
+ int ret;
+
+ /* grow or shrink the file */
+ ret = fallocate(fd, flags, fa_offset, page_sz);
+
+ if (ret < 0) {
+ RTE_LOG(DEBUG, EAL, "%s(): fallocate() failed: %s\n",
+ __func__,
+ strerror(errno));
+ return -1;
+ }
+ /* increase/decrease total segment count */
+ fd_list[list_idx].count += (grow ? 1 : -1);
+ if (!grow && fd_list[list_idx].count == 0) {
+ close(fd_list[list_idx].memseg_list_fd);
+ fd_list[list_idx].memseg_list_fd = -1;
+ }
+ return 0;
+ }
+
do {
if (fallocate_supported == 0) {
/* we cannot deallocate memory if fallocate() is not
@@ -496,26 +617,34 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
void *new_addr;
alloc_sz = hi->hugepage_sz;
- if (!internal_config.single_file_segments &&
- internal_config.in_memory &&
- anonymous_hugepages_supported) {
- int log2, flags;
-
- log2 = rte_log2_u32(alloc_sz);
- /* as per mmap() manpage, all page sizes are log2 of page size
- * shifted by MAP_HUGE_SHIFT
- */
- flags = (log2 << RTE_MAP_HUGE_SHIFT) | MAP_HUGETLB | MAP_FIXED |
+
+ /* these are checked at init, but code analyzers don't know that */
+ if (internal_config.in_memory && !anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Anonymous hugepages not supported, in-memory mode cannot allocate memory\n");
+ return -1;
+ }
+ if (internal_config.in_memory && !memfd_create_supported &&
+ internal_config.single_file_segments) {
+ RTE_LOG(ERR, EAL, "Single-file segments are not supported without memfd support\n");
+ return -1;
+ }
+
+ /* in-memory without memfd is a special case */
+ int mmap_flags;
+
+ if (internal_config.in_memory && !memfd_create_supported) {
+ int pagesz_flag, flags;
+
+ pagesz_flag = pagesz_flags(alloc_sz);
+ flags = pagesz_flag | MAP_HUGETLB | MAP_FIXED |
MAP_PRIVATE | MAP_ANONYMOUS;
fd = -1;
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, flags, -1, 0);
+ mmap_flags = flags;
- /* single-file segments codepath will never be active because
- * in-memory mode is incompatible with it and it's stopped at
- * EAL initialization stage, however the compiler doesn't know
- * that and complains about map_offset being used uninitialized
- * on failure codepaths while having in-memory mode enabled. so,
- * assign a value here.
+ /* single-file segments codepath will never be active
+ * here because in-memory mode is incompatible with the
+ * fallback path, and it's stopped at EAL initialization
+ * stage.
*/
map_offset = 0;
} else {
@@ -539,7 +668,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
__func__, strerror(errno));
goto resized;
}
- if (internal_config.hugepage_unlink) {
+ if (internal_config.hugepage_unlink &&
+ !internal_config.in_memory) {
if (unlink(path)) {
RTE_LOG(DEBUG, EAL, "%s(): unlink() failed: %s\n",
__func__, strerror(errno));
@@ -547,16 +677,16 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
}
}
}
-
- /*
- * map the segment, and populate page tables, the kernel fills
- * this segment with zeros if it's a new page.
- */
- va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE | MAP_FIXED, fd,
- map_offset);
+ mmap_flags = MAP_SHARED | MAP_POPULATE | MAP_FIXED;
}
+ /*
+ * map the segment, and populate page tables, the kernel fills
+ * this segment with zeros if it's a new page.
+ */
+ va = mmap(addr, alloc_sz, PROT_READ | PROT_WRITE, mmap_flags, fd,
+ map_offset);
+
if (va == MAP_FAILED) {
RTE_LOG(DEBUG, EAL, "%s(): mmap() failed: %s\n", __func__,
strerror(errno));
@@ -663,7 +793,8 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
{
uint64_t map_offset;
char path[PATH_MAX];
- int fd, ret;
+ int fd, ret = 0;
+ bool exit_early;
/* erase page data */
memset(ms->addr, 0, ms->len);
@@ -675,8 +806,17 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
return -1;
}
+ exit_early = false;
+
+ /* if we're using anonymous hugepages, nothing to be done */
+ if (internal_config.in_memory && !memfd_create_supported)
+ exit_early = true;
+
/* if we've already unlinked the page, nothing needs to be done */
- if (internal_config.hugepage_unlink) {
+ if (!internal_config.in_memory && internal_config.hugepage_unlink)
+ exit_early = true;
+
+ if (exit_early) {
memset(ms, 0, sizeof(*ms));
return 0;
}
@@ -699,11 +839,13 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi,
/* if we're able to take out a write lock, we're the last one
* holding onto this page.
*/
- ret = lock(fd, LOCK_EX);
- if (ret >= 0) {
- /* no one else is using this page */
- if (ret == 1)
- unlink(path);
+ if (!internal_config.in_memory) {
+ ret = lock(fd, LOCK_EX);
+ if (ret >= 0) {
+ /* no one else is using this page */
+ if (ret == 1)
+ unlink(path);
+ }
}
/* closing fd will drop the lock */
close(fd);
@@ -1406,6 +1548,35 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
return fd;
}
+static int
+test_memfd_create(void)
+{
+#ifdef MEMFD_SUPPORTED
+ unsigned int i;
+ for (i = 0; i < internal_config.num_hugepage_sizes; i++) {
+ uint64_t pagesz = internal_config.hugepage_info[i].hugepage_sz;
+ int pagesz_flag = pagesz_flags(pagesz);
+ int flags;
+
+ flags = pagesz_flag | MFD_HUGETLB;
+ int fd = memfd_create("test", flags);
+ if (fd < 0) {
+ /* we failed - let memalloc know this isn't working */
+ if (errno == EINVAL) {
+ memfd_create_supported = 0;
+ return 0; /* not supported */
+ }
+
+ /* we got other error - something's wrong */
+ return -1; /* error */
+ }
+ close(fd);
+ return 1; /* supported */
+ }
+#endif
+ return 0; /* not supported */
+}
+
int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
{
@@ -1433,6 +1604,34 @@ eal_memalloc_init(void)
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
return -1;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
+ internal_config.in_memory) {
+ int mfd_res = test_memfd_create();
+
+ if (mfd_res < 0) {
+ RTE_LOG(ERR, EAL, "Unable to check if memfd is supported\n");
+ return -1;
+ }
+ if (mfd_res == 1)
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ else
+ RTE_LOG(INFO, EAL, "Using memfd is not supported, falling back to anonymous hugepages\n");
+
+ /* we only support single-file segments mode with in-memory mode
+ * if we support hugetlbfs with memfd_create. this code will
+ * test if we do.
+ */
+ if (internal_config.single_file_segments &&
+ mfd_res != 1) {
+ RTE_LOG(ERR, EAL, "Single-file segments mode cannot be used without memfd support\n");
+ return -1;
+ }
+ /* this cannot ever happen but better safe than sorry */
+ if (!anonymous_hugepages_supported) {
+ RTE_LOG(ERR, EAL, "Using anonymous memory is not supported\n");
+ return -1;
+ }
+ }
/* initialize all of the fd lists */
if (rte_memseg_list_walk(fd_list_create_walk, NULL))
--
2.17.1
^ permalink raw reply [flat|nested] 45+ messages in thread