* [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages
@ 2018-11-13 17:54 Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 1/2] memalloc: allow setting up segment list fd's Anatoly Burakov
` (7 more replies)
0 siblings, 8 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-11-13 17:54 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella
It is already possible to use both DPDK in general and
virtio specifically, without hugetlbfs mounts, but
currently virtio cannot be used without hugepage memory
(i.e. with a --no-huge EAL switch) due to the fact that
it needs to share memory with the backend.
This patchset uses memfd to create actual files backing
anonymous memory. This enabled virtio to work not only
without hugetlbfs, but without hugepages altogether,
which could be useful in Cloud Native scenarios.
Anatoly Burakov (2):
memalloc: allow setting up segment list fd's
mem: use memfd for no-huge mode
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++
lib/librte_eal/common/eal_memalloc.h | 4 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 ++++++++
lib/librte_eal/linuxapp/eal/eal_memory.c | 46 +++++++++++++++++++++-
4 files changed, 70 insertions(+), 2 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH 19.02 1/2] memalloc: allow setting up segment list fd's
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
@ 2018-11-13 17:54 ` Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode Anatoly Burakov
` (6 subsequent siblings)
7 siblings, 0 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-11-13 17:54 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, przemyslawx.lal, kuralamudhan.ramakrishnan,
ivan.coughlan, tiwei.bie, ray.kinsella
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 ++++++
lib/librte_eal/common/eal_memalloc.h | 4 ++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 16 ++++++++++++++++
3 files changed, 26 insertions(+)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index a5847f0bd..6893448db 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -61,6 +61,12 @@ eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused,
return -ENOTSUP;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx __rte_unused, int fd __rte_unused)
+{
+ return -ENOTSUP;
+}
+
int
eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused,
int seg_idx __rte_unused, size_t *offset __rte_unused)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index af917c2f9..b96c9c512 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -84,6 +84,10 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+/* returns 0 or -errno */
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd);
+
int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 48b9c7360..5bda92717 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1529,6 +1529,10 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ /* single file segments mode doesn't support individual segment fd's */
+ if (internal_config.single_file_segments)
+ return -ENOTSUP;
+
/* if list is not allocated, allocate it */
if (fd_list[list_idx].len == 0) {
int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1541,6 +1545,18 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
return 0;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd)
+{
+ /* non-single file segment mode doesn't support segment list fd's */
+ if (!internal_config.single_file_segments)
+ return -ENOTSUP;
+
+ fd_list[list_idx].memseg_list_fd = fd;
+
+ return 0;
+}
+
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 1/2] memalloc: allow setting up segment list fd's Anatoly Burakov
@ 2018-11-13 17:54 ` Anatoly Burakov
2018-11-28 4:57 ` Tiwei Bie
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
` (5 subsequent siblings)
7 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-11-13 17:54 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.
To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/librte_eal/linuxapp/eal/eal_memory.c | 46 ++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 48b23ce19..8feac2c56 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -25,6 +25,7 @@
#include <sys/time.h>
#include <signal.h>
#include <setjmp.h>
+#include <linux/memfd.h>
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
#include <numa.h>
#include <numaif.h>
@@ -1345,12 +1346,15 @@ eal_legacy_hugepage_init(void)
/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
struct rte_memseg_list *msl;
+ int n_segs, cur_seg, fd, memfd, flags;
uint64_t page_sz;
- int n_segs, cur_seg;
/* nohuge mode is legacy mode */
internal_config.legacy_mem = 1;
+ /* nohuge mode is single-file segments mode */
+ internal_config.single_file_segments = 1;
+
/* create a memseg list */
msl = &mcfg->memsegs[0];
@@ -1363,8 +1367,36 @@ eal_legacy_hugepage_init(void)
return -1;
}
+ /* set up parameters for anonymous mmap */
+ fd = -1;
+ flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+ /* create a memfd and store it in the segment fd table */
+ memfd = memfd_create("nohuge", 0);
+ if (memfd < 0) {
+ RTE_LOG(ERR, EAL, "Cannot create memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
+ } else {
+ /* we got an fd - now resize it */
+ if (ftruncate(memfd, internal_config.memory) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot resize memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
+ close(memfd);
+ } else {
+ /* creating memfd-backed file was successful.
+ * we want changes to memfd to be visible to
+ * other processes (such as vhost backend), so
+ * map it as shared memory.
+ */
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ fd = memfd;
+ flags = MAP_SHARED;
+ }
+ }
addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
- MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ flags, fd, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
@@ -1375,6 +1407,16 @@ eal_legacy_hugepage_init(void)
msl->socket_id = 0;
msl->len = internal_config.memory;
+ /* we're in single-file segments mode, so only the segment list
+ * fd needs to be set up.
+ */
+ if (fd != -1) {
+ if (eal_memalloc_set_seg_list_fd(0, fd) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot set up segment list fd\n");
+ /* not a serious error, proceed */
+ }
+ }
+
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
arr = &msl->memseg_arr;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-11-28 4:57 ` Tiwei Bie
2018-11-28 9:11 ` Burakov, Anatoly
0 siblings, 1 reply; 27+ messages in thread
From: Tiwei Bie @ 2018-11-28 4:57 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
ray.kinsella
On Tue, Nov 13, 2018 at 05:54:48PM +0000, Anatoly Burakov wrote:
> When running in no-huge mode, we anonymously allocate our memory.
> While this works for regular NICs and vdev's, it's not suitable
> for memory sharing scenarios such as virtio with vhost_user
> backend.
>
> To fix this, allocate no-huge memory using memfd, and register
> it with memalloc just like any other memseg fd. This will enable
> using rte_memseg_get_fd() API with --no-huge EAL flag.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> lib/librte_eal/linuxapp/eal/eal_memory.c | 46 ++++++++++++++++++++++--
> 1 file changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 48b23ce19..8feac2c56 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -25,6 +25,7 @@
> #include <sys/time.h>
> #include <signal.h>
> #include <setjmp.h>
> +#include <linux/memfd.h>
> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
> #include <numa.h>
> #include <numaif.h>
> @@ -1345,12 +1346,15 @@ eal_legacy_hugepage_init(void)
> /* hugetlbfs can be disabled */
> if (internal_config.no_hugetlbfs) {
> struct rte_memseg_list *msl;
> + int n_segs, cur_seg, fd, memfd, flags;
> uint64_t page_sz;
> - int n_segs, cur_seg;
>
> /* nohuge mode is legacy mode */
> internal_config.legacy_mem = 1;
>
> + /* nohuge mode is single-file segments mode */
> + internal_config.single_file_segments = 1;
> +
> /* create a memseg list */
> msl = &mcfg->memsegs[0];
>
> @@ -1363,8 +1367,36 @@ eal_legacy_hugepage_init(void)
> return -1;
> }
>
> + /* set up parameters for anonymous mmap */
> + fd = -1;
> + flags = MAP_PRIVATE | MAP_ANONYMOUS;
> +
> + /* create a memfd and store it in the segment fd table */
> + memfd = memfd_create("nohuge", 0);
> + if (memfd < 0) {
> + RTE_LOG(ERR, EAL, "Cannot create memfd: %s\n",
> + strerror(errno));
> + RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
> + } else {
> + /* we got an fd - now resize it */
> + if (ftruncate(memfd, internal_config.memory) < 0) {
> + RTE_LOG(ERR, EAL, "Cannot resize memfd: %s\n",
> + strerror(errno));
> + RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
> + close(memfd);
> + } else {
> + /* creating memfd-backed file was successful.
> + * we want changes to memfd to be visible to
> + * other processes (such as vhost backend), so
> + * map it as shared memory.
> + */
> + RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
> + fd = memfd;
> + flags = MAP_SHARED;
> + }
> + }
> addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> + flags, fd, 0);
> if (addr == MAP_FAILED) {
> RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> strerror(errno));
> @@ -1375,6 +1407,16 @@ eal_legacy_hugepage_init(void)
> msl->socket_id = 0;
> msl->len = internal_config.memory;
>
> + /* we're in single-file segments mode, so only the segment list
> + * fd needs to be set up.
> + */
> + if (fd != -1) {
> + if (eal_memalloc_set_seg_list_fd(0, fd) < 0) {
> + RTE_LOG(ERR, EAL, "Cannot set up segment list fd\n");
> + /* not a serious error, proceed */
> + }
> + }
Hi Anatoly,
Thanks for the work!
It seems the support for getting fd offset is missing in no-huge
mode. I got below error in virtio-user while trying this series
with --no-huge:
update_memory_region(): Failed to get offset, ms=0x10002e000 rte_errno=19
Thanks
> +
> /* populate memsegs. each memseg is one page long */
> for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
> arr = &msl->memseg_arr;
> --
> 2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode
2018-11-28 4:57 ` Tiwei Bie
@ 2018-11-28 9:11 ` Burakov, Anatoly
0 siblings, 0 replies; 27+ messages in thread
From: Burakov, Anatoly @ 2018-11-28 9:11 UTC (permalink / raw)
To: Tiwei Bie
Cc: dev, przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
ray.kinsella
On 28-Nov-18 4:57 AM, Tiwei Bie wrote:
> On Tue, Nov 13, 2018 at 05:54:48PM +0000, Anatoly Burakov wrote:
>> When running in no-huge mode, we anonymously allocate our memory.
>> While this works for regular NICs and vdev's, it's not suitable
>> for memory sharing scenarios such as virtio with vhost_user
>> backend.
>>
>> To fix this, allocate no-huge memory using memfd, and register
>> it with memalloc just like any other memseg fd. This will enable
>> using rte_memseg_get_fd() API with --no-huge EAL flag.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> lib/librte_eal/linuxapp/eal/eal_memory.c | 46 ++++++++++++++++++++++--
>> 1 file changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> index 48b23ce19..8feac2c56 100644
>> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
>> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
>> @@ -25,6 +25,7 @@
>> #include <sys/time.h>
>> #include <signal.h>
>> #include <setjmp.h>
>> +#include <linux/memfd.h>
>> #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
>> #include <numa.h>
>> #include <numaif.h>
>> @@ -1345,12 +1346,15 @@ eal_legacy_hugepage_init(void)
>> /* hugetlbfs can be disabled */
>> if (internal_config.no_hugetlbfs) {
>> struct rte_memseg_list *msl;
>> + int n_segs, cur_seg, fd, memfd, flags;
>> uint64_t page_sz;
>> - int n_segs, cur_seg;
>>
>> /* nohuge mode is legacy mode */
>> internal_config.legacy_mem = 1;
>>
>> + /* nohuge mode is single-file segments mode */
>> + internal_config.single_file_segments = 1;
>> +
>> /* create a memseg list */
>> msl = &mcfg->memsegs[0];
>>
>> @@ -1363,8 +1367,36 @@ eal_legacy_hugepage_init(void)
>> return -1;
>> }
>>
>> + /* set up parameters for anonymous mmap */
>> + fd = -1;
>> + flags = MAP_PRIVATE | MAP_ANONYMOUS;
>> +
>> + /* create a memfd and store it in the segment fd table */
>> + memfd = memfd_create("nohuge", 0);
>> + if (memfd < 0) {
>> + RTE_LOG(ERR, EAL, "Cannot create memfd: %s\n",
>> + strerror(errno));
>> + RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
>> + } else {
>> + /* we got an fd - now resize it */
>> + if (ftruncate(memfd, internal_config.memory) < 0) {
>> + RTE_LOG(ERR, EAL, "Cannot resize memfd: %s\n",
>> + strerror(errno));
>> + RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
>> + close(memfd);
>> + } else {
>> + /* creating memfd-backed file was successful.
>> + * we want changes to memfd to be visible to
>> + * other processes (such as vhost backend), so
>> + * map it as shared memory.
>> + */
>> + RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
>> + fd = memfd;
>> + flags = MAP_SHARED;
>> + }
>> + }
>> addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
>> - MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>> + flags, fd, 0);
>> if (addr == MAP_FAILED) {
>> RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
>> strerror(errno));
>> @@ -1375,6 +1407,16 @@ eal_legacy_hugepage_init(void)
>> msl->socket_id = 0;
>> msl->len = internal_config.memory;
>>
>> + /* we're in single-file segments mode, so only the segment list
>> + * fd needs to be set up.
>> + */
>> + if (fd != -1) {
>> + if (eal_memalloc_set_seg_list_fd(0, fd) < 0) {
>> + RTE_LOG(ERR, EAL, "Cannot set up segment list fd\n");
>> + /* not a serious error, proceed */
>> + }
>> + }
>
> Hi Anatoly,
>
> Thanks for the work!
>
> It seems the support for getting fd offset is missing in no-huge
> mode. I got below error in virtio-user while trying this series
> with --no-huge:
>
> update_memory_region(): Failed to get offset, ms=0x10002e000 rte_errno=19
That's weird, it should have been working. I'll look into it, thanks!
>
> Thanks
>
>> +
>> /* populate memsegs. each memseg is one page long */
>> for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
>> arr = &msl->memseg_arr;
>> --
>> 2.17.1
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 1/2] memalloc: allow setting up segment list fd's Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
2018-12-13 4:53 ` Tiwei Bie
` (6 more replies)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
` (4 subsequent siblings)
7 siblings, 7 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella, maxime.coquelin
It is already possible to use both DPDK in general and
virtio specifically, without hugetlbfs mounts, but
currently virtio cannot be used without hugepage memory
(i.e. with a --no-huge EAL switch) due to the fact that
it needs to share memory with the backend.
This patchset uses memfd to create actual files backing
anonymous memory. This enabled virtio to work not only
without hugetlbfs, but without hugepages altogether,
which could be useful in Cloud Native scenarios.
v2:
- Fixed segment fd list not being initialized
- Added some segment fd API fixes
- Added unit test for segment fd API
Anatoly Burakov (5):
mem: fix error code for segment fd API for external segs
memalloc: check for memfd support in segment fd API
memalloc: allow setting up segment list fd's
mem: use memfd for no-huge mode
test: add segment fd API test
doc/guides/rel_notes/release_19_02.rst | 13 +++++
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 ++
lib/librte_eal/common/eal_common_memory.c | 12 ++++
lib/librte_eal/common/eal_memalloc.h | 4 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 +++++++++++++++++++---
lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++-
test/test/test_memory.c | 43 ++++++++++++++
7 files changed, 188 insertions(+), 10 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 1/5] mem: fix error code for segment fd API for external segs
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
` (2 preceding siblings ...)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
` (3 subsequent siblings)
7 siblings, 0 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin, stable
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.
Fixes: 5282bb1c3695 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
The API is experimental, no deprecation notice needed.
doc/guides/rel_notes/release_19_02.rst | 6 ++++++
lib/librte_eal/common/eal_common_memory.c | 12 ++++++++++++
2 files changed, 18 insertions(+)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index a94fa86a7..ade41b9c8 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -84,6 +84,12 @@ API Changes
=========================================================
+* eal: segment fd API on Linux now sets error code to ``ENOTSUP`` in more cases
+ where segment fd API is not expected to be supported:
+
+ - On attempt to get segment fd for an externally allocated memory segment
+
+
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..999ba24b4 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -704,6 +704,12 @@ rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms)
return -1;
}
+ /* segment fd API is not supported for external segments */
+ if (msl->external) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+
ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx);
if (ret < 0) {
rte_errno = -ret;
@@ -754,6 +760,12 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
return -1;
}
+ /* segment fd API is not supported for external segments */
+ if (msl->external) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+
ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset);
if (ret < 0) {
rte_errno = -ret;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 2/5] memalloc: check for memfd support in segment fd API
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
` (3 preceding siblings ...)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
` (2 subsequent siblings)
7 siblings, 0 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin, stable
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.
Fixes: 41dbdb68723b ("mem: add external API to retrieve page fd")
Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
The API is experimental, no deprecation notice needed.
doc/guides/rel_notes/release_19_02.rst | 2 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 40 +++++++++++++++++-----
2 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index ade41b9c8..960098582 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -88,6 +88,8 @@ API Changes
where segment fd API is not expected to be supported:
- On attempt to get segment fd for an externally allocated memory segment
+ - In cases where memfd support would have been required to provide segment
+ fd's (such as in-memory or no-huge mode)
ABI Changes
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 784939566..a93548b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -23,6 +23,10 @@
#include <sys/time.h>
#include <signal.h>
#include <setjmp.h>
+#ifdef F_ADD_SEALS /* if file sealing is supported, so is memfd */
+#include <linux/memfd.h>
+#define MEMFD_SUPPORTED
+#endif
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
#include <numa.h>
#include <numaif.h>
@@ -53,8 +57,8 @@ const int anonymous_hugepages_supported =
#endif
/*
- * we don't actually care if memfd itself is supported - we only need to check
- * if memfd supports hugetlbfs, as that already implies memfd support.
+ * we've already checked memfd support at compile-time, but we also need to
+ * check if we can create hugepage files with memfd.
*
* also, this is not a constant, because while we may be *compiled* with memfd
* hugetlbfs support, we might not be *running* on a system that supports memfd
@@ -63,10 +67,11 @@ const int anonymous_hugepages_supported =
*/
static int memfd_create_supported =
#ifdef MFD_HUGETLB
-#define MEMFD_SUPPORTED
1;
+#define RTE_MFD_HUGETLB MFD_HUGETLB
#else
0;
+#define RTE_MFD_HUGETLB 4U
#endif
/*
@@ -338,12 +343,12 @@ get_seg_memfd(struct hugepage_info *hi __rte_unused,
int fd;
char segname[250]; /* as per manpage, limit is 249 bytes plus null */
+ int flags = RTE_MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
if (internal_config.single_file_segments) {
fd = fd_list[list_idx].memseg_list_fd;
if (fd < 0) {
- int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
-
snprintf(segname, sizeof(segname), "seg_%i", list_idx);
fd = memfd_create(segname, flags);
if (fd < 0) {
@@ -357,8 +362,6 @@ get_seg_memfd(struct hugepage_info *hi __rte_unused,
fd = fd_list[list_idx].fds[seg_idx];
if (fd < 0) {
- int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
-
snprintf(segname, sizeof(segname), "seg_%i-%i",
list_idx, seg_idx);
fd = memfd_create(segname, flags);
@@ -1542,6 +1545,17 @@ int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
int fd;
+
+ if (internal_config.in_memory || internal_config.no_hugetlbfs) {
+#ifndef MEMFD_SUPPORTED
+ /* in in-memory or no-huge mode, we rely on memfd support */
+ return -ENOTSUP;
+#endif
+ /* memfd supported, but hugetlbfs memfd may not be */
+ if (!internal_config.no_hugetlbfs && !memfd_create_supported)
+ return -ENOTSUP;
+ }
+
if (internal_config.single_file_segments) {
fd = fd_list[list_idx].memseg_list_fd;
} else if (fd_list[list_idx].len == 0) {
@@ -1565,7 +1579,7 @@ test_memfd_create(void)
int pagesz_flag = pagesz_flags(pagesz);
int flags;
- flags = pagesz_flag | MFD_HUGETLB;
+ flags = pagesz_flag | RTE_MFD_HUGETLB;
int fd = memfd_create("test", flags);
if (fd < 0) {
/* we failed - let memalloc know this isn't working */
@@ -1589,6 +1603,16 @@ eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ if (internal_config.in_memory || internal_config.no_hugetlbfs) {
+#ifndef MEMFD_SUPPORTED
+ /* in in-memory or no-huge mode, we rely on memfd support */
+ return -ENOTSUP;
+#endif
+ /* memfd supported, but hugetlbfs memfd may not be */
+ if (!internal_config.no_hugetlbfs && !memfd_create_supported)
+ return -ENOTSUP;
+ }
+
/* fd_list not initialized? */
if (fd_list[list_idx].len == 0)
return -ENODEV;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 3/5] memalloc: allow setting up segment list fd's
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
` (4 preceding siblings ...)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 5/5] test: add segment fd API test Anatoly Burakov
7 siblings, 0 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, przemyslawx.lal, kuralamudhan.ramakrishnan,
ivan.coughlan, tiwei.bie, ray.kinsella, maxime.coquelin
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v2:
- Add missing fd list allocation on setting segment
list fd
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++++
lib/librte_eal/common/eal_memalloc.h | 4 ++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 26 ++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index a5847f0bd..6893448db 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -61,6 +61,12 @@ eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused,
return -ENOTSUP;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx __rte_unused, int fd __rte_unused)
+{
+ return -ENOTSUP;
+}
+
int
eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused,
int seg_idx __rte_unused, size_t *offset __rte_unused)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index af917c2f9..b96c9c512 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -84,6 +84,10 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+/* returns 0 or -errno */
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd);
+
int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index a93548b8c..eef140b33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1529,6 +1529,10 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ /* single file segments mode doesn't support individual segment fd's */
+ if (internal_config.single_file_segments)
+ return -ENOTSUP;
+
/* if list is not allocated, allocate it */
if (fd_list[list_idx].len == 0) {
int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1541,6 +1545,28 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
return 0;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* non-single file segment mode doesn't support segment list fd's */
+ if (!internal_config.single_file_segments)
+ return -ENOTSUP;
+
+ /* if list is not allocated, allocate it */
+ if (fd_list[list_idx].len == 0) {
+ int len = mcfg->memsegs[list_idx].memseg_arr.len;
+
+ if (alloc_list(list_idx, len) < 0)
+ return -ENOMEM;
+ }
+
+ fd_list[list_idx].memseg_list_fd = fd;
+
+ return 0;
+}
+
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
` (5 preceding siblings ...)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
2018-12-13 4:59 ` Tiwei Bie
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 5/5] test: add segment fd API test Anatoly Burakov
7 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.
To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v2:
- Detect memfd support at compile time
- Change memfd-related log level to debug
doc/guides/rel_notes/release_19_02.rst | 5 +++
lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
2 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index 960098582..420d51b5b 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -23,6 +23,11 @@ DPDK Release 19.02
New Features
------------
+* **Support for using VirtIO without hugepages**
+
+ The --no-huge mode was augmented to use memfd-backed memory (on systems that
+ support memfd), to allow using VirtIO-based NICs without hugepages.
+
.. This section should contain new features added in this release.
Sample format:
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 32feb415d..7d922a965 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -25,6 +25,10 @@
#include <sys/time.h>
#include <signal.h>
#include <setjmp.h>
+#ifdef F_ADD_SEALS /* if file sealing is supported, so is memfd */
+#include <linux/memfd.h>
+#define MEMFD_SUPPORTED
+#endif
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
#include <numa.h>
#include <numaif.h>
@@ -1341,12 +1345,18 @@ eal_legacy_hugepage_init(void)
/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
struct rte_memseg_list *msl;
+ int n_segs, cur_seg, fd, flags;
+#ifdef MEMFD_SUPPORTED
+ int memfd;
+#endif
uint64_t page_sz;
- int n_segs, cur_seg;
/* nohuge mode is legacy mode */
internal_config.legacy_mem = 1;
+ /* nohuge mode is single-file segments mode */
+ internal_config.single_file_segments = 1;
+
/* create a memseg list */
msl = &mcfg->memsegs[0];
@@ -1359,8 +1369,38 @@ eal_legacy_hugepage_init(void)
return -1;
}
+ /* set up parameters for anonymous mmap */
+ fd = -1;
+ flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+#ifdef MEMFD_SUPPORTED
+ /* create a memfd and store it in the segment fd table */
+ memfd = memfd_create("nohuge", 0);
+ if (memfd < 0) {
+ RTE_LOG(DEBUG, EAL, "Cannot create memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(DEBUG, EAL, "Falling back to anonymous map\n");
+ } else {
+ /* we got an fd - now resize it */
+ if (ftruncate(memfd, internal_config.memory) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot resize memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
+ close(memfd);
+ } else {
+ /* creating memfd-backed file was successful.
+ * we want changes to memfd to be visible to
+ * other processes (such as vhost backend), so
+ * map it as shared memory.
+ */
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ fd = memfd;
+ flags = MAP_SHARED;
+ }
+ }
+#endif
addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
- MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ flags, fd, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
@@ -1371,6 +1411,16 @@ eal_legacy_hugepage_init(void)
msl->socket_id = 0;
msl->len = internal_config.memory;
+ /* we're in single-file segments mode, so only the segment list
+ * fd needs to be set up.
+ */
+ if (fd != -1) {
+ if (eal_memalloc_set_seg_list_fd(0, fd) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot set up segment list fd\n");
+ /* not a serious error, proceed */
+ }
+ }
+
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
arr = &msl->memseg_arr;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v2 5/5] test: add segment fd API test
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
` (6 preceding siblings ...)
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-12-11 16:43 ` Anatoly Burakov
7 siblings, 0 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-11 16:43 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella, maxime.coquelin
Use memory autotest to also test segment fd API. This will not do
any checks - just see if the relevant API's return success or
indicate that the API is not supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
test/test/test_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/test/test/test_memory.c b/test/test/test_memory.c
index b96bca771..3da803e4e 100644
--- a/test/test/test_memory.c
+++ b/test/test/test_memory.c
@@ -37,10 +37,44 @@ check_mem(const struct rte_memseg_list *msl __rte_unused,
return 0;
}
+static int
+check_seg_fds(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg __rte_unused)
+{
+ size_t offset;
+ int ret;
+
+ /* skip external segments */
+ if (msl->external)
+ return 0;
+
+ /* try segment fd first. we're in a callback, so thread-unsafe */
+ ret = rte_memseg_get_fd_thread_unsafe(ms);
+ if (ret < 0) {
+ /* ENOTSUP means segment is valid, but there is not support for
+ * segment fd API (e.g. on FreeBSD).
+ */
+ if (errno == ENOTSUP)
+ return 1;
+ /* all other errors are treated as failures */
+ return -1;
+ }
+
+ /* we're able to get memseg fd - try getting its offset */
+ ret = rte_memseg_get_fd_offset_thread_unsafe(ms, &offset);
+ if (ret < 0) {
+ if (errno == ENOTSUP)
+ return 1;
+ return -1;
+ }
+ return 0;
+}
+
static int
test_memory(void)
{
uint64_t s;
+ int ret;
/*
* dump the mapped memory: the python-expect script checks
@@ -59,6 +93,15 @@ test_memory(void)
/* try to read memory (should not segfault) */
rte_memseg_walk(check_mem, NULL);
+ /* check segment fd support */
+ ret = rte_memseg_walk(check_seg_fds, NULL);
+ if (ret == 1) {
+ printf("Segment fd API is unsupported\n");
+ } else if (ret == -1) {
+ printf("Error getting segment fd's\n");
+ return -1;
+ }
+
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
@ 2018-12-13 4:53 ` Tiwei Bie
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user " Anatoly Burakov
` (5 subsequent siblings)
6 siblings, 0 replies; 27+ messages in thread
From: Tiwei Bie @ 2018-12-13 4:53 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
ray.kinsella, maxime.coquelin
On Tue, Dec 11, 2018 at 04:43:27PM +0000, Anatoly Burakov wrote:
> It is already possible to use both DPDK in general and
> virtio specifically, without hugetlbfs mounts, but
> currently virtio cannot be used without hugepage memory
> (i.e. with a --no-huge EAL switch) due to the fact that
> it needs to share memory with the backend.
>
> This patchset uses memfd to create actual files backing
> anonymous memory. This enabled virtio to work not only
> without hugetlbfs, but without hugepages altogether,
> which could be useful in Cloud Native scenarios.
Nice work!
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
>
> v2:
> - Fixed segment fd list not being initialized
> - Added some segment fd API fixes
> - Added unit test for segment fd API
>
> Anatoly Burakov (5):
> mem: fix error code for segment fd API for external segs
> memalloc: check for memfd support in segment fd API
> memalloc: allow setting up segment list fd's
> mem: use memfd for no-huge mode
> test: add segment fd API test
>
> doc/guides/rel_notes/release_19_02.rst | 13 +++++
> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 ++
> lib/librte_eal/common/eal_common_memory.c | 12 ++++
> lib/librte_eal/common/eal_memalloc.h | 4 ++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 +++++++++++++++++++---
> lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++-
> test/test/test_memory.c | 43 ++++++++++++++
> 7 files changed, 188 insertions(+), 10 deletions(-)
>
> --
> 2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-12-13 4:59 ` Tiwei Bie
2018-12-13 11:36 ` Burakov, Anatoly
0 siblings, 1 reply; 27+ messages in thread
From: Tiwei Bie @ 2018-12-13 4:59 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, ray.kinsella,
maxime.coquelin
On Tue, Dec 11, 2018 at 04:43:31PM +0000, Anatoly Burakov wrote:
> When running in no-huge mode, we anonymously allocate our memory.
> While this works for regular NICs and vdev's, it's not suitable
> for memory sharing scenarios such as virtio with vhost_user
> backend.
>
> To fix this, allocate no-huge memory using memfd, and register
> it with memalloc just like any other memseg fd. This will enable
> using rte_memseg_get_fd() API with --no-huge EAL flag.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> v2:
> - Detect memfd support at compile time
> - Change memfd-related log level to debug
>
> doc/guides/rel_notes/release_19_02.rst | 5 +++
> lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
> 2 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
> index 960098582..420d51b5b 100644
> --- a/doc/guides/rel_notes/release_19_02.rst
> +++ b/doc/guides/rel_notes/release_19_02.rst
> @@ -23,6 +23,11 @@ DPDK Release 19.02
> New Features
> ------------
>
> +* **Support for using VirtIO without hugepages**
> +
> + The --no-huge mode was augmented to use memfd-backed memory (on systems that
> + support memfd), to allow using VirtIO-based NICs without hugepages.
It would be better to say virtio-user here, because virtio NICs
e.g. the one emulated by QEMU, could be something quite different.
> +
> .. This section should contain new features added in this release.
> Sample format:
>
[...]
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode
2018-12-13 4:59 ` Tiwei Bie
@ 2018-12-13 11:36 ` Burakov, Anatoly
0 siblings, 0 replies; 27+ messages in thread
From: Burakov, Anatoly @ 2018-12-13 11:36 UTC (permalink / raw)
To: Tiwei Bie
Cc: dev, John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, ray.kinsella,
maxime.coquelin
On 13-Dec-18 4:59 AM, Tiwei Bie wrote:
> On Tue, Dec 11, 2018 at 04:43:31PM +0000, Anatoly Burakov wrote:
>> When running in no-huge mode, we anonymously allocate our memory.
>> While this works for regular NICs and vdev's, it's not suitable
>> for memory sharing scenarios such as virtio with vhost_user
>> backend.
>>
>> To fix this, allocate no-huge memory using memfd, and register
>> it with memalloc just like any other memseg fd. This will enable
>> using rte_memseg_get_fd() API with --no-huge EAL flag.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>> v2:
>> - Detect memfd support at compile time
>> - Change memfd-related log level to debug
>>
>> doc/guides/rel_notes/release_19_02.rst | 5 +++
>> lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
>> 2 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
>> index 960098582..420d51b5b 100644
>> --- a/doc/guides/rel_notes/release_19_02.rst
>> +++ b/doc/guides/rel_notes/release_19_02.rst
>> @@ -23,6 +23,11 @@ DPDK Release 19.02
>> New Features
>> ------------
>>
>> +* **Support for using VirtIO without hugepages**
>> +
>> + The --no-huge mode was augmented to use memfd-backed memory (on systems that
>> + support memfd), to allow using VirtIO-based NICs without hugepages.
>
> It would be better to say virtio-user here, because virtio NICs
> e.g. the one emulated by QEMU, could be something quite different.
Thanks, will fix!
>
>> +
>> .. This section should contain new features added in this release.
>> Sample format:
>>
> [...]
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user without hugepages
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
2018-12-13 4:53 ` Tiwei Bie
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-20 22:01 ` Thomas Monjalon
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
` (4 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella, maxime.coquelin
It is already possible to use both DPDK in general and
virtio-user specifically, without hugetlbfs mounts, but
currently virtio-user cannot be used without hugepage
memory (i.e. with a --no-huge EAL switch) due to the
fact that it needs to share memory with the backend.
This patchset uses memfd to create actual files backing
anonymous memory. This enabled virtio-user to work not
only without hugetlbfs (which was already possible), but
without hugepages altogether, which could be useful in
Cloud Native scenarios.
v3:
- Clarify doc changes
v2:
- Fixed segment fd list not being initialized
- Added some segment fd API fixes
- Added unit test for segment fd API
Anatoly Burakov (5):
mem: fix error code for segment fd API for external segs
memalloc: check for memfd support in segment fd API
memalloc: allow setting up segment list fd's
mem: use memfd for no-huge mode
test: add segment fd API test
doc/guides/rel_notes/release_19_02.rst | 13 +++++
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 ++
lib/librte_eal/common/eal_common_memory.c | 12 ++++
lib/librte_eal/common/eal_memalloc.h | 4 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 66 +++++++++++++++++++---
lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++-
test/test/test_memory.c | 43 ++++++++++++++
7 files changed, 188 insertions(+), 10 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
2018-12-13 4:53 ` Tiwei Bie
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user " Anatoly Burakov
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-14 9:15 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
` (3 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin, stable
Segment fd API does not support getting segment fd's from
externally allocated memory, so return proper error code
on any attempts to do so. This changes API behavior, so
document the change as well.
Fixes: 5282bb1c3695 ("mem: allow memseg lists to be marked as external")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
---
Notes:
The API is experimental, no deprecation notice needed.
doc/guides/rel_notes/release_19_02.rst | 6 ++++++
lib/librte_eal/common/eal_common_memory.c | 12 ++++++++++++
2 files changed, 18 insertions(+)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index a94fa86a7..ade41b9c8 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -84,6 +84,12 @@ API Changes
=========================================================
+* eal: segment fd API on Linux now sets error code to ``ENOTSUP`` in more cases
+ where segment fd API is not expected to be supported:
+
+ - On attempt to get segment fd for an externally allocated memory segment
+
+
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index d47ea4938..999ba24b4 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -704,6 +704,12 @@ rte_memseg_get_fd_thread_unsafe(const struct rte_memseg *ms)
return -1;
}
+ /* segment fd API is not supported for external segments */
+ if (msl->external) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+
ret = eal_memalloc_get_seg_fd(msl_idx, seg_idx);
if (ret < 0) {
rte_errno = -ret;
@@ -754,6 +760,12 @@ rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
return -1;
}
+ /* segment fd API is not supported for external segments */
+ if (msl->external) {
+ rte_errno = ENOTSUP;
+ return -1;
+ }
+
ret = eal_memalloc_get_seg_fd_offset(msl_idx, seg_idx, offset);
if (ret < 0) {
rte_errno = -ret;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
` (2 preceding siblings ...)
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-14 9:19 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
` (2 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin, stable
If memfd support was not compiled, or hugepage memfd support
is not available at runtime, the API will now return proper
error code, indicating that this API is unsupported. This
changes the API, so document the changes.
Fixes: 41dbdb68723b ("mem: add external API to retrieve page fd")
Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
---
Notes:
The API is experimental, no deprecation notice needed.
doc/guides/rel_notes/release_19_02.rst | 2 ++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 40 +++++++++++++++++-----
2 files changed, 34 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index ade41b9c8..960098582 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -88,6 +88,8 @@ API Changes
where segment fd API is not expected to be supported:
- On attempt to get segment fd for an externally allocated memory segment
+ - In cases where memfd support would have been required to provide segment
+ fd's (such as in-memory or no-huge mode)
ABI Changes
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 784939566..a93548b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -23,6 +23,10 @@
#include <sys/time.h>
#include <signal.h>
#include <setjmp.h>
+#ifdef F_ADD_SEALS /* if file sealing is supported, so is memfd */
+#include <linux/memfd.h>
+#define MEMFD_SUPPORTED
+#endif
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
#include <numa.h>
#include <numaif.h>
@@ -53,8 +57,8 @@ const int anonymous_hugepages_supported =
#endif
/*
- * we don't actually care if memfd itself is supported - we only need to check
- * if memfd supports hugetlbfs, as that already implies memfd support.
+ * we've already checked memfd support at compile-time, but we also need to
+ * check if we can create hugepage files with memfd.
*
* also, this is not a constant, because while we may be *compiled* with memfd
* hugetlbfs support, we might not be *running* on a system that supports memfd
@@ -63,10 +67,11 @@ const int anonymous_hugepages_supported =
*/
static int memfd_create_supported =
#ifdef MFD_HUGETLB
-#define MEMFD_SUPPORTED
1;
+#define RTE_MFD_HUGETLB MFD_HUGETLB
#else
0;
+#define RTE_MFD_HUGETLB 4U
#endif
/*
@@ -338,12 +343,12 @@ get_seg_memfd(struct hugepage_info *hi __rte_unused,
int fd;
char segname[250]; /* as per manpage, limit is 249 bytes plus null */
+ int flags = RTE_MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
+
if (internal_config.single_file_segments) {
fd = fd_list[list_idx].memseg_list_fd;
if (fd < 0) {
- int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
-
snprintf(segname, sizeof(segname), "seg_%i", list_idx);
fd = memfd_create(segname, flags);
if (fd < 0) {
@@ -357,8 +362,6 @@ get_seg_memfd(struct hugepage_info *hi __rte_unused,
fd = fd_list[list_idx].fds[seg_idx];
if (fd < 0) {
- int flags = MFD_HUGETLB | pagesz_flags(hi->hugepage_sz);
-
snprintf(segname, sizeof(segname), "seg_%i-%i",
list_idx, seg_idx);
fd = memfd_create(segname, flags);
@@ -1542,6 +1545,17 @@ int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
int fd;
+
+ if (internal_config.in_memory || internal_config.no_hugetlbfs) {
+#ifndef MEMFD_SUPPORTED
+ /* in in-memory or no-huge mode, we rely on memfd support */
+ return -ENOTSUP;
+#endif
+ /* memfd supported, but hugetlbfs memfd may not be */
+ if (!internal_config.no_hugetlbfs && !memfd_create_supported)
+ return -ENOTSUP;
+ }
+
if (internal_config.single_file_segments) {
fd = fd_list[list_idx].memseg_list_fd;
} else if (fd_list[list_idx].len == 0) {
@@ -1565,7 +1579,7 @@ test_memfd_create(void)
int pagesz_flag = pagesz_flags(pagesz);
int flags;
- flags = pagesz_flag | MFD_HUGETLB;
+ flags = pagesz_flag | RTE_MFD_HUGETLB;
int fd = memfd_create("test", flags);
if (fd < 0) {
/* we failed - let memalloc know this isn't working */
@@ -1589,6 +1603,16 @@ eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ if (internal_config.in_memory || internal_config.no_hugetlbfs) {
+#ifndef MEMFD_SUPPORTED
+ /* in in-memory or no-huge mode, we rely on memfd support */
+ return -ENOTSUP;
+#endif
+ /* memfd supported, but hugetlbfs memfd may not be */
+ if (!internal_config.no_hugetlbfs && !memfd_create_supported)
+ return -ENOTSUP;
+ }
+
/* fd_list not initialized? */
if (fd_list[list_idx].len == 0)
return -ENODEV;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
` (3 preceding siblings ...)
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-14 10:03 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode Anatoly Burakov
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test Anatoly Burakov
6 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: Bruce Richardson, przemyslawx.lal, kuralamudhan.ramakrishnan,
ivan.coughlan, tiwei.bie, ray.kinsella, maxime.coquelin
Currently, only segment fd's for multi-file segments are supported,
while for memfd-backed no-huge memory we need single-file segments
mode. Add support for single-file segments in the internal API.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
---
Notes:
v2:
- Add missing fd list allocation on setting segment
list fd
lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++++
lib/librte_eal/common/eal_memalloc.h | 4 ++++
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 26 ++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/lib/librte_eal/bsdapp/eal/eal_memalloc.c b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
index a5847f0bd..6893448db 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memalloc.c
@@ -61,6 +61,12 @@ eal_memalloc_set_seg_fd(int list_idx __rte_unused, int seg_idx __rte_unused,
return -ENOTSUP;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx __rte_unused, int fd __rte_unused)
+{
+ return -ENOTSUP;
+}
+
int
eal_memalloc_get_seg_fd_offset(int list_idx __rte_unused,
int seg_idx __rte_unused, size_t *offset __rte_unused)
diff --git a/lib/librte_eal/common/eal_memalloc.h b/lib/librte_eal/common/eal_memalloc.h
index af917c2f9..b96c9c512 100644
--- a/lib/librte_eal/common/eal_memalloc.h
+++ b/lib/librte_eal/common/eal_memalloc.h
@@ -84,6 +84,10 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx);
int
eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd);
+/* returns 0 or -errno */
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd);
+
int
eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index a93548b8c..eef140b33 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1529,6 +1529,10 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ /* single file segments mode doesn't support individual segment fd's */
+ if (internal_config.single_file_segments)
+ return -ENOTSUP;
+
/* if list is not allocated, allocate it */
if (fd_list[list_idx].len == 0) {
int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1541,6 +1545,28 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
return 0;
}
+int
+eal_memalloc_set_seg_list_fd(int list_idx, int fd)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+
+ /* non-single file segment mode doesn't support segment list fd's */
+ if (!internal_config.single_file_segments)
+ return -ENOTSUP;
+
+ /* if list is not allocated, allocate it */
+ if (fd_list[list_idx].len == 0) {
+ int len = mcfg->memsegs[list_idx].memseg_arr.len;
+
+ if (alloc_list(list_idx, len) < 0)
+ return -ENOMEM;
+ }
+
+ fd_list[list_idx].memseg_list_fd = fd;
+
+ return 0;
+}
+
int
eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
{
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
` (4 preceding siblings ...)
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-13 11:59 ` Burakov, Anatoly
2018-12-14 10:06 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test Anatoly Burakov
6 siblings, 2 replies; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin
When running in no-huge mode, we anonymously allocate our memory.
While this works for regular NICs and vdev's, it's not suitable
for memory sharing scenarios such as virtio with vhost_user
backend.
To fix this, allocate no-huge memory using memfd, and register
it with memalloc just like any other memseg fd. This will enable
using rte_memseg_get_fd() API with --no-huge EAL flag.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
---
Notes:
v3:
- Clarify release notes to state that the changes apply to
virtio-user NICs rather than virtio in general
v2:
- Detect memfd support at compile time
- Change memfd-related log level to debug
doc/guides/rel_notes/release_19_02.rst | 5 +++
lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
2 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
index 960098582..f733ad139 100644
--- a/doc/guides/rel_notes/release_19_02.rst
+++ b/doc/guides/rel_notes/release_19_02.rst
@@ -23,6 +23,11 @@ DPDK Release 19.02
New Features
------------
+* **Support for using VirtIO without hugepages**
+
+ The --no-huge mode was augmented to use memfd-backed memory (on systems that
+ support memfd), to allow using virtio-user-based NICs without hugepages.
+
.. This section should contain new features added in this release.
Sample format:
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 32feb415d..7d922a965 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -25,6 +25,10 @@
#include <sys/time.h>
#include <signal.h>
#include <setjmp.h>
+#ifdef F_ADD_SEALS /* if file sealing is supported, so is memfd */
+#include <linux/memfd.h>
+#define MEMFD_SUPPORTED
+#endif
#ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES
#include <numa.h>
#include <numaif.h>
@@ -1341,12 +1345,18 @@ eal_legacy_hugepage_init(void)
/* hugetlbfs can be disabled */
if (internal_config.no_hugetlbfs) {
struct rte_memseg_list *msl;
+ int n_segs, cur_seg, fd, flags;
+#ifdef MEMFD_SUPPORTED
+ int memfd;
+#endif
uint64_t page_sz;
- int n_segs, cur_seg;
/* nohuge mode is legacy mode */
internal_config.legacy_mem = 1;
+ /* nohuge mode is single-file segments mode */
+ internal_config.single_file_segments = 1;
+
/* create a memseg list */
msl = &mcfg->memsegs[0];
@@ -1359,8 +1369,38 @@ eal_legacy_hugepage_init(void)
return -1;
}
+ /* set up parameters for anonymous mmap */
+ fd = -1;
+ flags = MAP_PRIVATE | MAP_ANONYMOUS;
+
+#ifdef MEMFD_SUPPORTED
+ /* create a memfd and store it in the segment fd table */
+ memfd = memfd_create("nohuge", 0);
+ if (memfd < 0) {
+ RTE_LOG(DEBUG, EAL, "Cannot create memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(DEBUG, EAL, "Falling back to anonymous map\n");
+ } else {
+ /* we got an fd - now resize it */
+ if (ftruncate(memfd, internal_config.memory) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot resize memfd: %s\n",
+ strerror(errno));
+ RTE_LOG(ERR, EAL, "Falling back to anonymous map\n");
+ close(memfd);
+ } else {
+ /* creating memfd-backed file was successful.
+ * we want changes to memfd to be visible to
+ * other processes (such as vhost backend), so
+ * map it as shared memory.
+ */
+ RTE_LOG(DEBUG, EAL, "Using memfd for anonymous memory\n");
+ fd = memfd;
+ flags = MAP_SHARED;
+ }
+ }
+#endif
addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
- MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ flags, fd, 0);
if (addr == MAP_FAILED) {
RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
strerror(errno));
@@ -1371,6 +1411,16 @@ eal_legacy_hugepage_init(void)
msl->socket_id = 0;
msl->len = internal_config.memory;
+ /* we're in single-file segments mode, so only the segment list
+ * fd needs to be set up.
+ */
+ if (fd != -1) {
+ if (eal_memalloc_set_seg_list_fd(0, fd) < 0) {
+ RTE_LOG(ERR, EAL, "Cannot set up segment list fd\n");
+ /* not a serious error, proceed */
+ }
+ }
+
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
arr = &msl->memseg_arr;
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
` (5 preceding siblings ...)
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-12-13 11:43 ` Anatoly Burakov
2018-12-14 10:09 ` Maxime Coquelin
6 siblings, 1 reply; 27+ messages in thread
From: Anatoly Burakov @ 2018-12-13 11:43 UTC (permalink / raw)
To: dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella, maxime.coquelin
Use memory autotest to also test segment fd API. This will not do
any checks - just see if the relevant API's return success or
indicate that the API is not supported.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Tiwei Bie <tiwei.bie@intel.com>
---
test/test/test_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/test/test/test_memory.c b/test/test/test_memory.c
index b96bca771..3da803e4e 100644
--- a/test/test/test_memory.c
+++ b/test/test/test_memory.c
@@ -37,10 +37,44 @@ check_mem(const struct rte_memseg_list *msl __rte_unused,
return 0;
}
+static int
+check_seg_fds(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg __rte_unused)
+{
+ size_t offset;
+ int ret;
+
+ /* skip external segments */
+ if (msl->external)
+ return 0;
+
+ /* try segment fd first. we're in a callback, so thread-unsafe */
+ ret = rte_memseg_get_fd_thread_unsafe(ms);
+ if (ret < 0) {
+ /* ENOTSUP means segment is valid, but there is not support for
+ * segment fd API (e.g. on FreeBSD).
+ */
+ if (errno == ENOTSUP)
+ return 1;
+ /* all other errors are treated as failures */
+ return -1;
+ }
+
+ /* we're able to get memseg fd - try getting its offset */
+ ret = rte_memseg_get_fd_offset_thread_unsafe(ms, &offset);
+ if (ret < 0) {
+ if (errno == ENOTSUP)
+ return 1;
+ return -1;
+ }
+ return 0;
+}
+
static int
test_memory(void)
{
uint64_t s;
+ int ret;
/*
* dump the mapped memory: the python-expect script checks
@@ -59,6 +93,15 @@ test_memory(void)
/* try to read memory (should not segfault) */
rte_memseg_walk(check_mem, NULL);
+ /* check segment fd support */
+ ret = rte_memseg_walk(check_seg_fds, NULL);
+ if (ret == 1) {
+ printf("Segment fd API is unsupported\n");
+ } else if (ret == -1) {
+ printf("Error getting segment fd's\n");
+ return -1;
+ }
+
return 0;
}
--
2.17.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode Anatoly Burakov
@ 2018-12-13 11:59 ` Burakov, Anatoly
2018-12-14 10:06 ` Maxime Coquelin
1 sibling, 0 replies; 27+ messages in thread
From: Burakov, Anatoly @ 2018-12-13 11:59 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, maxime.coquelin
On 13-Dec-18 11:43 AM, Anatoly Burakov wrote:
> When running in no-huge mode, we anonymously allocate our memory.
> While this works for regular NICs and vdev's, it's not suitable
> for memory sharing scenarios such as virtio with vhost_user
> backend.
>
> To fix this, allocate no-huge memory using memfd, and register
> it with memalloc just like any other memseg fd. This will enable
> using rte_memseg_get_fd() API with --no-huge EAL flag.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>
> Notes:
> v3:
> - Clarify release notes to state that the changes apply to
> virtio-user NICs rather than virtio in general
>
> v2:
> - Detect memfd support at compile time
> - Change memfd-related log level to debug
>
> doc/guides/rel_notes/release_19_02.rst | 5 +++
> lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
> 2 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
> index 960098582..f733ad139 100644
> --- a/doc/guides/rel_notes/release_19_02.rst
> +++ b/doc/guides/rel_notes/release_19_02.rst
> @@ -23,6 +23,11 @@ DPDK Release 19.02
> New Features
> ------------
>
> +* **Support for using VirtIO without hugepages**
^^ oops, forgot to fix the title... Should be virtio-user.
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
@ 2018-12-14 9:15 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-12-14 9:15 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, stable
On 12/13/18 12:43 PM, Anatoly Burakov wrote:
> Segment fd API does not support getting segment fd's from
> externally allocated memory, so return proper error code
> on any attempts to do so. This changes API behavior, so
> document the change as well.
>
> Fixes: 5282bb1c3695 ("mem: allow memseg lists to be marked as external")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>
> Notes:
> The API is experimental, no deprecation notice needed.
>
> doc/guides/rel_notes/release_19_02.rst | 6 ++++++
> lib/librte_eal/common/eal_common_memory.c | 12 ++++++++++++
> 2 files changed, 18 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
@ 2018-12-14 9:19 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-12-14 9:19 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella, stable
On 12/13/18 12:43 PM, Anatoly Burakov wrote:
> If memfd support was not compiled, or hugepage memfd support
> is not available at runtime, the API will now return proper
> error code, indicating that this API is unsupported. This
> changes the API, so document the changes.
>
> Fixes: 41dbdb68723b ("mem: add external API to retrieve page fd")
> Fixes: 3a44687139eb ("mem: allow querying offset into segment fd")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>
> Notes:
> The API is experimental, no deprecation notice needed.
>
> doc/guides/rel_notes/release_19_02.rst | 2 ++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 40 +++++++++++++++++-----
> 2 files changed, 34 insertions(+), 8 deletions(-)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
@ 2018-12-14 10:03 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-12-14 10:03 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Bruce Richardson, przemyslawx.lal, kuralamudhan.ramakrishnan,
ivan.coughlan, tiwei.bie, ray.kinsella
On 12/13/18 12:43 PM, Anatoly Burakov wrote:
> Currently, only segment fd's for multi-file segments are supported,
> while for memfd-backed no-huge memory we need single-file segments
> mode. Add support for single-file segments in the internal API.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>
> Notes:
> v2:
> - Add missing fd list allocation on setting segment
> list fd
>
> lib/librte_eal/bsdapp/eal/eal_memalloc.c | 6 +++++
> lib/librte_eal/common/eal_memalloc.h | 4 ++++
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 26 ++++++++++++++++++++++
> 3 files changed, 36 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode Anatoly Burakov
2018-12-13 11:59 ` Burakov, Anatoly
@ 2018-12-14 10:06 ` Maxime Coquelin
1 sibling, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-12-14 10:06 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: John McNamara, Marko Kovacevic, przemyslawx.lal,
kuralamudhan.ramakrishnan, ivan.coughlan, tiwei.bie,
ray.kinsella
On 12/13/18 12:43 PM, Anatoly Burakov wrote:
> When running in no-huge mode, we anonymously allocate our memory.
> While this works for regular NICs and vdev's, it's not suitable
> for memory sharing scenarios such as virtio with vhost_user
> backend.
>
> To fix this, allocate no-huge memory using memfd, and register
> it with memalloc just like any other memseg fd. This will enable
> using rte_memseg_get_fd() API with --no-huge EAL flag.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>
> Notes:
> v3:
> - Clarify release notes to state that the changes apply to
> virtio-user NICs rather than virtio in general
>
> v2:
> - Detect memfd support at compile time
> - Change memfd-related log level to debug
>
> doc/guides/rel_notes/release_19_02.rst | 5 +++
> lib/librte_eal/linuxapp/eal/eal_memory.c | 54 +++++++++++++++++++++++-
> 2 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst
> index 960098582..f733ad139 100644
> --- a/doc/guides/rel_notes/release_19_02.rst
> +++ b/doc/guides/rel_notes/release_19_02.rst
> @@ -23,6 +23,11 @@ DPDK Release 19.02
> New Features
> ------------
>
> +* **Support for using VirtIO without hugepages**
> +
With the title change you suggested:
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test Anatoly Burakov
@ 2018-12-14 10:09 ` Maxime Coquelin
0 siblings, 0 replies; 27+ messages in thread
From: Maxime Coquelin @ 2018-12-14 10:09 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella
On 12/13/18 12:43 PM, Anatoly Burakov wrote:
> Use memory autotest to also test segment fd API. This will not do
> any checks - just see if the relevant API's return success or
> indicate that the API is not supported.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
> test/test/test_memory.c | 43 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user without hugepages
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user " Anatoly Burakov
@ 2018-12-20 22:01 ` Thomas Monjalon
0 siblings, 0 replies; 27+ messages in thread
From: Thomas Monjalon @ 2018-12-20 22:01 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, przemyslawx.lal, kuralamudhan.ramakrishnan, ivan.coughlan,
tiwei.bie, ray.kinsella, maxime.coquelin
> Anatoly Burakov (5):
> mem: fix error code for segment fd API for external segs
> memalloc: check for memfd support in segment fd API
> memalloc: allow setting up segment list fd's
> mem: use memfd for no-huge mode
> test: add segment fd API test
Applied, thanks
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2018-12-20 22:01 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-13 17:54 [dpdk-dev] [PATCH 19.02 0/2] Allow using virtio without hugepages Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 1/2] memalloc: allow setting up segment list fd's Anatoly Burakov
2018-11-13 17:54 ` [dpdk-dev] [PATCH 19.02 2/2] mem: use memfd for no-huge mode Anatoly Burakov
2018-11-28 4:57 ` Tiwei Bie
2018-11-28 9:11 ` Burakov, Anatoly
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 0/5] Allow using virtio without hugepages Anatoly Burakov
2018-12-13 4:53 ` Tiwei Bie
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 0/5] Allow using virtio-user " Anatoly Burakov
2018-12-20 22:01 ` Thomas Monjalon
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
2018-12-14 9:15 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
2018-12-14 9:19 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
2018-12-14 10:03 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 4/5] mem: use memfd for no-huge mode Anatoly Burakov
2018-12-13 11:59 ` Burakov, Anatoly
2018-12-14 10:06 ` Maxime Coquelin
2018-12-13 11:43 ` [dpdk-dev] [PATCH v3 5/5] test: add segment fd API test Anatoly Burakov
2018-12-14 10:09 ` Maxime Coquelin
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 1/5] mem: fix error code for segment fd API for external segs Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 2/5] memalloc: check for memfd support in segment fd API Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 3/5] memalloc: allow setting up segment list fd's Anatoly Burakov
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 4/5] mem: use memfd for no-huge mode Anatoly Burakov
2018-12-13 4:59 ` Tiwei Bie
2018-12-13 11:36 ` Burakov, Anatoly
2018-12-11 16:43 ` [dpdk-dev] [PATCH v2 5/5] test: add segment fd API test Anatoly Burakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).