DPDK patches and discussions
 help / color / mirror / Atom feed
From: Anatoly Burakov <anatoly.burakov@intel.com>
To: dev@dpdk.org
Cc: John McNamara <john.mcnamara@intel.com>,
	Marko Kovacevic <marko.kovacevic@intel.com>,
	iain.barker@oracle.com, edwin.leung@oracle.com
Subject: [dpdk-dev] [PATCH] eal: add option to not store segment fd's
Date: Fri, 22 Feb 2019 17:12:41 +0000	[thread overview]
Message-ID: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com> (raw)

Due to internal glibc limitations [1], DPDK may exhaust internal
file descriptor limits when using smaller page sizes, which results
in inability to use system calls such as select() by user
applications.

While the problem can be worked around using --single-file-segments
option, it does not work if --legacy-mem mode is also used. Add a
(yet another) EAL flag to disable storing fd's internally. This
will sacrifice compability with Virtio with vhost-backend, but
at least select() and friends will work.

[1] https://mails.dpdk.org/archives/dev/2019-February/124386.html

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 doc/guides/linux_gsg/linux_eal_parameters.rst |  4 ++++
 .../prog_guide/env_abstraction_layer.rst      | 19 +++++++++++++++++++
 lib/librte_eal/common/eal_internal_cfg.h      |  4 ++++
 lib/librte_eal/common/eal_options.h           |  2 ++
 lib/librte_eal/linuxapp/eal/eal.c             |  4 ++++
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    | 19 ++++++++++++++++++-
 6 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst
index c63f0f49a..d50a7067e 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -94,6 +94,10 @@ Memory-related options
 
     Free hugepages back to system exactly as they were originally allocated.
 
+*   ``--no-seg-fds``
+
+    Do not store segment file descriptors in EAL.
+
 Other options
 ~~~~~~~~~~~~~
 
diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
index 929d76dba..ad540f158 100644
--- a/doc/guides/prog_guide/env_abstraction_layer.rst
+++ b/doc/guides/prog_guide/env_abstraction_layer.rst
@@ -214,6 +214,25 @@ Normally, these options do not need to be changed.
     can later be mapped into that preallocated VA space (if dynamic memory mode
     is enabled), and can optionally be mapped into it at startup.
 
++ Segment file descriptors
+
+On Linux, in most cases, EAL will store segment file descriptors in EAL. This
+can become a problem when using smaller page sizes due to underlying limitations
+of ``glibc`` library. For example, Linux API calls such as ``select()`` may not
+work correctly because ``glibc`` does not support more than certain number of
+file descriptors.
+
+There are several possible workarounds for this issue. One is to use
+``--single-file-segments`` mode, as that mode will not use a file descriptor per
+each page. This is the recommended way of solving this issue, as it keeps
+compatibility with Virtio with vhost-user backend. This option is not available
+when using ``--legacy-mem`` mode.
+
+The other option is to use ``--no-seg-fds`` command-line parameter,
+to prevent EAL from storing any page file descriptors. This will break
+compatibility with Virtio with vhost-user backend, but this option will work
+with ``--legacy-mem`` mode.
+
 Support for Externally Allocated Memory
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 60eaead8f..96596c6b6 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -63,6 +63,10 @@ struct internal_config {
 	/**< true if storing all pages within single files (per-page-size,
 	 * per-node) non-legacy mode only.
 	 */
+	volatile unsigned no_seg_fds;
+	/**< true if no segment file descriptors are to be stored internally
+	 * by EAL.
+	 */
 	volatile int syslog_facility;	  /**< facility passed to openlog() */
 	/** default interrupt mode for VFIO */
 	volatile enum rte_intr_mode vfio_intr_mode;
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 58ee9ae33..94e39aed8 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -67,6 +67,8 @@ enum {
 	OPT_IOVA_MODE_NUM,
 #define OPT_MATCH_ALLOCATIONS  "match-allocations"
 	OPT_MATCH_ALLOCATIONS_NUM,
+#define OPT_NO_SEG_FDS         "no-seg-fds"
+	OPT_NO_SEG_FDS_NUM,
 	OPT_LONG_MAX_NUM
 };
 
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 13f401684..e8a98c505 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -519,6 +519,7 @@ eal_usage(const char *prgname)
 	       "  --"OPT_LEGACY_MEM"        Legacy memory mode (no dynamic allocation, contiguous segments)\n"
 	       "  --"OPT_SINGLE_FILE_SEGMENTS" Put all hugepage memory in single files\n"
 	       "  --"OPT_MATCH_ALLOCATIONS" Free hugepages exactly as allocated\n"
+	       "  --"OPT_NO_SEG_FDS"        Do not store segment file descriptors in EAL\n"
 	       "\n");
 	/* Allow the application to print its usage message too if hook is set */
 	if ( rte_application_usage_hook ) {
@@ -815,6 +816,9 @@ eal_parse_args(int argc, char **argv)
 		case OPT_MATCH_ALLOCATIONS_NUM:
 			internal_config.match_allocations = 1;
 			break;
+		case OPT_NO_SEG_FDS_NUM:
+			internal_config.no_seg_fds = 1;
+			break;
 
 		default:
 			if (opt < OPT_LONG_MIN_NUM && isprint(opt)) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b6fb183db..420f82a54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1518,6 +1518,10 @@ eal_memalloc_set_seg_fd(int list_idx, int seg_idx, int fd)
 	if (internal_config.single_file_segments)
 		return -ENOTSUP;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	/* if list is not allocated, allocate it */
 	if (fd_list[list_idx].len == 0) {
 		int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1539,6 +1543,10 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd)
 	if (!internal_config.single_file_segments)
 		return -ENOTSUP;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	/* if list is not allocated, allocate it */
 	if (fd_list[list_idx].len == 0) {
 		int len = mcfg->memsegs[list_idx].memseg_arr.len;
@@ -1557,6 +1565,10 @@ eal_memalloc_get_seg_fd(int list_idx, int seg_idx)
 {
 	int fd;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	if (internal_config.in_memory || internal_config.no_hugetlbfs) {
 #ifndef MEMFD_SUPPORTED
 		/* in in-memory or no-huge mode, we rely on memfd support */
@@ -1614,6 +1626,10 @@ eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 
+	/* no seg fds mode doesn't support segment fd's */
+	if (internal_config.no_seg_fds)
+		return -ENOTSUP;
+
 	if (internal_config.in_memory || internal_config.no_hugetlbfs) {
 #ifndef MEMFD_SUPPORTED
 		/* in in-memory or no-huge mode, we rely on memfd support */
@@ -1679,7 +1695,8 @@ eal_memalloc_init(void)
 	}
 
 	/* initialize all of the fd lists */
-	if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+	if (!internal_config.no_seg_fds &&
+			rte_memseg_list_walk(fd_list_create_walk, NULL))
 		return -1;
 	return 0;
 }
-- 
2.17.1

             reply	other threads:[~2019-02-22 17:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 17:12 Anatoly Burakov [this message]
2019-03-29  9:50 ` David Marchand
2019-03-29  9:50   ` David Marchand
2019-03-29 10:33   ` Burakov, Anatoly
2019-03-29 10:33     ` Burakov, Anatoly
2019-03-29 11:34     ` Thomas Monjalon
2019-03-29 11:34       ` Thomas Monjalon
2019-03-29 12:05       ` Burakov, Anatoly
2019-03-29 12:05         ` Burakov, Anatoly
2019-03-29 12:40         ` Thomas Monjalon
2019-03-29 12:40           ` Thomas Monjalon
2019-03-29 13:24           ` Burakov, Anatoly
2019-03-29 13:24             ` Burakov, Anatoly
2019-03-29 13:34             ` Thomas Monjalon
2019-03-29 13:34               ` Thomas Monjalon
2019-03-29 14:21               ` Burakov, Anatoly
2019-03-29 14:21                 ` Burakov, Anatoly
2019-03-29 13:35             ` Maxime Coquelin
2019-03-29 13:35               ` Maxime Coquelin
2019-03-29 17:55 ` [dpdk-dev] [PATCH v2 1/2] memalloc: refactor segment resizing code Anatoly Burakov
2019-03-29 17:55   ` Anatoly Burakov
2019-03-29 17:55 ` [dpdk-dev] [PATCH v2 2/2] memalloc: do not use lockfiles for single file segments mode Anatoly Burakov
2019-03-29 17:55   ` Anatoly Burakov
2019-04-02 14:08   ` Thomas Monjalon
2019-04-02 14:08     ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=edwin.leung@oracle.com \
    --cc=iain.barker@oracle.com \
    --cc=john.mcnamara@intel.com \
    --cc=marko.kovacevic@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).