DPDK patches and discussions
 help / color / mirror / Atom feed
From: Slava Ovsiienko <viacheslavo@nvidia.com>
To: Elena Agostini <eagostini@nvidia.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd engine
Date: Tue, 16 Nov 2021 16:28:15 +0000	[thread overview]
Message-ID: <DM6PR12MB3753D60497643298D38C0075DF999@DM6PR12MB3753.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20211111214141.26612-2-eagostini@nvidia.com>

Hi, Elena

> -----Original Message-----
> From: eagostini@nvidia.com <eagostini@nvidia.com>
> Sent: Thursday, November 11, 2021 23:42
> To: dev@dpdk.org
> Cc: Elena Agostini <eagostini@nvidia.com>
> Subject: [PATCH v2 1/1] app/testpmd: add GPU memory option in iofwd
> engine
> 
> From: eagostini <eagostini@nvidia.com>

Could you, please, set "Author" correctly - "Elena Agostini <eagostini@nvidia.com>"?
Otherwise, we see in the git log:

"Author: eagostini <eagostini@nvidia.com>"

Compare with:
"Author: Bing Zhao <bingz@nvidia.com>"
"Author: Viacheslav Ovsiienko <viacheslavo@nvidia.com>"

Also, please, see the codestyle issues, too many code lines far beyond 100 chars.
Lines like this:
+		if (mbuf_mem_types[idx] == MBUF_MEM_GPU && strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
can be easily be splitted.

>>app/testpmd: add GPU memory option in iofwd
As I see from the patch - it adds the new mbuf pool type (residing on GPU memory).
May be, we should reword the title?
" app/testpmd: add GPU memory option for mbuf pools"

> 
> This patch introduces GPU memory in testpmd through the gpudev library.
> Testpmd can be used for network benchmarks when using GPU memory
> instead of regular CPU memory to send and receive packets.
> This option is currently limited to iofwd engine.
Why? Because iofwd the only mode not touching the mbuf data?
Is it critical for functionality? Is GPU mbuf pool memory accessible from CPU side?
I would explain the reasons (for supporting iofwd mode only) in commit message.

> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>
> 
> Depends-on: series-19465 ("GPU library")
> Depends-on: series-20422 ("common/mlx5: fix external memory pool
> registration")
> ---
>  app/test-pmd/cmdline.c                |  14 +++
>  app/test-pmd/meson.build              |   2 +-
>  app/test-pmd/parameters.c             |  13 ++-
>  app/test-pmd/testpmd.c                | 133 +++++++++++++++++++++++---
>  app/test-pmd/testpmd.h                |  16 +++-
>  doc/guides/testpmd_app_ug/run_app.rst |   3 +
>  6 files changed, 164 insertions(+), 17 deletions(-)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> 4f51b259fe..36193bc566 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -3614,6 +3614,7 @@ parse_item_list(const char *str, const char

The "parse_item_list()" is used to parse the list looking like "number0, number1, ...., numberN"
and invoked for "core","port", "txtimes", etc. Not sure all of these params need to handle "g"
suffix. We should allow "g" processing only for "mbuf-size". We have "item_name" argument
to check whether we are invoked on "mbuf-size".

> *item_name, unsigned int max_items,
>  	unsigned int j;
>  	int value_ok;
>  	char c;
> +	int gpu_mbuf = 0;
> 
>  	/*
>  	 * First parse all items in the list and store their value.
> @@ -3628,6 +3629,14 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			value_ok = 1;
>  			continue;
>  		}
> +		if (c == 'g') {
We should check whether "g" is the single char suffix (last char).
Otherwise, "20g48" and "g20gggg48" would be also valid values.

> +			/*
> +			 * When this flag is set, mbufs for this segment
> +			 * will be created on GPU memory.
> +			 */
> +			gpu_mbuf = 1;
> +			continue;
> +		}
>  		if (c != ',') {
>  			fprintf(stderr, "character %c is not a decimal digit\n",
> c);
>  			return 0;
> @@ -3640,6 +3649,8 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			parsed_items[nb_item] = value;
>  			value_ok = 0;
>  			value = 0;
> +			mbuf_mem_types[nb_item] = gpu_mbuf ?
> MBUF_MEM_GPU : MBUF_MEM_CPU;
> +			gpu_mbuf = 0;
>  		}
>  		nb_item++;
>  	}
> @@ -3648,6 +3659,9 @@ parse_item_list(const char *str, const char
> *item_name, unsigned int max_items,
>  			item_name, nb_item + 1, max_items);
>  		return 0;
>  	}
> +
> +	mbuf_mem_types[nb_item] = gpu_mbuf ? MBUF_MEM_GPU :
> MBUF_MEM_CPU;
> +
>  	parsed_items[nb_item++] = value;
>  	if (! check_unique_values)
>  		return nb_item;
> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build index
> d5df52c470..5c8ca68c9d 100644
> --- a/app/test-pmd/meson.build
> +++ b/app/test-pmd/meson.build
> @@ -32,7 +32,7 @@ if dpdk_conf.has('RTE_HAS_JANSSON')
>      ext_deps += jansson_dep
>  endif
> 
> -deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci']
> +deps += ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'bus_pci',
> +'gpudev']
>  if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
>      deps += 'crypto_scheduler'
>  endif
> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index
> 0974b0a38f..d41f7f220b 100644
> --- a/app/test-pmd/parameters.c
> +++ b/app/test-pmd/parameters.c
> @@ -87,7 +87,10 @@ usage(char* progname)
>  	       "in NUMA mode.\n");
>  	printf("  --mbuf-size=N,[N1[,..Nn]: set the data size of mbuf to "
>  	       "N bytes. If multiple numbers are specified the extra pools "
> -	       "will be created to receive with packet split features\n");
> +	       "will be created to receive with packet split features\n"
> +		   "Use 'g' suffix for GPU memory.\n"
> +		   "If no or an unrecognized suffix is provided, CPU is
> assumed\n");
Unrecognized suffix? I would emit an error and abort the launch.

> +
>  	printf("  --total-num-mbufs=N: set the number of mbufs to be
> allocated "
>  	       "in mbuf pools.\n");
>  	printf("  --max-pkt-len=N: set the maximum size of packet to N
> bytes.\n"); @@ -595,6 +598,7 @@ launch_args_parse(int argc, char** argv)
>  	struct rte_eth_dev_info dev_info;
>  	uint16_t rec_nb_pkts;
>  	int ret;
> +	uint32_t idx = 0;
> 
>  	static struct option lgopts[] = {
>  		{ "help",			0, 0, 0 },
> @@ -1538,4 +1542,11 @@ launch_args_parse(int argc, char** argv)
>  				  "ignored\n");
>  		mempool_flags = 0;
>  	}
> +
> +	for (idx = 0; idx < mbuf_data_size_n; idx++) {
> +		if (mbuf_mem_types[idx] == MBUF_MEM_GPU &&
> strcmp(cur_fwd_eng->fwd_mode_name, "io") != 0) {
> +			fprintf(stderr, "GPU memory mbufs can be used with
> iofwd engine only\n");
> +			rte_exit(EXIT_FAILURE, "Command line is
> incorrect\n");
> +		}
> +	}
Please, note, the forwarding mode can be changed from interactive prompt with "set fwd <mode>" command.
If iofwd mode is crucial for GPU functionality - we should prevent switching to other modes if GPU pools are engaged.


>  }
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> a66dfb297c..1af235c4d8 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -205,6 +205,12 @@ uint32_t mbuf_data_size_n = 1; /* Number of
> specified mbuf sizes. */  uint16_t mbuf_data_size[MAX_SEGS_BUFFER_SPLIT]
> = {
>  	DEFAULT_MBUF_DATA_SIZE
>  }; /**< Mbuf data space size. */
> +
> +/* Mbuf memory types. */
> +enum mbuf_mem_type mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
> +/* Pointers to external memory allocated for mempools. */ uintptr_t
> +mempools_ext_ptr[MAX_SEGS_BUFFER_SPLIT];
> +
>  uint32_t param_total_num_mbufs = 0;  /**< number of mbufs in all pools - if
>                                        * specified on command-line. */  uint16_t
> stats_period; /**< Period to show statistics (disabled by default) */ @@ -
> 543,6 +549,12 @@ int proc_id;
>   */
>  unsigned int num_procs = 1;
> 
> +/*
> + * In case of GPU memory external mbufs use, for simplicity,
> + * the first GPU device in the list.
> + */
> +int gpu_id = 0;
It is assigned with zero and never changes. Support the first GPU only?
This limitation should be mentioned in documentation.

> +
>  static void
>  eth_rx_metadata_negotiate_mp(uint16_t port_id)  { @@ -1103,6 +1115,79
> @@ setup_extbuf(uint32_t nb_mbufs, uint16_t mbuf_sz, unsigned int
> socket_id,
>  	return ext_num;
>  }
> 
> +static struct rte_mempool *
> +gpu_mbuf_pool_create(uint16_t mbuf_seg_size, unsigned int nb_mbuf,
> +			unsigned int socket_id, uint16_t port_id,
> +			int gpu_id, uintptr_t * mp_addr)
> +{
> +	int ret = 0;
> +	char pool_name[RTE_MEMPOOL_NAMESIZE];
> +	struct rte_eth_dev_info dev_info;
> +	struct rte_mempool *rte_mp = NULL;
> +	struct rte_pktmbuf_extmem gpu_mem;
> +	struct rte_gpu_info ginfo;
> +	uint8_t gpu_page_shift = 16;
> +	uint32_t gpu_page_size = (1UL << gpu_page_shift);
> +
> +	ret = eth_dev_info_get_print_err(port_id, &dev_info);
> +	if (ret != 0)
> +		rte_exit(EXIT_FAILURE,
> +			"Failed to get device info for port %d\n",
> +			port_id);
> +
> +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> port_id, MBUF_MEM_GPU);
> +	if (!is_proc_primary()) {
> +		rte_mp = rte_mempool_lookup(pool_name);
> +		if (rte_mp == NULL)
> +			rte_exit(EXIT_FAILURE,
> +				"Get mbuf pool for socket %u failed: %s\n",
> +				socket_id, rte_strerror(rte_errno));
> +		return rte_mp;
> +	}
> +
> +	if (rte_gpu_info_get(gpu_id, &ginfo))
> +		rte_exit(EXIT_FAILURE, "Can't retrieve info about GPU %d -
> bye\n",
> +gpu_id);
> +
> +	TESTPMD_LOG(INFO,
> +		"create a new mbuf pool <%s>: n=%u, size=%u, socket=%u
> GPU device=%s\n",
> +		pool_name, nb_mbuf, mbuf_seg_size, socket_id, ginfo.name);
> +
> +	/* Create an external memory mempool using memory allocated on
> the
> +GPU. */
> +
> +	gpu_mem.elt_size = RTE_MBUF_DEFAULT_BUF_SIZE;
> +	gpu_mem.buf_len = RTE_ALIGN_CEIL(nb_mbuf * gpu_mem.elt_size,
> gpu_page_size);
> +	gpu_mem.buf_iova = RTE_BAD_IOVA;
> +
> +	gpu_mem.buf_ptr = rte_gpu_mem_alloc(gpu_id, gpu_mem.buf_len);
> +	if (gpu_mem.buf_ptr == NULL)
> +		rte_exit(EXIT_FAILURE, "Could not allocate GPU device
> memory\n");
> +
> +	ret = rte_extmem_register(gpu_mem.buf_ptr, gpu_mem.buf_len,
> NULL, gpu_mem.buf_iova, gpu_page_size);
> +	if (ret)
> +		rte_exit(EXIT_FAILURE, "Unable to register addr 0x%p, ret
> %d\n",
> +gpu_mem.buf_ptr, ret);
> +
> +	uint16_t pid = 0;
> +
> +	RTE_ETH_FOREACH_DEV(pid)
> +	{
> +		ret = rte_dev_dma_map(dev_info.device, gpu_mem.buf_ptr,
> +					  gpu_mem.buf_iova,
> gpu_mem.buf_len);
> +		if (ret) {
> +			rte_exit(EXIT_FAILURE, "Unable to DMA map addr
> 0x%p for device %s\n",
> +					 gpu_mem.buf_ptr, dev_info.device-
> >name);
> +		}
> +	}
> +
> +	rte_mp = rte_pktmbuf_pool_create_extbuf(pool_name, nb_mbuf,
> mb_mempool_cache, 0, mbuf_seg_size, socket_id, &gpu_mem, 1);
> +	if (rte_mp == NULL) {
> +		rte_exit(EXIT_FAILURE, "Creation of GPU mempool <%s>
> failed\n", pool_name);
> +	}
> +
> +	*mp_addr = (uintptr_t) gpu_mem.buf_ptr;
> +
> +	return rte_mp;
> +}
> +
>  /*
>   * Configuration initialisation done once at init time.
>   */
> @@ -1117,7 +1202,7 @@ mbuf_pool_create(uint16_t mbuf_seg_size,
> unsigned nb_mbuf,
> 
>  	mb_size = sizeof(struct rte_mbuf) + mbuf_seg_size;  #endif
> -	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> size_idx);
> +	mbuf_poolname_build(socket_id, pool_name, sizeof(pool_name),
> size_idx,
> +MBUF_MEM_CPU);
>  	if (!is_proc_primary()) {
>  		rte_mp = rte_mempool_lookup(pool_name);
>  		if (rte_mp == NULL)
> @@ -1700,19 +1785,42 @@ init_config(void)
> 
>  		for (i = 0; i < num_sockets; i++)
>  			for (j = 0; j < mbuf_data_size_n; j++)
> -				mempools[i * MAX_SEGS_BUFFER_SPLIT + j]
> =
> -					mbuf_pool_create(mbuf_data_size[j],
> -							  nb_mbuf_per_pool,
> -							  socket_ids[i], j);
> +			{
> +				if (mbuf_mem_types[j] == MBUF_MEM_GPU)
> {
> +					if (rte_gpu_count_avail() == 0)
> +						rte_exit(EXIT_FAILURE, "No
> GPU device available.\n");
> +
> +					mempools[i *
> MAX_SEGS_BUFFER_SPLIT + j] =
> +
> 	gpu_mbuf_pool_create(mbuf_data_size[j],
What about GPU/CPU adherence ? Do we create one GPU pool per CPU socket?
Disregarding  hardware topology at all?
We can mitigate the issue by "--socket-mem/--socket-num" parameter though.

> +
> 		 nb_mbuf_per_pool,
> +
> 		 socket_ids[i], j, gpu_id,
> +
> 		 &(mempools_ext_ptr[j]));
> +				} else {
> +					mempools[i *
> MAX_SEGS_BUFFER_SPLIT + j] =
> +
> 	mbuf_pool_create(mbuf_data_size[j],
> +
> 	nb_mbuf_per_pool,
> +								socket_ids[i],
> j);
> +				}
> +			}
>  	} else {
>  		uint8_t i;
> 
>  		for (i = 0; i < mbuf_data_size_n; i++)
> -			mempools[i] = mbuf_pool_create
> -					(mbuf_data_size[i],
> -					 nb_mbuf_per_pool,
> -					 socket_num == UMA_NO_CONFIG ?
> -					 0 : socket_num, i);
> +		{
> +			if (mbuf_mem_types[i] == MBUF_MEM_GPU) {
> +				mempools[i] =
> gpu_mbuf_pool_create(mbuf_data_size[i],
> +
> 			   nb_mbuf_per_pool,
> +
> 			   socket_num == UMA_NO_CONFIG ? 0 : socket_num,
> +
> 			   i, gpu_id,
> +
> 			   &(mempools_ext_ptr[i]));
> +			} else {
> +				mempools[i] =
> mbuf_pool_create(mbuf_data_size[i],
> +							nb_mbuf_per_pool,
> +							socket_num ==
> UMA_NO_CONFIG ?
> +							0 : socket_num, i);
> +			}
> +		}
> +
>  	}
> 
>  	init_port_config();
> @@ -3415,8 +3523,11 @@ pmd_test_exit(void)
>  		}
>  	}
>  	for (i = 0 ; i < RTE_DIM(mempools) ; i++) {
> -		if (mempools[i])
> +		if (mempools[i]) {
>  			mempool_free_mp(mempools[i]);
> +			if (mbuf_mem_types[i] == MBUF_MEM_GPU)
> +				rte_gpu_mem_free(gpu_id, (void
> *)mempools_ext_ptr[i]);
> +		}
>  	}
>  	free(xstats_display);
> 
> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index
> 669ce1e87d..9919044372 100644
> --- a/app/test-pmd/testpmd.h
> +++ b/app/test-pmd/testpmd.h
> @@ -12,6 +12,7 @@
>  #include <rte_gro.h>
>  #include <rte_gso.h>
>  #include <rte_os_shim.h>
> +#include <rte_gpudev.h>
>  #include <cmdline.h>
>  #include <sys/queue.h>
>  #ifdef RTE_HAS_JANSSON
> @@ -474,6 +475,11 @@ extern uint8_t dcb_config;  extern uint32_t
> mbuf_data_size_n;  extern uint16_t
> mbuf_data_size[MAX_SEGS_BUFFER_SPLIT];
>  /**< Mbuf data space size. */
> +enum mbuf_mem_type {
> +	MBUF_MEM_CPU,
> +	MBUF_MEM_GPU
> +};
> +extern enum mbuf_mem_type
> mbuf_mem_types[MAX_SEGS_BUFFER_SPLIT];
>  extern uint32_t param_total_num_mbufs;
> 
>  extern uint16_t stats_period;
> @@ -717,14 +723,16 @@ current_fwd_lcore(void)
>  /* Mbuf Pools */
>  static inline void
>  mbuf_poolname_build(unsigned int sock_id, char *mp_name,
> -		    int name_size, uint16_t idx)
> +		    int name_size, uint16_t idx, enum mbuf_mem_type
> mem_type)
>  {
> +	const char *suffix = mem_type == MBUF_MEM_GPU ? "_gpu" : "";
> +
>  	if (!idx)
>  		snprintf(mp_name, name_size,
> -			 MBUF_POOL_NAME_PFX "_%u", sock_id);
> +			 MBUF_POOL_NAME_PFX "_%u%s", sock_id, suffix);
>  	else
>  		snprintf(mp_name, name_size,
> -			 MBUF_POOL_NAME_PFX "_%hu_%hu",
> (uint16_t)sock_id, idx);
> +			 MBUF_POOL_NAME_PFX "_%hu_%hu%s",
> (uint16_t)sock_id, idx, suffix);
>  }
> 
>  static inline struct rte_mempool *
> @@ -732,7 +740,7 @@ mbuf_pool_find(unsigned int sock_id, uint16_t idx)  {
>  	char pool_name[RTE_MEMPOOL_NAMESIZE];
> 
> -	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx);
> +	mbuf_poolname_build(sock_id, pool_name, sizeof(pool_name), idx,
> +mbuf_mem_types[idx]);
>  	return rte_mempool_lookup((const char *)pool_name);  }
> 
> diff --git a/doc/guides/testpmd_app_ug/run_app.rst
> b/doc/guides/testpmd_app_ug/run_app.rst
> index 30edef07ea..ede7b79abb 100644
> --- a/doc/guides/testpmd_app_ug/run_app.rst
> +++ b/doc/guides/testpmd_app_ug/run_app.rst
> @@ -119,6 +119,9 @@ The command line options are:
>      The default value is 2048. If multiple mbuf-size values are specified the
>      extra memory pools will be created for allocating mbufs to receive packets
>      with buffer splitting features.
> +    Providing mbuf size with a 'g' suffix (e.g. ``--mbuf-size=2048g``),
> +    will cause the mempool to be created on a GPU memory area allocated.
> +    This option is currently limited to iofwd engine with the first GPU.
Mmm, if we have multiple GPUs - we have no way to specify on which one we should allocate the pool.
Syntax is not complete☹. Possible we should specify GPU port id after the suffix ? Like 2048g2 ?
If port index is omitted we should consider we have the only GPU and check this.

With best regards,
Slava


  reply	other threads:[~2021-11-16 16:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29 20:49 [dpdk-dev] [PATCH] " eagostini
2021-11-11 21:41 ` [PATCH v2 0/1] " eagostini
2021-11-11 21:41   ` [PATCH v2 1/1] " eagostini
2021-11-16 16:28     ` Slava Ovsiienko [this message]
2021-11-16 17:16       ` Ananyev, Konstantin
2021-11-16 18:15         ` Elena Agostini
2021-11-16 17:55     ` Ferruh Yigit
2021-11-16 18:06       ` Elena Agostini
2021-11-16 18:11         ` Ferruh Yigit
2021-11-16 19:09           ` Jerin Jacob
2021-11-16 19:14             ` Elena Agostini
2021-11-16 19:21               ` Jerin Jacob
2021-11-17  8:55                 ` Bruce Richardson
2021-11-17  3:04 ` [PATCH v3 0/1] app/testpmd: add GPU memory option for mbuf pools eagostini
2021-11-17  3:04   ` [PATCH v3 1/1] " eagostini
2021-11-16 21:34     ` Stephen Hemminger
2021-11-17 11:08       ` Elena Agostini
2021-11-17 11:23         ` Jerin Jacob
2021-11-17 11:26           ` Elena Agostini
2021-11-17 11:31             ` Jerin Jacob
2021-11-17 11:48               ` Ferruh Yigit
2021-11-17 12:36                 ` Ananyev, Konstantin
2021-11-17 12:39                   ` Elena Agostini
2021-11-17 13:39                     ` Jerin Jacob
2021-11-17 13:50                       ` Elena Agostini
2021-11-17 14:02                         ` Jerin Jacob
2021-11-17 14:07                           ` Elena Agostini
2021-11-17 17:44                             ` Elena Agostini
2021-11-17 21:49 ` [PATCH v4 0/1] " eagostini
2021-11-17 21:49   ` [PATCH v4 1/1] " eagostini
2021-11-17 14:04     ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR12MB3753D60497643298D38C0075DF999@DM6PR12MB3753.namprd12.prod.outlook.com \
    --to=viacheslavo@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=eagostini@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).