[PATCH v1 0/3] GPU memory aligned

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH v1 0/3] GPU memory aligned
@ 2022-01-04  1:47 eagostini
  2022-01-04  1:47 ` [PATCH v1 1/3] gpudev: mem alloc aligned memory eagostini
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: eagostini @ 2022-01-04  1:47 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Applications may need to allocate GPU memory buffers
with memory address aligned to some value
(e.g. page size).

Similarly to the rte_malloc function, aligned size
can be provided as input to rte_gpu_mem_alloc.

This set of patches implements this functionality
in the gpudev library and the GPU CUDA driver.

Elena Agostini (3):
  gpudev: mem alloc aligned memory
  app/test-gpudev: test aligned memory allocation
  gpu/cuda: mem alloc aligned memory

 app/test-gpudev/main.c     | 13 ++++++++++---
 drivers/gpu/cuda/cuda.c    | 21 ++++++++++++++++-----
 lib/gpudev/gpudev.c        | 10 ++++++++--
 lib/gpudev/gpudev_driver.h |  2 +-
 lib/gpudev/rte_gpudev.h    | 10 +++++++---
 5 files changed, 42 insertions(+), 14 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/3] gpudev: mem alloc aligned memory
  2022-01-04  1:47 [PATCH v1 0/3] GPU memory aligned eagostini
@ 2022-01-04  1:47 ` eagostini
  2022-01-04  1:47 ` [PATCH v1 2/3] app/test-gpudev: test aligned memory allocation eagostini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: eagostini @ 2022-01-04  1:47 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Similarly to rte_malloc, rte_gpu_mem_alloc accept as
input the memory alignment size.

GPU driver should return GPU memory address aligned
with the input value.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 lib/gpudev/gpudev.c        | 10 ++++++++--
 lib/gpudev/gpudev_driver.h |  2 +-
 lib/gpudev/rte_gpudev.h    | 10 +++++++---
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 9ae36dbae9..dc8c3baefa 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -527,7 +527,7 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 }
 
 void *
-rte_gpu_mem_alloc(int16_t dev_id, size_t size)
+rte_gpu_mem_alloc(int16_t dev_id, size_t size, unsigned int align)
 {
 	struct rte_gpu *dev;
 	void *ptr;
@@ -549,7 +549,13 @@ rte_gpu_mem_alloc(int16_t dev_id, size_t size)
 	if (size == 0) /* dry-run */
 		return NULL;
 
-	ret = dev->ops.mem_alloc(dev, size, &ptr);
+	if (align && !rte_is_power_of_2(align)) {
+		GPU_LOG(ERR, "requested alignment is not a power of two %u", align);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	ret = dev->ops.mem_alloc(dev, size, &ptr, align);
 
 	switch (ret) {
 	case 0:
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index cb7b101f2f..d06f465194 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -27,7 +27,7 @@ enum rte_gpu_state {
 struct rte_gpu;
 typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
 typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
-typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
+typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr, unsigned int align);
 typedef int (rte_gpu_mem_free_t)(struct rte_gpu *dev, void *ptr);
 typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
 typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index fa3f3aad4f..9e2e2c5dce 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -364,18 +364,22 @@ int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
  * @param size
  *   Number of bytes to allocate.
  *   Requesting 0 will do nothing.
- *
+ * @param align
+ *   If 0, the return is a pointer that is suitably aligned for any kind of
+ *   variable (in the same manner as malloc()).
+ *   Otherwise, the return is a pointer that is a multiple of *align*. In
+ *   this case, it must obviously be a power of two.
  * @return
  *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
  *   - ENODEV if invalid dev_id
- *   - EINVAL if reserved flags
+ *   - EINVAL if align is not a power of two
  *   - ENOTSUP if operation not supported by the driver
  *   - E2BIG if size is higher than limit
  *   - ENOMEM if out of space
  *   - EPERM if driver error
  */
 __rte_experimental
-void *rte_gpu_mem_alloc(int16_t dev_id, size_t size)
+void *rte_gpu_mem_alloc(int16_t dev_id, size_t size, unsigned int align)
 __rte_alloc_size(2);
 
 /**
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 2/3] app/test-gpudev: test aligned memory allocation
  2022-01-04  1:47 [PATCH v1 0/3] GPU memory aligned eagostini
  2022-01-04  1:47 ` [PATCH v1 1/3] gpudev: mem alloc aligned memory eagostini
@ 2022-01-04  1:47 ` eagostini
  2022-01-04  1:47 ` [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory eagostini
  2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
  3 siblings, 0 replies; 12+ messages in thread
From: eagostini @ 2022-01-04  1:47 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Update gpudev app to test GPU memory aligned allocation.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 5c1aa3d52f..f36f46cbca 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -69,11 +69,12 @@ alloc_gpu_memory(uint16_t gpu_id)
 	void *ptr_2 = NULL;
 	size_t buf_bytes = 1024;
 	int ret;
+	unsigned align = 4096;
 
 	printf("\n=======> TEST: Allocate GPU memory\n\n");
 
-	/* Alloc memory on GPU 0 */
-	ptr_1 = rte_gpu_mem_alloc(gpu_id, buf_bytes);
+	/* Alloc memory on GPU 0 without any specific alignment */
+	ptr_1 = rte_gpu_mem_alloc(gpu_id, buf_bytes, 0);
 	if (ptr_1 == NULL) {
 		fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n");
 		goto error;
@@ -81,7 +82,8 @@ alloc_gpu_memory(uint16_t gpu_id)
 	printf("GPU memory allocated at 0x%p size is %zd bytes\n",
 			ptr_1, buf_bytes);
 
-	ptr_2 = rte_gpu_mem_alloc(gpu_id, buf_bytes);
+	/* Alloc memory on GPU 0 with 4kB alignment */
+	ptr_2 = rte_gpu_mem_alloc(gpu_id, buf_bytes, align);
 	if (ptr_2 == NULL) {
 		fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n");
 		goto error;
@@ -89,6 +91,11 @@ alloc_gpu_memory(uint16_t gpu_id)
 	printf("GPU memory allocated at 0x%p size is %zd bytes\n",
 			ptr_2, buf_bytes);
 
+	if (((uintptr_t)ptr_2) % align) {
+		fprintf(stderr, "Memory address 0x%p is not aligned to %u\n", ptr_2, align);
+		goto error;
+	}
+
 	ret = rte_gpu_mem_free(gpu_id, (uint8_t *)(ptr_1)+0x700);
 	if (ret < 0) {
 		printf("GPU memory 0x%p NOT freed: GPU driver didn't find this memory address internally.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-04  1:47 [PATCH v1 0/3] GPU memory aligned eagostini
  2022-01-04  1:47 ` [PATCH v1 1/3] gpudev: mem alloc aligned memory eagostini
  2022-01-04  1:47 ` [PATCH v1 2/3] app/test-gpudev: test aligned memory allocation eagostini
@ 2022-01-04  1:47 ` eagostini
  2022-01-03 18:05   ` Stephen Hemminger
  2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
  3 siblings, 1 reply; 12+ messages in thread
From: eagostini @ 2022-01-04  1:47 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Implement aligned GPU memory allocation in GPU CUDA driver.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 drivers/gpu/cuda/cuda.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/cuda/cuda.c b/drivers/gpu/cuda/cuda.c
index 882df08e56..4ad3f5fc90 100644
--- a/drivers/gpu/cuda/cuda.c
+++ b/drivers/gpu/cuda/cuda.c
@@ -139,8 +139,10 @@ typedef uintptr_t cuda_ptr_key;
 /* Single entry of the memory list */
 struct mem_entry {
 	CUdeviceptr ptr_d;
+	CUdeviceptr ptr_orig_d;
 	void *ptr_h;
 	size_t size;
+	size_t size_orig;
 	struct rte_gpu *dev;
 	CUcontext ctx;
 	cuda_ptr_key pkey;
@@ -569,7 +571,7 @@ cuda_dev_info_get(struct rte_gpu *dev, struct rte_gpu_info *info)
  */
 
 static int
-cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
+cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr, unsigned int align)
 {
 	CUresult res;
 	const char *err_string;
@@ -610,8 +612,10 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 
 	/* Allocate memory */
 	mem_alloc_list_tail->size = size;
-	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_d),
-			mem_alloc_list_tail->size);
+	mem_alloc_list_tail->size_orig = size + align;
+
+	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_orig_d),
+			mem_alloc_list_tail->size_orig);
 	if (res != 0) {
 		pfn_cuGetErrorString(res, &(err_string));
 		rte_cuda_log(ERR, "cuCtxSetCurrent current failed with %s",
@@ -620,6 +624,13 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 		return -rte_errno;
 	}
 
+
+	/* Align memory address */
+	mem_alloc_list_tail->ptr_d = mem_alloc_list_tail->ptr_orig_d;
+	if (align && ((uintptr_t)mem_alloc_list_tail->ptr_d) % align)
+		mem_alloc_list_tail->ptr_d += (align -
+				(((uintptr_t)mem_alloc_list_tail->ptr_d) % align));
+
 	/* GPUDirect RDMA attribute required */
 	res = pfn_cuPointerSetAttribute(&flag,
 			CU_POINTER_ATTRIBUTE_SYNC_MEMOPS,
@@ -634,7 +645,6 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 
 	mem_alloc_list_tail->pkey = get_hash_from_ptr((void *)mem_alloc_list_tail->ptr_d);
 	mem_alloc_list_tail->ptr_h = NULL;
-	mem_alloc_list_tail->size = size;
 	mem_alloc_list_tail->dev = dev;
 	mem_alloc_list_tail->ctx = (CUcontext)((uintptr_t)dev->mpshared->info.context);
 	mem_alloc_list_tail->mtype = GPU_MEM;
@@ -761,6 +771,7 @@ cuda_mem_register(struct rte_gpu *dev, size_t size, void *ptr)
 	mem_alloc_list_tail->dev = dev;
 	mem_alloc_list_tail->ctx = (CUcontext)((uintptr_t)dev->mpshared->info.context);
 	mem_alloc_list_tail->mtype = CPU_REGISTERED;
+	mem_alloc_list_tail->ptr_orig_d = mem_alloc_list_tail->ptr_d;
 
 	/* Restore original ctx as current ctx */
 	res = pfn_cuCtxSetCurrent(current_ctx);
@@ -796,7 +807,7 @@ cuda_mem_free(struct rte_gpu *dev, void *ptr)
 	}
 
 	if (mem_item->mtype == GPU_MEM) {
-		res = pfn_cuMemFree(mem_item->ptr_d);
+		res = pfn_cuMemFree(mem_item->ptr_orig_d);
 		if (res != 0) {
 			pfn_cuGetErrorString(res, &(err_string));
 			rte_cuda_log(ERR, "cuMemFree current failed with %s",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-04  1:47 ` [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory eagostini
@ 2022-01-03 18:05   ` Stephen Hemminger
  2022-01-03 18:15     ` Elena Agostini
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2022-01-03 18:05 UTC (permalink / raw)
  To: eagostini; +Cc: dev

On Tue, 4 Jan 2022 01:47:21 +0000
<eagostini@nvidia.com> wrote:

>  static int
> -cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
> +cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr, unsigned int align)
>  {
>  	CUresult res;
>  	const char *err_string;
> @@ -610,8 +612,10 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
>  
>  	/* Allocate memory */
>  	mem_alloc_list_tail->size = size;
> -	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_d),
> -			mem_alloc_list_tail->size);
> +	mem_alloc_list_tail->size_orig = size + align;
> +
> +	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_orig_d),
> +			mem_alloc_list_tail->size_orig);
>  	if (res != 0) {
>  		pfn_cuGetErrorString(res, &(err_string));
>  		rte_cuda_log(ERR, "cuCtxSetCurrent current failed with %s",
> @@ -620,6 +624,13 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
>  		return -rte_errno;
>  	}
>  
> +
> +	/* Align memory address */
> +	mem_alloc_list_tail->ptr_d = mem_alloc_list_tail->ptr_orig_d;
> +	if (align && ((uintptr_t)mem_alloc_list_tail->ptr_d) % align)
> +		mem_alloc_list_tail->ptr_d += (align -
> +				(((uintptr_t)mem_alloc_list_tail->ptr_d) % align));


Posix memalign takes size_t for both size and alignment.

Better to put the input parameters first, and then the resulting output parameter last
for consistency; follows the Rusty API design manifesto.

Alignment only makes sense if power of two. The code should check that and optimize
for that.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-03 18:05   ` Stephen Hemminger
@ 2022-01-03 18:15     ` Elena Agostini
  2022-01-03 18:17       ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Elena Agostini @ 2022-01-03 18:15 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 2550 bytes --]

> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Monday, 3 January 2022 at 19:05
> To: Elena Agostini <eagostini@nvidia.com>
> Cc: dev@dpdk.org <dev@dpdk.org>
> Subject: Re: [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
> External email: Use caution opening links or attachments>
>

> On Tue, 4 Jan 2022 01:47:21 +0000
> <eagostini@nvidia.com> wrote:>

> >  static int
> > -cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
> > +cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr, unsigned int align)
> >  {
> >       CUresult res;
> >       const char *err_string;
> > @@ -610,8 +612,10 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
> >
> >       /* Allocate memory */
> >       mem_alloc_list_tail->size = size;
> > -     res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_d),
> > -                     mem_alloc_list_tail->size);
> > +     mem_alloc_list_tail->size_orig = size + align;
> > +
> > +     res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_orig_d),
> > +                     mem_alloc_list_tail->size_orig);
> >       if (res != 0) {
> >               pfn_cuGetErrorString(res, &(err_string));
> >               rte_cuda_log(ERR, "cuCtxSetCurrent current failed with %s",
> > @@ -620,6 +624,13 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
> >               return -rte_errno;
> >       }
> >
> > +
> > +     /* Align memory address */
> > +     mem_alloc_list_tail->ptr_d = mem_alloc_list_tail->ptr_orig_d;
> > +     if (align && ((uintptr_t)mem_alloc_list_tail->ptr_d) % align)
> > +             mem_alloc_list_tail->ptr_d += (align -
> > +                             (((uintptr_t)mem_alloc_list_tail->ptr_d) % align));>
>

> Posix memalign takes size_t for both size and alignment.

I've created this patch based on the rte_malloc function definition for consistency.

void * rte_malloc(const char *type, size_t size, unsigned align)


> Better to put the input parameters first, and then the resulting output parameter last
> for consistency; follows the Rusty API design manifesto.

Got it, will do.

> Alignment only makes sense if power of two. The code should check that and optimize
> for that.
>

The alignment value is checked in the gpudev library before
passing it to the driver.

Adding this kind of checks in the driver has been rejected in the past because it was
considered dead code (the library was already checking input parameters).

Let me know what are the preferred options.

[-- Attachment #2: Type: text/html, Size: 9832 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-03 18:15     ` Elena Agostini
@ 2022-01-03 18:17       ` Stephen Hemminger
  2022-01-03 18:22         ` Elena Agostini
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2022-01-03 18:17 UTC (permalink / raw)
  To: Elena Agostini; +Cc: dev

On Mon, 3 Jan 2022 18:15:11 +0000
Elena Agostini <eagostini@nvidia.com> wrote:

> > Alignment only makes sense if power of two. The code should check that and optimize
> > for that.
> >  
> 
> The alignment value is checked in the gpudev library before
> passing it to the driver.
> 
> Adding this kind of checks in the driver has been rejected in the past because it was
> considered dead code (the library was already checking input parameters).
> 
> Let me know what are the preferred options.

Driver could use the mask instead of slow divide operation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-03 18:17       ` Stephen Hemminger
@ 2022-01-03 18:22         ` Elena Agostini
  0 siblings, 0 replies; 12+ messages in thread
From: Elena Agostini @ 2022-01-03 18:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

> On Mon, 3 Jan 2022 18:15:11 +0000
> Elena Agostini <eagostini@nvidia.com> wrote:
>
> > > Alignment only makes sense if power of two. The code should check that and optimize
> > > for that.
> > >
> >
> > The alignment value is checked in the gpudev library before
> > passing it to the driver.
> >
> > Adding this kind of checks in the driver has been rejected in the past because it was
> > considered dead code (the library was already checking input parameters).
> >
> > Let me know what are the preferred options.
>
> Driver could use the mask instead of slow divide operation.

I'd not be concerned about performance here.
Memory allocation is expensive, typically you want to do it
at initialization time.

What do you suggest for my other comments?

[-- Attachment #2: Type: text/html, Size: 3683 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] gpudev: mem alloc aligned memory
  2022-01-04  1:47 [PATCH v1 0/3] GPU memory aligned eagostini
                   ` (2 preceding siblings ...)
  2022-01-04  1:47 ` [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory eagostini
@ 2022-01-08  0:20 ` eagostini
  2022-01-08  0:20   ` [PATCH v2 2/3] app/test-gpudev: test aligned memory allocation eagostini
                     ` (2 more replies)
  3 siblings, 3 replies; 12+ messages in thread
From: eagostini @ 2022-01-08  0:20 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Similarly to rte_malloc, rte_gpu_mem_alloc accept as
input the memory alignment size.

GPU driver should return GPU memory address aligned
with the input value.

Changelog:
- rte_gpu_mem_alloc parameters order

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 lib/gpudev/gpudev.c        | 10 ++++++++--
 lib/gpudev/gpudev_driver.h |  2 +-
 lib/gpudev/rte_gpudev.h    | 10 +++++++---
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 9ae36dbae9..59e2169292 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -527,7 +527,7 @@ rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info)
 }
 
 void *
-rte_gpu_mem_alloc(int16_t dev_id, size_t size)
+rte_gpu_mem_alloc(int16_t dev_id, size_t size, unsigned int align)
 {
 	struct rte_gpu *dev;
 	void *ptr;
@@ -549,7 +549,13 @@ rte_gpu_mem_alloc(int16_t dev_id, size_t size)
 	if (size == 0) /* dry-run */
 		return NULL;
 
-	ret = dev->ops.mem_alloc(dev, size, &ptr);
+	if (align && !rte_is_power_of_2(align)) {
+		GPU_LOG(ERR, "requested alignment is not a power of two %u", align);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	ret = dev->ops.mem_alloc(dev, size, align, &ptr);
 
 	switch (ret) {
 	case 0:
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index cb7b101f2f..0ed7478e9b 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -27,7 +27,7 @@ enum rte_gpu_state {
 struct rte_gpu;
 typedef int (rte_gpu_close_t)(struct rte_gpu *dev);
 typedef int (rte_gpu_info_get_t)(struct rte_gpu *dev, struct rte_gpu_info *info);
-typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, void **ptr);
+typedef int (rte_gpu_mem_alloc_t)(struct rte_gpu *dev, size_t size, unsigned int align, void **ptr);
 typedef int (rte_gpu_mem_free_t)(struct rte_gpu *dev, void *ptr);
 typedef int (rte_gpu_mem_register_t)(struct rte_gpu *dev, size_t size, void *ptr);
 typedef int (rte_gpu_mem_unregister_t)(struct rte_gpu *dev, void *ptr);
diff --git a/lib/gpudev/rte_gpudev.h b/lib/gpudev/rte_gpudev.h
index fa3f3aad4f..9e2e2c5dce 100644
--- a/lib/gpudev/rte_gpudev.h
+++ b/lib/gpudev/rte_gpudev.h
@@ -364,18 +364,22 @@ int rte_gpu_info_get(int16_t dev_id, struct rte_gpu_info *info);
  * @param size
  *   Number of bytes to allocate.
  *   Requesting 0 will do nothing.
- *
+ * @param align
+ *   If 0, the return is a pointer that is suitably aligned for any kind of
+ *   variable (in the same manner as malloc()).
+ *   Otherwise, the return is a pointer that is a multiple of *align*. In
+ *   this case, it must obviously be a power of two.
  * @return
  *   A pointer to the allocated memory, otherwise NULL and rte_errno is set:
  *   - ENODEV if invalid dev_id
- *   - EINVAL if reserved flags
+ *   - EINVAL if align is not a power of two
  *   - ENOTSUP if operation not supported by the driver
  *   - E2BIG if size is higher than limit
  *   - ENOMEM if out of space
  *   - EPERM if driver error
  */
 __rte_experimental
-void *rte_gpu_mem_alloc(int16_t dev_id, size_t size)
+void *rte_gpu_mem_alloc(int16_t dev_id, size_t size, unsigned int align)
 __rte_alloc_size(2);
 
 /**
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] app/test-gpudev: test aligned memory allocation
  2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
@ 2022-01-08  0:20   ` eagostini
  2022-01-08  0:20   ` [PATCH v2 3/3] gpu/cuda: mem alloc aligned memory eagostini
  2022-01-21 10:34   ` [PATCH v2 1/3] gpudev: " Thomas Monjalon
  2 siblings, 0 replies; 12+ messages in thread
From: eagostini @ 2022-01-08  0:20 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Update gpudev app to test GPU memory aligned allocation.

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 app/test-gpudev/main.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/app/test-gpudev/main.c b/app/test-gpudev/main.c
index 5c1aa3d52f..f36f46cbca 100644
--- a/app/test-gpudev/main.c
+++ b/app/test-gpudev/main.c
@@ -69,11 +69,12 @@ alloc_gpu_memory(uint16_t gpu_id)
 	void *ptr_2 = NULL;
 	size_t buf_bytes = 1024;
 	int ret;
+	unsigned align = 4096;
 
 	printf("\n=======> TEST: Allocate GPU memory\n\n");
 
-	/* Alloc memory on GPU 0 */
-	ptr_1 = rte_gpu_mem_alloc(gpu_id, buf_bytes);
+	/* Alloc memory on GPU 0 without any specific alignment */
+	ptr_1 = rte_gpu_mem_alloc(gpu_id, buf_bytes, 0);
 	if (ptr_1 == NULL) {
 		fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n");
 		goto error;
@@ -81,7 +82,8 @@ alloc_gpu_memory(uint16_t gpu_id)
 	printf("GPU memory allocated at 0x%p size is %zd bytes\n",
 			ptr_1, buf_bytes);
 
-	ptr_2 = rte_gpu_mem_alloc(gpu_id, buf_bytes);
+	/* Alloc memory on GPU 0 with 4kB alignment */
+	ptr_2 = rte_gpu_mem_alloc(gpu_id, buf_bytes, align);
 	if (ptr_2 == NULL) {
 		fprintf(stderr, "rte_gpu_mem_alloc GPU memory returned error\n");
 		goto error;
@@ -89,6 +91,11 @@ alloc_gpu_memory(uint16_t gpu_id)
 	printf("GPU memory allocated at 0x%p size is %zd bytes\n",
 			ptr_2, buf_bytes);
 
+	if (((uintptr_t)ptr_2) % align) {
+		fprintf(stderr, "Memory address 0x%p is not aligned to %u\n", ptr_2, align);
+		goto error;
+	}
+
 	ret = rte_gpu_mem_free(gpu_id, (uint8_t *)(ptr_1)+0x700);
 	if (ret < 0) {
 		printf("GPU memory 0x%p NOT freed: GPU driver didn't find this memory address internally.\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] gpu/cuda: mem alloc aligned memory
  2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
  2022-01-08  0:20   ` [PATCH v2 2/3] app/test-gpudev: test aligned memory allocation eagostini
@ 2022-01-08  0:20   ` eagostini
  2022-01-21 10:34   ` [PATCH v2 1/3] gpudev: " Thomas Monjalon
  2 siblings, 0 replies; 12+ messages in thread
From: eagostini @ 2022-01-08  0:20 UTC (permalink / raw)
  To: dev; +Cc: Elena Agostini

From: Elena Agostini <eagostini@nvidia.com>

Implement aligned GPU memory allocation in GPU CUDA driver.

Changelog:
- cuda_mem_alloc parameters order

Signed-off-by: Elena Agostini <eagostini@nvidia.com>
---
 drivers/gpu/cuda/cuda.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/cuda/cuda.c b/drivers/gpu/cuda/cuda.c
index 882df08e56..dc8d3d3b5a 100644
--- a/drivers/gpu/cuda/cuda.c
+++ b/drivers/gpu/cuda/cuda.c
@@ -139,8 +139,10 @@ typedef uintptr_t cuda_ptr_key;
 /* Single entry of the memory list */
 struct mem_entry {
 	CUdeviceptr ptr_d;
+	CUdeviceptr ptr_orig_d;
 	void *ptr_h;
 	size_t size;
+	size_t size_orig;
 	struct rte_gpu *dev;
 	CUcontext ctx;
 	cuda_ptr_key pkey;
@@ -569,7 +571,7 @@ cuda_dev_info_get(struct rte_gpu *dev, struct rte_gpu_info *info)
  */
 
 static int
-cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
+cuda_mem_alloc(struct rte_gpu *dev, size_t size, unsigned int align, void **ptr)
 {
 	CUresult res;
 	const char *err_string;
@@ -610,8 +612,10 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 
 	/* Allocate memory */
 	mem_alloc_list_tail->size = size;
-	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_d),
-			mem_alloc_list_tail->size);
+	mem_alloc_list_tail->size_orig = size + align;
+
+	res = pfn_cuMemAlloc(&(mem_alloc_list_tail->ptr_orig_d),
+			mem_alloc_list_tail->size_orig);
 	if (res != 0) {
 		pfn_cuGetErrorString(res, &(err_string));
 		rte_cuda_log(ERR, "cuCtxSetCurrent current failed with %s",
@@ -620,6 +624,13 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 		return -rte_errno;
 	}
 
+
+	/* Align memory address */
+	mem_alloc_list_tail->ptr_d = mem_alloc_list_tail->ptr_orig_d;
+	if (align && ((uintptr_t)mem_alloc_list_tail->ptr_d) % align)
+		mem_alloc_list_tail->ptr_d += (align -
+				(((uintptr_t)mem_alloc_list_tail->ptr_d) % align));
+
 	/* GPUDirect RDMA attribute required */
 	res = pfn_cuPointerSetAttribute(&flag,
 			CU_POINTER_ATTRIBUTE_SYNC_MEMOPS,
@@ -634,7 +645,6 @@ cuda_mem_alloc(struct rte_gpu *dev, size_t size, void **ptr)
 
 	mem_alloc_list_tail->pkey = get_hash_from_ptr((void *)mem_alloc_list_tail->ptr_d);
 	mem_alloc_list_tail->ptr_h = NULL;
-	mem_alloc_list_tail->size = size;
 	mem_alloc_list_tail->dev = dev;
 	mem_alloc_list_tail->ctx = (CUcontext)((uintptr_t)dev->mpshared->info.context);
 	mem_alloc_list_tail->mtype = GPU_MEM;
@@ -761,6 +771,7 @@ cuda_mem_register(struct rte_gpu *dev, size_t size, void *ptr)
 	mem_alloc_list_tail->dev = dev;
 	mem_alloc_list_tail->ctx = (CUcontext)((uintptr_t)dev->mpshared->info.context);
 	mem_alloc_list_tail->mtype = CPU_REGISTERED;
+	mem_alloc_list_tail->ptr_orig_d = mem_alloc_list_tail->ptr_d;
 
 	/* Restore original ctx as current ctx */
 	res = pfn_cuCtxSetCurrent(current_ctx);
@@ -796,7 +807,7 @@ cuda_mem_free(struct rte_gpu *dev, void *ptr)
 	}
 
 	if (mem_item->mtype == GPU_MEM) {
-		res = pfn_cuMemFree(mem_item->ptr_d);
+		res = pfn_cuMemFree(mem_item->ptr_orig_d);
 		if (res != 0) {
 			pfn_cuGetErrorString(res, &(err_string));
 			rte_cuda_log(ERR, "cuMemFree current failed with %s",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] gpudev: mem alloc aligned memory
  2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
  2022-01-08  0:20   ` [PATCH v2 2/3] app/test-gpudev: test aligned memory allocation eagostini
  2022-01-08  0:20   ` [PATCH v2 3/3] gpu/cuda: mem alloc aligned memory eagostini
@ 2022-01-21 10:34   ` Thomas Monjalon
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2022-01-21 10:34 UTC (permalink / raw)
  To: Elena Agostini; +Cc: dev

08/01/2022 01:20, eagostini@nvidia.com:
> From: Elena Agostini <eagostini@nvidia.com>
> 
> Similarly to rte_malloc, rte_gpu_mem_alloc accept as
> input the memory alignment size.
> 
> GPU driver should return GPU memory address aligned
> with the input value.
> 
> Changelog:
> - rte_gpu_mem_alloc parameters order
> 
> Signed-off-by: Elena Agostini <eagostini@nvidia.com>

Squashed and applied, thanks.




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-21 10:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04  1:47 [PATCH v1 0/3] GPU memory aligned eagostini
2022-01-04  1:47 ` [PATCH v1 1/3] gpudev: mem alloc aligned memory eagostini
2022-01-04  1:47 ` [PATCH v1 2/3] app/test-gpudev: test aligned memory allocation eagostini
2022-01-04  1:47 ` [PATCH v1 3/3] gpu/cuda: mem alloc aligned memory eagostini
2022-01-03 18:05   ` Stephen Hemminger
2022-01-03 18:15     ` Elena Agostini
2022-01-03 18:17       ` Stephen Hemminger
2022-01-03 18:22         ` Elena Agostini
2022-01-08  0:20 ` [PATCH v2 1/3] gpudev: " eagostini
2022-01-08  0:20   ` [PATCH v2 2/3] app/test-gpudev: test aligned memory allocation eagostini
2022-01-08  0:20   ` [PATCH v2 3/3] gpu/cuda: mem alloc aligned memory eagostini
2022-01-21 10:34   ` [PATCH v2 1/3] gpudev: " Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).