patches for DPDK stable branches
 help / color / mirror / Atom feed
* [PATCH] dmadev: fix structure alignment
@ 2024-03-08  5:37 Wenwu Ma
  2024-03-08  7:01 ` fengchengwen
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Wenwu Ma @ 2024-03-08  5:37 UTC (permalink / raw)
  To: dev, fengchengwen; +Cc: songx.jiale, Wenwu Ma, stable

The structure rte_dma_dev needs cacheline alignment, but the return
value of malloc may not be aligned to the cacheline. Therefore,
extra memory is applied for realignment.

Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
 lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 5953a77bd6..61e106d574 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -160,15 +160,25 @@ static int
 dma_dev_data_prepare(void)
 {
 	size_t size;
+	void *ptr;
 
 	if (rte_dma_devices != NULL)
 		return 0;
 
-	size = dma_devices_max * sizeof(struct rte_dma_dev);
-	rte_dma_devices = malloc(size);
-	if (rte_dma_devices == NULL)
+	/* The dma device object is expected to align cacheline, but
+	 * the return value of malloc may not be aligned to the cache line.
+	 * Therefore, extra memory is applied for realignment.
+	 * note: We do not call posix_memalign/aligned_alloc because it is
+	 * version dependent on libc.
+	 */
+	size = dma_devices_max * sizeof(struct rte_dma_dev) +
+		RTE_CACHE_LINE_SIZE;
+	ptr = malloc(size);
+	if (ptr == NULL)
 		return -ENOMEM;
-	memset(rte_dma_devices, 0, size);
+	memset(ptr, 0, size);
+
+	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] dmadev: fix structure alignment
  2024-03-08  5:37 [PATCH] dmadev: fix structure alignment Wenwu Ma
@ 2024-03-08  7:01 ` fengchengwen
  2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
  2024-03-20  7:23 ` [PATCH v3] " Wenwu Ma
  2 siblings, 0 replies; 22+ messages in thread
From: fengchengwen @ 2024-03-08  7:01 UTC (permalink / raw)
  To: Wenwu Ma, dev; +Cc: songx.jiale, stable

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2024/3/8 13:37, Wenwu Ma wrote:
> The structure rte_dma_dev needs cacheline alignment, but the return
> value of malloc may not be aligned to the cacheline. Therefore,
> extra memory is applied for realignment.
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
>  lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> index 5953a77bd6..61e106d574 100644
> --- a/lib/dmadev/rte_dmadev.c
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -160,15 +160,25 @@ static int
>  dma_dev_data_prepare(void)
>  {
>  	size_t size;
> +	void *ptr;
>  
>  	if (rte_dma_devices != NULL)
>  		return 0;
>  
> -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> -	if (rte_dma_devices == NULL)
> +	/* The dma device object is expected to align cacheline, but
> +	 * the return value of malloc may not be aligned to the cache line.
> +	 * Therefore, extra memory is applied for realignment.
> +	 * note: We do not call posix_memalign/aligned_alloc because it is
> +	 * version dependent on libc.
> +	 */
> +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> +		RTE_CACHE_LINE_SIZE;
> +	ptr = malloc(size);
> +	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
> +	memset(ptr, 0, size);
> +
> +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
>  
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] dmadev: fix structure alignment
  2024-03-08  5:37 [PATCH] dmadev: fix structure alignment Wenwu Ma
  2024-03-08  7:01 ` fengchengwen
@ 2024-03-15  1:43 ` Wenwu Ma
  2024-03-15  6:02   ` Tyler Retzlaff
                     ` (2 more replies)
  2024-03-20  7:23 ` [PATCH v3] " Wenwu Ma
  2 siblings, 3 replies; 22+ messages in thread
From: Wenwu Ma @ 2024-03-15  1:43 UTC (permalink / raw)
  To: dev, fengchengwen; +Cc: songx.jiale, Wenwu Ma, stable

The structure rte_dma_dev needs only 8 byte alignment.
This patch replaces __rte_cache_aligned of rte_dma_dev
with __rte_aligned(8).

Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v2:
 - Because of performance drop, adjust the code to
   no longer demand cache line alignment

---
 lib/dmadev/rte_dmadev_pmd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
index 58729088ff..b569bb3502 100644
--- a/lib/dmadev/rte_dmadev_pmd.h
+++ b/lib/dmadev/rte_dmadev_pmd.h
@@ -122,7 +122,7 @@ enum rte_dma_dev_state {
  * @internal
  * The generic data structure associated with each DMA device.
  */
-struct __rte_cache_aligned rte_dma_dev {
+struct __rte_aligned(8) rte_dma_dev {
 	/** Device info which supplied during device initialization. */
 	struct rte_device *device;
 	struct rte_dma_dev_data *data; /**< Pointer to shared device data. */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
@ 2024-03-15  6:02   ` Tyler Retzlaff
  2024-03-15  6:06   ` fengchengwen
  2024-03-19  9:48   ` Jiale, SongX
  2 siblings, 0 replies; 22+ messages in thread
From: Tyler Retzlaff @ 2024-03-15  6:02 UTC (permalink / raw)
  To: Wenwu Ma; +Cc: dev, fengchengwen, songx.jiale, stable

On Fri, Mar 15, 2024 at 09:43:31AM +0800, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---

Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
  2024-03-15  6:02   ` Tyler Retzlaff
@ 2024-03-15  6:06   ` fengchengwen
  2024-03-15  6:25     ` Ma, WenwuX
  2024-03-19  9:48   ` Jiale, SongX
  2 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2024-03-15  6:06 UTC (permalink / raw)
  To: Wenwu Ma, dev; +Cc: songx.jiale, stable

Hi Wenwu,

On 2024/3/15 9:43, Wenwu Ma wrote:
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev
> with __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
> v2:
>  - Because of performance drop, adjust the code to
>    no longer demand cache line alignment

Which two versions observed performance drop? And which benchmark observed drop?
Could you provide more information?

> 
> ---
>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> index 58729088ff..b569bb3502 100644
> --- a/lib/dmadev/rte_dmadev_pmd.h
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>   * @internal
>   * The generic data structure associated with each DMA device.
>   */
> -struct __rte_cache_aligned rte_dma_dev {
> +struct __rte_aligned(8) rte_dma_dev {

The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
rte_dma_dev? So why is it a problem here?

Thanks

>  	/** Device info which supplied during device initialization. */
>  	struct rte_device *device;
>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data. */
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  6:06   ` fengchengwen
@ 2024-03-15  6:25     ` Ma, WenwuX
  2024-03-15  7:44       ` Ma, WenwuX
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-15  6:25 UTC (permalink / raw)
  To: fengchengwen, dev; +Cc: Jiale, SongX, stable

Hi Chengwen,

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 2:06 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 9:43, Wenwu Ma wrote:
> > The structure rte_dma_dev needs only 8 byte alignment.
> > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > __rte_aligned(8).
> >
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > ---
> > v2:
> >  - Because of performance drop, adjust the code to
> >    no longer demand cache line alignment
> 
> Which two versions observed performance drop? And which benchmark
> observed drop?
> Could you provide more information?
> 
> >
V1 patch:
https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/

To view detailed results, visit:
https://lab.dpdk.org/results/dashboard/patchsets/29472/

> > ---
> >  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> b/lib/dmadev/rte_dmadev_pmd.h
> > index 58729088ff..b569bb3502 100644
> > --- a/lib/dmadev/rte_dmadev_pmd.h
> > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >   * @internal
> >   * The generic data structure associated with each DMA device.
> >   */
> > -struct __rte_cache_aligned rte_dma_dev {
> > +struct __rte_aligned(8) rte_dma_dev {
> 
> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is not
> rte_dma_dev? So why is it a problem here?
> 
> Thanks
> 
The DMA device object is expected to align cache line, so clang will use “vmovaps” assembly instruction, 

And the instruction demands 16 bytes alignment or will cause segment fault in some environments.


> >  	/** Device info which supplied during device initialization. */
> >  	struct rte_device *device;
> >  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > */
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  6:25     ` Ma, WenwuX
@ 2024-03-15  7:44       ` Ma, WenwuX
  2024-03-15  8:31         ` fengchengwen
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-15  7:44 UTC (permalink / raw)
  To: fengchengwen, dev; +Cc: Jiale, SongX, stable

Hi Chengwen,

> -----Original Message-----
> From: Ma, WenwuX
> Sent: Friday, March 15, 2024 2:26 PM
> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Chengwen,
> 
> > -----Original Message-----
> > From: fengchengwen <fengchengwen@huawei.com>
> > Sent: Friday, March 15, 2024 2:06 PM
> > To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> > Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> > Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >
> > Hi Wenwu,
> >
> > On 2024/3/15 9:43, Wenwu Ma wrote:
> > > The structure rte_dma_dev needs only 8 byte alignment.
> > > This patch replaces __rte_cache_aligned of rte_dma_dev with
> > > __rte_aligned(8).
> > >
> > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > ---
> > > v2:
> > >  - Because of performance drop, adjust the code to
> > >    no longer demand cache line alignment
> >
> > Which two versions observed performance drop? And which benchmark
> > observed drop?
> > Could you provide more information?
> >
> > >
> V1 patch:
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
> 
> To view detailed results, visit:
> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> 
> > > ---
> > >  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/lib/dmadev/rte_dmadev_pmd.h
> > b/lib/dmadev/rte_dmadev_pmd.h
> > > index 58729088ff..b569bb3502 100644
> > > --- a/lib/dmadev/rte_dmadev_pmd.h
> > > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > > @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> > >   * @internal
> > >   * The generic data structure associated with each DMA device.
> > >   */
> > > -struct __rte_cache_aligned rte_dma_dev {
> > > +struct __rte_aligned(8) rte_dma_dev {
> >
> > The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
> > not rte_dma_dev? So why is it a problem here?
> >
> > Thanks
> >
> The DMA device object is expected to align cache line, so clang will use
> “vmovaps” assembly instruction,
> 
> And the instruction demands 16 bytes alignment or will cause segment fault in
> some environments.
> 
Test case:
1. compile dpdk 
rm -rf x86_64-native-linuxapp-clang
CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
ninja -C x86_64-native-linuxapp-clang -j 72 
2. start dpdk-test
/root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
3. exit dpdk-test
RTE>>quit
Segmentation fault (core dumped)

> 
> > >  	/** Device info which supplied during device initialization. */
> > >  	struct rte_device *device;
> > >  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> > > */
> > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  7:44       ` Ma, WenwuX
@ 2024-03-15  8:31         ` fengchengwen
  2024-03-15  9:27           ` Ma, WenwuX
  0 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2024-03-15  8:31 UTC (permalink / raw)
  To: Ma, WenwuX, dev; +Cc: Jiale, SongX, stable

Hi Wenwu,

On 2024/3/15 15:44, Ma, WenwuX wrote:
> Hi Chengwen,
> 
>> -----Original Message-----
>> From: Ma, WenwuX
>> Sent: Friday, March 15, 2024 2:26 PM
>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Chengwen,
>>
>>> -----Original Message-----
>>> From: fengchengwen <fengchengwen@huawei.com>
>>> Sent: Friday, March 15, 2024 2:06 PM
>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>
>>> Hi Wenwu,
>>>
>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>> __rte_aligned(8).
>>>>
>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>> ---
>>>> v2:
>>>>  - Because of performance drop, adjust the code to
>>>>    no longer demand cache line alignment
>>>
>>> Which two versions observed performance drop? And which benchmark
>>> observed drop?
>>> Could you provide more information?
>>>
>>>>
>> V1 patch:
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> To view detailed results, visit:
>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>
>>>> ---
>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>> index 58729088ff..b569bb3502 100644
>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>   * @internal
>>>>   * The generic data structure associated with each DMA device.
>>>>   */
>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>
>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which is
>>> not rte_dma_dev? So why is it a problem here?
>>>
>>> Thanks
>>>
>> The DMA device object is expected to align cache line, so clang will use
>> “vmovaps” assembly instruction,
>>
>> And the instruction demands 16 bytes alignment or will cause segment fault in
>> some environments.
>>
> Test case:
> 1. compile dpdk 
> rm -rf x86_64-native-linuxapp-clang
> CC=clang meson -Denable_kmods=True -Dlibdir=lib --default-library=static x86_64-native-linuxapp-clang
> ninja -C x86_64-native-linuxapp-clang -j 72 
> 2. start dpdk-test
> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39 --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note: If it cannot be reproduced, please try using a different core)
> 3. exit dpdk-test
> RTE>>quit
> Segmentation fault (core dumped)

I will try to reproduce, but still a question: does above test has already merged your patch [1] or the current main branch code has this problem?

[1] https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-1-wenwux.ma@intel.com/

Thanks

> 
>>
>>>>  	/** Device info which supplied during device initialization. */
>>>>  	struct rte_device *device;
>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>> */
>>>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  8:31         ` fengchengwen
@ 2024-03-15  9:27           ` Ma, WenwuX
  2024-03-20  4:11             ` fengchengwen
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-15  9:27 UTC (permalink / raw)
  To: fengchengwen, dev; +Cc: Jiale, SongX, stable

Hi Chengwen

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 15, 2024 4:32 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 15:44, Ma, WenwuX wrote:
> > Hi Chengwen,
> >
> >> -----Original Message-----
> >> From: Ma, WenwuX
> >> Sent: Friday, March 15, 2024 2:26 PM
> >> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Chengwen,
> >>
> >>> -----Original Message-----
> >>> From: fengchengwen <fengchengwen@huawei.com>
> >>> Sent: Friday, March 15, 2024 2:06 PM
> >>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>
> >>> Hi Wenwu,
> >>>
> >>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>> __rte_aligned(8).
> >>>>
> >>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>> ---
> >>>> v2:
> >>>>  - Because of performance drop, adjust the code to
> >>>>    no longer demand cache line alignment
> >>>
> >>> Which two versions observed performance drop? And which benchmark
> >>> observed drop?
> >>> Could you provide more information?
> >>>
> >>>>
> >> V1 patch:
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> To view detailed results, visit:
> >> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>
> >>>> ---
> >>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>> index 58729088ff..b569bb3502 100644
> >>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>   * @internal
> >>>>   * The generic data structure associated with each DMA device.
> >>>>   */
> >>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>
> >>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>> is not rte_dma_dev? So why is it a problem here?
> >>>
> >>> Thanks
> >>>
> >> The DMA device object is expected to align cache line, so clang will
> >> use “vmovaps” assembly instruction,
> >>
> >> And the instruction demands 16 bytes alignment or will cause segment
> >> fault in some environments.
> >>
> > Test case:
> > 1. compile dpdk
> > rm -rf x86_64-native-linuxapp-clang
> > CC=clang meson -Denable_kmods=True -Dlibdir=lib
> > --default-library=static x86_64-native-linuxapp-clang ninja -C
> > x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> > /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> > --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> > If it cannot be reproduced, please try using a different core)
> > 3. exit dpdk-test
> > RTE>>quit
> > Segmentation fault (core dumped)
> 
> I will try to reproduce, but still a question: does above test has already merged
> your patch [1] or the current main branch code has this problem?
> 
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> 1-wenwux.ma@intel.com/
> 
> Thanks
> 
the current main branch code has this problem.

Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.

> >
> >>
> >>>>  	/** Device info which supplied during device initialization. */
> >>>>  	struct rte_device *device;
> >>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
> >>>> */
> >>>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
  2024-03-15  6:02   ` Tyler Retzlaff
  2024-03-15  6:06   ` fengchengwen
@ 2024-03-19  9:48   ` Jiale, SongX
  2 siblings, 0 replies; 22+ messages in thread
From: Jiale, SongX @ 2024-03-19  9:48 UTC (permalink / raw)
  To: Ma, WenwuX, dev, fengchengwen; +Cc: stable

> -----Original Message-----
> From: Ma, WenwuX <wenwux.ma@intel.com>
> Sent: Friday, March 15, 2024 9:44 AM
> To: dev@dpdk.org; fengchengwen@huawei.com
> Cc: Jiale, SongX <songx.jiale@intel.com>; Ma, WenwuX
> <wenwux.ma@intel.com>; stable@dpdk.org
> Subject: [PATCH v2] dmadev: fix structure alignment
> 
> The structure rte_dma_dev needs only 8 byte alignment.
> This patch replaces __rte_cache_aligned of rte_dma_dev with
> __rte_aligned(8).
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
Tested-by: Jiale Song <songx.jiale@intel.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] dmadev: fix structure alignment
  2024-03-15  9:27           ` Ma, WenwuX
@ 2024-03-20  4:11             ` fengchengwen
  2024-03-20  7:34               ` Ma, WenwuX
  0 siblings, 1 reply; 22+ messages in thread
From: fengchengwen @ 2024-03-20  4:11 UTC (permalink / raw)
  To: Ma, WenwuX, dev; +Cc: Jiale, SongX, stable, Pavan Nikhilesh, Thomas Monjalon

Hi Wenwu,

On 2024/3/15 17:27, Ma, WenwuX wrote:
> Hi Chengwen
> 
>> -----Original Message-----
>> From: fengchengwen <fengchengwen@huawei.com>
>> Sent: Friday, March 15, 2024 4:32 PM
>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>
>> Hi Wenwu,
>>
>> On 2024/3/15 15:44, Ma, WenwuX wrote:
>>> Hi Chengwen,
>>>
>>>> -----Original Message-----
>>>> From: Ma, WenwuX
>>>> Sent: Friday, March 15, 2024 2:26 PM
>>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
>>>>
>>>> Hi Chengwen,
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>>> Sent: Friday, March 15, 2024 2:06 PM
>>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
>>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
>>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
>>>>>
>>>>> Hi Wenwu,
>>>>>
>>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
>>>>>> The structure rte_dma_dev needs only 8 byte alignment.
>>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
>>>>>> __rte_aligned(8).
>>>>>>
>>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
>>>>>> ---
>>>>>> v2:
>>>>>>  - Because of performance drop, adjust the code to
>>>>>>    no longer demand cache line alignment
>>>>>
>>>>> Which two versions observed performance drop? And which benchmark
>>>>> observed drop?
>>>>> Could you provide more information?
>>>>>
>>>>>>
>>>> V1 patch:
>>>>
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>>>> 1-wenwux.ma@intel.com/
>>>>
>>>> To view detailed results, visit:
>>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
>>>>
>>>>>> ---
>>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
>>>>> b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> index 58729088ff..b569bb3502 100644
>>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
>>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
>>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
>>>>>>   * @internal
>>>>>>   * The generic data structure associated with each DMA device.
>>>>>>   */
>>>>>> -struct __rte_cache_aligned rte_dma_dev {
>>>>>> +struct __rte_aligned(8) rte_dma_dev {
>>>>>
>>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
>>>>> is not rte_dma_dev? So why is it a problem here?
>>>>>
>>>>> Thanks
>>>>>
>>>> The DMA device object is expected to align cache line, so clang will
>>>> use “vmovaps” assembly instruction,
>>>>
>>>> And the instruction demands 16 bytes alignment or will cause segment
>>>> fault in some environments.
>>>>
>>> Test case:
>>> 1. compile dpdk
>>> rm -rf x86_64-native-linuxapp-clang
>>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
>>> --default-library=static x86_64-native-linuxapp-clang ninja -C
>>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
>>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
>>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
>>> If it cannot be reproduced, please try using a different core)
>>> 3. exit dpdk-test
>>> RTE>>quit
>>> Segmentation fault (core dumped)

I reproduce it just with --vdev=dma_skeleton.
When execute quit command, it will invoke rte_dma_close->dma_release, pls see my annotations (//) below:

void
dma_release(struct rte_dma_dev *dev)
{
	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
		rte_free(dev->data->dev_private);
		memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
	}

	dma_fp_object_dummy(dev->fp_obj);
	memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was compiles using vmovaps, its
						//  8c24da:       c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
						//  8c24de:       c5 fc 29 43 20          vmovaps %ymm0,0x20(%rbx)
						//  8c24e3:       c5 fc 29 03             vmovaps %ymm0,(%rbx)
						// but the dev is not align 16B (in my env the rte_dma_devices addr is 0x15d39950)
}

>>
>> I will try to reproduce, but still a question: does above test has already merged
>> your patch [1] or the current main branch code has this problem?
>>
>> [1]
>> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
>> 1-wenwux.ma@intel.com/
>>
>> Thanks
>>
> the current main branch code has this problem.
> 
> Both patch v1 and v2 are able to solve this problem, but v1 has a performance issue.

The performance issue is ethdev benchmark, it will not invoke any dmadev API, I don't think these two has any relations.

So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it was not a fix for clang-x86-platform.

[1] https://lore.kernel.org/all/20240210062758.1510-1-pbhagavatula@marvell.com/T/

> 
>>>
>>>>
>>>>>>  	/** Device info which supplied during device initialization. */
>>>>>>  	struct rte_device *device;
>>>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device data.
>>>>>> */
>>>>>>

What more, could you please send v3? I hope it will contain the root cause and optional solutions of the segment fault problem.

BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe more xxxdev will use this type, I think that's typical.
     Maybe we should add a such mem_align() function in eal library, but this could done later.

Thanks


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] dmadev: fix structure alignment
  2024-03-08  5:37 [PATCH] dmadev: fix structure alignment Wenwu Ma
  2024-03-08  7:01 ` fengchengwen
  2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
@ 2024-03-20  7:23 ` Wenwu Ma
  2024-03-20  9:31   ` fengchengwen
  2024-03-20 11:37   ` Thomas Monjalon
  2 siblings, 2 replies; 22+ messages in thread
From: Wenwu Ma @ 2024-03-20  7:23 UTC (permalink / raw)
  To: dev, fengchengwen; +Cc: songx.jiale, Wenwu Ma, stable

The structure rte_dma_dev needs to be aligned to the cache line, but
the return value of malloc may not be aligned to the cache line. When
we use memset to clear the rte_dma_dev object, it may cause a segmentation
fault in clang-x86-platform.

This is because clang uses the "vmovaps" assembly instruction for
memset, which requires that the operands (rte_dma_dev objects) must
aligned on a 16-byte boundary or a general-protection exception (#GP)
is generated.

Therefore, either additional memory is applied for re-alignment, or the
rte_dma_dev object does not require cache line alignment. The patch
chooses the former option to fix the issue.

Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
Cc: stable@dpdk.org

Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
---
v2:
 - Because of performance drop, adjust the code to
   no longer demand cache line alignment
v3:
 - back to v1 patch

---
 lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
index 5953a77bd6..61e106d574 100644
--- a/lib/dmadev/rte_dmadev.c
+++ b/lib/dmadev/rte_dmadev.c
@@ -160,15 +160,25 @@ static int
 dma_dev_data_prepare(void)
 {
 	size_t size;
+	void *ptr;
 
 	if (rte_dma_devices != NULL)
 		return 0;
 
-	size = dma_devices_max * sizeof(struct rte_dma_dev);
-	rte_dma_devices = malloc(size);
-	if (rte_dma_devices == NULL)
+	/* The dma device object is expected to align cacheline, but
+	 * the return value of malloc may not be aligned to the cache line.
+	 * Therefore, extra memory is applied for realignment.
+	 * note: We do not call posix_memalign/aligned_alloc because it is
+	 * version dependent on libc.
+	 */
+	size = dma_devices_max * sizeof(struct rte_dma_dev) +
+		RTE_CACHE_LINE_SIZE;
+	ptr = malloc(size);
+	if (ptr == NULL)
 		return -ENOMEM;
-	memset(rte_dma_devices, 0, size);
+	memset(ptr, 0, size);
+
+	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v2] dmadev: fix structure alignment
  2024-03-20  4:11             ` fengchengwen
@ 2024-03-20  7:34               ` Ma, WenwuX
  0 siblings, 0 replies; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-20  7:34 UTC (permalink / raw)
  To: fengchengwen, dev; +Cc: Jiale, SongX, stable, Pavan Nikhilesh, Thomas Monjalon

Hi chengwen,

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Wednesday, March 20, 2024 12:12 PM
> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> 
> Hi Wenwu,
> 
> On 2024/3/15 17:27, Ma, WenwuX wrote:
> > Hi Chengwen
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengchengwen@huawei.com>
> >> Sent: Friday, March 15, 2024 4:32 PM
> >> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>
> >> Hi Wenwu,
> >>
> >> On 2024/3/15 15:44, Ma, WenwuX wrote:
> >>> Hi Chengwen,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ma, WenwuX
> >>>> Sent: Friday, March 15, 2024 2:26 PM
> >>>> To: fengchengwen <fengchengwen@huawei.com>; dev@dpdk.org
> >>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>> Subject: RE: [PATCH v2] dmadev: fix structure alignment
> >>>>
> >>>> Hi Chengwen,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: fengchengwen <fengchengwen@huawei.com>
> >>>>> Sent: Friday, March 15, 2024 2:06 PM
> >>>>> To: Ma, WenwuX <wenwux.ma@intel.com>; dev@dpdk.org
> >>>>> Cc: Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> >>>>> Subject: Re: [PATCH v2] dmadev: fix structure alignment
> >>>>>
> >>>>> Hi Wenwu,
> >>>>>
> >>>>> On 2024/3/15 9:43, Wenwu Ma wrote:
> >>>>>> The structure rte_dma_dev needs only 8 byte alignment.
> >>>>>> This patch replaces __rte_cache_aligned of rte_dma_dev with
> >>>>>> __rte_aligned(8).
> >>>>>>
> >>>>>> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> >>>>>> Cc: stable@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> >>>>>> ---
> >>>>>> v2:
> >>>>>>  - Because of performance drop, adjust the code to
> >>>>>>    no longer demand cache line alignment
> >>>>>
> >>>>> Which two versions observed performance drop? And which
> benchmark
> >>>>> observed drop?
> >>>>> Could you provide more information?
> >>>>>
> >>>>>>
> >>>> V1 patch:
> >>>>
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >>>> 1-wenwux.ma@intel.com/
> >>>>
> >>>> To view detailed results, visit:
> >>>> https://lab.dpdk.org/results/dashboard/patchsets/29472/
> >>>>
> >>>>>> ---
> >>>>>>  lib/dmadev/rte_dmadev_pmd.h | 2 +-
> >>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/lib/dmadev/rte_dmadev_pmd.h
> >>>>> b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> index 58729088ff..b569bb3502 100644
> >>>>>> --- a/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> +++ b/lib/dmadev/rte_dmadev_pmd.h
> >>>>>> @@ -122,7 +122,7 @@ enum rte_dma_dev_state {
> >>>>>>   * @internal
> >>>>>>   * The generic data structure associated with each DMA device.
> >>>>>>   */
> >>>>>> -struct __rte_cache_aligned rte_dma_dev {
> >>>>>> +struct __rte_aligned(8) rte_dma_dev {
> >>>>>
> >>>>> The DMA fast-path was implemented by struct rte_dma_fp_objs, which
> >>>>> is not rte_dma_dev? So why is it a problem here?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>> The DMA device object is expected to align cache line, so clang
> >>>> will use “vmovaps” assembly instruction,
> >>>>
> >>>> And the instruction demands 16 bytes alignment or will cause
> >>>> segment fault in some environments.
> >>>>
> >>> Test case:
> >>> 1. compile dpdk
> >>> rm -rf x86_64-native-linuxapp-clang
> >>> CC=clang meson -Denable_kmods=True -Dlibdir=lib
> >>> --default-library=static x86_64-native-linuxapp-clang ninja -C
> >>> x86_64-native-linuxapp-clang -j 72 2. start dpdk-test
> >>> /root/dpdk/x86_64-native-linuxapp-clang/app/dpdk-test -l 0-39
> >>> --vdev=dma_skeleton -a 31:00.0 -a 31:00.1 -a 31:00.2 -a 31:00.3 (Note:
> >>> If it cannot be reproduced, please try using a different core)
> >>> 3. exit dpdk-test
> >>> RTE>>quit
> >>> Segmentation fault (core dumped)
> 
> I reproduce it just with --vdev=dma_skeleton.
> When execute quit command, it will invoke rte_dma_close->dma_release, pls
> see my annotations (//) below:
> 
> void
> dma_release(struct rte_dma_dev *dev)
> {
> 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> 		rte_free(dev->data->dev_private);
> 		memset(dev->data, 0, sizeof(struct rte_dma_dev_data));
> 	}
> 
> 	dma_fp_object_dummy(dev->fp_obj);
> 	memset(dev, 0, sizeof(struct rte_dma_dev));   // this memset was
> compiles using vmovaps, its
> 						//  8c24da:       c5 f8 57 c0
> vxorps %xmm0,%xmm0,%xmm0
> 						//  8c24de:       c5 fc 29 43 20
> vmovaps %ymm0,0x20(%rbx)
> 						//  8c24e3:       c5 fc 29 03
> vmovaps %ymm0,(%rbx)
> 						// but the dev is not align 16B
> (in my env the rte_dma_devices addr is 0x15d39950) }
> 
> >>
> >> I will try to reproduce, but still a question: does above test has
> >> already merged your patch [1] or the current main branch code has this
> problem?
> >>
> >> [1]
> >>
> https://patches.dpdk.org/project/dpdk/patch/20240308053711.1260154-
> >> 1-wenwux.ma@intel.com/
> >>
> >> Thanks
> >>
> > the current main branch code has this problem.
> >
> > Both patch v1 and v2 are able to solve this problem, but v1 has a
> performance issue.
> 
> The performance issue is ethdev benchmark, it will not invoke any dmadev
> API, I don't think these two has any relations.
> 
> So I prefer v1, Plus Pavan also submit a commit [1] to align the struct, but it
> was not a fix for clang-x86-platform.
> 
The performance issue is subtle, as it doesn't occur in the v2 patch. 
So, maybe it needs more investigation.

> [1] https://lore.kernel.org/all/20240210062758.1510-1-
> pbhagavatula@marvell.com/T/
> 
> >
> >>>
> >>>>
> >>>>>>  	/** Device info which supplied during device initialization. */
> >>>>>>  	struct rte_device *device;
> >>>>>>  	struct rte_dma_dev_data *data; /**< Pointer to shared device
> data.
> >>>>>> */
> >>>>>>
> 
> What more, could you please send v3? I hope it will contain the root cause and
> optional solutions of the segment fault problem.
> 
I will submit v3 patch later.

> BTW: dmadev is the first one which dynamic alloc dmadev struct, later maybe
> more xxxdev will use this type, I think that's typical.
>      Maybe we should add a such mem_align() function in eal library, but this
> could done later.
> 
> Thanks


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-20  7:23 ` [PATCH v3] " Wenwu Ma
@ 2024-03-20  9:31   ` fengchengwen
  2024-06-27 12:46     ` Thomas Monjalon
  2024-03-20 11:37   ` Thomas Monjalon
  1 sibling, 1 reply; 22+ messages in thread
From: fengchengwen @ 2024-03-20  9:31 UTC (permalink / raw)
  To: Wenwu Ma, dev; +Cc: songx.jiale, stable, Thomas Monjalon

Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>

On 2024/3/20 15:23, Wenwu Ma wrote:
> The structure rte_dma_dev needs to be aligned to the cache line, but
> the return value of malloc may not be aligned to the cache line. When
> we use memset to clear the rte_dma_dev object, it may cause a segmentation
> fault in clang-x86-platform.
> 
> This is because clang uses the "vmovaps" assembly instruction for
> memset, which requires that the operands (rte_dma_dev objects) must
> aligned on a 16-byte boundary or a general-protection exception (#GP)
> is generated.
> 
> Therefore, either additional memory is applied for re-alignment, or the
> rte_dma_dev object does not require cache line alignment. The patch
> chooses the former option to fix the issue.
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> ---
> v2:
>  - Because of performance drop, adjust the code to
>    no longer demand cache line alignment
> v3:
>  - back to v1 patch
> 
> ---
>  lib/dmadev/rte_dmadev.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> index 5953a77bd6..61e106d574 100644
> --- a/lib/dmadev/rte_dmadev.c
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -160,15 +160,25 @@ static int
>  dma_dev_data_prepare(void)
>  {
>  	size_t size;
> +	void *ptr;
>  
>  	if (rte_dma_devices != NULL)
>  		return 0;
>  
> -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> -	if (rte_dma_devices == NULL)
> +	/* The dma device object is expected to align cacheline, but
> +	 * the return value of malloc may not be aligned to the cache line.
> +	 * Therefore, extra memory is applied for realignment.
> +	 * note: We do not call posix_memalign/aligned_alloc because it is
> +	 * version dependent on libc.
> +	 */
> +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> +		RTE_CACHE_LINE_SIZE;
> +	ptr = malloc(size);
> +	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
> +	memset(ptr, 0, size);
> +
> +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
>  
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-20  7:23 ` [PATCH v3] " Wenwu Ma
  2024-03-20  9:31   ` fengchengwen
@ 2024-03-20 11:37   ` Thomas Monjalon
  2024-03-21  1:25     ` Ma, WenwuX
  1 sibling, 1 reply; 22+ messages in thread
From: Thomas Monjalon @ 2024-03-20 11:37 UTC (permalink / raw)
  To: fengchengwen, Wenwu Ma; +Cc: dev, songx.jiale, stable

20/03/2024 08:23, Wenwu Ma:
> The structure rte_dma_dev needs to be aligned to the cache line, but
> the return value of malloc may not be aligned to the cache line. When
> we use memset to clear the rte_dma_dev object, it may cause a segmentation
> fault in clang-x86-platform.
> 
> This is because clang uses the "vmovaps" assembly instruction for
> memset, which requires that the operands (rte_dma_dev objects) must
> aligned on a 16-byte boundary or a general-protection exception (#GP)
> is generated.
> 
> Therefore, either additional memory is applied for re-alignment, or the
> rte_dma_dev object does not require cache line alignment. The patch
> chooses the former option to fix the issue.
> 
> Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
[..]
> -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> -	rte_dma_devices = malloc(size);
> -	if (rte_dma_devices == NULL)
> +	/* The dma device object is expected to align cacheline, but
> +	 * the return value of malloc may not be aligned to the cache line.
> +	 * Therefore, extra memory is applied for realignment.
> +	 * note: We do not call posix_memalign/aligned_alloc because it is
> +	 * version dependent on libc.
> +	 */
> +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> +		RTE_CACHE_LINE_SIZE;
> +	ptr = malloc(size);
> +	if (ptr == NULL)
>  		return -ENOMEM;
> -	memset(rte_dma_devices, 0, size);
> +	memset(ptr, 0, size);
> +
> +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);

Why not using aligned_alloc()?
https://en.cppreference.com/w/c/memory/aligned_alloc




^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3] dmadev: fix structure alignment
  2024-03-20 11:37   ` Thomas Monjalon
@ 2024-03-21  1:25     ` Ma, WenwuX
  2024-03-21  8:30       ` Thomas Monjalon
  0 siblings, 1 reply; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-21  1:25 UTC (permalink / raw)
  To: Thomas Monjalon, fengchengwen; +Cc: dev, Jiale, SongX, stable

Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, March 20, 2024 7:37 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 20/03/2024 08:23, Wenwu Ma:
> > The structure rte_dma_dev needs to be aligned to the cache line, but
> > the return value of malloc may not be aligned to the cache line. When
> > we use memset to clear the rte_dma_dev object, it may cause a
> > segmentation fault in clang-x86-platform.
> >
> > This is because clang uses the "vmovaps" assembly instruction for
> > memset, which requires that the operands (rte_dma_dev objects) must
> > aligned on a 16-byte boundary or a general-protection exception (#GP)
> > is generated.
> >
> > Therefore, either additional memory is applied for re-alignment, or
> > the rte_dma_dev object does not require cache line alignment. The
> > patch chooses the former option to fix the issue.
> >
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> [..]
> > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > -	rte_dma_devices = malloc(size);
> > -	if (rte_dma_devices == NULL)
> > +	/* The dma device object is expected to align cacheline, but
> > +	 * the return value of malloc may not be aligned to the cache line.
> > +	 * Therefore, extra memory is applied for realignment.
> > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > +	 * version dependent on libc.
> > +	 */
> > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > +		RTE_CACHE_LINE_SIZE;
> > +	ptr = malloc(size);
> > +	if (ptr == NULL)
> >  		return -ENOMEM;
> > -	memset(rte_dma_devices, 0, size);
> > +	memset(ptr, 0, size);
> > +
> > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> 
> Why not using aligned_alloc()?
> https://en.cppreference.com/w/c/memory/aligned_alloc
> 
> 
because it is version dependent on libc.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-21  1:25     ` Ma, WenwuX
@ 2024-03-21  8:30       ` Thomas Monjalon
  2024-03-21  8:57         ` Ma, WenwuX
  2024-03-21  9:18         ` Ma, WenwuX
  0 siblings, 2 replies; 22+ messages in thread
From: Thomas Monjalon @ 2024-03-21  8:30 UTC (permalink / raw)
  To: fengchengwen, Ma, WenwuX; +Cc: dev, Jiale, SongX, stable

21/03/2024 02:25, Ma, WenwuX:
> Hi, Thomas
> 
> From: Thomas Monjalon <thomas@monjalon.net>
> > 20/03/2024 08:23, Wenwu Ma:
> > > The structure rte_dma_dev needs to be aligned to the cache line, but
> > > the return value of malloc may not be aligned to the cache line. When
> > > we use memset to clear the rte_dma_dev object, it may cause a
> > > segmentation fault in clang-x86-platform.
> > >
> > > This is because clang uses the "vmovaps" assembly instruction for
> > > memset, which requires that the operands (rte_dma_dev objects) must
> > > aligned on a 16-byte boundary or a general-protection exception (#GP)
> > > is generated.
> > >
> > > Therefore, either additional memory is applied for re-alignment, or
> > > the rte_dma_dev object does not require cache line alignment. The
> > > patch chooses the former option to fix the issue.
> > >
> > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > [..]
> > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > -	rte_dma_devices = malloc(size);
> > > -	if (rte_dma_devices == NULL)
> > > +	/* The dma device object is expected to align cacheline, but
> > > +	 * the return value of malloc may not be aligned to the cache line.
> > > +	 * Therefore, extra memory is applied for realignment.
> > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > +	 * version dependent on libc.
> > > +	 */
> > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > +		RTE_CACHE_LINE_SIZE;
> > > +	ptr = malloc(size);
> > > +	if (ptr == NULL)
> > >  		return -ENOMEM;
> > > -	memset(rte_dma_devices, 0, size);
> > > +	memset(ptr, 0, size);
> > > +
> > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > 
> > Why not using aligned_alloc()?
> > https://en.cppreference.com/w/c/memory/aligned_alloc
> > 
> > 
> because it is version dependent on libc.

Which libc is required?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3] dmadev: fix structure alignment
  2024-03-21  8:30       ` Thomas Monjalon
@ 2024-03-21  8:57         ` Ma, WenwuX
  2024-03-21  9:18         ` Ma, WenwuX
  1 sibling, 0 replies; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-21  8:57 UTC (permalink / raw)
  To: Thomas Monjalon, fengchengwen; +Cc: dev, Jiale, SongX, stable

Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, March 21, 2024 4:31 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 21/03/2024 02:25, Ma, WenwuX:
> > Hi, Thomas
> >
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 20/03/2024 08:23, Wenwu Ma:
> > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > but the return value of malloc may not be aligned to the cache
> > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > cause a segmentation fault in clang-x86-platform.
> > > >
> > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > must aligned on a 16-byte boundary or a general-protection
> > > > exception (#GP) is generated.
> > > >
> > > > Therefore, either additional memory is applied for re-alignment,
> > > > or the rte_dma_dev object does not require cache line alignment.
> > > > The patch chooses the former option to fix the issue.
> > > >
> > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > [..]
> > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > -	rte_dma_devices = malloc(size);
> > > > -	if (rte_dma_devices == NULL)
> > > > +	/* The dma device object is expected to align cacheline, but
> > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > +	 * Therefore, extra memory is applied for realignment.
> > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > +	 * version dependent on libc.
> > > > +	 */
> > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > +		RTE_CACHE_LINE_SIZE;
> > > > +	ptr = malloc(size);
> > > > +	if (ptr == NULL)
> > > >  		return -ENOMEM;
> > > > -	memset(rte_dma_devices, 0, size);
> > > > +	memset(ptr, 0, size);
> > > > +
> > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > >
> > > Why not using aligned_alloc()?
> > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > >
> > >
> > because it is version dependent on libc.
> 
> Which libc is required?
> 
In the NOTE section of the link you gave there is this quote:

This function is not supported in Microsoft C Runtime library because its implementation of std::free is unable to handle aligned allocations of any kind. Instead, MS CRT provides _aligned_malloc (to be freed with _aligned_free).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH v3] dmadev: fix structure alignment
  2024-03-21  8:30       ` Thomas Monjalon
  2024-03-21  8:57         ` Ma, WenwuX
@ 2024-03-21  9:18         ` Ma, WenwuX
  2024-03-21 10:06           ` Thomas Monjalon
  1 sibling, 1 reply; 22+ messages in thread
From: Ma, WenwuX @ 2024-03-21  9:18 UTC (permalink / raw)
  To: Thomas Monjalon, fengchengwen; +Cc: dev, Jiale, SongX, stable

Hi, Thomas

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, March 21, 2024 4:31 PM
> To: fengchengwen@huawei.com; Ma, WenwuX <wenwux.ma@intel.com>
> Cc: dev@dpdk.org; Jiale, SongX <songx.jiale@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH v3] dmadev: fix structure alignment
> 
> 21/03/2024 02:25, Ma, WenwuX:
> > Hi, Thomas
> >
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 20/03/2024 08:23, Wenwu Ma:
> > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > but the return value of malloc may not be aligned to the cache
> > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > cause a segmentation fault in clang-x86-platform.
> > > >
> > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > must aligned on a 16-byte boundary or a general-protection
> > > > exception (#GP) is generated.
> > > >
> > > > Therefore, either additional memory is applied for re-alignment,
> > > > or the rte_dma_dev object does not require cache line alignment.
> > > > The patch chooses the former option to fix the issue.
> > > >
> > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > [..]
> > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > -	rte_dma_devices = malloc(size);
> > > > -	if (rte_dma_devices == NULL)
> > > > +	/* The dma device object is expected to align cacheline, but
> > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > +	 * Therefore, extra memory is applied for realignment.
> > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > +	 * version dependent on libc.
> > > > +	 */
> > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > +		RTE_CACHE_LINE_SIZE;
> > > > +	ptr = malloc(size);
> > > > +	if (ptr == NULL)
> > > >  		return -ENOMEM;
> > > > -	memset(rte_dma_devices, 0, size);
> > > > +	memset(ptr, 0, size);
> > > > +
> > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > >
> > > Why not using aligned_alloc()?
> > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > >
> > >
> > because it is version dependent on libc.
> 
> Which libc is required?
> 

using the 'man aligned_alloc' command, we has the following description:

VERSIONS
       The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.

       The function aligned_alloc() was added to glibc in version 2.16.

       The function posix_memalign() is available since glibc 2.1.91.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-21  9:18         ` Ma, WenwuX
@ 2024-03-21 10:06           ` Thomas Monjalon
  2024-03-21 16:05             ` Tyler Retzlaff
  0 siblings, 1 reply; 22+ messages in thread
From: Thomas Monjalon @ 2024-03-21 10:06 UTC (permalink / raw)
  To: fengchengwen, Ma, WenwuX
  Cc: dev, Jiale, SongX, stable, Tyler Retzlaff, david.marchand,
	bruce.richardson

21/03/2024 10:18, Ma, WenwuX:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 21/03/2024 02:25, Ma, WenwuX:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 20/03/2024 08:23, Wenwu Ma:
> > > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > > but the return value of malloc may not be aligned to the cache
> > > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > > cause a segmentation fault in clang-x86-platform.
> > > > >
> > > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > > must aligned on a 16-byte boundary or a general-protection
> > > > > exception (#GP) is generated.
> > > > >
> > > > > Therefore, either additional memory is applied for re-alignment,
> > > > > or the rte_dma_dev object does not require cache line alignment.
> > > > > The patch chooses the former option to fix the issue.
> > > > >
> > > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > > [..]
> > > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > > -	rte_dma_devices = malloc(size);
> > > > > -	if (rte_dma_devices == NULL)
> > > > > +	/* The dma device object is expected to align cacheline, but
> > > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > > +	 * Therefore, extra memory is applied for realignment.
> > > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > > +	 * version dependent on libc.
> > > > > +	 */
> > > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > > +		RTE_CACHE_LINE_SIZE;
> > > > > +	ptr = malloc(size);
> > > > > +	if (ptr == NULL)
> > > > >  		return -ENOMEM;
> > > > > -	memset(rte_dma_devices, 0, size);
> > > > > +	memset(ptr, 0, size);
> > > > > +
> > > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > > >
> > > > Why not using aligned_alloc()?
> > > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > > >
> > > >
> > > because it is version dependent on libc.
> > 
> > Which libc is required?
> > 
> 
> using the 'man aligned_alloc' command, we has the following description:
> 
> VERSIONS
>        The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.
> 
>        The function aligned_alloc() was added to glibc in version 2.16.

released in 2012-06-30

>        The function posix_memalign() is available since glibc 2.1.91.

I think we could bump our libc requirements for these functions.

I understand there is also a concern on Windows,
but an alternative exists there.
We may need a wrapper like "rte_alloc_align".



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-21 10:06           ` Thomas Monjalon
@ 2024-03-21 16:05             ` Tyler Retzlaff
  0 siblings, 0 replies; 22+ messages in thread
From: Tyler Retzlaff @ 2024-03-21 16:05 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: fengchengwen, Ma, WenwuX, dev, Jiale, SongX, stable,
	Tyler Retzlaff, david.marchand, bruce.richardson

On Thu, Mar 21, 2024 at 11:06:34AM +0100, Thomas Monjalon wrote:
> 21/03/2024 10:18, Ma, WenwuX:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 21/03/2024 02:25, Ma, WenwuX:
> > > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > > 20/03/2024 08:23, Wenwu Ma:
> > > > > > The structure rte_dma_dev needs to be aligned to the cache line,
> > > > > > but the return value of malloc may not be aligned to the cache
> > > > > > line. When we use memset to clear the rte_dma_dev object, it may
> > > > > > cause a segmentation fault in clang-x86-platform.
> > > > > >
> > > > > > This is because clang uses the "vmovaps" assembly instruction for
> > > > > > memset, which requires that the operands (rte_dma_dev objects)
> > > > > > must aligned on a 16-byte boundary or a general-protection
> > > > > > exception (#GP) is generated.
> > > > > >
> > > > > > Therefore, either additional memory is applied for re-alignment,
> > > > > > or the rte_dma_dev object does not require cache line alignment.
> > > > > > The patch chooses the former option to fix the issue.
> > > > > >
> > > > > > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > > > > > Cc: stable@dpdk.org
> > > > > >
> > > > > > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>
> > > > > [..]
> > > > > > -	size = dma_devices_max * sizeof(struct rte_dma_dev);
> > > > > > -	rte_dma_devices = malloc(size);
> > > > > > -	if (rte_dma_devices == NULL)
> > > > > > +	/* The dma device object is expected to align cacheline, but
> > > > > > +	 * the return value of malloc may not be aligned to the cache line.
> > > > > > +	 * Therefore, extra memory is applied for realignment.
> > > > > > +	 * note: We do not call posix_memalign/aligned_alloc because it is
> > > > > > +	 * version dependent on libc.
> > > > > > +	 */
> > > > > > +	size = dma_devices_max * sizeof(struct rte_dma_dev) +
> > > > > > +		RTE_CACHE_LINE_SIZE;
> > > > > > +	ptr = malloc(size);
> > > > > > +	if (ptr == NULL)
> > > > > >  		return -ENOMEM;
> > > > > > -	memset(rte_dma_devices, 0, size);
> > > > > > +	memset(ptr, 0, size);
> > > > > > +
> > > > > > +	rte_dma_devices = RTE_PTR_ALIGN(ptr, RTE_CACHE_LINE_SIZE);
> > > > >
> > > > > Why not using aligned_alloc()?
> > > > > https://en.cppreference.com/w/c/memory/aligned_alloc
> > > > >
> > > > >
> > > > because it is version dependent on libc.
> > > 
> > > Which libc is required?
> > > 
> > 
> > using the 'man aligned_alloc' command, we has the following description:
> > 
> > VERSIONS
> >        The functions memalign(), valloc(), and pvalloc() have been available in all Linux libc libraries.
> > 
> >        The function aligned_alloc() was added to glibc in version 2.16.
> 
> released in 2012-06-30

If we are using C11 we probably already implicitly depend on the glibc
that supports aligned_alloc (introduced in C11).

> 
> >        The function posix_memalign() is available since glibc 2.1.91.
> 
> I think we could bump our libc requirements for these functions.
> 
> I understand there is also a concern on Windows,
> but an alternative exists there.
> We may need a wrapper like "rte_alloc_align".

Yes, I'm afraid we would probably have to introduce
rte_aligned_alloc/rte_aligned_free. On Windows this would simply
forward to _aligned_alloc() and _aligned_free() respectively.

ty

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] dmadev: fix structure alignment
  2024-03-20  9:31   ` fengchengwen
@ 2024-06-27 12:46     ` Thomas Monjalon
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Monjalon @ 2024-06-27 12:46 UTC (permalink / raw)
  To: Wenwu Ma, fengchengwen, Tyler Retzlaff
  Cc: dev, stable, songx.jiale, david.marchand

20/03/2024 10:31, fengchengwen:
> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>
> 
> On 2024/3/20 15:23, Wenwu Ma wrote:
> > The structure rte_dma_dev needs to be aligned to the cache line, but
> > the return value of malloc may not be aligned to the cache line. When
> > we use memset to clear the rte_dma_dev object, it may cause a segmentation
> > fault in clang-x86-platform.
> > 
> > This is because clang uses the "vmovaps" assembly instruction for
> > memset, which requires that the operands (rte_dma_dev objects) must
> > aligned on a 16-byte boundary or a general-protection exception (#GP)
> > is generated.
> > 
> > Therefore, either additional memory is applied for re-alignment, or the
> > rte_dma_dev object does not require cache line alignment. The patch
> > chooses the former option to fix the issue.
> > 
> > Fixes: b36970f2e13e ("dmadev: introduce DMA device library")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Wenwu Ma <wenwux.ma@intel.com>

I keep thinking we should have a wrapper for aligned allocations,
with Windows support and fallback to malloc + RTE_PTR_ALIGN.

Probably not a reason to block this patch, so applied, thanks.



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-06-27 12:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-08  5:37 [PATCH] dmadev: fix structure alignment Wenwu Ma
2024-03-08  7:01 ` fengchengwen
2024-03-15  1:43 ` [PATCH v2] " Wenwu Ma
2024-03-15  6:02   ` Tyler Retzlaff
2024-03-15  6:06   ` fengchengwen
2024-03-15  6:25     ` Ma, WenwuX
2024-03-15  7:44       ` Ma, WenwuX
2024-03-15  8:31         ` fengchengwen
2024-03-15  9:27           ` Ma, WenwuX
2024-03-20  4:11             ` fengchengwen
2024-03-20  7:34               ` Ma, WenwuX
2024-03-19  9:48   ` Jiale, SongX
2024-03-20  7:23 ` [PATCH v3] " Wenwu Ma
2024-03-20  9:31   ` fengchengwen
2024-06-27 12:46     ` Thomas Monjalon
2024-03-20 11:37   ` Thomas Monjalon
2024-03-21  1:25     ` Ma, WenwuX
2024-03-21  8:30       ` Thomas Monjalon
2024-03-21  8:57         ` Ma, WenwuX
2024-03-21  9:18         ` Ma, WenwuX
2024-03-21 10:06           ` Thomas Monjalon
2024-03-21 16:05             ` Tyler Retzlaff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).