* RE: [PATCH] net/af_xdp: fix umem map size for zero copy
2024-04-26 0:51 [PATCH] net/af_xdp: fix umem map size for zero copy Frank Du
@ 2024-04-26 10:43 ` Loftus, Ciara
2024-04-28 0:46 ` Du, Frank
2024-05-11 5:26 ` [PATCH v2] " Frank Du
` (3 subsequent siblings)
4 siblings, 1 reply; 29+ messages in thread
From: Loftus, Ciara @ 2024-04-26 10:43 UTC (permalink / raw)
To: Du, Frank; +Cc: dev
> Subject: [PATCH] net/af_xdp: fix umem map size for zero copy
>
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
>
> Signed-off-by: Frank Du <frank.du@intel.com>
> ---
> drivers/net/af_xdp/rte_eth_af_xdp.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..cb95d17d13 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,7 +1039,7 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
> }
>
> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t
> *align, size_t *len)
> {
> struct rte_mempool_memhdr *memhdr;
> uintptr_t memhdr_addr, aligned_addr;
> @@ -1048,6 +1048,7 @@ static inline uintptr_t get_base_addr(struct
> rte_mempool *mp, uint64_t *align)
> memhdr_addr = (uintptr_t)memhdr->addr;
> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> *align = memhdr_addr - aligned_addr;
> + *len = memhdr->len;
>
> return aligned_addr;
> }
> @@ -1125,6 +1126,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> void *base_addr = NULL;
> struct rte_mempool *mb_pool = rxq->mb_pool;
> uint64_t umem_size, align = 0;
> + size_t len = 0;
>
> if (internals->shared_umem) {
> if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1158,8 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> }
>
> umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> &len);
> + umem_size = (uint64_t)len + align;
len is set to the length of the first memhdr of the mempool. There may be many other memhdrs in the mempool. So I don't think this is the correct value to use for calculating the entire umem size.
>
> ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> &rxq->fq, &rxq->cq, &usr_config);
> --
> 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] net/af_xdp: fix umem map size for zero copy
2024-04-26 10:43 ` Loftus, Ciara
@ 2024-04-28 0:46 ` Du, Frank
2024-04-30 9:22 ` Loftus, Ciara
0 siblings, 1 reply; 29+ messages in thread
From: Du, Frank @ 2024-04-28 0:46 UTC (permalink / raw)
To: Loftus, Ciara; +Cc: dev
> -----Original Message-----
> From: Loftus, Ciara <ciara.loftus@intel.com>
> Sent: Friday, April 26, 2024 6:44 PM
> To: Du, Frank <frank.du@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH] net/af_xdp: fix umem map size for zero copy
>
> > Subject: [PATCH] net/af_xdp: fix umem map size for zero copy
> >
> > The current calculation assumes that the mbufs are contiguous.
> > However, this assumption is incorrect when the memory spans across a huge
> page.
> > Correct to directly read the size from the mempool memory chunks.
> >
> > Signed-off-by: Frank Du <frank.du@intel.com>
> > ---
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 268a130c49..cb95d17d13 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -1039,7 +1039,7 @@ eth_link_update(struct rte_eth_dev *dev
> > __rte_unused, }
> >
> > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > uint64_t
> > *align)
> > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > +uint64_t
> > *align, size_t *len)
> > {
> > struct rte_mempool_memhdr *memhdr;
> > uintptr_t memhdr_addr, aligned_addr; @@ -1048,6 +1048,7 @@ static
> > inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> > *align)
> > memhdr_addr = (uintptr_t)memhdr->addr;
> > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > *align = memhdr_addr - aligned_addr;
> > + *len = memhdr->len;
> >
> > return aligned_addr;
> > }
> > @@ -1125,6 +1126,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > void *base_addr = NULL;
> > struct rte_mempool *mb_pool = rxq->mb_pool;
> > uint64_t umem_size, align = 0;
> > + size_t len = 0;
> >
> > if (internals->shared_umem) {
> > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > -1156,10 +1158,8 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > }
> >
> > umem->mb_pool = mb_pool;
> > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > - umem_size = (uint64_t)mb_pool->populated_size *
> > - (uint64_t)usr_config.frame_size +
> > - align;
> > + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> > &len);
> > + umem_size = (uint64_t)len + align;
>
> len is set to the length of the first memhdr of the mempool. There may be many
> other memhdrs in the mempool. So I don't think this is the correct value to use for
> calculating the entire umem size.
Current each xdp rx ring is bonded to one single umem region, it can't reuse the memory
if there are multiple memhdrs in the mempool. How about adding a check on the number
of the memory chunks to only allow one single memhdr mempool can be used here?
>
> >
> > ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> > &rxq->fq, &rxq->cq, &usr_config);
> > --
> > 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH] net/af_xdp: fix umem map size for zero copy
2024-04-28 0:46 ` Du, Frank
@ 2024-04-30 9:22 ` Loftus, Ciara
0 siblings, 0 replies; 29+ messages in thread
From: Loftus, Ciara @ 2024-04-30 9:22 UTC (permalink / raw)
To: Du, Frank; +Cc: dev
> >
> > > Subject: [PATCH] net/af_xdp: fix umem map size for zero copy
> > >
> > > The current calculation assumes that the mbufs are contiguous.
> > > However, this assumption is incorrect when the memory spans across a
> huge
> > page.
> > > Correct to directly read the size from the mempool memory chunks.
> > >
> > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > ---
> > > drivers/net/af_xdp/rte_eth_af_xdp.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > index 268a130c49..cb95d17d13 100644
> > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > @@ -1039,7 +1039,7 @@ eth_link_update(struct rte_eth_dev *dev
> > > __rte_unused, }
> > >
> > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > uint64_t
> > > *align)
> > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > +uint64_t
> > > *align, size_t *len)
> > > {
> > > struct rte_mempool_memhdr *memhdr;
> > > uintptr_t memhdr_addr, aligned_addr; @@ -1048,6 +1048,7 @@
> static
> > > inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> > > *align)
> > > memhdr_addr = (uintptr_t)memhdr->addr;
> > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > *align = memhdr_addr - aligned_addr;
> > > + *len = memhdr->len;
> > >
> > > return aligned_addr;
> > > }
> > > @@ -1125,6 +1126,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > > pmd_internals *internals,
> > > void *base_addr = NULL;
> > > struct rte_mempool *mb_pool = rxq->mb_pool;
> > > uint64_t umem_size, align = 0;
> > > + size_t len = 0;
> > >
> > > if (internals->shared_umem) {
> > > if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@
> > > -1156,10 +1158,8 @@ xsk_umem_info *xdp_umem_configure(struct
> > > pmd_internals *internals,
> > > }
> > >
> > > umem->mb_pool = mb_pool;
> > > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > > - umem_size = (uint64_t)mb_pool->populated_size *
> > > - (uint64_t)usr_config.frame_size +
> > > - align;
> > > + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> > > &len);
> > > + umem_size = (uint64_t)len + align;
> >
> > len is set to the length of the first memhdr of the mempool. There may be
> many
> > other memhdrs in the mempool. So I don't think this is the correct value to
> use for
> > calculating the entire umem size.
>
> Current each xdp rx ring is bonded to one single umem region, it can't reuse
> the memory
> if there are multiple memhdrs in the mempool. How about adding a check on
> the number
> of the memory chunks to only allow one single memhdr mempool can be used
> here?
The UMEM needs to be a region of virtual contiguous memory. I think this can still be the case, even if the mempool has multiple memhdrs.
If we detect >1 memhdrs perhaps we need to verify that the RTE_MEMPOOL_F_NO_IOVA_CONTIG flag is not set which I think would mean that the mempool may not be virtually contiguous.
>
> >
> > >
> > > ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > > &rxq->fq, &rxq->cq, &usr_config);
> > > --
> > > 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-04-26 0:51 [PATCH] net/af_xdp: fix umem map size for zero copy Frank Du
2024-04-26 10:43 ` Loftus, Ciara
@ 2024-05-11 5:26 ` Frank Du
2024-05-17 13:19 ` Loftus, Ciara
` (2 more replies)
2024-05-23 6:53 ` [PATCH v3] " Frank Du
` (2 subsequent siblings)
4 siblings, 3 replies; 29+ messages in thread
From: Frank Du @ 2024-05-11 5:26 UTC (permalink / raw)
To: dev; +Cc: ciara.loftus
The current calculation assumes that the mbufs are contiguous. However,
this assumption is incorrect when the memory spans across a huge page.
Correct to directly read the size from the mempool memory chunks.
Signed-off-by: Frank Du <frank.du@intel.com>
---
v2:
* Add virtual contiguous detect for for multiple memhdrs.
---
drivers/net/af_xdp/rte_eth_af_xdp.c | 34 ++++++++++++++++++++++++-----
1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 268a130c49..7456108d6d 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
}
#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
+static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t *align, size_t *len)
{
- struct rte_mempool_memhdr *memhdr;
+ struct rte_mempool_memhdr *memhdr, *next;
uintptr_t memhdr_addr, aligned_addr;
+ size_t memhdr_len = 0;
+ /* get the mempool base addr and align */
memhdr = STAILQ_FIRST(&mp->mem_list);
memhdr_addr = (uintptr_t)memhdr->addr;
aligned_addr = memhdr_addr & ~(getpagesize() - 1);
*align = memhdr_addr - aligned_addr;
+ memhdr_len += memhdr->len;
+
+ /* check if virtual contiguous memory for multiple memhdrs */
+ next = STAILQ_NEXT(memhdr, next);
+ while (next != NULL) {
+ if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr->len) {
+ AF_XDP_LOG(ERR, "memory chunks not virtual contiguous, "
+ "next: %p, cur: %p(len: %" PRId64 " )\n",
+ next->addr, memhdr->addr, memhdr->len);
+ return 0;
+ }
+ /* virtual contiguous */
+ memhdr = next;
+ memhdr_len += memhdr->len;
+ next = STAILQ_NEXT(memhdr, next);
+ }
+ *len = memhdr_len;
return aligned_addr;
}
@@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
void *base_addr = NULL;
struct rte_mempool *mb_pool = rxq->mb_pool;
uint64_t umem_size, align = 0;
+ size_t len = 0;
if (internals->shared_umem) {
if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
@@ -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
umem->mb_pool = mb_pool;
- base_addr = (void *)get_base_addr(mb_pool, &align);
- umem_size = (uint64_t)mb_pool->populated_size *
- (uint64_t)usr_config.frame_size +
- align;
+ base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
+ if (!base_addr) {
+ AF_XDP_LOG(ERR, "Failed to parse memhdr info from pool\n");
+ goto err;
+ }
+ umem_size = (uint64_t)len + align;
ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
&rxq->fq, &rxq->cq, &usr_config);
--
2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-11 5:26 ` [PATCH v2] " Frank Du
@ 2024-05-17 13:19 ` Loftus, Ciara
2024-05-20 1:28 ` Du, Frank
2024-05-21 15:43 ` Ferruh Yigit
2024-05-21 17:57 ` Ferruh Yigit
2 siblings, 1 reply; 29+ messages in thread
From: Loftus, Ciara @ 2024-05-17 13:19 UTC (permalink / raw)
To: Du, Frank; +Cc: dev
>
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
>
> Signed-off-by: Frank Du <frank.du@intel.com>
Hi Frank,
Thanks for the patch.
Before your patch the umem_size was calculated using mb_pool->populated_size * rte_mempool_calc_obj_size(mb_pool->elt_size, ..)
With your patch we sum up the lens of all memhdrs in the mempool.
When debugging I see the new calculation can yield a larger value, but the new logic looks good and more thorough to me so I'm happy to opt with the new approach.
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
Thanks,
Ciara
>
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs.
> ---
> drivers/net/af_xdp/rte_eth_af_xdp.c | 34 ++++++++++++++++++++++++----
> -
> 1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..7456108d6d 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> __rte_unused,
> }
>
> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t
> *align, size_t *len)
> {
> - struct rte_mempool_memhdr *memhdr;
> + struct rte_mempool_memhdr *memhdr, *next;
> uintptr_t memhdr_addr, aligned_addr;
> + size_t memhdr_len = 0;
>
> + /* get the mempool base addr and align */
> memhdr = STAILQ_FIRST(&mp->mem_list);
> memhdr_addr = (uintptr_t)memhdr->addr;
> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> *align = memhdr_addr - aligned_addr;
> + memhdr_len += memhdr->len;
> +
> + /* check if virtual contiguous memory for multiple memhdrs */
> + next = STAILQ_NEXT(memhdr, next);
> + while (next != NULL) {
> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> memhdr->len) {
> + AF_XDP_LOG(ERR, "memory chunks not virtual
> contiguous, "
> + "next: %p, cur: %p(len: %" PRId64 "
> )\n",
> + next->addr, memhdr->addr, memhdr-
> >len);
> + return 0;
> + }
> + /* virtual contiguous */
> + memhdr = next;
> + memhdr_len += memhdr->len;
> + next = STAILQ_NEXT(memhdr, next);
> + }
>
> + *len = memhdr_len;
> return aligned_addr;
> }
>
> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> void *base_addr = NULL;
> struct rte_mempool *mb_pool = rxq->mb_pool;
> uint64_t umem_size, align = 0;
> + size_t len = 0;
>
> if (internals->shared_umem) {
> if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> }
>
> umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> &len);
> + if (!base_addr) {
> + AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> pool\n");
> + goto err;
> + }
> + umem_size = (uint64_t)len + align;
>
> ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> &rxq->fq, &rxq->cq, &usr_config);
> --
> 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-17 13:19 ` Loftus, Ciara
@ 2024-05-20 1:28 ` Du, Frank
0 siblings, 0 replies; 29+ messages in thread
From: Du, Frank @ 2024-05-20 1:28 UTC (permalink / raw)
To: Loftus, Ciara; +Cc: dev
> -----Original Message-----
> From: Loftus, Ciara <ciara.loftus@intel.com>
> Sent: Friday, May 17, 2024 9:20 PM
> To: Du, Frank <frank.du@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>
> >
> > The current calculation assumes that the mbufs are contiguous.
> > However, this assumption is incorrect when the memory spans across a huge
> page.
> > Correct to directly read the size from the mempool memory chunks.
> >
> > Signed-off-by: Frank Du <frank.du@intel.com>
>
> Hi Frank,
>
> Thanks for the patch.
>
> Before your patch the umem_size was calculated using mb_pool->populated_size
> * rte_mempool_calc_obj_size(mb_pool->elt_size, ..) With your patch we sum up
> the lens of all memhdrs in the mempool.
>
> When debugging I see the new calculation can yield a larger value, but the new
> logic looks good and more thorough to me so I'm happy to opt with the new
> approach.
Hi Ciara,
Thanks for the review. The reason for a larger value is that, the pool contain some necessary space hole to ensure one single mbuf does not span two virtual huge page, so that the PA of each mbuf is secure for the hardware operation.
>
> Acked-by: Ciara Loftus <ciara.loftus@intel.com>
>
> Thanks,
> Ciara
>
> >
> > ---
> > v2:
> > * Add virtual contiguous detect for for multiple memhdrs.
> > ---
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 34 ++++++++++++++++++++++++----
> > -
> > 1 file changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 268a130c49..7456108d6d 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > __rte_unused, }
> >
> > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > uint64_t
> > *align)
> > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > +uint64_t
> > *align, size_t *len)
> > {
> > - struct rte_mempool_memhdr *memhdr;
> > + struct rte_mempool_memhdr *memhdr, *next;
> > uintptr_t memhdr_addr, aligned_addr;
> > + size_t memhdr_len = 0;
> >
> > + /* get the mempool base addr and align */
> > memhdr = STAILQ_FIRST(&mp->mem_list);
> > memhdr_addr = (uintptr_t)memhdr->addr;
> > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > *align = memhdr_addr - aligned_addr;
> > + memhdr_len += memhdr->len;
> > +
> > + /* check if virtual contiguous memory for multiple memhdrs */
> > + next = STAILQ_NEXT(memhdr, next);
> > + while (next != NULL) {
> > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> > memhdr->len) {
> > + AF_XDP_LOG(ERR, "memory chunks not virtual
> > contiguous, "
> > + "next: %p, cur: %p(len: %" PRId64 "
> > )\n",
> > + next->addr, memhdr->addr, memhdr-
> > >len);
> > + return 0;
> > + }
> > + /* virtual contiguous */
> > + memhdr = next;
> > + memhdr_len += memhdr->len;
> > + next = STAILQ_NEXT(memhdr, next);
> > + }
> >
> > + *len = memhdr_len;
> > return aligned_addr;
> > }
> >
> > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > void *base_addr = NULL;
> > struct rte_mempool *mb_pool = rxq->mb_pool;
> > uint64_t umem_size, align = 0;
> > + size_t len = 0;
> >
> > if (internals->shared_umem) {
> > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > }
> >
> > umem->mb_pool = mb_pool;
> > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > - umem_size = (uint64_t)mb_pool->populated_size *
> > - (uint64_t)usr_config.frame_size +
> > - align;
> > + base_addr = (void *)get_memhdr_info(mb_pool, &align,
> > &len);
> > + if (!base_addr) {
> > + AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> > pool\n");
> > + goto err;
> > + }
> > + umem_size = (uint64_t)len + align;
> >
> > ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> > &rxq->fq, &rxq->cq, &usr_config);
> > --
> > 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-11 5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19 ` Loftus, Ciara
@ 2024-05-21 15:43 ` Ferruh Yigit
2024-05-21 17:57 ` Ferruh Yigit
2 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-21 15:43 UTC (permalink / raw)
To: Frank Du, dev; +Cc: ciara.loftus
On 5/11/2024 6:26 AM, Frank Du wrote:
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
>
> Signed-off-by: Frank Du <frank.du@intel.com>
>
Hi Frank,
As this is a fix patch, can you please provide a fixes tag [1], this
helps us understanding the logic in the original code and enables us
figuring out which stable tree to backport this fix.
[1]
https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-body
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-11 5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19 ` Loftus, Ciara
2024-05-21 15:43 ` Ferruh Yigit
@ 2024-05-21 17:57 ` Ferruh Yigit
2024-05-22 1:25 ` Du, Frank
2 siblings, 1 reply; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-21 17:57 UTC (permalink / raw)
To: Frank Du, dev, Andrew Rybchenko, Morten Brørup
Cc: ciara.loftus, Burakov, Anatoly
On 5/11/2024 6:26 AM, Frank Du wrote:
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the memory spans across a huge page.
> Correct to directly read the size from the mempool memory chunks.
>
> Signed-off-by: Frank Du <frank.du@intel.com>
>
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs.
> ---
> drivers/net/af_xdp/rte_eth_af_xdp.c | 34 ++++++++++++++++++++++++-----
> 1 file changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 268a130c49..7456108d6d 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
> }
>
> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t *align, size_t *len)
> {
> - struct rte_mempool_memhdr *memhdr;
> + struct rte_mempool_memhdr *memhdr, *next;
> uintptr_t memhdr_addr, aligned_addr;
> + size_t memhdr_len = 0;
>
> + /* get the mempool base addr and align */
> memhdr = STAILQ_FIRST(&mp->mem_list);
> memhdr_addr = (uintptr_t)memhdr->addr;
> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> *align = memhdr_addr - aligned_addr;
>
I am aware this is not part of this patch, but as note, can't we use
'RTE_ALIGN_FLOOR' to calculate aligned address.
> + memhdr_len += memhdr->len;
> +
> + /* check if virtual contiguous memory for multiple memhdrs */
> + next = STAILQ_NEXT(memhdr, next);
> + while (next != NULL) {
> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr->len) {
> + AF_XDP_LOG(ERR, "memory chunks not virtual contiguous, "
> + "next: %p, cur: %p(len: %" PRId64 " )\n",
> + next->addr, memhdr->addr, memhdr->len);
> + return 0;
> + }
>
Isn't there a mempool flag that can help us figure out mempool is not
IOVA contiguous? Isn't it sufficient on its own?
> + /* virtual contiguous */
> + memhdr = next;
> + memhdr_len += memhdr->len;
> + next = STAILQ_NEXT(memhdr, next);
> + }
>
> + *len = memhdr_len;
> return aligned_addr;
> }
>
This function goes too much details of the mempool object, and any
change in mempool details has potential to break this code.
@Andrew, @Morten, do you think does it make sense to have
'rte_mempool_info_get()' kind of function, that provides at least
address and length of the mempool, and used here?
This helps to hide internal details and complexity of the mempool for users.
>
> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
> void *base_addr = NULL;
> struct rte_mempool *mb_pool = rxq->mb_pool;
> uint64_t umem_size, align = 0;
> + size_t len = 0;
>
> if (internals->shared_umem) {
> if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
> }
>
> umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>
Is this calculation correct if mempool is not already aligned to page size?
Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
> + if (!base_addr) {
> + AF_XDP_LOG(ERR, "Failed to parse memhdr info from pool\n");
>
Log message is not accurate, it is not parsing memhdr info failed, but
mempool was not satisfying expectation.
> + goto err;
> + }
> + umem_size = (uint64_t)len + align;
>
> ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
> &rxq->fq, &rxq->cq, &usr_config);
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-21 17:57 ` Ferruh Yigit
@ 2024-05-22 1:25 ` Du, Frank
2024-05-22 7:26 ` Morten Brørup
2024-05-22 10:00 ` Ferruh Yigit
0 siblings, 2 replies; 29+ messages in thread
From: Du, Frank @ 2024-05-22 1:25 UTC (permalink / raw)
To: Ferruh Yigit, dev, Andrew Rybchenko, Morten Brørup
Cc: Loftus, Ciara, Burakov, Anatoly
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, May 22, 2024 1:58 AM
> To: Du, Frank <frank.du@intel.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Morten Brørup
> <mb@smartsharesystems.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>
> On 5/11/2024 6:26 AM, Frank Du wrote:
> > The current calculation assumes that the mbufs are contiguous.
> > However, this assumption is incorrect when the memory spans across a huge
> page.
> > Correct to directly read the size from the mempool memory chunks.
> >
> > Signed-off-by: Frank Du <frank.du@intel.com>
> >
> > ---
> > v2:
> > * Add virtual contiguous detect for for multiple memhdrs.
> > ---
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > ++++++++++++++++++++++++-----
> > 1 file changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 268a130c49..7456108d6d 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > __rte_unused, }
> >
> > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > uint64_t *align)
> > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > +uint64_t *align, size_t *len)
> > {
> > - struct rte_mempool_memhdr *memhdr;
> > + struct rte_mempool_memhdr *memhdr, *next;
> > uintptr_t memhdr_addr, aligned_addr;
> > + size_t memhdr_len = 0;
> >
> > + /* get the mempool base addr and align */
> > memhdr = STAILQ_FIRST(&mp->mem_list);
> > memhdr_addr = (uintptr_t)memhdr->addr;
> > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > *align = memhdr_addr - aligned_addr;
> >
>
> I am aware this is not part of this patch, but as note, can't we use
> 'RTE_ALIGN_FLOOR' to calculate aligned address.
Sure, will use RTE_ALIGN_FLOOR in next version.
>
>
> > + memhdr_len += memhdr->len;
> > +
> > + /* check if virtual contiguous memory for multiple memhdrs */
> > + next = STAILQ_NEXT(memhdr, next);
> > + while (next != NULL) {
> > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
> >len) {
> > + AF_XDP_LOG(ERR, "memory chunks not virtual
> contiguous, "
> > + "next: %p, cur: %p(len: %" PRId64
> " )\n",
> > + next->addr, memhdr->addr, memhdr-
> >len);
> > + return 0;
> > + }
> >
>
> Isn't there a mempool flag that can help us figure out mempool is not IOVA
> contiguous? Isn't it sufficient on its own?
Indeed, what we need to ascertain is whether it's contiguous in CPU virtual space, not IOVA. I haven't come across a flag specifically for CPU virtual contiguity. The major limitation in XDP is XSK UMEM only supports registering a single contiguous virtual memory area.
>
>
> > + /* virtual contiguous */
> > + memhdr = next;
> > + memhdr_len += memhdr->len;
> > + next = STAILQ_NEXT(memhdr, next);
> > + }
> >
> > + *len = memhdr_len;
> > return aligned_addr;
> > }
> >
>
> This function goes too much details of the mempool object, and any change in
> mempool details has potential to break this code.
>
> @Andrew, @Morten, do you think does it make sense to have
> 'rte_mempool_info_get()' kind of function, that provides at least address and
> length of the mempool, and used here?
>
> This helps to hide internal details and complexity of the mempool for users.
>
>
> >
> > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> > void *base_addr = NULL;
> > struct rte_mempool *mb_pool = rxq->mb_pool;
> > uint64_t umem_size, align = 0;
> > + size_t len = 0;
> >
> > if (internals->shared_umem) {
> > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> > }
> >
> > umem->mb_pool = mb_pool;
> > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > - umem_size = (uint64_t)mb_pool->populated_size *
> > - (uint64_t)usr_config.frame_size +
> > - align;
> > + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> >
>
> Is this calculation correct if mempool is not already aligned to page size?
>
> Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
> returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
>
> Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
Yes, since the XSK UMEM memory area requires page alignment. However, no need to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the mbuf data area. We don’t have any chance to access the invalid range [0x000a1000: 0x000a1080] here.
>
>
> > + if (!base_addr) {
> > + AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> pool\n");
> >
>
> Log message is not accurate, it is not parsing memhdr info failed, but mempool
> was not satisfying expectation.
Thanks, will correct it in next version.
>
> > + goto err;
> > + }
> > + umem_size = (uint64_t)len + align;
> >
> > ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> > &rxq->fq, &rxq->cq, &usr_config);
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 1:25 ` Du, Frank
@ 2024-05-22 7:26 ` Morten Brørup
2024-05-22 10:20 ` Ferruh Yigit
2024-05-23 6:56 ` Du, Frank
2024-05-22 10:00 ` Ferruh Yigit
1 sibling, 2 replies; 29+ messages in thread
From: Morten Brørup @ 2024-05-22 7:26 UTC (permalink / raw)
To: Du, Frank, Ferruh Yigit, dev, Andrew Rybchenko, Burakov, Anatoly
Cc: Loftus, Ciara
> From: Du, Frank [mailto:frank.du@intel.com]
> Sent: Wednesday, 22 May 2024 03.25
>
> > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > Sent: Wednesday, May 22, 2024 1:58 AM
> >
> > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > The current calculation assumes that the mbufs are contiguous.
> > > However, this assumption is incorrect when the memory spans across a huge
> > page.
> > > Correct to directly read the size from the mempool memory chunks.
> > >
> > > Signed-off-by: Frank Du <frank.du@intel.com>
> > >
> > > ---
> > > v2:
> > > * Add virtual contiguous detect for for multiple memhdrs.
> > > ---
> > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > ++++++++++++++++++++++++-----
> > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > index 268a130c49..7456108d6d 100644
> > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > > __rte_unused, }
> > >
> > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > uint64_t *align)
> > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > +uint64_t *align, size_t *len)
> > > {
> > > - struct rte_mempool_memhdr *memhdr;
> > > + struct rte_mempool_memhdr *memhdr, *next;
> > > uintptr_t memhdr_addr, aligned_addr;
> > > + size_t memhdr_len = 0;
> > >
> > > + /* get the mempool base addr and align */
> > > memhdr = STAILQ_FIRST(&mp->mem_list);
> > > memhdr_addr = (uintptr_t)memhdr->addr;
This is not a new bug; but if the mempool is not populated, memhdr is NULL here.
> > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > *align = memhdr_addr - aligned_addr;
> > >
> >
> > I am aware this is not part of this patch, but as note, can't we use
> > 'RTE_ALIGN_FLOOR' to calculate aligned address.
>
> Sure, will use RTE_ALIGN_FLOOR in next version.
>
> >
> >
> > > + memhdr_len += memhdr->len;
> > > +
> > > + /* check if virtual contiguous memory for multiple memhdrs */
> > > + next = STAILQ_NEXT(memhdr, next);
> > > + while (next != NULL) {
> > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
> > >len) {
> > > + AF_XDP_LOG(ERR, "memory chunks not virtual
> > contiguous, "
> > > + "next: %p, cur: %p(len: %" PRId64
> > " )\n",
> > > + next->addr, memhdr->addr, memhdr-
> > >len);
> > > + return 0;
> > > + }
> > >
> >
> > Isn't there a mempool flag that can help us figure out mempool is not IOVA
> > contiguous? Isn't it sufficient on its own?
>
> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
> space, not IOVA. I haven't come across a flag specifically for CPU virtual
> contiguity. The major limitation in XDP is XSK UMEM only supports registering
> a single contiguous virtual memory area.
I would assume that the EAL memory manager merges free memory into contiguous chunks whenever possible.
@Anatoly, please confirm?
If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is; there is no need to iterate through the memhdr list.
>
> >
> >
> > > + /* virtual contiguous */
> > > + memhdr = next;
> > > + memhdr_len += memhdr->len;
> > > + next = STAILQ_NEXT(memhdr, next);
> > > + }
> > >
> > > + *len = memhdr_len;
> > > return aligned_addr;
> > > }
> > >
> >
> > This function goes too much details of the mempool object, and any change in
> > mempool details has potential to break this code.
> >
> > @Andrew, @Morten, do you think does it make sense to have
> > 'rte_mempool_info_get()' kind of function, that provides at least address
> and
> > length of the mempool, and used here?
> >
> > This helps to hide internal details and complexity of the mempool for users.
I think all the relevant information is available as (public) fields directly in the rte_mempool.
I agree about hiding internal details. For discriminating between private and public information, I would prefer marking the "private" fields in the rte_mempool structure as such.
Optimally we need an rte_mempool_create() flag, specifying that the mempool objects must be allocated as one chunk of memory with contiguous virtual addresses when populating the mempool. As discussed in another thread [1], the proposed pointer compression library would also benefit from such a mempool flag.
[1] https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F45C@smartserver.smartshare.dk/
> >
> >
> > >
> > > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > > void *base_addr = NULL;
> > > struct rte_mempool *mb_pool = rxq->mb_pool;
> > > uint64_t umem_size, align = 0;
> > > + size_t len = 0;
> > >
> > > if (internals->shared_umem) {
> > > if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > pmd_internals *internals,
> > > }
> > >
> > > umem->mb_pool = mb_pool;
> > > - base_addr = (void *)get_base_addr(mb_pool, &align);
> > > - umem_size = (uint64_t)mb_pool->populated_size *
> > > - (uint64_t)usr_config.frame_size +
> > > - align;
> > > + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> > >
> >
> > Is this calculation correct if mempool is not already aligned to page size?
Please note: The mempool uses one memzone for the mempool structure itself. The objects in the mempool are stored in another memzone (or multiple other memzones, if necessary). I think you are talking about the alignment of the mempool object chunk, not of the mempool structure itself.
> >
> > Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
> > returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
> >
> > Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
>
> Yes, since the XSK UMEM memory area requires page alignment. However, no need
> to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the
> mbuf data area. We don’t have any chance to access the invalid range
> [0x000a1000: 0x000a1080] here.
>
> >
> >
> > > + if (!base_addr) {
> > > + AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> > pool\n");
> > >
> >
> > Log message is not accurate, it is not parsing memhdr info failed, but
> mempool
> > was not satisfying expectation.
Looking at get_memhdr_info() above, it could be either mempool or memhdr failing to parse.
>
> Thanks, will correct it in next version.
>
> >
> > > + goto err;
> > > + }
> > > + umem_size = (uint64_t)len + align;
> > >
> > > ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > > &rxq->fq, &rxq->cq, &usr_config);
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 7:26 ` Morten Brørup
@ 2024-05-22 10:20 ` Ferruh Yigit
2024-05-23 6:56 ` Du, Frank
1 sibling, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-22 10:20 UTC (permalink / raw)
To: Morten Brørup, Du, Frank, dev, Andrew Rybchenko, Burakov, Anatoly
Cc: Loftus, Ciara
On 5/22/2024 8:26 AM, Morten Brørup wrote:
>> From: Du, Frank [mailto:frank.du@intel.com]
>> Sent: Wednesday, 22 May 2024 03.25
>>
>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>> Sent: Wednesday, May 22, 2024 1:58 AM
>>>
>>> On 5/11/2024 6:26 AM, Frank Du wrote:
>>>> The current calculation assumes that the mbufs are contiguous.
>>>> However, this assumption is incorrect when the memory spans across a huge
>>> page.
>>>> Correct to directly read the size from the mempool memory chunks.
>>>>
>>>> Signed-off-by: Frank Du <frank.du@intel.com>
>>>>
>>>> ---
>>>> v2:
>>>> * Add virtual contiguous detect for for multiple memhdrs.
>>>> ---
>>>> drivers/net/af_xdp/rte_eth_af_xdp.c | 34
>>>> ++++++++++++++++++++++++-----
>>>> 1 file changed, 28 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> index 268a130c49..7456108d6d 100644
>>>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
>>>> __rte_unused, }
>>>>
>>>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>>>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
>>>> uint64_t *align)
>>>> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
>>>> +uint64_t *align, size_t *len)
>>>> {
>>>> - struct rte_mempool_memhdr *memhdr;
>>>> + struct rte_mempool_memhdr *memhdr, *next;
>>>> uintptr_t memhdr_addr, aligned_addr;
>>>> + size_t memhdr_len = 0;
>>>>
>>>> + /* get the mempool base addr and align */
>>>> memhdr = STAILQ_FIRST(&mp->mem_list);
>>>> memhdr_addr = (uintptr_t)memhdr->addr;
>
> This is not a new bug; but if the mempool is not populated, memhdr is NULL here.
>
>>>> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>>>> *align = memhdr_addr - aligned_addr;
>>>>
>>>
>>> I am aware this is not part of this patch, but as note, can't we use
>>> 'RTE_ALIGN_FLOOR' to calculate aligned address.
>>
>> Sure, will use RTE_ALIGN_FLOOR in next version.
>>
>>>
>>>
>>>> + memhdr_len += memhdr->len;
>>>> +
>>>> + /* check if virtual contiguous memory for multiple memhdrs */
>>>> + next = STAILQ_NEXT(memhdr, next);
>>>> + while (next != NULL) {
>>>> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
>>>> len) {
>>>> + AF_XDP_LOG(ERR, "memory chunks not virtual
>>> contiguous, "
>>>> + "next: %p, cur: %p(len: %" PRId64
>>> " )\n",
>>>> + next->addr, memhdr->addr, memhdr-
>>>> len);
>>>> + return 0;
>>>> + }
>>>>
>>>
>>> Isn't there a mempool flag that can help us figure out mempool is not IOVA
>>> contiguous? Isn't it sufficient on its own?
>>
>> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
>> space, not IOVA. I haven't come across a flag specifically for CPU virtual
>> contiguity. The major limitation in XDP is XSK UMEM only supports registering
>> a single contiguous virtual memory area.
>
> I would assume that the EAL memory manager merges free memory into contiguous chunks whenever possible.
> @Anatoly, please confirm?
>
> If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is; there is no need to iterate through the memhdr list.
>
>>
>>>
>>>
>>>> + /* virtual contiguous */
>>>> + memhdr = next;
>>>> + memhdr_len += memhdr->len;
>>>> + next = STAILQ_NEXT(memhdr, next);
>>>> + }
>>>>
>>>> + *len = memhdr_len;
>>>> return aligned_addr;
>>>> }
>>>>
>>>
>>> This function goes too much details of the mempool object, and any change in
>>> mempool details has potential to break this code.
>>>
>>> @Andrew, @Morten, do you think does it make sense to have
>>> 'rte_mempool_info_get()' kind of function, that provides at least address
>> and
>>> length of the mempool, and used here?
>>>
>>> This helps to hide internal details and complexity of the mempool for users.
>
> I think all the relevant information is available as (public) fields directly in the rte_mempool.
> I agree about hiding internal details. For discriminating between private and public information, I would prefer marking the "private" fields in the rte_mempool structure as such.
>
Marking fields 'private' may break some user code, so I don't know about
this, although I agree better thing to do programmatically.
But even required fields are public, having an _info() API may help
abstracting the implementation details and kept it withing the library
(instead of leaking to drivers like this).
But not sure if there are other potential users of such an API.
> Optimally we need an rte_mempool_create() flag, specifying that the mempool objects must be allocated as one chunk of memory with contiguous virtual addresses when populating the mempool. As discussed in another thread [1], the proposed pointer compression library would also benefit from such a mempool flag.
>
Ack, this also can be useful for this usecase.
> [1] https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F45C@smartserver.smartshare.dk/
>
>>>
>>>
>>>>
>>>> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
>>> pmd_internals *internals,
>>>> void *base_addr = NULL;
>>>> struct rte_mempool *mb_pool = rxq->mb_pool;
>>>> uint64_t umem_size, align = 0;
>>>> + size_t len = 0;
>>>>
>>>> if (internals->shared_umem) {
>>>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
>>>> -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
>>> pmd_internals *internals,
>>>> }
>>>>
>>>> umem->mb_pool = mb_pool;
>>>> - base_addr = (void *)get_base_addr(mb_pool, &align);
>>>> - umem_size = (uint64_t)mb_pool->populated_size *
>>>> - (uint64_t)usr_config.frame_size +
>>>> - align;
>>>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>>>>
>>>
>>> Is this calculation correct if mempool is not already aligned to page size?
>
> Please note: The mempool uses one memzone for the mempool structure itself. The objects in the mempool are stored in another memzone (or multiple other memzones, if necessary). I think you are talking about the alignment of the mempool object chunk, not of the mempool structure itself.
>
Yes, about the mempool object chunk.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 7:26 ` Morten Brørup
2024-05-22 10:20 ` Ferruh Yigit
@ 2024-05-23 6:56 ` Du, Frank
2024-05-23 7:40 ` Morten Brørup
1 sibling, 1 reply; 29+ messages in thread
From: Du, Frank @ 2024-05-23 6:56 UTC (permalink / raw)
To: Morten Brørup, Ferruh Yigit, dev, Andrew Rybchenko, Burakov,
Anatoly
Cc: Loftus, Ciara
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Wednesday, May 22, 2024 3:27 PM
> To: Du, Frank <frank.du@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Burakov,
> Anatoly <anatoly.burakov@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>
> > From: Du, Frank [mailto:frank.du@intel.com]
> > Sent: Wednesday, 22 May 2024 03.25
> >
> > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > Sent: Wednesday, May 22, 2024 1:58 AM
> > >
> > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > The current calculation assumes that the mbufs are contiguous.
> > > > However, this assumption is incorrect when the memory spans across
> > > > a huge
> > > page.
> > > > Correct to directly read the size from the mempool memory chunks.
> > > >
> > > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > >
> > > > ---
> > > > v2:
> > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > ---
> > > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > ++++++++++++++++++++++++-----
> > > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > index 268a130c49..7456108d6d 100644
> > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > > > __rte_unused, }
> > > >
> > > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > uint64_t *align)
> > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > > +uint64_t *align, size_t *len)
> > > > {
> > > > - struct rte_mempool_memhdr *memhdr;
> > > > + struct rte_mempool_memhdr *memhdr, *next;
> > > > uintptr_t memhdr_addr, aligned_addr;
> > > > + size_t memhdr_len = 0;
> > > >
> > > > + /* get the mempool base addr and align */
> > > > memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > memhdr_addr = (uintptr_t)memhdr->addr;
>
> This is not a new bug; but if the mempool is not populated, memhdr is NULL here.
Thanks, will add a check later.
>
> > > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > *align = memhdr_addr - aligned_addr;
> > > >
> > >
> > > I am aware this is not part of this patch, but as note, can't we use
> > > 'RTE_ALIGN_FLOOR' to calculate aligned address.
> >
> > Sure, will use RTE_ALIGN_FLOOR in next version.
> >
> > >
> > >
> > > > + memhdr_len += memhdr->len;
> > > > +
> > > > + /* check if virtual contiguous memory for multiple memhdrs */
> > > > + next = STAILQ_NEXT(memhdr, next);
> > > > + while (next != NULL) {
> > > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
> > > >len) {
> > > > + AF_XDP_LOG(ERR, "memory chunks not virtual
> > > contiguous, "
> > > > + "next: %p, cur: %p(len: %" PRId64
> > > " )\n",
> > > > + next->addr, memhdr->addr, memhdr-
> > > >len);
> > > > + return 0;
> > > > + }
> > > >
> > >
> > > Isn't there a mempool flag that can help us figure out mempool is
> > > not IOVA contiguous? Isn't it sufficient on its own?
> >
> > Indeed, what we need to ascertain is whether it's contiguous in CPU
> > virtual space, not IOVA. I haven't come across a flag specifically for
> > CPU virtual contiguity. The major limitation in XDP is XSK UMEM only
> > supports registering a single contiguous virtual memory area.
>
> I would assume that the EAL memory manager merges free memory into
> contiguous chunks whenever possible.
> @Anatoly, please confirm?
>
> If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the
> mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is;
> there is no need to iterate through the memhdr list.
If this's true now, however, this assumption may not hold true in the future code change, iterating through the list may is a safer way as it carefully checks the virtual address without relying on any condition.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-23 6:56 ` Du, Frank
@ 2024-05-23 7:40 ` Morten Brørup
2024-05-23 7:56 ` Du, Frank
0 siblings, 1 reply; 29+ messages in thread
From: Morten Brørup @ 2024-05-23 7:40 UTC (permalink / raw)
To: Du, Frank, Ferruh Yigit, dev, Andrew Rybchenko, Burakov, Anatoly
Cc: Loftus, Ciara
> From: Du, Frank [mailto:frank.du@intel.com]
> Sent: Thursday, 23 May 2024 08.56
>
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Wednesday, May 22, 2024 3:27 PM
> >
> > > From: Du, Frank [mailto:frank.du@intel.com]
> > > Sent: Wednesday, 22 May 2024 03.25
> > >
> > > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > > Sent: Wednesday, May 22, 2024 1:58 AM
> > > >
> > > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > > The current calculation assumes that the mbufs are contiguous.
> > > > > However, this assumption is incorrect when the memory spans across
> > > > > a huge
> > > > page.
What does "the memory spans across a huge page" mean?
Should it be "the memory spans across multiple memory chunks"?
> > > > > Correct to directly read the size from the mempool memory chunks.
> > > > >
> > > > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > > >
> > > > > ---
> > > > > v2:
> > > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > > ---
> > > > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > > ++++++++++++++++++++++++-----
> > > > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > index 268a130c49..7456108d6d 100644
> > > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > > > > __rte_unused, }
> > > > >
> > > > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > > uint64_t *align)
> > > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > > > > +uint64_t *align, size_t *len)
> > > > > {
> > > > > - struct rte_mempool_memhdr *memhdr;
> > > > > + struct rte_mempool_memhdr *memhdr, *next;
> > > > > uintptr_t memhdr_addr, aligned_addr;
> > > > > + size_t memhdr_len = 0;
> > > > >
> > > > > + /* get the mempool base addr and align */
> > > > > memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > > memhdr_addr = (uintptr_t)memhdr->addr;
> >
> > This is not a new bug; but if the mempool is not populated, memhdr is NULL
> here.
>
> Thanks, will add a check later.
>
> >
> > > > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > > *align = memhdr_addr - aligned_addr;
> > > > >
> > > >
> > > > I am aware this is not part of this patch, but as note, can't we use
> > > > 'RTE_ALIGN_FLOOR' to calculate aligned address.
> > >
> > > Sure, will use RTE_ALIGN_FLOOR in next version.
> > >
> > > >
> > > >
> > > > > + memhdr_len += memhdr->len;
> > > > > +
> > > > > + /* check if virtual contiguous memory for multiple memhdrs */
> > > > > + next = STAILQ_NEXT(memhdr, next);
> > > > > + while (next != NULL) {
> > > > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> memhdr-
> > > > >len) {
> > > > > + AF_XDP_LOG(ERR, "memory chunks not virtual
> > > > contiguous, "
> > > > > + "next: %p, cur: %p(len: %" PRId64
> > > > " )\n",
> > > > > + next->addr, memhdr->addr, memhdr-
> > > > >len);
> > > > > + return 0;
> > > > > + }
> > > > >
> > > >
> > > > Isn't there a mempool flag that can help us figure out mempool is
> > > > not IOVA contiguous? Isn't it sufficient on its own?
> > >
> > > Indeed, what we need to ascertain is whether it's contiguous in CPU
> > > virtual space, not IOVA. I haven't come across a flag specifically for
> > > CPU virtual contiguity. The major limitation in XDP is XSK UMEM only
> > > supports registering a single contiguous virtual memory area.
> >
> > I would assume that the EAL memory manager merges free memory into
> > contiguous chunks whenever possible.
> > @Anatoly, please confirm?
> >
> > If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then
> the
> > mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it
> is;
> > there is no need to iterate through the memhdr list.
>
> If this's true now, however, this assumption may not hold true in the future
> code change, iterating through the list may is a safer way as it carefully
> checks the virtual address without relying on any condition.
If there is exactly one memory chunk, it is virtual contiguous. It has one address and one length, so it must be.
If there are more than one memory chunk, I consider it unlikely that they are contiguous.
Have you ever observed the opposite, i.e. a mempool with multiple memory chunks being virtual contiguous?
Iterating through the list does not seem safer to me, quite the opposite.
Which future change are you trying to prepare for?
Keeping it simple is more likely to not break with future changes.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-23 7:40 ` Morten Brørup
@ 2024-05-23 7:56 ` Du, Frank
2024-05-29 12:57 ` Loftus, Ciara
0 siblings, 1 reply; 29+ messages in thread
From: Du, Frank @ 2024-05-23 7:56 UTC (permalink / raw)
To: Morten Brørup, Ferruh Yigit, dev, Andrew Rybchenko, Burakov,
Anatoly
Cc: Loftus, Ciara
> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, May 23, 2024 3:41 PM
> To: Du, Frank <frank.du@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Burakov,
> Anatoly <anatoly.burakov@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>
> > From: Du, Frank [mailto:frank.du@intel.com]
> > Sent: Thursday, 23 May 2024 08.56
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, May 22, 2024 3:27 PM
> > >
> > > > From: Du, Frank [mailto:frank.du@intel.com]
> > > > Sent: Wednesday, 22 May 2024 03.25
> > > >
> > > > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > > > Sent: Wednesday, May 22, 2024 1:58 AM
> > > > >
> > > > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > > > The current calculation assumes that the mbufs are contiguous.
> > > > > > However, this assumption is incorrect when the memory spans
> > > > > > across a huge
> > > > > page.
>
> What does "the memory spans across a huge page" mean?
>
> Should it be "the memory spans across multiple memory chunks"?
This does not pertain to multiple memory chunks but rather to mbuf memory. The scenario involves a single memory chunk utilizing multiple 2M pages. To ensure that each mbuf resides exclusively within a single page, there are deliberate spacing gaps when allocating mbufs across the 2M page boundaries.
>
> > > > > > Correct to directly read the size from the mempool memory chunks.
> > > > > >
> > > > > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > > > >
> > > > > > ---
> > > > > > v2:
> > > > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > > > ---
> > > > > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > > > ++++++++++++++++++++++++-----
> > > > > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > index 268a130c49..7456108d6d 100644
> > > > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev
> > > > > > *dev __rte_unused, }
> > > > > >
> > > > > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > > > uint64_t *align)
> > > > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool
> > > > > > +*mp, uint64_t *align, size_t *len)
> > > > > > {
> > > > > > - struct rte_mempool_memhdr *memhdr;
> > > > > > + struct rte_mempool_memhdr *memhdr, *next;
> > > > > > uintptr_t memhdr_addr, aligned_addr;
> > > > > > + size_t memhdr_len = 0;
> > > > > >
> > > > > > + /* get the mempool base addr and align */
> > > > > > memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > > > memhdr_addr = (uintptr_t)memhdr->addr;
> > >
> > > This is not a new bug; but if the mempool is not populated, memhdr
> > > is NULL
> > here.
> >
> > Thanks, will add a check later.
> >
> > >
> > > > > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > > > *align = memhdr_addr - aligned_addr;
> > > > > >
> > > > >
> > > > > I am aware this is not part of this patch, but as note, can't we
> > > > > use 'RTE_ALIGN_FLOOR' to calculate aligned address.
> > > >
> > > > Sure, will use RTE_ALIGN_FLOOR in next version.
> > > >
> > > > >
> > > > >
> > > > > > + memhdr_len += memhdr->len;
> > > > > > +
> > > > > > + /* check if virtual contiguous memory for multiple memhdrs */
> > > > > > + next = STAILQ_NEXT(memhdr, next);
> > > > > > + while (next != NULL) {
> > > > > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> > memhdr-
> > > > > >len) {
> > > > > > + AF_XDP_LOG(ERR, "memory chunks not virtual
> > > > > contiguous, "
> > > > > > + "next: %p, cur: %p(len: %"
> PRId64
> > > > > " )\n",
> > > > > > + next->addr, memhdr->addr,
> memhdr-
> > > > > >len);
> > > > > > + return 0;
> > > > > > + }
> > > > > >
> > > > >
> > > > > Isn't there a mempool flag that can help us figure out mempool
> > > > > is not IOVA contiguous? Isn't it sufficient on its own?
> > > >
> > > > Indeed, what we need to ascertain is whether it's contiguous in
> > > > CPU virtual space, not IOVA. I haven't come across a flag
> > > > specifically for CPU virtual contiguity. The major limitation in
> > > > XDP is XSK UMEM only supports registering a single contiguous virtual
> memory area.
> > >
> > > I would assume that the EAL memory manager merges free memory into
> > > contiguous chunks whenever possible.
> > > @Anatoly, please confirm?
> > >
> > > If my assumption is correct, it means that if mp->nb_mem_chunks !=
> > > 1, then
> > the
> > > mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1,
> > > then it
> > is;
> > > there is no need to iterate through the memhdr list.
> >
> > If this's true now, however, this assumption may not hold true in the
> > future code change, iterating through the list may is a safer way as
> > it carefully checks the virtual address without relying on any condition.
>
> If there is exactly one memory chunk, it is virtual contiguous. It has one address
> and one length, so it must be.
>
> If there are more than one memory chunk, I consider it unlikely that they are
> contiguous.
> Have you ever observed the opposite, i.e. a mempool with multiple memory
> chunks being virtual contiguous?
>
> Iterating through the list does not seem safer to me, quite the opposite.
> Which future change are you trying to prepare for?
>
> Keeping it simple is more likely to not break with future changes.
No, I haven't encountered a mempool with multiple memory chunks actually, not know how to construct such mempool. The initial approach was to return an error if multiple chunks were detected, and the iteration method was introduced later. I can revert to the original, simpler way.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-23 7:56 ` Du, Frank
@ 2024-05-29 12:57 ` Loftus, Ciara
2024-05-29 14:16 ` Morten Brørup
0 siblings, 1 reply; 29+ messages in thread
From: Loftus, Ciara @ 2024-05-29 12:57 UTC (permalink / raw)
To: Du, Frank, Morten Brørup
Cc: Ferruh Yigit, dev, Andrew Rybchenko, Burakov, Anatoly
> -----Original Message-----
> From: Du, Frank <frank.du@intel.com>
> Sent: Thursday, May 23, 2024 8:56 AM
> To: Morten Brørup <mb@smartsharesystems.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>
> > -----Original Message-----
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Thursday, May 23, 2024 3:41 PM
> > To: Du, Frank <frank.du@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>;
> Burakov,
> > Anatoly <anatoly.burakov@intel.com>
> > Cc: Loftus, Ciara <ciara.loftus@intel.com>
> > Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
> >
> > > From: Du, Frank [mailto:frank.du@intel.com]
> > > Sent: Thursday, 23 May 2024 08.56
> > >
> > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > Sent: Wednesday, May 22, 2024 3:27 PM
> > > >
> > > > > From: Du, Frank [mailto:frank.du@intel.com]
> > > > > Sent: Wednesday, 22 May 2024 03.25
> > > > >
> > > > > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > > > > Sent: Wednesday, May 22, 2024 1:58 AM
> > > > > >
> > > > > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > > > > The current calculation assumes that the mbufs are contiguous.
> > > > > > > However, this assumption is incorrect when the memory spans
> > > > > > > across a huge
> > > > > > page.
> >
> > What does "the memory spans across a huge page" mean?
> >
> > Should it be "the memory spans across multiple memory chunks"?
>
> This does not pertain to multiple memory chunks but rather to mbuf memory.
> The scenario involves a single memory chunk utilizing multiple 2M pages. To
> ensure that each mbuf resides exclusively within a single page, there are
> deliberate spacing gaps when allocating mbufs across the 2M page
> boundaries.
>
> >
> > > > > > > Correct to directly read the size from the mempool memory chunks.
> > > > > > >
> > > > > > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > > > > >
> > > > > > > ---
> > > > > > > v2:
> > > > > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > > > > ---
> > > > > > > drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > > > > ++++++++++++++++++++++++-----
> > > > > > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > index 268a130c49..7456108d6d 100644
> > > > > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev
> > > > > > > *dev __rte_unused, }
> > > > > > >
> > > > > > > #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > > > > uint64_t *align)
> > > > > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool
> > > > > > > +*mp, uint64_t *align, size_t *len)
> > > > > > > {
> > > > > > > - struct rte_mempool_memhdr *memhdr;
> > > > > > > + struct rte_mempool_memhdr *memhdr, *next;
> > > > > > > uintptr_t memhdr_addr, aligned_addr;
> > > > > > > + size_t memhdr_len = 0;
> > > > > > >
> > > > > > > + /* get the mempool base addr and align */
> > > > > > > memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > > > > memhdr_addr = (uintptr_t)memhdr->addr;
> > > >
> > > > This is not a new bug; but if the mempool is not populated, memhdr
> > > > is NULL
> > > here.
> > >
> > > Thanks, will add a check later.
> > >
> > > >
> > > > > > > aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > > > > *align = memhdr_addr - aligned_addr;
> > > > > > >
> > > > > >
> > > > > > I am aware this is not part of this patch, but as note, can't we
> > > > > > use 'RTE_ALIGN_FLOOR' to calculate aligned address.
> > > > >
> > > > > Sure, will use RTE_ALIGN_FLOOR in next version.
> > > > >
> > > > > >
> > > > > >
> > > > > > > + memhdr_len += memhdr->len;
> > > > > > > +
> > > > > > > + /* check if virtual contiguous memory for multiple memhdrs
> */
> > > > > > > + next = STAILQ_NEXT(memhdr, next);
> > > > > > > + while (next != NULL) {
> > > > > > > + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr
> +
> > > memhdr-
> > > > > > >len) {
> > > > > > > + AF_XDP_LOG(ERR, "memory chunks not
> virtual
> > > > > > contiguous, "
> > > > > > > + "next: %p, cur: %p(len: %"
> > PRId64
> > > > > > " )\n",
> > > > > > > + next->addr, memhdr->addr,
> > memhdr-
> > > > > > >len);
> > > > > > > + return 0;
> > > > > > > + }
> > > > > > >
> > > > > >
> > > > > > Isn't there a mempool flag that can help us figure out mempool
> > > > > > is not IOVA contiguous? Isn't it sufficient on its own?
> > > > >
> > > > > Indeed, what we need to ascertain is whether it's contiguous in
> > > > > CPU virtual space, not IOVA. I haven't come across a flag
> > > > > specifically for CPU virtual contiguity. The major limitation in
> > > > > XDP is XSK UMEM only supports registering a single contiguous virtual
> > memory area.
> > > >
> > > > I would assume that the EAL memory manager merges free memory into
> > > > contiguous chunks whenever possible.
> > > > @Anatoly, please confirm?
> > > >
> > > > If my assumption is correct, it means that if mp->nb_mem_chunks !=
> > > > 1, then
> > > the
> > > > mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1,
> > > > then it
> > > is;
> > > > there is no need to iterate through the memhdr list.
> > >
> > > If this's true now, however, this assumption may not hold true in the
> > > future code change, iterating through the list may is a safer way as
> > > it carefully checks the virtual address without relying on any condition.
> >
> > If there is exactly one memory chunk, it is virtual contiguous. It has one
> address
> > and one length, so it must be.
> >
> > If there are more than one memory chunk, I consider it unlikely that they are
> > contiguous.
> > Have you ever observed the opposite, i.e. a mempool with multiple memory
> > chunks being virtual contiguous?
> >
> > Iterating through the list does not seem safer to me, quite the opposite.
> > Which future change are you trying to prepare for?
> >
> > Keeping it simple is more likely to not break with future changes.
>
> No, I haven't encountered a mempool with multiple memory chunks actually,
> not know how to construct such mempool. The initial approach was to return
> an error if multiple chunks were detected, and the iteration method was
> introduced later. I can revert to the original, simpler way.
The mempool created in my (virtualized) test environment always has multiple memory chunks and the iterative check for virtual contiguity in v2 of this patch succeeds for me.
However in v4, since mp->nb_mem_chunks != 1, it will fail for me.
So it appears that virtual contiguity is possible even if mp->nb_mem_chunks != 1 so I don't think we can rely on that value for determining virtual contiguity.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-29 12:57 ` Loftus, Ciara
@ 2024-05-29 14:16 ` Morten Brørup
0 siblings, 0 replies; 29+ messages in thread
From: Morten Brørup @ 2024-05-29 14:16 UTC (permalink / raw)
To: Loftus, Ciara, Du, Frank, Andrew Rybchenko, Paul Szczepanek
Cc: Ferruh Yigit, dev, Burakov, Anatoly
> From: Loftus, Ciara [mailto:ciara.loftus@intel.com]
> Sent: Wednesday, 29 May 2024 14.58
>
> > From: Du, Frank <frank.du@intel.com>
> > Sent: Thursday, May 23, 2024 8:56 AM
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Thursday, May 23, 2024 3:41 PM
> > >
> > > > From: Du, Frank [mailto:frank.du@intel.com]
> > > > Sent: Thursday, 23 May 2024 08.56
> > > >
> > > > > From: Morten Brørup <mb@smartsharesystems.com>
> > > > > Sent: Wednesday, May 22, 2024 3:27 PM
> > > > >
> > > > > > From: Du, Frank [mailto:frank.du@intel.com]
> > > > > > Sent: Wednesday, 22 May 2024 03.25
> > > > > >
> > > > > > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > > > > > Sent: Wednesday, May 22, 2024 1:58 AM
[...]
> > > > > > >
> > > > > > > Isn't there a mempool flag that can help us figure out mempool
> > > > > > > is not IOVA contiguous? Isn't it sufficient on its own?
> > > > > >
> > > > > > Indeed, what we need to ascertain is whether it's contiguous in
> > > > > > CPU virtual space, not IOVA. I haven't come across a flag
> > > > > > specifically for CPU virtual contiguity. The major limitation in
> > > > > > XDP is XSK UMEM only supports registering a single contiguous
> > > > > > virtual memory area.
> > > > >
> > > > > I would assume that the EAL memory manager merges free memory into
> > > > > contiguous chunks whenever possible.
> > > > > @Anatoly, please confirm?
> > > > >
> > > > > If my assumption is correct, it means that if mp->nb_mem_chunks !=
> > > > > 1, then the
> > > > > mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1,
> > > > > then it is;
> > > > > there is no need to iterate through the memhdr list.
> > > >
> > > > If this's true now, however, this assumption may not hold true in the
> > > > future code change, iterating through the list may is a safer way as
> > > > it carefully checks the virtual address without relying on any
> > > > condition.
> > >
> > > If there is exactly one memory chunk, it is virtual contiguous. It has one
> > > address and one length, so it must be.
> > >
> > > If there are more than one memory chunk, I consider it unlikely that they
> > > are contiguous.
> > > Have you ever observed the opposite, i.e. a mempool with multiple memory
> > > chunks being virtual contiguous?
> > >
> > > Iterating through the list does not seem safer to me, quite the opposite.
> > > Which future change are you trying to prepare for?
> > >
> > > Keeping it simple is more likely to not break with future changes.
> >
> > No, I haven't encountered a mempool with multiple memory chunks actually,
> > not know how to construct such mempool. The initial approach was to return
> > an error if multiple chunks were detected, and the iteration method was
> > introduced later. I can revert to the original, simpler way.
>
> The mempool created in my (virtualized) test environment always has multiple
> memory chunks and the iterative check for virtual contiguity in v2 of this
> patch succeeds for me.
> However in v4, since mp->nb_mem_chunks != 1, it will fail for me.
> So it appears that virtual contiguity is possible even if mp->nb_mem_chunks !=
> 1 so I don't think we can rely on that value for determining virtual
> contiguity.
Excellent feedback, Ciara!
Once again, reality beats assumptions. Memory management is not easy. :-)
In another thread [1], I have asked Paul Szczepanek to pick up on this, and coordinate directly with Frank Du.
Paul is working on a closely related function in the mempool library, and it makes sense to merge this feature into the function he is providing.
Alternatively provide another function in the mempool library.
[1]: https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F4C4@smartserver.smartshare.dk/T/#m8bf62c45d12f34659becf965cf101f8456723c94
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 1:25 ` Du, Frank
2024-05-22 7:26 ` Morten Brørup
@ 2024-05-22 10:00 ` Ferruh Yigit
2024-05-22 11:03 ` Morten Brørup
1 sibling, 1 reply; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-22 10:00 UTC (permalink / raw)
To: Du, Frank, dev, Andrew Rybchenko, Morten Brørup
Cc: Loftus, Ciara, Burakov, Anatoly
On 5/22/2024 2:25 AM, Du, Frank wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, May 22, 2024 1:58 AM
>> To: Du, Frank <frank.du@intel.com>; dev@dpdk.org; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Morten Brørup
>> <mb@smartsharesystems.com>
>> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Burakov, Anatoly
>> <anatoly.burakov@intel.com>
>> Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>>
>> On 5/11/2024 6:26 AM, Frank Du wrote:
>>> The current calculation assumes that the mbufs are contiguous.
>>> However, this assumption is incorrect when the memory spans across a huge
>> page.
>>> Correct to directly read the size from the mempool memory chunks.
>>>
>>> Signed-off-by: Frank Du <frank.du@intel.com>
>>>
>>> ---
>>> v2:
>>> * Add virtual contiguous detect for for multiple memhdrs.
>>> ---
>>> drivers/net/af_xdp/rte_eth_af_xdp.c | 34
>>> ++++++++++++++++++++++++-----
>>> 1 file changed, 28 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> index 268a130c49..7456108d6d 100644
>>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
>>> __rte_unused, }
>>>
>>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
>>> uint64_t *align)
>>> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
>>> +uint64_t *align, size_t *len)
>>> {
>>> - struct rte_mempool_memhdr *memhdr;
>>> + struct rte_mempool_memhdr *memhdr, *next;
>>> uintptr_t memhdr_addr, aligned_addr;
>>> + size_t memhdr_len = 0;
>>>
>>> + /* get the mempool base addr and align */
>>> memhdr = STAILQ_FIRST(&mp->mem_list);
>>> memhdr_addr = (uintptr_t)memhdr->addr;
>>> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>>> *align = memhdr_addr - aligned_addr;
>>>
>>
>> I am aware this is not part of this patch, but as note, can't we use
>> 'RTE_ALIGN_FLOOR' to calculate aligned address.
>
> Sure, will use RTE_ALIGN_FLOOR in next version.
>
>>
>>
>>> + memhdr_len += memhdr->len;
>>> +
>>> + /* check if virtual contiguous memory for multiple memhdrs */
>>> + next = STAILQ_NEXT(memhdr, next);
>>> + while (next != NULL) {
>>> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
>>> len) {
>>> + AF_XDP_LOG(ERR, "memory chunks not virtual
>> contiguous, "
>>> + "next: %p, cur: %p(len: %" PRId64
>> " )\n",
>>> + next->addr, memhdr->addr, memhdr-
>>> len);
>>> + return 0;
>>> + }
>>>
>>
>> Isn't there a mempool flag that can help us figure out mempool is not IOVA
>> contiguous? Isn't it sufficient on its own?
>
> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual space, not IOVA. I haven't come across a flag specifically for CPU virtual contiguity. The major limitation in XDP is XSK UMEM only supports registering a single contiguous virtual memory area.
>
'RTE_MEMPOOL_F_NO_IOVA_CONTIG' is the flag I was looking for. This flag
being *cleared* implies IOVA contiguous but not sure if it is
guaranteed, need to check.
And I may be wrong here, but as far as I remember in IOVA as VA mode,
process virtual address and IOVA address are same, so IOVA contiguous is
same as contiguous CPU virtual address.
>>
>>
>>> + /* virtual contiguous */
>>> + memhdr = next;
>>> + memhdr_len += memhdr->len;
>>> + next = STAILQ_NEXT(memhdr, next);
>>> + }
>>>
>>> + *len = memhdr_len;
>>> return aligned_addr;
>>> }
>>>
>>
>> This function goes too much details of the mempool object, and any change in
>> mempool details has potential to break this code.
>>
>> @Andrew, @Morten, do you think does it make sense to have
>> 'rte_mempool_info_get()' kind of function, that provides at least address and
>> length of the mempool, and used here?
>>
>> This helps to hide internal details and complexity of the mempool for users.
>>
>>
>>>
>>> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
>> pmd_internals *internals,
>>> void *base_addr = NULL;
>>> struct rte_mempool *mb_pool = rxq->mb_pool;
>>> uint64_t umem_size, align = 0;
>>> + size_t len = 0;
>>>
>>> if (internals->shared_umem) {
>>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
>>> -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
>> pmd_internals *internals,
>>> }
>>>
>>> umem->mb_pool = mb_pool;
>>> - base_addr = (void *)get_base_addr(mb_pool, &align);
>>> - umem_size = (uint64_t)mb_pool->populated_size *
>>> - (uint64_t)usr_config.frame_size +
>>> - align;
>>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>>>
>>
>> Is this calculation correct if mempool is not already aligned to page size?
>>
>> Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
>> returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
>>
>> Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
>
> Yes, since the XSK UMEM memory area requires page alignment. However, no need to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the mbuf data area. We don’t have any chance to access the invalid range [0x000a1000: 0x000a1080] here.
>
Thanks for clarification.
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 10:00 ` Ferruh Yigit
@ 2024-05-22 11:03 ` Morten Brørup
2024-05-22 14:05 ` Ferruh Yigit
0 siblings, 1 reply; 29+ messages in thread
From: Morten Brørup @ 2024-05-22 11:03 UTC (permalink / raw)
To: Ferruh Yigit, Du, Frank, dev, Andrew Rybchenko
Cc: Loftus, Ciara, Burakov, Anatoly
> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
> Sent: Wednesday, 22 May 2024 12.01
>
> On 5/22/2024 2:25 AM, Du, Frank wrote:
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Wednesday, May 22, 2024 1:58 AM
> >>
> >> Isn't there a mempool flag that can help us figure out mempool is not IOVA
> >> contiguous? Isn't it sufficient on its own?
> >
> > Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
> space, not IOVA. I haven't come across a flag specifically for CPU virtual
> contiguity. The major limitation in XDP is XSK UMEM only supports registering
> a single contiguous virtual memory area.
> >
>
> 'RTE_MEMPOOL_F_NO_IOVA_CONTIG' is the flag I was looking for. This flag
> being *cleared* implies IOVA contiguous but not sure if it is
> guaranteed, need to check.
Wrong.
RTE_MEMPOOL_F_NO_IOVA_CONTIG only relates to individual objects in the pool. This flag being cleared only means that each individual object in the mempool is IOVA contiguous.
It does not imply that the entire pool of objects is IOVA contiguous.
>
> And I may be wrong here, but as far as I remember in IOVA as VA mode,
> process virtual address and IOVA address are same, so IOVA contiguous is
> same as contiguous CPU virtual address.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
2024-05-22 11:03 ` Morten Brørup
@ 2024-05-22 14:05 ` Ferruh Yigit
0 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-22 14:05 UTC (permalink / raw)
To: Morten Brørup, Du, Frank, dev, Andrew Rybchenko
Cc: Loftus, Ciara, Burakov, Anatoly
On 5/22/2024 12:03 PM, Morten Brørup wrote:
>> From: Ferruh Yigit [mailto:ferruh.yigit@amd.com]
>> Sent: Wednesday, 22 May 2024 12.01
>>
>> On 5/22/2024 2:25 AM, Du, Frank wrote:
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Wednesday, May 22, 2024 1:58 AM
>>>>
>>>> Isn't there a mempool flag that can help us figure out mempool is not IOVA
>>>> contiguous? Isn't it sufficient on its own?
>>>
>>> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
>> space, not IOVA. I haven't come across a flag specifically for CPU virtual
>> contiguity. The major limitation in XDP is XSK UMEM only supports registering
>> a single contiguous virtual memory area.
>>>
>>
>> 'RTE_MEMPOOL_F_NO_IOVA_CONTIG' is the flag I was looking for. This flag
>> being *cleared* implies IOVA contiguous but not sure if it is
>> guaranteed, need to check.
>
> Wrong.
> RTE_MEMPOOL_F_NO_IOVA_CONTIG only relates to individual objects in the pool. This flag being cleared only means that each individual object in the mempool is IOVA contiguous.
> It does not imply that the entire pool of objects is IOVA contiguous.
>
Ah, thanks for clarification.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3] net/af_xdp: fix umem map size for zero copy
2024-04-26 0:51 [PATCH] net/af_xdp: fix umem map size for zero copy Frank Du
2024-04-26 10:43 ` Loftus, Ciara
2024-05-11 5:26 ` [PATCH v2] " Frank Du
@ 2024-05-23 6:53 ` Frank Du
2024-05-23 8:07 ` [PATCH v4] " Frank Du
2024-06-20 3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
4 siblings, 0 replies; 29+ messages in thread
From: Frank Du @ 2024-05-23 6:53 UTC (permalink / raw)
To: dev; +Cc: ciara.loftus, ferruh.yigit, mb
The current calculation assumes that the mbufs are contiguous. However,
this assumption is incorrect when the memory spans across a huge page.
Correct to directly read the size from the mempool memory chunks.
Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
Cc: stable@dpdk.org
Signed-off-by: Frank Du <frank.du@intel.com>
---
v2:
* Add virtual contiguous detect for for multiple memhdrs.
v3:
* Use RTE_ALIGN_FLOOR to get the aligned addr
* Add check on the first memhdr of memory chunks
---
drivers/net/af_xdp/rte_eth_af_xdp.c | 40 ++++++++++++++++++++++++-----
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 6ba455bb9b..986665d1d4 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1040,16 +1040,39 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
}
#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
+static inline uintptr_t get_memhdr_info(struct rte_mempool *mp, uint64_t *align, size_t *len)
{
- struct rte_mempool_memhdr *memhdr;
+ struct rte_mempool_memhdr *memhdr, *next;
uintptr_t memhdr_addr, aligned_addr;
+ size_t memhdr_len = 0;
+ /* get the mempool base addr and align */
memhdr = STAILQ_FIRST(&mp->mem_list);
+ if (!memhdr) {
+ AF_XDP_LOG(ERR, "The mempool is not populated\n");
+ return 0;
+ }
memhdr_addr = (uintptr_t)memhdr->addr;
- aligned_addr = memhdr_addr & ~(getpagesize() - 1);
+ aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
*align = memhdr_addr - aligned_addr;
+ memhdr_len += memhdr->len;
+
+ /* check if virtual contiguous memory for multiple memhdrs */
+ next = STAILQ_NEXT(memhdr, next);
+ while (next) {
+ if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr->len) {
+ AF_XDP_LOG(ERR, "Memory chunks not virtual contiguous, "
+ "next: %p, cur: %p(len: %" PRId64 " )\n",
+ next->addr, memhdr->addr, memhdr->len);
+ return 0;
+ }
+ /* virtual contiguous */
+ memhdr = next;
+ memhdr_len += memhdr->len;
+ next = STAILQ_NEXT(memhdr, next);
+ }
+ *len = memhdr_len;
return aligned_addr;
}
@@ -1126,6 +1149,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
void *base_addr = NULL;
struct rte_mempool *mb_pool = rxq->mb_pool;
uint64_t umem_size, align = 0;
+ size_t len = 0;
if (internals->shared_umem) {
if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
@@ -1157,10 +1181,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
umem->mb_pool = mb_pool;
- base_addr = (void *)get_base_addr(mb_pool, &align);
- umem_size = (uint64_t)mb_pool->populated_size *
- (uint64_t)usr_config.frame_size +
- align;
+ base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
+ if (!base_addr) {
+ AF_XDP_LOG(ERR, "The memory pool can't be mapped into umem\n");
+ goto err;
+ }
+ umem_size = (uint64_t)len + align;
ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
&rxq->fq, &rxq->cq, &usr_config);
--
2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v4] net/af_xdp: fix umem map size for zero copy
2024-04-26 0:51 [PATCH] net/af_xdp: fix umem map size for zero copy Frank Du
` (2 preceding siblings ...)
2024-05-23 6:53 ` [PATCH v3] " Frank Du
@ 2024-05-23 8:07 ` Frank Du
2024-05-23 9:22 ` Morten Brørup
2024-06-20 3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
4 siblings, 1 reply; 29+ messages in thread
From: Frank Du @ 2024-05-23 8:07 UTC (permalink / raw)
To: dev; +Cc: ciara.loftus, ferruh.yigit, mb
The current calculation assumes that the mbufs are contiguous. However,
this assumption is incorrect when the mbuf memory spans across huge page.
To ensure that each mbuf resides exclusively within a single page, there
are deliberate spacing gaps when allocating mbufs across the boundaries.
Correct to directly read the size from the mempool memory chunk.
Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
Cc: stable@dpdk.org
Signed-off-by: Frank Du <frank.du@intel.com>
---
v2:
* Add virtual contiguous detect for for multiple memhdrs
v3:
* Use RTE_ALIGN_FLOOR to get the aligned addr
* Add check on the first memhdr of memory chunks
v4:
* Replace the iterating with simple nb_mem_chunks check
---
drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++++++++++++++++++++++------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 6ba455bb9b..d0431ec089 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
}
#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
+static inline uintptr_t
+get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len)
{
struct rte_mempool_memhdr *memhdr;
uintptr_t memhdr_addr, aligned_addr;
+ if (mp->nb_mem_chunks != 1) {
+ /*
+ * The mempool with multiple chunks is not virtual contiguous but
+ * xsk umem only support single virtual region mapping.
+ */
+ AF_XDP_LOG(ERR, "The mempool contain multiple %u memory chunks\n",
+ mp->nb_mem_chunks);
+ return 0;
+ }
+
+ /* Get the mempool base addr and align from the header now */
memhdr = STAILQ_FIRST(&mp->mem_list);
+ if (!memhdr) {
+ AF_XDP_LOG(ERR, "The mempool is not populated\n");
+ return 0;
+ }
memhdr_addr = (uintptr_t)memhdr->addr;
- aligned_addr = memhdr_addr & ~(getpagesize() - 1);
+ aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
*align = memhdr_addr - aligned_addr;
-
+ *len = memhdr->len;
return aligned_addr;
}
@@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
void *base_addr = NULL;
struct rte_mempool *mb_pool = rxq->mb_pool;
uint64_t umem_size, align = 0;
+ size_t len = 0;
if (internals->shared_umem) {
if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
@@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
umem->mb_pool = mb_pool;
- base_addr = (void *)get_base_addr(mb_pool, &align);
- umem_size = (uint64_t)mb_pool->populated_size *
- (uint64_t)usr_config.frame_size +
- align;
+ base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
+ if (!base_addr) {
+ AF_XDP_LOG(ERR, "The memory pool can't be mapped as umem\n");
+ goto err;
+ }
+ umem_size = (uint64_t)len + align;
ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
&rxq->fq, &rxq->cq, &usr_config);
--
2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
2024-05-23 8:07 ` [PATCH v4] " Frank Du
@ 2024-05-23 9:22 ` Morten Brørup
2024-05-23 13:31 ` Ferruh Yigit
0 siblings, 1 reply; 29+ messages in thread
From: Morten Brørup @ 2024-05-23 9:22 UTC (permalink / raw)
To: Frank Du, dev; +Cc: ciara.loftus, ferruh.yigit
> From: Frank Du [mailto:frank.du@intel.com]
> Sent: Thursday, 23 May 2024 10.08
>
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the mbuf memory spans across huge page.
> To ensure that each mbuf resides exclusively within a single page, there
> are deliberate spacing gaps when allocating mbufs across the boundaries.
A agree that this patch is an improvement of what existed previously.
But I still don't understand the patch description. To me, it looks like the patch adds a missing check for contiguous memory, and the patch itself has nothing to do with huge pages. Anyway, if the maintainer agrees with the description, I don't mind not grasping it. ;-)
However, while trying to understand what is happening, I think I found one more (already existing) bug.
I will show through an example inline below.
>
> Correct to directly read the size from the mempool memory chunk.
>
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: stable@dpdk.org
>
> Signed-off-by: Frank Du <frank.du@intel.com>
>
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs
> v3:
> * Use RTE_ALIGN_FLOOR to get the aligned addr
> * Add check on the first memhdr of memory chunks
> v4:
> * Replace the iterating with simple nb_mem_chunks check
> ---
> drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++++++++++++++++++++++------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> index 6ba455bb9b..d0431ec089 100644
> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
> }
>
> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
> *align)
> +static inline uintptr_t
> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len)
> {
> struct rte_mempool_memhdr *memhdr;
> uintptr_t memhdr_addr, aligned_addr;
>
> + if (mp->nb_mem_chunks != 1) {
> + /*
> + * The mempool with multiple chunks is not virtual contiguous but
> + * xsk umem only support single virtual region mapping.
> + */
> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> chunks\n",
> + mp->nb_mem_chunks);
> + return 0;
> + }
> +
> + /* Get the mempool base addr and align from the header now */
> memhdr = STAILQ_FIRST(&mp->mem_list);
> + if (!memhdr) {
> + AF_XDP_LOG(ERR, "The mempool is not populated\n");
> + return 0;
> + }
> memhdr_addr = (uintptr_t)memhdr->addr;
> - aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> *align = memhdr_addr - aligned_addr;
> -
> + *len = memhdr->len;
> return aligned_addr;
On x86_64, the page size is 4 KB = 0x1000.
Let's look at an example where memhdr->addr is not aligned to the page size:
In the example,
memhdr->addr is 0x700100, and
memhdr->len is 0x20000.
Then
aligned_addr becomes 0x700000,
*align becomes 0x100, and
*len becomes 0x20000.
> }
>
> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals,
> void *base_addr = NULL;
> struct rte_mempool *mb_pool = rxq->mb_pool;
> uint64_t umem_size, align = 0;
> + size_t len = 0;
>
> if (internals->shared_umem) {
> if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
> @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
> *internals,
> }
>
> umem->mb_pool = mb_pool;
> - base_addr = (void *)get_base_addr(mb_pool, &align);
> - umem_size = (uint64_t)mb_pool->populated_size *
> - (uint64_t)usr_config.frame_size +
> - align;
> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> + if (!base_addr) {
> + AF_XDP_LOG(ERR, "The memory pool can't be mapped as
> umem\n");
> + goto err;
> + }
> + umem_size = (uint64_t)len + align;
Here, umem_size becomes 0x20100.
>
> ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
> &rxq->fq, &rxq->cq, &usr_config);
Here, xsk_umem__create() is called with the base_address (0x700000) preceding the address of the memory chunk (0x700100).
It looks like a bug, causing a buffer underrun. I.e. will it access memory starting at base_address?
If I'm correct, the code should probably do this for alignment instead:
aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize());
*align = aligned_addr - memhdr_addr;
umem_size = (uint64_t)len - align;
Disclaimer: I don't know much about the AF_XDP implementation, so maybe I just don't understand what is going on.
> --
> 2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy
2024-05-23 9:22 ` Morten Brørup
@ 2024-05-23 13:31 ` Ferruh Yigit
2024-05-24 1:05 ` Du, Frank
0 siblings, 1 reply; 29+ messages in thread
From: Ferruh Yigit @ 2024-05-23 13:31 UTC (permalink / raw)
To: Morten Brørup, Frank Du, dev; +Cc: ciara.loftus
On 5/23/2024 10:22 AM, Morten Brørup wrote:
>> From: Frank Du [mailto:frank.du@intel.com]
>> Sent: Thursday, 23 May 2024 10.08
>>
>> The current calculation assumes that the mbufs are contiguous. However,
>> this assumption is incorrect when the mbuf memory spans across huge page.
>> To ensure that each mbuf resides exclusively within a single page, there
>> are deliberate spacing gaps when allocating mbufs across the boundaries.
>
> A agree that this patch is an improvement of what existed previously.
> But I still don't understand the patch description. To me, it looks like the patch adds a missing check for contiguous memory, and the patch itself has nothing to do with huge pages. Anyway, if the maintainer agrees with the description, I don't mind not grasping it. ;-)
>
> However, while trying to understand what is happening, I think I found one more (already existing) bug.
> I will show through an example inline below.
>
>>
>> Correct to directly read the size from the mempool memory chunk.
>>
>> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Frank Du <frank.du@intel.com>
>>
>> ---
>> v2:
>> * Add virtual contiguous detect for for multiple memhdrs
>> v3:
>> * Use RTE_ALIGN_FLOOR to get the aligned addr
>> * Add check on the first memhdr of memory chunks
>> v4:
>> * Replace the iterating with simple nb_mem_chunks check
>> ---
>> drivers/net/af_xdp/rte_eth_af_xdp.c | 33 +++++++++++++++++++++++------
>> 1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> index 6ba455bb9b..d0431ec089 100644
>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
>> }
>>
>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t
>> *align)
>> +static inline uintptr_t
>> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align, size_t *len)
>> {
>> struct rte_mempool_memhdr *memhdr;
>> uintptr_t memhdr_addr, aligned_addr;
>>
>> + if (mp->nb_mem_chunks != 1) {
>> + /*
>> + * The mempool with multiple chunks is not virtual contiguous but
>> + * xsk umem only support single virtual region mapping.
>> + */
>> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
>> chunks\n",
>> + mp->nb_mem_chunks);
>> + return 0;
>> + }
>> +
>> + /* Get the mempool base addr and align from the header now */
>> memhdr = STAILQ_FIRST(&mp->mem_list);
>> + if (!memhdr) {
>> + AF_XDP_LOG(ERR, "The mempool is not populated\n");
>> + return 0;
>> + }
>> memhdr_addr = (uintptr_t)memhdr->addr;
>> - aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
>> *align = memhdr_addr - aligned_addr;
>> -
>> + *len = memhdr->len;
>> return aligned_addr;
>
> On x86_64, the page size is 4 KB = 0x1000.
>
> Let's look at an example where memhdr->addr is not aligned to the page size:
>
> In the example,
> memhdr->addr is 0x700100, and
> memhdr->len is 0x20000.
>
> Then
> aligned_addr becomes 0x700000,
> *align becomes 0x100, and
> *len becomes 0x20000.
>
>> }
>>
>> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>> *internals,
>> void *base_addr = NULL;
>> struct rte_mempool *mb_pool = rxq->mb_pool;
>> uint64_t umem_size, align = 0;
>> + size_t len = 0;
>>
>> if (internals->shared_umem) {
>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
>> @@ -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals
>> *internals,
>> }
>>
>> umem->mb_pool = mb_pool;
>> - base_addr = (void *)get_base_addr(mb_pool, &align);
>> - umem_size = (uint64_t)mb_pool->populated_size *
>> - (uint64_t)usr_config.frame_size +
>> - align;
>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>> + if (!base_addr) {
>> + AF_XDP_LOG(ERR, "The memory pool can't be mapped as
>> umem\n");
>> + goto err;
>> + }
>> + umem_size = (uint64_t)len + align;
>
> Here, umem_size becomes 0x20100.
>
>>
>> ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
>> &rxq->fq, &rxq->cq, &usr_config);
>
> Here, xsk_umem__create() is called with the base_address (0x700000) preceding the address of the memory chunk (0x700100).
> It looks like a bug, causing a buffer underrun. I.e. will it access memory starting at base_address?
>
I already asked for this on v2, Frank mentioned that area is not
accessed and having gap is safe.
> If I'm correct, the code should probably do this for alignment instead:
>
> aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize());
> *align = aligned_addr - memhdr_addr;
> umem_size = (uint64_t)len - align;
>
>
> Disclaimer: I don't know much about the AF_XDP implementation, so maybe I just don't understand what is going on.
>
>> --
>> 2.34.1
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
2024-05-23 13:31 ` Ferruh Yigit
@ 2024-05-24 1:05 ` Du, Frank
2024-05-24 5:30 ` Morten Brørup
0 siblings, 1 reply; 29+ messages in thread
From: Du, Frank @ 2024-05-24 1:05 UTC (permalink / raw)
To: Ferruh Yigit, Morten Brørup, dev; +Cc: Loftus, Ciara
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, May 23, 2024 9:32 PM
> To: Morten Brørup <mb@smartsharesystems.com>; Du, Frank
> <frank.du@intel.com>; dev@dpdk.org
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [PATCH v4] net/af_xdp: fix umem map size for zero copy
>
> On 5/23/2024 10:22 AM, Morten Brørup wrote:
> >> From: Frank Du [mailto:frank.du@intel.com]
> >> Sent: Thursday, 23 May 2024 10.08
> >>
> >> The current calculation assumes that the mbufs are contiguous.
> >> However, this assumption is incorrect when the mbuf memory spans across
> huge page.
> >> To ensure that each mbuf resides exclusively within a single page,
> >> there are deliberate spacing gaps when allocating mbufs across the
> boundaries.
> >
> > A agree that this patch is an improvement of what existed previously.
> > But I still don't understand the patch description. To me, it looks
> > like the patch adds a missing check for contiguous memory, and the
> > patch itself has nothing to do with huge pages. Anyway, if the
> > maintainer agrees with the description, I don't mind not grasping it.
> > ;-)
> >
> > However, while trying to understand what is happening, I think I found one
> more (already existing) bug.
> > I will show through an example inline below.
> >
> >>
> >> Correct to directly read the size from the mempool memory chunk.
> >>
> >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Frank Du <frank.du@intel.com>
> >>
> >> ---
> >> v2:
> >> * Add virtual contiguous detect for for multiple memhdrs
> >> v3:
> >> * Use RTE_ALIGN_FLOOR to get the aligned addr
> >> * Add check on the first memhdr of memory chunks
> >> v4:
> >> * Replace the iterating with simple nb_mem_chunks check
> >> ---
> >> drivers/net/af_xdp/rte_eth_af_xdp.c | 33
> >> +++++++++++++++++++++++------
> >> 1 file changed, 26 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> index 6ba455bb9b..d0431ec089 100644
> >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev
> >> __rte_unused, }
> >>
> >> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> >> uint64_t
> >> *align)
> >> +static inline uintptr_t
> >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align,
> >> +size_t *len)
> >> {
> >> struct rte_mempool_memhdr *memhdr;
> >> uintptr_t memhdr_addr, aligned_addr;
> >>
> >> + if (mp->nb_mem_chunks != 1) {
> >> + /*
> >> + * The mempool with multiple chunks is not virtual contiguous
> but
> >> + * xsk umem only support single virtual region mapping.
> >> + */
> >> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> >> chunks\n",
> >> + mp->nb_mem_chunks);
> >> + return 0;
> >> + }
> >> +
> >> + /* Get the mempool base addr and align from the header now */
> >> memhdr = STAILQ_FIRST(&mp->mem_list);
> >> + if (!memhdr) {
> >> + AF_XDP_LOG(ERR, "The mempool is not populated\n");
> >> + return 0;
> >> + }
> >> memhdr_addr = (uintptr_t)memhdr->addr;
> >> - aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> >> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> >> *align = memhdr_addr - aligned_addr;
> >> -
> >> + *len = memhdr->len;
> >> return aligned_addr;
> >
> > On x86_64, the page size is 4 KB = 0x1000.
> >
> > Let's look at an example where memhdr->addr is not aligned to the page size:
> >
> > In the example,
> > memhdr->addr is 0x700100, and
> > memhdr->len is 0x20000.
> >
> > Then
> > aligned_addr becomes 0x700000,
> > *align becomes 0x100, and
> > *len becomes 0x20000.
> >
> >> }
> >>
> >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct
> >> pmd_internals *internals,
> >> void *base_addr = NULL;
> >> struct rte_mempool *mb_pool = rxq->mb_pool;
> >> uint64_t umem_size, align = 0;
> >> + size_t len = 0;
> >>
> >> if (internals->shared_umem) {
> >> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> >> -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct
> >> pmd_internals *internals,
> >> }
> >>
> >> umem->mb_pool = mb_pool;
> >> - base_addr = (void *)get_base_addr(mb_pool, &align);
> >> - umem_size = (uint64_t)mb_pool->populated_size *
> >> - (uint64_t)usr_config.frame_size +
> >> - align;
> >> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> >> + if (!base_addr) {
> >> + AF_XDP_LOG(ERR, "The memory pool can't be mapped
> as
> >> umem\n");
> >> + goto err;
> >> + }
> >> + umem_size = (uint64_t)len + align;
> >
> > Here, umem_size becomes 0x20100.
> >
> >>
> >> ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> >> &rxq->fq, &rxq->cq, &usr_config);
> >
> > Here, xsk_umem__create() is called with the base_address (0x700000)
> preceding the address of the memory chunk (0x700100).
> > It looks like a bug, causing a buffer underrun. I.e. will it access memory starting
> at base_address?
> >
>
> I already asked for this on v2, Frank mentioned that area is not accessed and
> having gap is safe.
xsk_umem__create() requires a base address that is aligned to a page boundary.
And, there is no chance to access the area between 0x700000 and 0x700100,
because the memory pointer for each XSK TX/RX descriptor is derived from the
mbuf data area.
>
> > If I'm correct, the code should probably do this for alignment instead:
> >
> > aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize()); *align =
> > aligned_addr - memhdr_addr; umem_size = (uint64_t)len - align;
> >
> >
> > Disclaimer: I don't know much about the AF_XDP implementation, so maybe I
> just don't understand what is going on.
> >
> >> --
> >> 2.34.1
> >
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v4] net/af_xdp: fix umem map size for zero copy
2024-05-24 1:05 ` Du, Frank
@ 2024-05-24 5:30 ` Morten Brørup
0 siblings, 0 replies; 29+ messages in thread
From: Morten Brørup @ 2024-05-24 5:30 UTC (permalink / raw)
To: Du, Frank, Ferruh Yigit, dev; +Cc: Loftus, Ciara
> From: Du, Frank [mailto:frank.du@intel.com]
> Sent: Friday, 24 May 2024 03.05
>
> > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > Sent: Thursday, May 23, 2024 9:32 PM
> >
> > On 5/23/2024 10:22 AM, Morten Brørup wrote:
> > >> From: Frank Du [mailto:frank.du@intel.com]
> > >> Sent: Thursday, 23 May 2024 10.08
> > >>
> > >> The current calculation assumes that the mbufs are contiguous.
> > >> However, this assumption is incorrect when the mbuf memory spans across
> > huge page.
> > >> To ensure that each mbuf resides exclusively within a single page,
> > >> there are deliberate spacing gaps when allocating mbufs across the
> > boundaries.
> > >
> > > A agree that this patch is an improvement of what existed previously.
> > > But I still don't understand the patch description. To me, it looks
> > > like the patch adds a missing check for contiguous memory, and the
> > > patch itself has nothing to do with huge pages. Anyway, if the
> > > maintainer agrees with the description, I don't mind not grasping it.
> > > ;-)
> > >
> > > However, while trying to understand what is happening, I think I found one
> > more (already existing) bug.
> > > I will show through an example inline below.
> > >
> > >>
> > >> Correct to directly read the size from the mempool memory chunk.
> > >>
> > >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> > >> Cc: stable@dpdk.org
> > >>
> > >> Signed-off-by: Frank Du <frank.du@intel.com>
> > >>
> > >> ---
> > >> v2:
> > >> * Add virtual contiguous detect for for multiple memhdrs
> > >> v3:
> > >> * Use RTE_ALIGN_FLOOR to get the aligned addr
> > >> * Add check on the first memhdr of memory chunks
> > >> v4:
> > >> * Replace the iterating with simple nb_mem_chunks check
> > >> ---
> > >> drivers/net/af_xdp/rte_eth_af_xdp.c | 33
> > >> +++++++++++++++++++++++------
> > >> 1 file changed, 26 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> index 6ba455bb9b..d0431ec089 100644
> > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev
> > >> __rte_unused, }
> > >>
> > >> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > >> uint64_t
> > >> *align)
> > >> +static inline uintptr_t
> > >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align,
> > >> +size_t *len)
> > >> {
> > >> struct rte_mempool_memhdr *memhdr;
> > >> uintptr_t memhdr_addr, aligned_addr;
> > >>
> > >> + if (mp->nb_mem_chunks != 1) {
> > >> + /*
> > >> + * The mempool with multiple chunks is not virtual contiguous
> > but
> > >> + * xsk umem only support single virtual region mapping.
> > >> + */
> > >> + AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> > >> chunks\n",
> > >> + mp->nb_mem_chunks);
> > >> + return 0;
> > >> + }
> > >> +
> > >> + /* Get the mempool base addr and align from the header now */
> > >> memhdr = STAILQ_FIRST(&mp->mem_list);
> > >> + if (!memhdr) {
> > >> + AF_XDP_LOG(ERR, "The mempool is not populated\n");
> > >> + return 0;
> > >> + }
> > >> memhdr_addr = (uintptr_t)memhdr->addr;
> > >> - aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > >> + aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> > >> *align = memhdr_addr - aligned_addr;
> > >> -
> > >> + *len = memhdr->len;
> > >> return aligned_addr;
> > >
> > > On x86_64, the page size is 4 KB = 0x1000.
> > >
> > > Let's look at an example where memhdr->addr is not aligned to the page
> size:
> > >
> > > In the example,
> > > memhdr->addr is 0x700100, and
> > > memhdr->len is 0x20000.
> > >
> > > Then
> > > aligned_addr becomes 0x700000,
> > > *align becomes 0x100, and
> > > *len becomes 0x20000.
> > >
> > >> }
> > >>
> > >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >> void *base_addr = NULL;
> > >> struct rte_mempool *mb_pool = rxq->mb_pool;
> > >> uint64_t umem_size, align = 0;
> > >> + size_t len = 0;
> > >>
> > >> if (internals->shared_umem) {
> > >> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > >> -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >> }
> > >>
> > >> umem->mb_pool = mb_pool;
> > >> - base_addr = (void *)get_base_addr(mb_pool, &align);
> > >> - umem_size = (uint64_t)mb_pool->populated_size *
> > >> - (uint64_t)usr_config.frame_size +
> > >> - align;
> > >> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> > >> + if (!base_addr) {
> > >> + AF_XDP_LOG(ERR, "The memory pool can't be mapped
> > as
> > >> umem\n");
> > >> + goto err;
> > >> + }
> > >> + umem_size = (uint64_t)len + align;
> > >
> > > Here, umem_size becomes 0x20100.
> > >
> > >>
> > >> ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > >> &rxq->fq, &rxq->cq, &usr_config);
> > >
> > > Here, xsk_umem__create() is called with the base_address (0x700000)
> > preceding the address of the memory chunk (0x700100).
> > > It looks like a bug, causing a buffer underrun. I.e. will it access memory
> starting
> > at base_address?
> > >
> >
> > I already asked for this on v2, Frank mentioned that area is not accessed
> and
> > having gap is safe.
>
> xsk_umem__create() requires a base address that is aligned to a page boundary.
> And, there is no chance to access the area between 0x700000 and 0x700100,
> because the memory pointer for each XSK TX/RX descriptor is derived from the
> mbuf data area.
OK, thanks for explaining.
Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
> >
> > > If I'm correct, the code should probably do this for alignment instead:
> > >
> > > aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize()); *align =
> > > aligned_addr - memhdr_addr; umem_size = (uint64_t)len - align;
> > >
> > >
> > > Disclaimer: I don't know much about the AF_XDP implementation, so maybe I
> > just don't understand what is going on.
> > >
> > >> --
> > >> 2.34.1
> > >
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v5] net/af_xdp: parse umem map info from mempool range api
2024-04-26 0:51 [PATCH] net/af_xdp: fix umem map size for zero copy Frank Du
` (3 preceding siblings ...)
2024-05-23 8:07 ` [PATCH v4] " Frank Du
@ 2024-06-20 3:25 ` Frank Du
2024-06-20 7:10 ` Morten Brørup
4 siblings, 1 reply; 29+ messages in thread
From: Frank Du @ 2024-06-20 3:25 UTC (permalink / raw)
To: dev; +Cc: ciara.loftus, ferruh.yigit, mb
The current calculation assumes that the mbufs are contiguous. However,
this assumption is incorrect when the mbuf memory spans across huge page.
Correct to directly read with mempool get range API.
Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
Cc: stable@dpdk.org
Signed-off-by: Frank Du <frank.du@intel.com>
---
v2:
* Add virtual contiguous detect for for multiple memhdrs
v3:
* Use RTE_ALIGN_FLOOR to get the aligned addr
* Add check on the first memhdr of memory chunks
v4:
* Replace the iterating with simple nb_mem_chunks check
v5:
* Use rte_mempool_get_mem_range to query the mempool range
---
drivers/net/af_xdp/rte_eth_af_xdp.c | 42 ++++++++++++++---------------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 4b282adb03..0bc0d9a55a 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -1067,19 +1067,6 @@ eth_link_update(struct rte_eth_dev *dev __rte_unused,
}
#if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
-static inline uintptr_t get_base_addr(struct rte_mempool *mp, uint64_t *align)
-{
- struct rte_mempool_memhdr *memhdr;
- uintptr_t memhdr_addr, aligned_addr;
-
- memhdr = STAILQ_FIRST(&mp->mem_list);
- memhdr_addr = (uintptr_t)memhdr->addr;
- aligned_addr = memhdr_addr & ~(getpagesize() - 1);
- *align = memhdr_addr - aligned_addr;
-
- return aligned_addr;
-}
-
/* Check if the netdev,qid context already exists */
static inline bool
ctx_exists(struct pkt_rx_queue *rxq, const char *ifname,
@@ -1150,9 +1137,10 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
.fill_size = ETH_AF_XDP_DFLT_NUM_DESCS * 2,
.comp_size = ETH_AF_XDP_DFLT_NUM_DESCS,
.flags = XDP_UMEM_UNALIGNED_CHUNK_FLAG};
- void *base_addr = NULL;
struct rte_mempool *mb_pool = rxq->mb_pool;
- uint64_t umem_size, align = 0;
+ void *aligned_addr;
+ uint64_t umem_size;
+ struct rte_mempool_mem_range_info range;
if (internals->shared_umem) {
if (get_shared_umem(rxq, internals->if_name, &umem) < 0)
@@ -1184,19 +1172,29 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
umem->mb_pool = mb_pool;
- base_addr = (void *)get_base_addr(mb_pool, &align);
- umem_size = (uint64_t)mb_pool->populated_size *
- (uint64_t)usr_config.frame_size +
- align;
-
- ret = xsk_umem__create(&umem->umem, base_addr, umem_size,
+ ret = rte_mempool_get_mem_range(mb_pool, &range);
+ if (ret < 0) {
+ AF_XDP_LOG(ERR, "Failed(%d) to get range from mempool\n", ret);
+ goto err;
+ }
+ if (!range.is_contiguous) {
+ AF_XDP_LOG(ERR, "Can't mapped to umem as mempool is not contiguous\n");
+ goto err;
+ }
+ /*
+ * umem requires the memory area be page aligned, safe to map with a large area as
+ * the memory pointer for each XSK TX/RX descriptor is derived from mbuf data area.
+ */
+ aligned_addr = (void *)RTE_ALIGN_FLOOR((uintptr_t)range.start, getpagesize());
+ umem_size = range.length + RTE_PTR_DIFF(range.start, aligned_addr);
+ ret = xsk_umem__create(&umem->umem, aligned_addr, umem_size,
&rxq->fq, &rxq->cq, &usr_config);
if (ret) {
AF_XDP_LOG(ERR, "Failed to create umem [%d]: [%s]\n",
errno, strerror(errno));
goto err;
}
- umem->buffer = base_addr;
+ umem->buffer = aligned_addr;
if (internals->shared_umem) {
umem->max_xsks = mb_pool->populated_size /
--
2.34.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: [PATCH v5] net/af_xdp: parse umem map info from mempool range api
2024-06-20 3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
@ 2024-06-20 7:10 ` Morten Brørup
2024-07-06 3:40 ` Ferruh Yigit
0 siblings, 1 reply; 29+ messages in thread
From: Morten Brørup @ 2024-06-20 7:10 UTC (permalink / raw)
To: Frank Du, dev; +Cc: ciara.loftus, ferruh.yigit
> From: Frank Du [mailto:frank.du@intel.com]
>
> The current calculation assumes that the mbufs are contiguous. However,
> this assumption is incorrect when the mbuf memory spans across huge page.
>
> Correct to directly read with mempool get range API.
>
> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> Cc: stable@dpdk.org
>
> Signed-off-by: Frank Du <frank.du@intel.com>
>
> ---
> v2:
> * Add virtual contiguous detect for for multiple memhdrs
> v3:
> * Use RTE_ALIGN_FLOOR to get the aligned addr
> * Add check on the first memhdr of memory chunks
> v4:
> * Replace the iterating with simple nb_mem_chunks check
> v5:
> * Use rte_mempool_get_mem_range to query the mempool range
> ---
Acked-by: Morten Brørup <mb@smartsharesystems.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5] net/af_xdp: parse umem map info from mempool range api
2024-06-20 7:10 ` Morten Brørup
@ 2024-07-06 3:40 ` Ferruh Yigit
0 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2024-07-06 3:40 UTC (permalink / raw)
To: Morten Brørup, Frank Du, dev; +Cc: ciara.loftus
On 6/20/2024 8:10 AM, Morten Brørup wrote:
>> From: Frank Du [mailto:frank.du@intel.com]
>>
>> The current calculation assumes that the mbufs are contiguous. However,
>> this assumption is incorrect when the mbuf memory spans across huge page.
>>
>> Correct to directly read with mempool get range API.
>>
>> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Frank Du <frank.du@intel.com>
>>
>> ---
>> v2:
>> * Add virtual contiguous detect for for multiple memhdrs
>> v3:
>> * Use RTE_ALIGN_FLOOR to get the aligned addr
>> * Add check on the first memhdr of memory chunks
>> v4:
>> * Replace the iterating with simple nb_mem_chunks check
>> v5:
>> * Use rte_mempool_get_mem_range to query the mempool range
>> ---
>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
Acked-by: Ferruh Yigit <ferruh.yigit@amd.com>
Applied to dpdk-next-net/main, thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread