DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Du, Frank" <frank.du@intel.com>
To: "Ferruh Yigit" <ferruh.yigit@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Morten Brørup" <mb@smartsharesystems.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>
Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
Date: Wed, 22 May 2024 01:25:15 +0000	[thread overview]
Message-ID: <PH0PR11MB4775D4E96A677923E26C2C6080EB2@PH0PR11MB4775.namprd11.prod.outlook.com> (raw)
In-Reply-To: <0d02e8c6-0ef4-44e3-9dd2-94685b46136a@amd.com>

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, May 22, 2024 1:58 AM
> To: Du, Frank <frank.du@intel.com>; dev@dpdk.org; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Morten Brørup
> <mb@smartsharesystems.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
> 
> On 5/11/2024 6:26 AM, Frank Du wrote:
> > The current calculation assumes that the mbufs are contiguous.
> > However, this assumption is incorrect when the memory spans across a huge
> page.
> > Correct to directly read the size from the mempool memory chunks.
> >
> > Signed-off-by: Frank Du <frank.du@intel.com>
> >
> > ---
> > v2:
> > * Add virtual contiguous detect for for multiple memhdrs.
> > ---
> >  drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > ++++++++++++++++++++++++-----
> >  1 file changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > index 268a130c49..7456108d6d 100644
> > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
> > __rte_unused,  }
> >
> >  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > uint64_t *align)
> > +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
> > +uint64_t *align, size_t *len)
> >  {
> > -	struct rte_mempool_memhdr *memhdr;
> > +	struct rte_mempool_memhdr *memhdr, *next;
> >  	uintptr_t memhdr_addr, aligned_addr;
> > +	size_t memhdr_len = 0;
> >
> > +	/* get the mempool base addr and align */
> >  	memhdr = STAILQ_FIRST(&mp->mem_list);
> >  	memhdr_addr = (uintptr_t)memhdr->addr;
> >  	aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> >  	*align = memhdr_addr - aligned_addr;
> >
> 
> I am aware this is not part of this patch, but as note, can't we use
> 'RTE_ALIGN_FLOOR' to calculate aligned address.

Sure, will use RTE_ALIGN_FLOOR in next version.

> 
> 
> > +	memhdr_len += memhdr->len;
> > +
> > +	/* check if virtual contiguous memory for multiple memhdrs */
> > +	next = STAILQ_NEXT(memhdr, next);
> > +	while (next != NULL) {
> > +		if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
> >len) {
> > +			AF_XDP_LOG(ERR, "memory chunks not virtual
> contiguous, "
> > +					"next: %p, cur: %p(len: %" PRId64
> " )\n",
> > +					next->addr, memhdr->addr, memhdr-
> >len);
> > +			return 0;
> > +		}
> >
> 
> Isn't there a mempool flag that can help us figure out mempool is not IOVA
> contiguous? Isn't it sufficient on its own?

Indeed, what we need to ascertain is whether it's contiguous in CPU virtual space, not IOVA. I haven't come across a flag specifically for CPU virtual contiguity. The major limitation in XDP is XSK UMEM only supports registering a single contiguous virtual memory area.

> 
> 
> > +		/* virtual contiguous */
> > +		memhdr = next;
> > +		memhdr_len += memhdr->len;
> > +		next = STAILQ_NEXT(memhdr, next);
> > +	}
> >
> > +	*len = memhdr_len;
> >  	return aligned_addr;
> >  }
> >
> 
> This function goes too much details of the mempool object, and any change in
> mempool details has potential to break this code.
> 
> @Andrew, @Morten, do you think does it make sense to have
> 'rte_mempool_info_get()' kind of function, that provides at least address and
> length of the mempool, and used here?
> 
> This helps to hide internal details and complexity of the mempool for users.
> 
> 
> >
> > @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> >  	void *base_addr = NULL;
> >  	struct rte_mempool *mb_pool = rxq->mb_pool;
> >  	uint64_t umem_size, align = 0;
> > +	size_t len = 0;
> >
> >  	if (internals->shared_umem) {
> >  		if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
> > -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
> pmd_internals *internals,
> >  		}
> >
> >  		umem->mb_pool = mb_pool;
> > -		base_addr = (void *)get_base_addr(mb_pool, &align);
> > -		umem_size = (uint64_t)mb_pool->populated_size *
> > -				(uint64_t)usr_config.frame_size +
> > -				align;
> > +		base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
> >
> 
> Is this calculation correct if mempool is not already aligned to page size?
> 
> Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
> returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
> 
> Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?

Yes, since the XSK UMEM memory area requires page alignment. However, no need to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the mbuf data area. We don’t have any chance to access the invalid range [0x000a1000: 0x000a1080] here.

> 
> 
> > +		if (!base_addr) {
> > +			AF_XDP_LOG(ERR, "Failed to parse memhdr info from
> pool\n");
> >
> 
> Log message is not accurate, it is not parsing memhdr info failed, but mempool
> was not satisfying expectation.

Thanks, will correct it in next version.

> 
> > +			goto err;
> > +		}
> > +		umem_size = (uint64_t)len + align;
> >
> >  		ret = xsk_umem__create(&umem->umem, base_addr,
> umem_size,
> >  				&rxq->fq, &rxq->cq, &usr_config);


  reply	other threads:[~2024-05-22  1:25 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26  0:51 [PATCH] " Frank Du
2024-04-26 10:43 ` Loftus, Ciara
2024-04-28  0:46   ` Du, Frank
2024-04-30  9:22     ` Loftus, Ciara
2024-05-11  5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19   ` Loftus, Ciara
2024-05-20  1:28     ` Du, Frank
2024-05-21 15:43   ` Ferruh Yigit
2024-05-21 17:57   ` Ferruh Yigit
2024-05-22  1:25     ` Du, Frank [this message]
2024-05-22  7:26       ` Morten Brørup
2024-05-22 10:20         ` Ferruh Yigit
2024-05-23  6:56         ` Du, Frank
2024-05-23  7:40           ` Morten Brørup
2024-05-23  7:56             ` Du, Frank
2024-05-29 12:57               ` Loftus, Ciara
2024-05-29 14:16                 ` Morten Brørup
2024-05-22 10:00       ` Ferruh Yigit
2024-05-22 11:03         ` Morten Brørup
2024-05-22 14:05           ` Ferruh Yigit
2024-05-23  6:53 ` [PATCH v3] " Frank Du
2024-05-23  8:07 ` [PATCH v4] " Frank Du
2024-05-23  9:22   ` Morten Brørup
2024-05-23 13:31     ` Ferruh Yigit
2024-05-24  1:05       ` Du, Frank
2024-05-24  5:30         ` Morten Brørup
2024-06-20  3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
2024-06-20  7:10   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR11MB4775D4E96A677923E26C2C6080EB2@PH0PR11MB4775.namprd11.prod.outlook.com \
    --to=frank.du@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).