DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Du, Frank" <frank.du@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
	"Ferruh Yigit" <ferruh.yigit@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>
Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
Date: Thu, 23 May 2024 07:56:06 +0000	[thread overview]
Message-ID: <PH0PR11MB4775FE8DA69364E04209E4DF80F42@PH0PR11MB4775.namprd11.prod.outlook.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F491@smartserver.smartshare.dk>

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, May 23, 2024 3:41 PM
> To: Du, Frank <frank.du@intel.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> dev@dpdk.org; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Burakov,
> Anatoly <anatoly.burakov@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: RE: [PATCH v2] net/af_xdp: fix umem map size for zero copy
> 
> > From: Du, Frank [mailto:frank.du@intel.com]
> > Sent: Thursday, 23 May 2024 08.56
> >
> > > From: Morten Brørup <mb@smartsharesystems.com>
> > > Sent: Wednesday, May 22, 2024 3:27 PM
> > >
> > > > From: Du, Frank [mailto:frank.du@intel.com]
> > > > Sent: Wednesday, 22 May 2024 03.25
> > > >
> > > > > From: Ferruh Yigit <ferruh.yigit@amd.com>
> > > > > Sent: Wednesday, May 22, 2024 1:58 AM
> > > > >
> > > > > On 5/11/2024 6:26 AM, Frank Du wrote:
> > > > > > The current calculation assumes that the mbufs are contiguous.
> > > > > > However, this assumption is incorrect when the memory spans
> > > > > > across a huge
> > > > > page.
> 
> What does "the memory spans across a huge page" mean?
> 
> Should it be "the memory spans across multiple memory chunks"?

This does not pertain to multiple memory chunks but rather to mbuf memory. The scenario involves a single memory chunk utilizing multiple 2M pages. To ensure that each mbuf resides exclusively within a single page, there are deliberate spacing gaps when allocating mbufs across the 2M page boundaries.

> 
> > > > > > Correct to directly read the size from the mempool memory chunks.
> > > > > >
> > > > > > Signed-off-by: Frank Du <frank.du@intel.com>
> > > > > >
> > > > > > ---
> > > > > > v2:
> > > > > > * Add virtual contiguous detect for for multiple memhdrs.
> > > > > > ---
> > > > > >  drivers/net/af_xdp/rte_eth_af_xdp.c | 34
> > > > > > ++++++++++++++++++++++++-----
> > > > > >  1 file changed, 28 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > index 268a130c49..7456108d6d 100644
> > > > > > --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > > > > > @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev
> > > > > > *dev __rte_unused,  }
> > > > > >
> > > > > >  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > > > > > -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > > > > > uint64_t *align)
> > > > > > +static inline uintptr_t get_memhdr_info(struct rte_mempool
> > > > > > +*mp, uint64_t *align, size_t *len)
> > > > > >  {
> > > > > > -	struct rte_mempool_memhdr *memhdr;
> > > > > > +	struct rte_mempool_memhdr *memhdr, *next;
> > > > > >  	uintptr_t memhdr_addr, aligned_addr;
> > > > > > +	size_t memhdr_len = 0;
> > > > > >
> > > > > > +	/* get the mempool base addr and align */
> > > > > >  	memhdr = STAILQ_FIRST(&mp->mem_list);
> > > > > >  	memhdr_addr = (uintptr_t)memhdr->addr;
> > >
> > > This is not a new bug; but if the mempool is not populated, memhdr
> > > is NULL
> > here.
> >
> > Thanks, will add a check later.
> >
> > >
> > > > > >  	aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > > > > >  	*align = memhdr_addr - aligned_addr;
> > > > > >
> > > > >
> > > > > I am aware this is not part of this patch, but as note, can't we
> > > > > use 'RTE_ALIGN_FLOOR' to calculate aligned address.
> > > >
> > > > Sure, will use RTE_ALIGN_FLOOR in next version.
> > > >
> > > > >
> > > > >
> > > > > > +	memhdr_len += memhdr->len;
> > > > > > +
> > > > > > +	/* check if virtual contiguous memory for multiple memhdrs */
> > > > > > +	next = STAILQ_NEXT(memhdr, next);
> > > > > > +	while (next != NULL) {
> > > > > > +		if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr +
> > memhdr-
> > > > > >len) {
> > > > > > +			AF_XDP_LOG(ERR, "memory chunks not virtual
> > > > > contiguous, "
> > > > > > +					"next: %p, cur: %p(len: %"
> PRId64
> > > > > " )\n",
> > > > > > +					next->addr, memhdr->addr,
> memhdr-
> > > > > >len);
> > > > > > +			return 0;
> > > > > > +		}
> > > > > >
> > > > >
> > > > > Isn't there a mempool flag that can help us figure out mempool
> > > > > is not IOVA contiguous? Isn't it sufficient on its own?
> > > >
> > > > Indeed, what we need to ascertain is whether it's contiguous in
> > > > CPU virtual space, not IOVA. I haven't come across a flag
> > > > specifically for CPU virtual contiguity. The major limitation in
> > > > XDP is XSK UMEM only supports registering a single contiguous virtual
> memory area.
> > >
> > > I would assume that the EAL memory manager merges free memory into
> > > contiguous chunks whenever possible.
> > > @Anatoly, please confirm?
> > >
> > > If my assumption is correct, it means that if mp->nb_mem_chunks !=
> > > 1, then
> > the
> > > mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1,
> > > then it
> > is;
> > > there is no need to iterate through the memhdr list.
> >
> > If this's true now, however, this assumption may not hold true in the
> > future code change, iterating through the list may is a safer way as
> > it carefully checks the virtual address without relying on any condition.
> 
> If there is exactly one memory chunk, it is virtual contiguous. It has one address
> and one length, so it must be.
> 
> If there are more than one memory chunk, I consider it unlikely that they are
> contiguous.
> Have you ever observed the opposite, i.e. a mempool with multiple memory
> chunks being virtual contiguous?
> 
> Iterating through the list does not seem safer to me, quite the opposite.
> Which future change are you trying to prepare for?
> 
> Keeping it simple is more likely to not break with future changes.

No, I haven't encountered a mempool with multiple memory chunks actually, not know how to construct such mempool. The initial approach was to return an error if multiple chunks were detected, and the iteration method was introduced later. I can revert to the original, simpler way.


  reply	other threads:[~2024-05-23  7:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26  0:51 [PATCH] " Frank Du
2024-04-26 10:43 ` Loftus, Ciara
2024-04-28  0:46   ` Du, Frank
2024-04-30  9:22     ` Loftus, Ciara
2024-05-11  5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19   ` Loftus, Ciara
2024-05-20  1:28     ` Du, Frank
2024-05-21 15:43   ` Ferruh Yigit
2024-05-21 17:57   ` Ferruh Yigit
2024-05-22  1:25     ` Du, Frank
2024-05-22  7:26       ` Morten Brørup
2024-05-22 10:20         ` Ferruh Yigit
2024-05-23  6:56         ` Du, Frank
2024-05-23  7:40           ` Morten Brørup
2024-05-23  7:56             ` Du, Frank [this message]
2024-05-29 12:57               ` Loftus, Ciara
2024-05-29 14:16                 ` Morten Brørup
2024-05-22 10:00       ` Ferruh Yigit
2024-05-22 11:03         ` Morten Brørup
2024-05-22 14:05           ` Ferruh Yigit
2024-05-23  6:53 ` [PATCH v3] " Frank Du
2024-05-23  8:07 ` [PATCH v4] " Frank Du
2024-05-23  9:22   ` Morten Brørup
2024-05-23 13:31     ` Ferruh Yigit
2024-05-24  1:05       ` Du, Frank
2024-05-24  5:30         ` Morten Brørup
2024-06-20  3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
2024-06-20  7:10   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR11MB4775FE8DA69364E04209E4DF80F42@PH0PR11MB4775.namprd11.prod.outlook.com \
    --to=frank.du@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=mb@smartsharesystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).