From: Ferruh Yigit <ferruh.yigit@amd.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"Du, Frank" <frank.du@intel.com>,
dev@dpdk.org, "Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>
Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
Date: Wed, 22 May 2024 11:20:03 +0100 [thread overview]
Message-ID: <d6f00c1b-20e4-491e-894d-33edb0f2dc6d@amd.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F487@smartserver.smartshare.dk>
On 5/22/2024 8:26 AM, Morten Brørup wrote:
>> From: Du, Frank [mailto:frank.du@intel.com]
>> Sent: Wednesday, 22 May 2024 03.25
>>
>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>> Sent: Wednesday, May 22, 2024 1:58 AM
>>>
>>> On 5/11/2024 6:26 AM, Frank Du wrote:
>>>> The current calculation assumes that the mbufs are contiguous.
>>>> However, this assumption is incorrect when the memory spans across a huge
>>> page.
>>>> Correct to directly read the size from the mempool memory chunks.
>>>>
>>>> Signed-off-by: Frank Du <frank.du@intel.com>
>>>>
>>>> ---
>>>> v2:
>>>> * Add virtual contiguous detect for for multiple memhdrs.
>>>> ---
>>>> drivers/net/af_xdp/rte_eth_af_xdp.c | 34
>>>> ++++++++++++++++++++++++-----
>>>> 1 file changed, 28 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> index 268a130c49..7456108d6d 100644
>>>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>>> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
>>>> __rte_unused, }
>>>>
>>>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>>>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
>>>> uint64_t *align)
>>>> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
>>>> +uint64_t *align, size_t *len)
>>>> {
>>>> - struct rte_mempool_memhdr *memhdr;
>>>> + struct rte_mempool_memhdr *memhdr, *next;
>>>> uintptr_t memhdr_addr, aligned_addr;
>>>> + size_t memhdr_len = 0;
>>>>
>>>> + /* get the mempool base addr and align */
>>>> memhdr = STAILQ_FIRST(&mp->mem_list);
>>>> memhdr_addr = (uintptr_t)memhdr->addr;
>
> This is not a new bug; but if the mempool is not populated, memhdr is NULL here.
>
>>>> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>>>> *align = memhdr_addr - aligned_addr;
>>>>
>>>
>>> I am aware this is not part of this patch, but as note, can't we use
>>> 'RTE_ALIGN_FLOOR' to calculate aligned address.
>>
>> Sure, will use RTE_ALIGN_FLOOR in next version.
>>
>>>
>>>
>>>> + memhdr_len += memhdr->len;
>>>> +
>>>> + /* check if virtual contiguous memory for multiple memhdrs */
>>>> + next = STAILQ_NEXT(memhdr, next);
>>>> + while (next != NULL) {
>>>> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
>>>> len) {
>>>> + AF_XDP_LOG(ERR, "memory chunks not virtual
>>> contiguous, "
>>>> + "next: %p, cur: %p(len: %" PRId64
>>> " )\n",
>>>> + next->addr, memhdr->addr, memhdr-
>>>> len);
>>>> + return 0;
>>>> + }
>>>>
>>>
>>> Isn't there a mempool flag that can help us figure out mempool is not IOVA
>>> contiguous? Isn't it sufficient on its own?
>>
>> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual
>> space, not IOVA. I haven't come across a flag specifically for CPU virtual
>> contiguity. The major limitation in XDP is XSK UMEM only supports registering
>> a single contiguous virtual memory area.
>
> I would assume that the EAL memory manager merges free memory into contiguous chunks whenever possible.
> @Anatoly, please confirm?
>
> If my assumption is correct, it means that if mp->nb_mem_chunks != 1, then the mempool is not virtual contiguous. And if mp->nb_mem_chunks == 1, then it is; there is no need to iterate through the memhdr list.
>
>>
>>>
>>>
>>>> + /* virtual contiguous */
>>>> + memhdr = next;
>>>> + memhdr_len += memhdr->len;
>>>> + next = STAILQ_NEXT(memhdr, next);
>>>> + }
>>>>
>>>> + *len = memhdr_len;
>>>> return aligned_addr;
>>>> }
>>>>
>>>
>>> This function goes too much details of the mempool object, and any change in
>>> mempool details has potential to break this code.
>>>
>>> @Andrew, @Morten, do you think does it make sense to have
>>> 'rte_mempool_info_get()' kind of function, that provides at least address
>> and
>>> length of the mempool, and used here?
>>>
>>> This helps to hide internal details and complexity of the mempool for users.
>
> I think all the relevant information is available as (public) fields directly in the rte_mempool.
> I agree about hiding internal details. For discriminating between private and public information, I would prefer marking the "private" fields in the rte_mempool structure as such.
>
Marking fields 'private' may break some user code, so I don't know about
this, although I agree better thing to do programmatically.
But even required fields are public, having an _info() API may help
abstracting the implementation details and kept it withing the library
(instead of leaking to drivers like this).
But not sure if there are other potential users of such an API.
> Optimally we need an rte_mempool_create() flag, specifying that the mempool objects must be allocated as one chunk of memory with contiguous virtual addresses when populating the mempool. As discussed in another thread [1], the proposed pointer compression library would also benefit from such a mempool flag.
>
Ack, this also can be useful for this usecase.
> [1] https://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35E9F45C@smartserver.smartshare.dk/
>
>>>
>>>
>>>>
>>>> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
>>> pmd_internals *internals,
>>>> void *base_addr = NULL;
>>>> struct rte_mempool *mb_pool = rxq->mb_pool;
>>>> uint64_t umem_size, align = 0;
>>>> + size_t len = 0;
>>>>
>>>> if (internals->shared_umem) {
>>>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
>>>> -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
>>> pmd_internals *internals,
>>>> }
>>>>
>>>> umem->mb_pool = mb_pool;
>>>> - base_addr = (void *)get_base_addr(mb_pool, &align);
>>>> - umem_size = (uint64_t)mb_pool->populated_size *
>>>> - (uint64_t)usr_config.frame_size +
>>>> - align;
>>>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>>>>
>>>
>>> Is this calculation correct if mempool is not already aligned to page size?
>
> Please note: The mempool uses one memzone for the mempool structure itself. The objects in the mempool are stored in another memzone (or multiple other memzones, if necessary). I think you are talking about the alignment of the mempool object chunk, not of the mempool structure itself.
>
Yes, about the mempool object chunk.
next prev parent reply other threads:[~2024-05-22 10:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-26 0:51 [PATCH] " Frank Du
2024-04-26 10:43 ` Loftus, Ciara
2024-04-28 0:46 ` Du, Frank
2024-04-30 9:22 ` Loftus, Ciara
2024-05-11 5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19 ` Loftus, Ciara
2024-05-20 1:28 ` Du, Frank
2024-05-21 15:43 ` Ferruh Yigit
2024-05-21 17:57 ` Ferruh Yigit
2024-05-22 1:25 ` Du, Frank
2024-05-22 7:26 ` Morten Brørup
2024-05-22 10:20 ` Ferruh Yigit [this message]
2024-05-23 6:56 ` Du, Frank
2024-05-23 7:40 ` Morten Brørup
2024-05-23 7:56 ` Du, Frank
2024-05-29 12:57 ` Loftus, Ciara
2024-05-29 14:16 ` Morten Brørup
2024-05-22 10:00 ` Ferruh Yigit
2024-05-22 11:03 ` Morten Brørup
2024-05-22 14:05 ` Ferruh Yigit
2024-05-23 6:53 ` [PATCH v3] " Frank Du
2024-05-23 8:07 ` [PATCH v4] " Frank Du
2024-05-23 9:22 ` Morten Brørup
2024-05-23 13:31 ` Ferruh Yigit
2024-05-24 1:05 ` Du, Frank
2024-05-24 5:30 ` Morten Brørup
2024-06-20 3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
2024-06-20 7:10 ` Morten Brørup
2024-07-06 3:40 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d6f00c1b-20e4-491e-894d-33edb0f2dc6d@amd.com \
--to=ferruh.yigit@amd.com \
--cc=anatoly.burakov@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=ciara.loftus@intel.com \
--cc=dev@dpdk.org \
--cc=frank.du@intel.com \
--cc=mb@smartsharesystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).