From: Ferruh Yigit <ferruh.yigit@amd.com>
To: "Du, Frank" <frank.du@intel.com>, "dev@dpdk.org" <dev@dpdk.org>,
"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
"Morten Brørup" <mb@smartsharesystems.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>,
"Burakov, Anatoly" <anatoly.burakov@intel.com>
Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
Date: Wed, 22 May 2024 11:00:36 +0100 [thread overview]
Message-ID: <36e654fe-078e-4df3-bb2f-de3917da3e17@amd.com> (raw)
In-Reply-To: <PH0PR11MB4775D4E96A677923E26C2C6080EB2@PH0PR11MB4775.namprd11.prod.outlook.com>
On 5/22/2024 2:25 AM, Du, Frank wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Wednesday, May 22, 2024 1:58 AM
>> To: Du, Frank <frank.du@intel.com>; dev@dpdk.org; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Morten Brørup
>> <mb@smartsharesystems.com>
>> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Burakov, Anatoly
>> <anatoly.burakov@intel.com>
>> Subject: Re: [PATCH v2] net/af_xdp: fix umem map size for zero copy
>>
>> On 5/11/2024 6:26 AM, Frank Du wrote:
>>> The current calculation assumes that the mbufs are contiguous.
>>> However, this assumption is incorrect when the memory spans across a huge
>> page.
>>> Correct to directly read the size from the mempool memory chunks.
>>>
>>> Signed-off-by: Frank Du <frank.du@intel.com>
>>>
>>> ---
>>> v2:
>>> * Add virtual contiguous detect for for multiple memhdrs.
>>> ---
>>> drivers/net/af_xdp/rte_eth_af_xdp.c | 34
>>> ++++++++++++++++++++++++-----
>>> 1 file changed, 28 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> index 268a130c49..7456108d6d 100644
>>> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
>>> @@ -1039,16 +1039,35 @@ eth_link_update(struct rte_eth_dev *dev
>>> __rte_unused, }
>>>
>>> #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
>>> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
>>> uint64_t *align)
>>> +static inline uintptr_t get_memhdr_info(struct rte_mempool *mp,
>>> +uint64_t *align, size_t *len)
>>> {
>>> - struct rte_mempool_memhdr *memhdr;
>>> + struct rte_mempool_memhdr *memhdr, *next;
>>> uintptr_t memhdr_addr, aligned_addr;
>>> + size_t memhdr_len = 0;
>>>
>>> + /* get the mempool base addr and align */
>>> memhdr = STAILQ_FIRST(&mp->mem_list);
>>> memhdr_addr = (uintptr_t)memhdr->addr;
>>> aligned_addr = memhdr_addr & ~(getpagesize() - 1);
>>> *align = memhdr_addr - aligned_addr;
>>>
>>
>> I am aware this is not part of this patch, but as note, can't we use
>> 'RTE_ALIGN_FLOOR' to calculate aligned address.
>
> Sure, will use RTE_ALIGN_FLOOR in next version.
>
>>
>>
>>> + memhdr_len += memhdr->len;
>>> +
>>> + /* check if virtual contiguous memory for multiple memhdrs */
>>> + next = STAILQ_NEXT(memhdr, next);
>>> + while (next != NULL) {
>>> + if ((uintptr_t)next->addr != (uintptr_t)memhdr->addr + memhdr-
>>> len) {
>>> + AF_XDP_LOG(ERR, "memory chunks not virtual
>> contiguous, "
>>> + "next: %p, cur: %p(len: %" PRId64
>> " )\n",
>>> + next->addr, memhdr->addr, memhdr-
>>> len);
>>> + return 0;
>>> + }
>>>
>>
>> Isn't there a mempool flag that can help us figure out mempool is not IOVA
>> contiguous? Isn't it sufficient on its own?
>
> Indeed, what we need to ascertain is whether it's contiguous in CPU virtual space, not IOVA. I haven't come across a flag specifically for CPU virtual contiguity. The major limitation in XDP is XSK UMEM only supports registering a single contiguous virtual memory area.
>
'RTE_MEMPOOL_F_NO_IOVA_CONTIG' is the flag I was looking for. This flag
being *cleared* implies IOVA contiguous but not sure if it is
guaranteed, need to check.
And I may be wrong here, but as far as I remember in IOVA as VA mode,
process virtual address and IOVA address are same, so IOVA contiguous is
same as contiguous CPU virtual address.
>>
>>
>>> + /* virtual contiguous */
>>> + memhdr = next;
>>> + memhdr_len += memhdr->len;
>>> + next = STAILQ_NEXT(memhdr, next);
>>> + }
>>>
>>> + *len = memhdr_len;
>>> return aligned_addr;
>>> }
>>>
>>
>> This function goes too much details of the mempool object, and any change in
>> mempool details has potential to break this code.
>>
>> @Andrew, @Morten, do you think does it make sense to have
>> 'rte_mempool_info_get()' kind of function, that provides at least address and
>> length of the mempool, and used here?
>>
>> This helps to hide internal details and complexity of the mempool for users.
>>
>>
>>>
>>> @@ -1125,6 +1144,7 @@ xsk_umem_info *xdp_umem_configure(struct
>> pmd_internals *internals,
>>> void *base_addr = NULL;
>>> struct rte_mempool *mb_pool = rxq->mb_pool;
>>> uint64_t umem_size, align = 0;
>>> + size_t len = 0;
>>>
>>> if (internals->shared_umem) {
>>> if (get_shared_umem(rxq, internals->if_name, &umem) < 0) @@
>>> -1156,10 +1176,12 @@ xsk_umem_info *xdp_umem_configure(struct
>> pmd_internals *internals,
>>> }
>>>
>>> umem->mb_pool = mb_pool;
>>> - base_addr = (void *)get_base_addr(mb_pool, &align);
>>> - umem_size = (uint64_t)mb_pool->populated_size *
>>> - (uint64_t)usr_config.frame_size +
>>> - align;
>>> + base_addr = (void *)get_memhdr_info(mb_pool, &align, &len);
>>>
>>
>> Is this calculation correct if mempool is not already aligned to page size?
>>
>> Like in an example page size is '0x1000', and "memhdr_addr = 0x000a1080"
>> returned aligned address is '0x000a1000', "base_addr = 0x000a1000"
>>
>> Any access between '0x000a1000' & '0x000a1080' is invalid. Is this expected?
>
> Yes, since the XSK UMEM memory area requires page alignment. However, no need to worry; the memory pointer in the XSK TX/RX descriptor is obtained from the mbuf data area. We don’t have any chance to access the invalid range [0x000a1000: 0x000a1080] here.
>
Thanks for clarification.
next prev parent reply other threads:[~2024-05-22 10:00 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-26 0:51 [PATCH] " Frank Du
2024-04-26 10:43 ` Loftus, Ciara
2024-04-28 0:46 ` Du, Frank
2024-04-30 9:22 ` Loftus, Ciara
2024-05-11 5:26 ` [PATCH v2] " Frank Du
2024-05-17 13:19 ` Loftus, Ciara
2024-05-20 1:28 ` Du, Frank
2024-05-21 15:43 ` Ferruh Yigit
2024-05-21 17:57 ` Ferruh Yigit
2024-05-22 1:25 ` Du, Frank
2024-05-22 7:26 ` Morten Brørup
2024-05-22 10:20 ` Ferruh Yigit
2024-05-23 6:56 ` Du, Frank
2024-05-23 7:40 ` Morten Brørup
2024-05-23 7:56 ` Du, Frank
2024-05-29 12:57 ` Loftus, Ciara
2024-05-29 14:16 ` Morten Brørup
2024-05-22 10:00 ` Ferruh Yigit [this message]
2024-05-22 11:03 ` Morten Brørup
2024-05-22 14:05 ` Ferruh Yigit
2024-05-23 6:53 ` [PATCH v3] " Frank Du
2024-05-23 8:07 ` [PATCH v4] " Frank Du
2024-05-23 9:22 ` Morten Brørup
2024-05-23 13:31 ` Ferruh Yigit
2024-05-24 1:05 ` Du, Frank
2024-05-24 5:30 ` Morten Brørup
2024-06-20 3:25 ` [PATCH v5] net/af_xdp: parse umem map info from mempool range api Frank Du
2024-06-20 7:10 ` Morten Brørup
2024-07-06 3:40 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36e654fe-078e-4df3-bb2f-de3917da3e17@amd.com \
--to=ferruh.yigit@amd.com \
--cc=anatoly.burakov@intel.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=ciara.loftus@intel.com \
--cc=dev@dpdk.org \
--cc=frank.du@intel.com \
--cc=mb@smartsharesystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).