From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org, stable@dpdk.org,
Padraig Connolly <padraig.j.connolly@intel.com>
Subject: Re: [PATCH] ethdev: fix device init without socket-local memory
Date: Fri, 19 Jul 2024 12:10:24 +0100 [thread overview]
Message-ID: <57b08bdb-5f45-4343-a8d2-598d66d82fe8@amd.com> (raw)
In-Reply-To: <Zpo4itRTrrQncmPu@bricha3-mobl1.ger.corp.intel.com>
On 7/19/2024 10:57 AM, Bruce Richardson wrote:
> On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote:
>> On 7/11/2024 1:35 PM, Bruce Richardson wrote:
>>> When allocating memory for an ethdev, the rte_malloc_socket call used
>>> only allocates memory on the NUMA node/socket local to the device. This
>>> means that even if the user wanted to, they could never use a remote NIC
>>> without also having memory on that NIC's socket.
>>>
>>> For example, if we change examples/skeleton/basicfwd.c to have
>>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
>>> be able to run the app cross-numa e.g. as below, where the two PCI
>>> devices are on socket 1, and core 1 is on socket 0:
>>>
>>> ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
>>> -a a8:00.0 -a b8:00.0
>>>
>>> This fails however, with the error:
>>>
>>> ETHDEV: failed to allocate private data
>>> PCI_BUS: Requested device 0000:a8:00.0 cannot be used
>>>
>>> We can remove this restriction by doing a fallback call to general
>>> rte_malloc after a call to rte_malloc_socket fails. This should be safe
>>> to do because the later ethdev calls to setup Rx/Tx queues all take a
>>> socket_id parameter, which can be used by applications to enforce the
>>> requirement for local-only memory for a device, if so desired. [If
>>> device-local memory is present it will be used as before, while if not
>>> present the rte_eth_dev_configure call will now pass, but the subsequent
>>> queue setup calls requesting local memory will fail].
>>>
>>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
>>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>
>>>
>>
>> Hi Bruce,
>>
>> If device-local memory is present, behavior will be same, so I agree
>> this is low impact.
>>
>> But for the case device-local memory is NOT present, should we enforce
>> the HW setup is the question. This can be beneficial for users new to DPDK.
>>
>
> No we should not do so, because if we do, there is no way for the user to
> allow using remote memory - the probe/init and even configure call has NO
> socket_id parameter in it, so the enforcement of local memory is an
> internal assumption on the part of the API which is not documented
> anywhere, and is not possible for the user to override.
>
>> Probably 'dev_private' on its own has small impact on the performance
>> (although it may depend if these fields used in datapath), but it may be
>> vehicle to enforce local memory.
>>
>
> As I explain above in the cover letter, there are already other ways to
> enforce local memory - we don't need another one. If the user only wants to
> use local memory for a port, they can do so by setting the socket_id
> parameter of the rx and tx queues. Enforcing local memory in probe
> doesn't add anything to that, and just prevents other use-cases.
>
>> What is enabled by enabling app to run on cross-numa memory, since on a
>> production I expect users would like to use device-local memory for
>> performance reasons anyway?
>>
>
> Mostly users want socket-local memory, but with increasing speeds of NICs
> we are already seeing cases where users want to run cross-NUMA. For
> example, a multi-port NIC may have some ports in use on each socket.
>
Ack.
>>
>> Also I am not sure if this is a fix, or change of a intentional behavior.
>>
>
> I suppose it can be viewed either way. However, for me this is a fix,
> because right now it is impossible for many users to run their applications
> with memory on a different socket to the ports. Nowhere is it documented in
> DPDK that there is a hard restriction that ports must have local memory, so
> any enforcement of such a policy is wrong.
>
Although it seems this is done intentionally in the API, I agree that
this is not documented in the API, or this restriction is not part of
the API definition.
> Turning things the other way around - I can't see how anything will break
> or even slow down with this patch applied. As I point out above, the user
> can already enforce local memory by passing the required socket id when
> allocating rx and tx rings - this patch only pushed the failure for
> non-local memory a bit later in the initialization sequence, where the user
> can actually specify the desired NUMA behaviour. Is there some
> case I'm missing where you forsee this causing problems?
>
A new user may not know that allocating memory from cross-numa impacts
performance negatively and have this configuration unintentionally,
current failure enforces the ideal configuration.
One option can be adding a warning log to the fallback case, saying that
memory allocated from non-local socket and performance will be less.
Although this message may not mean much to a new user, it may still help
via a support engineer or internet search...
next prev parent reply other threads:[~2024-07-19 11:10 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 12:35 Bruce Richardson
2024-07-19 8:59 ` Ferruh Yigit
2024-07-19 9:57 ` Bruce Richardson
2024-07-19 11:10 ` Ferruh Yigit [this message]
2024-07-19 13:22 ` Bruce Richardson
2024-07-19 15:31 ` Ferruh Yigit
2024-07-19 16:10 ` Bruce Richardson
2024-07-21 22:56 ` Ferruh Yigit
2024-07-22 10:06 ` Bruce Richardson
2024-07-19 10:41 ` Bruce Richardson
2024-07-22 10:02 ` [PATCH v2] " Bruce Richardson
2024-07-22 13:24 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57b08bdb-5f45-4343-a8d2-598d66d82fe8@amd.com \
--to=ferruh.yigit@amd.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=padraig.j.connolly@intel.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).