Re: [PATCH] ethdev: fix device init without socket-local memory

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org, stable@dpdk.org,
	Padraig Connolly <padraig.j.connolly@intel.com>
Subject: Re: [PATCH] ethdev: fix device init without socket-local memory
Date: Fri, 19 Jul 2024 16:31:11 +0100	[thread overview]
Message-ID: <e22166ca-04ce-42a0-9435-14935c91d8b2@amd.com> (raw)
In-Reply-To: <ZppogrHvlJcb7t4M@bricha3-mobl1.ger.corp.intel.com>

On 7/19/2024 2:22 PM, Bruce Richardson wrote:
> On Fri, Jul 19, 2024 at 12:10:24PM +0100, Ferruh Yigit wrote:
>> On 7/19/2024 10:57 AM, Bruce Richardson wrote:
>>> On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote:
>>>> On 7/11/2024 1:35 PM, Bruce Richardson wrote:
>>>>> When allocating memory for an ethdev, the rte_malloc_socket call used
>>>>> only allocates memory on the NUMA node/socket local to the device. This
>>>>> means that even if the user wanted to, they could never use a remote NIC
>>>>> without also having memory on that NIC's socket.
>>>>>
>>>>> For example, if we change examples/skeleton/basicfwd.c to have
>>>>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
>>>>> be able to run the app cross-numa e.g. as below, where the two PCI
>>>>> devices are on socket 1, and core 1 is on socket 0:
>>>>>
>>>>>  ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
>>>>> 		-a a8:00.0 -a b8:00.0
>>>>>
>>>>> This fails however, with the error:
>>>>>
>>>>>   ETHDEV: failed to allocate private data
>>>>>   PCI_BUS: Requested device 0000:a8:00.0 cannot be used
>>>>>
>>>>> We can remove this restriction by doing a fallback call to general
>>>>> rte_malloc after a call to rte_malloc_socket fails. This should be safe
>>>>> to do because the later ethdev calls to setup Rx/Tx queues all take a
>>>>> socket_id parameter, which can be used by applications to enforce the
>>>>> requirement for local-only memory for a device, if so desired. [If
>>>>> device-local memory is present it will be used as before, while if not
>>>>> present the rte_eth_dev_configure call will now pass, but the subsequent
>>>>> queue setup calls requesting local memory will fail].
>>>>>
>>>>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
>>>>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>>>> Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>
>>>>>
>>>>
>>>> Hi Bruce,
>>>>
>>>> If device-local memory is present, behavior will be same, so I agree
>>>> this is low impact.
>>>>
>>>> But for the case device-local memory is NOT present, should we enforce
>>>> the HW setup is the question. This can be beneficial for users new to DPDK.
>>>>
>>>
>>> No we should not do so, because if we do, there is no way for the user to
>>> allow using remote memory - the probe/init and even configure call has NO
>>> socket_id parameter in it, so the enforcement of local memory is an
>>> internal assumption on the part of the API which is not documented
>>> anywhere, and is not possible for the user to override.
>>>
>>>> Probably 'dev_private' on its own has small impact on the performance
>>>> (although it may depend if these fields used in datapath), but it may be
>>>> vehicle to enforce local memory.
>>>>
>>>
>>> As I explain above in the cover letter, there are already other ways to
>>> enforce local memory - we don't need another one. If the user only wants to
>>> use local memory for a port, they can do so by setting the socket_id
>>> parameter of the rx and tx queues. Enforcing local memory in probe
>>> doesn't add anything to that, and just prevents other use-cases.
>>>
>>>> What is enabled by enabling app to run on cross-numa memory, since on a
>>>> production I expect users would like to use device-local memory for
>>>> performance reasons anyway?
>>>>
>>>
>>> Mostly users want socket-local memory, but with increasing speeds of NICs
>>> we are already seeing cases where users want to run cross-NUMA. For
>>> example, a multi-port NIC may have some ports in use on each socket.
>>>
>>
>> Ack.
>>
>>>>
>>>> Also I am not sure if this is a fix, or change of a intentional behavior.
>>>>
>>>
>>> I suppose it can be viewed either way. However, for me this is a fix,
>>> because right now it is impossible for many users to run their applications
>>> with memory on a different socket to the ports. Nowhere is it documented in
>>> DPDK that there is a hard restriction that ports must have local memory, so
>>> any enforcement of such a policy is wrong.
>>>
>>
>> Although it seems this is done intentionally in the API, I agree that
>> this is not documented in the API, or this restriction is not part of
>> the API definition.
>>
>>> Turning things the other way around - I can't see how anything will break
>>> or even slow down with this patch applied. As I point out above, the user
>>> can already enforce local memory by passing the required socket id when
>>> allocating rx and tx rings - this patch only pushed the failure for
>>> non-local memory a bit later in the initialization sequence, where the user
>>> can actually specify the desired NUMA behaviour. Is there some
>>> case I'm missing where you forsee this causing problems?
>>>
>>
>> A new user may not know that allocating memory from cross-numa impacts
>> performance negatively and have this configuration unintentionally,
>> current failure enforces the ideal configuration.
>>
>> One option can be adding a warning log to the fallback case, saying that
>> memory allocated from non-local socket and performance will be less.
>> Although this message may not mean much to a new user, it may still help
>> via a support engineer or internet search...
>>
> 
> Yes, we could add a warning, but that is probably better in the app itself.
> Thinking about where we get issues, they primarily stem from running the
> cores on a different numa node  Since the private_data struct is accessed
> by cores not devices, any perf degradation will come from having it remote
> to the cores. Because of that, I actually think the original implementation
> should really have used "rte_socket_id()" rather than the device socket id
> for the allocation.
>

Yes I guess private_data is not accessed by device, but it may be
accessed by cores that is running the datapath.

This API may be called by core that is not involved to the datapath, so
it may not correct to allocate memory from its numa.

Will it be wrong to assume that cores used for datapath will be same
numa with device, if so allocating memory from that numa (which device
is in) makes more sense. Am I missing something?


> 
> For v2, will I therefore include a warning in the case that rte_socket_id()
> != device socket_id? WDYT?
> 
> /Bruce

next prev parent reply	other threads:[~2024-07-19 15:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 12:35 Bruce Richardson
2024-07-19  8:59 ` Ferruh Yigit
2024-07-19  9:57   ` Bruce Richardson
2024-07-19 11:10     ` Ferruh Yigit
2024-07-19 13:22       ` Bruce Richardson
2024-07-19 15:31         ` Ferruh Yigit [this message]
2024-07-19 16:10           ` Bruce Richardson
2024-07-21 22:56             ` Ferruh Yigit
2024-07-22 10:06               ` Bruce Richardson
2024-07-19 10:41   ` Bruce Richardson
2024-07-22 10:02 ` [PATCH v2] " Bruce Richardson
2024-07-22 13:24   ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e22166ca-04ce-42a0-9435-14935c91d8b2@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=padraig.j.connolly@intel.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).