Re: [PATCH] ethdev: fix device init without socket-local memory

patches for DPDK stable branches
 help / color / mirror / Atom feed

From: Bruce Richardson <bruce.richardson@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, <stable@dpdk.org>,
	Padraig Connolly <padraig.j.connolly@intel.com>
Subject: Re: [PATCH] ethdev: fix device init without socket-local memory
Date: Fri, 19 Jul 2024 14:22:10 +0100	[thread overview]
Message-ID: <ZppogrHvlJcb7t4M@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <57b08bdb-5f45-4343-a8d2-598d66d82fe8@amd.com>

On Fri, Jul 19, 2024 at 12:10:24PM +0100, Ferruh Yigit wrote:
> On 7/19/2024 10:57 AM, Bruce Richardson wrote:
> > On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote:
> >> On 7/11/2024 1:35 PM, Bruce Richardson wrote:
> >>> When allocating memory for an ethdev, the rte_malloc_socket call used
> >>> only allocates memory on the NUMA node/socket local to the device. This
> >>> means that even if the user wanted to, they could never use a remote NIC
> >>> without also having memory on that NIC's socket.
> >>>
> >>> For example, if we change examples/skeleton/basicfwd.c to have
> >>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
> >>> be able to run the app cross-numa e.g. as below, where the two PCI
> >>> devices are on socket 1, and core 1 is on socket 0:
> >>>
> >>>  ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
> >>> 		-a a8:00.0 -a b8:00.0
> >>>
> >>> This fails however, with the error:
> >>>
> >>>   ETHDEV: failed to allocate private data
> >>>   PCI_BUS: Requested device 0000:a8:00.0 cannot be used
> >>>
> >>> We can remove this restriction by doing a fallback call to general
> >>> rte_malloc after a call to rte_malloc_socket fails. This should be safe
> >>> to do because the later ethdev calls to setup Rx/Tx queues all take a
> >>> socket_id parameter, which can be used by applications to enforce the
> >>> requirement for local-only memory for a device, if so desired. [If
> >>> device-local memory is present it will be used as before, while if not
> >>> present the rte_eth_dev_configure call will now pass, but the subsequent
> >>> queue setup calls requesting local memory will fail].
> >>>
> >>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
> >>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >>> Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>
> >>>
> >>
> >> Hi Bruce,
> >>
> >> If device-local memory is present, behavior will be same, so I agree
> >> this is low impact.
> >>
> >> But for the case device-local memory is NOT present, should we enforce
> >> the HW setup is the question. This can be beneficial for users new to DPDK.
> >>
> > 
> > No we should not do so, because if we do, there is no way for the user to
> > allow using remote memory - the probe/init and even configure call has NO
> > socket_id parameter in it, so the enforcement of local memory is an
> > internal assumption on the part of the API which is not documented
> > anywhere, and is not possible for the user to override.
> > 
> >> Probably 'dev_private' on its own has small impact on the performance
> >> (although it may depend if these fields used in datapath), but it may be
> >> vehicle to enforce local memory.
> >>
> > 
> > As I explain above in the cover letter, there are already other ways to
> > enforce local memory - we don't need another one. If the user only wants to
> > use local memory for a port, they can do so by setting the socket_id
> > parameter of the rx and tx queues. Enforcing local memory in probe
> > doesn't add anything to that, and just prevents other use-cases.
> > 
> >> What is enabled by enabling app to run on cross-numa memory, since on a
> >> production I expect users would like to use device-local memory for
> >> performance reasons anyway?
> >>
> > 
> > Mostly users want socket-local memory, but with increasing speeds of NICs
> > we are already seeing cases where users want to run cross-NUMA. For
> > example, a multi-port NIC may have some ports in use on each socket.
> > 
> 
> Ack.
> 
> >>
> >> Also I am not sure if this is a fix, or change of a intentional behavior.
> >>
> > 
> > I suppose it can be viewed either way. However, for me this is a fix,
> > because right now it is impossible for many users to run their applications
> > with memory on a different socket to the ports. Nowhere is it documented in
> > DPDK that there is a hard restriction that ports must have local memory, so
> > any enforcement of such a policy is wrong.
> > 
> 
> Although it seems this is done intentionally in the API, I agree that
> this is not documented in the API, or this restriction is not part of
> the API definition.
> 
> > Turning things the other way around - I can't see how anything will break
> > or even slow down with this patch applied. As I point out above, the user
> > can already enforce local memory by passing the required socket id when
> > allocating rx and tx rings - this patch only pushed the failure for
> > non-local memory a bit later in the initialization sequence, where the user
> > can actually specify the desired NUMA behaviour. Is there some
> > case I'm missing where you forsee this causing problems?
> > 
> 
> A new user may not know that allocating memory from cross-numa impacts
> performance negatively and have this configuration unintentionally,
> current failure enforces the ideal configuration.
> 
> One option can be adding a warning log to the fallback case, saying that
> memory allocated from non-local socket and performance will be less.
> Although this message may not mean much to a new user, it may still help
> via a support engineer or internet search...
> 

Yes, we could add a warning, but that is probably better in the app itself.
Thinking about where we get issues, they primarily stem from running the
cores on a different numa node  Since the private_data struct is accessed
by cores not devices, any perf degradation will come from having it remote
to the cores. Because of that, I actually think the original implementation
should really have used "rte_socket_id()" rather than the device socket id
for the allocation.

For v2, will I therefore include a warning in the case that rte_socket_id()
!= device socket_id? WDYT?

/Bruce

next prev parent reply	other threads:[~2024-07-19 13:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-11 12:35 Bruce Richardson
2024-07-19  8:59 ` Ferruh Yigit
2024-07-19  9:57   ` Bruce Richardson
2024-07-19 11:10     ` Ferruh Yigit
2024-07-19 13:22       ` Bruce Richardson [this message]
2024-07-19 15:31         ` Ferruh Yigit
2024-07-19 16:10           ` Bruce Richardson
2024-07-21 22:56             ` Ferruh Yigit
2024-07-22 10:06               ` Bruce Richardson
2024-07-19 10:41   ` Bruce Richardson
2024-07-22 10:02 ` [PATCH v2] " Bruce Richardson
2024-07-22 13:24   ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZppogrHvlJcb7t4M@bricha3-mobl1.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=padraig.j.connolly@intel.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).