From: Bruce Richardson <bruce.richardson@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, <stable@dpdk.org>,
Padraig Connolly <padraig.j.connolly@intel.com>
Subject: Re: [PATCH] ethdev: fix device init without socket-local memory
Date: Fri, 19 Jul 2024 14:22:10 +0100 [thread overview]
Message-ID: <ZppogrHvlJcb7t4M@bricha3-mobl1.ger.corp.intel.com> (raw)
In-Reply-To: <57b08bdb-5f45-4343-a8d2-598d66d82fe8@amd.com>
On Fri, Jul 19, 2024 at 12:10:24PM +0100, Ferruh Yigit wrote:
> On 7/19/2024 10:57 AM, Bruce Richardson wrote:
> > On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote:
> >> On 7/11/2024 1:35 PM, Bruce Richardson wrote:
> >>> When allocating memory for an ethdev, the rte_malloc_socket call used
> >>> only allocates memory on the NUMA node/socket local to the device. This
> >>> means that even if the user wanted to, they could never use a remote NIC
> >>> without also having memory on that NIC's socket.
> >>>
> >>> For example, if we change examples/skeleton/basicfwd.c to have
> >>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
> >>> be able to run the app cross-numa e.g. as below, where the two PCI
> >>> devices are on socket 1, and core 1 is on socket 0:
> >>>
> >>> ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
> >>> -a a8:00.0 -a b8:00.0
> >>>
> >>> This fails however, with the error:
> >>>
> >>> ETHDEV: failed to allocate private data
> >>> PCI_BUS: Requested device 0000:a8:00.0 cannot be used
> >>>
> >>> We can remove this restriction by doing a fallback call to general
> >>> rte_malloc after a call to rte_malloc_socket fails. This should be safe
> >>> to do because the later ethdev calls to setup Rx/Tx queues all take a
> >>> socket_id parameter, which can be used by applications to enforce the
> >>> requirement for local-only memory for a device, if so desired. [If
> >>> device-local memory is present it will be used as before, while if not
> >>> present the rte_eth_dev_configure call will now pass, but the subsequent
> >>> queue setup calls requesting local memory will fail].
> >>>
> >>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
> >>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >>> Signed-off-by: Padraig Connolly <padraig.j.connolly@intel.com>
> >>>
> >>
> >> Hi Bruce,
> >>
> >> If device-local memory is present, behavior will be same, so I agree
> >> this is low impact.
> >>
> >> But for the case device-local memory is NOT present, should we enforce
> >> the HW setup is the question. This can be beneficial for users new to DPDK.
> >>
> >
> > No we should not do so, because if we do, there is no way for the user to
> > allow using remote memory - the probe/init and even configure call has NO
> > socket_id parameter in it, so the enforcement of local memory is an
> > internal assumption on the part of the API which is not documented
> > anywhere, and is not possible for the user to override.
> >
> >> Probably 'dev_private' on its own has small impact on the performance
> >> (although it may depend if these fields used in datapath), but it may be
> >> vehicle to enforce local memory.
> >>
> >
> > As I explain above in the cover letter, there are already other ways to
> > enforce local memory - we don't need another one. If the user only wants to
> > use local memory for a port, they can do so by setting the socket_id
> > parameter of the rx and tx queues. Enforcing local memory in probe
> > doesn't add anything to that, and just prevents other use-cases.
> >
> >> What is enabled by enabling app to run on cross-numa memory, since on a
> >> production I expect users would like to use device-local memory for
> >> performance reasons anyway?
> >>
> >
> > Mostly users want socket-local memory, but with increasing speeds of NICs
> > we are already seeing cases where users want to run cross-NUMA. For
> > example, a multi-port NIC may have some ports in use on each socket.
> >
>
> Ack.
>
> >>
> >> Also I am not sure if this is a fix, or change of a intentional behavior.
> >>
> >
> > I suppose it can be viewed either way. However, for me this is a fix,
> > because right now it is impossible for many users to run their applications
> > with memory on a different socket to the ports. Nowhere is it documented in
> > DPDK that there is a hard restriction that ports must have local memory, so
> > any enforcement of such a policy is wrong.
> >
>
> Although it seems this is done intentionally in the API, I agree that
> this is not documented in the API, or this restriction is not part of
> the API definition.
>
> > Turning things the other way around - I can't see how anything will break
> > or even slow down with this patch applied. As I point out above, the user
> > can already enforce local memory by passing the required socket id when
> > allocating rx and tx rings - this patch only pushed the failure for
> > non-local memory a bit later in the initialization sequence, where the user
> > can actually specify the desired NUMA behaviour. Is there some
> > case I'm missing where you forsee this causing problems?
> >
>
> A new user may not know that allocating memory from cross-numa impacts
> performance negatively and have this configuration unintentionally,
> current failure enforces the ideal configuration.
>
> One option can be adding a warning log to the fallback case, saying that
> memory allocated from non-local socket and performance will be less.
> Although this message may not mean much to a new user, it may still help
> via a support engineer or internet search...
>
Yes, we could add a warning, but that is probably better in the app itself.
Thinking about where we get issues, they primarily stem from running the
cores on a different numa node Since the private_data struct is accessed
by cores not devices, any perf degradation will come from having it remote
to the cores. Because of that, I actually think the original implementation
should really have used "rte_socket_id()" rather than the device socket id
for the allocation.
For v2, will I therefore include a warning in the case that rte_socket_id()
!= device socket_id? WDYT?
/Bruce
next prev parent reply other threads:[~2024-07-19 13:22 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 12:35 Bruce Richardson
2024-07-19 8:59 ` Ferruh Yigit
2024-07-19 9:57 ` Bruce Richardson
2024-07-19 11:10 ` Ferruh Yigit
2024-07-19 13:22 ` Bruce Richardson [this message]
2024-07-19 15:31 ` Ferruh Yigit
2024-07-19 16:10 ` Bruce Richardson
2024-07-21 22:56 ` Ferruh Yigit
2024-07-22 10:06 ` Bruce Richardson
2024-07-19 10:41 ` Bruce Richardson
2024-07-22 10:02 ` [PATCH v2] " Bruce Richardson
2024-07-22 13:24 ` Ferruh Yigit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZppogrHvlJcb7t4M@bricha3-mobl1.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@amd.com \
--cc=padraig.j.connolly@intel.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).