DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Possible bug in eal_pci pci_scan_one
@ 2014-10-06  9:13 Matthew Hall
  2014-10-14  5:47 ` Matthew Hall
  2014-10-24 13:06 ` Stephen Hemminger
  0 siblings, 2 replies; 5+ messages in thread
From: Matthew Hall @ 2014-10-06  9:13 UTC (permalink / raw)
  To: dev

Hi Guys,

I'm doing my development on kind of a cheap machine with no NUMA support... 
but several years ago I used DPDK to build a NUMA box that could do 40 gbits 
bidirectional L4-L7 stateful traffic replay.

So given the past experiences I had before, I wanted to clean the code up so 
it'd work well if some crazy guy tried my code on one of these huge boxes, 
too, but then I ran into some weird issues.

1) When I call rte_eth_dev_socket_id() I get back -1. But the call can return 
-1 if the port_id is bogus or if pci_scan_one didn't get a numa_node (because 
you're on a non-NUMA box for example).

int rte_eth_dev_socket_id(uint8_t port_id)
{
        if (port_id >= nb_ports)
                return -1;
        return rte_eth_devices[port_id].pci_dev->numa_node;
}

So you couldn't tell the different between non-NUMA or a bad port value, etc.

2) The code's behavior and comments disagree with one another. In the 
pci_scan_one function, there's this code:

/* get numa node */
snprintf(filename, sizeof(filename), "%s/numa_node",
         dirname);
if (access(filename, R_OK) != 0) {
        /* if no NUMA support just set node to 0 */
        dev->numa_node = -1;
} else {
        if (eal_parse_sysfs_value(filename, &tmp) < 0) {
                free(dev);
                return -1;
        }
        dev->numa_node = tmp;
}

It says, just use NUMA node 0 if there is no NUMA support. But then proceeds 
to set the value to -1 in disagreement with the comment, and also stomping on 
the other meaning for -1 in the higher function rte_eth_dev_socket_id.

3) In conclusion, it seems like some stuff is missing... first there needs to 
be a function that will tell you the number of NUMA nodes present on the box 
so you can create the right number of mbuf_pools, but I couldn't find that function.

Then if you have the function, you can do some magic and shuffle the NICs 
around to get them hooked to a core on the same NUMA, and the mbuf_pool on the 
same NUMA.

When NUMA is not present, can we return 0 instead of -1, or return a specific 
error code that the client can use to know he should just use Socket 0? Right 
now I can't tell apart any potential errors or weird values from correct 
values.

4) I'm willing to help make and test some patches... but first I want to 
understand what is happening with these funny functions before doing things 
blindly.

Thanks,
Matthew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Possible bug in eal_pci pci_scan_one
  2014-10-06  9:13 [dpdk-dev] Possible bug in eal_pci pci_scan_one Matthew Hall
@ 2014-10-14  5:47 ` Matthew Hall
  2014-10-23  0:44   ` Matthew Hall
  2014-10-24 13:06 ` Stephen Hemminger
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Hall @ 2014-10-14  5:47 UTC (permalink / raw)
  To: dev

Hi,

Did anybody get a chance to look what might be going on in this weird NUMA 
bug? I could use some help to understand how you're supposed to make code that 
will work right on both NUMA and non-NUMA. Otherwise it's hard to make a 
bulletproof DPDK based app that will be able to reliably init on single 
socket, dual socket non-NUMA, and dual socket NUMA boxes.

Thanks,
Matthew.

On Mon, Oct 06, 2014 at 02:13:44AM -0700, Matthew Hall wrote:
> Hi Guys,
> 
> I'm doing my development on kind of a cheap machine with no NUMA support... 
> but several years ago I used DPDK to build a NUMA box that could do 40 gbits 
> bidirectional L4-L7 stateful traffic replay.
> 
> So given the past experiences I had before, I wanted to clean the code up so 
> it'd work well if some crazy guy tried my code on one of these huge boxes, 
> too, but then I ran into some weird issues.
> 
> 1) When I call rte_eth_dev_socket_id() I get back -1. But the call can return 
> -1 if the port_id is bogus or if pci_scan_one didn't get a numa_node (because 
> you're on a non-NUMA box for example).
> 
> int rte_eth_dev_socket_id(uint8_t port_id)
> {
>         if (port_id >= nb_ports)
>                 return -1;
>         return rte_eth_devices[port_id].pci_dev->numa_node;
> }
> 
> So you couldn't tell the different between non-NUMA or a bad port value, etc.
> 
> 2) The code's behavior and comments disagree with one another. In the 
> pci_scan_one function, there's this code:
> 
> /* get numa node */
> snprintf(filename, sizeof(filename), "%s/numa_node",
>          dirname);
> if (access(filename, R_OK) != 0) {
>         /* if no NUMA support just set node to 0 */
>         dev->numa_node = -1;
> } else {
>         if (eal_parse_sysfs_value(filename, &tmp) < 0) {
>                 free(dev);
>                 return -1;
>         }
>         dev->numa_node = tmp;
> }
> 
> It says, just use NUMA node 0 if there is no NUMA support. But then proceeds 
> to set the value to -1 in disagreement with the comment, and also stomping on 
> the other meaning for -1 in the higher function rte_eth_dev_socket_id.
> 
> 3) In conclusion, it seems like some stuff is missing... first there needs to 
> be a function that will tell you the number of NUMA nodes present on the box 
> so you can create the right number of mbuf_pools, but I couldn't find that function.
> 
> Then if you have the function, you can do some magic and shuffle the NICs 
> around to get them hooked to a core on the same NUMA, and the mbuf_pool on the 
> same NUMA.
> 
> When NUMA is not present, can we return 0 instead of -1, or return a specific 
> error code that the client can use to know he should just use Socket 0? Right 
> now I can't tell apart any potential errors or weird values from correct 
> values.
> 
> 4) I'm willing to help make and test some patches... but first I want to 
> understand what is happening with these funny functions before doing things 
> blindly.
> 
> Thanks,
> Matthew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Possible bug in eal_pci pci_scan_one
  2014-10-14  5:47 ` Matthew Hall
@ 2014-10-23  0:44   ` Matthew Hall
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hall @ 2014-10-23  0:44 UTC (permalink / raw)
  To: dev

Hi guys,

Could anybody comment on my observations about the PCI scan process and how to 
improve the NUMA compatibility, etc.?

The original bug contains a bogus commit history (fake source for commit, fake 
timestamp, no detail about what was done in the commit). I also don't have a 
list of subsystem maintainers for DPDK or subsystem mailing lists so no clue 
who might be able to assist with the issue I was seeing:

commit b6ea6408fbc784f174626aa6ea2fa49610b1f432
Author: Intel <intel.com>
Date:   Mon Jun 3 00:00:00 2013 +0000

    ethdev: store numa_node per device

    Signed-off-by: Intel

I hope that future Intel commits don't come with a censored commit history... 
it isn't very friendly when you're trying to get help tracking down bugs and 
fixing stuff from the community side.

Thanks,
Matthew.

On Mon, Oct 13, 2014 at 10:47:12PM -0700, Matthew Hall wrote:
> Hi,
> 
> Did anybody get a chance to look what might be going on in this weird NUMA 
> bug? I could use some help to understand how you're supposed to make code that 
> will work right on both NUMA and non-NUMA. Otherwise it's hard to make a 
> bulletproof DPDK based app that will be able to reliably init on single 
> socket, dual socket non-NUMA, and dual socket NUMA boxes.
> 
> Thanks,
> Matthew.
> 
> On Mon, Oct 06, 2014 at 02:13:44AM -0700, Matthew Hall wrote:
> > Hi Guys,
> > 
> > I'm doing my development on kind of a cheap machine with no NUMA support... 
> > but several years ago I used DPDK to build a NUMA box that could do 40 gbits 
> > bidirectional L4-L7 stateful traffic replay.
> > 
> > So given the past experiences I had before, I wanted to clean the code up so 
> > it'd work well if some crazy guy tried my code on one of these huge boxes, 
> > too, but then I ran into some weird issues.
> > 
> > 1) When I call rte_eth_dev_socket_id() I get back -1. But the call can return 
> > -1 if the port_id is bogus or if pci_scan_one didn't get a numa_node (because 
> > you're on a non-NUMA box for example).
> > 
> > int rte_eth_dev_socket_id(uint8_t port_id)
> > {
> >         if (port_id >= nb_ports)
> >                 return -1;
> >         return rte_eth_devices[port_id].pci_dev->numa_node;
> > }
> > 
> > So you couldn't tell the different between non-NUMA or a bad port value, etc.
> > 
> > 2) The code's behavior and comments disagree with one another. In the 
> > pci_scan_one function, there's this code:
> > 
> > /* get numa node */
> > snprintf(filename, sizeof(filename), "%s/numa_node",
> >          dirname);
> > if (access(filename, R_OK) != 0) {
> >         /* if no NUMA support just set node to 0 */
> >         dev->numa_node = -1;
> > } else {
> >         if (eal_parse_sysfs_value(filename, &tmp) < 0) {
> >                 free(dev);
> >                 return -1;
> >         }
> >         dev->numa_node = tmp;
> > }
> > 
> > It says, just use NUMA node 0 if there is no NUMA support. But then proceeds 
> > to set the value to -1 in disagreement with the comment, and also stomping on 
> > the other meaning for -1 in the higher function rte_eth_dev_socket_id.
> > 
> > 3) In conclusion, it seems like some stuff is missing... first there needs to 
> > be a function that will tell you the number of NUMA nodes present on the box 
> > so you can create the right number of mbuf_pools, but I couldn't find that function.
> > 
> > Then if you have the function, you can do some magic and shuffle the NICs 
> > around to get them hooked to a core on the same NUMA, and the mbuf_pool on the 
> > same NUMA.
> > 
> > When NUMA is not present, can we return 0 instead of -1, or return a specific 
> > error code that the client can use to know he should just use Socket 0? Right 
> > now I can't tell apart any potential errors or weird values from correct 
> > values.
> > 
> > 4) I'm willing to help make and test some patches... but first I want to 
> > understand what is happening with these funny functions before doing things 
> > blindly.
> > 
> > Thanks,
> > Matthew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Possible bug in eal_pci pci_scan_one
  2014-10-06  9:13 [dpdk-dev] Possible bug in eal_pci pci_scan_one Matthew Hall
  2014-10-14  5:47 ` Matthew Hall
@ 2014-10-24 13:06 ` Stephen Hemminger
  2014-10-24 19:03   ` Matthew Hall
  1 sibling, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2014-10-24 13:06 UTC (permalink / raw)
  To: Matthew Hall; +Cc: dev

On Mon, 6 Oct 2014 02:13:44 -0700
Matthew Hall <mhall@mhcomputing.net> wrote:

> Hi Guys,
> 
> I'm doing my development on kind of a cheap machine with no NUMA support... 
> but several years ago I used DPDK to build a NUMA box that could do 40 gbits 
> bidirectional L4-L7 stateful traffic replay.
> 
> So given the past experiences I had before, I wanted to clean the code up so 
> it'd work well if some crazy guy tried my code on one of these huge boxes, 
> too, but then I ran into some weird issues.
> 
> 1) When I call rte_eth_dev_socket_id() I get back -1. But the call can return 
> -1 if the port_id is bogus or if pci_scan_one didn't get a numa_node (because 
> you're on a non-NUMA box for example).
> 
> int rte_eth_dev_socket_id(uint8_t port_id)
> {
>         if (port_id >= nb_ports)
>                 return -1;
>         return rte_eth_devices[port_id].pci_dev->numa_node;
> }
> 
> So you couldn't tell the different between non-NUMA or a bad port value, etc.
> 
> 2) The code's behavior and comments disagree with one another. In the 
> pci_scan_one function, there's this code:
> 
> /* get numa node */
> snprintf(filename, sizeof(filename), "%s/numa_node",
>          dirname);
> if (access(filename, R_OK) != 0) {
>         /* if no NUMA support just set node to 0 */
>         dev->numa_node = -1;
> } else {
>         if (eal_parse_sysfs_value(filename, &tmp) < 0) {
>                 free(dev);
>                 return -1;
>         }
>         dev->numa_node = tmp;
> }
> 
> It says, just use NUMA node 0 if there is no NUMA support. But then proceeds 
> to set the value to -1 in disagreement with the comment, and also stomping on 
> the other meaning for -1 in the higher function rte_eth_dev_socket_id.
> 
> 3) In conclusion, it seems like some stuff is missing... first there needs to 
> be a function that will tell you the number of NUMA nodes present on the box 
> so you can create the right number of mbuf_pools, but I couldn't find that function.
> 
> Then if you have the function, you can do some magic and shuffle the NICs 
> around to get them hooked to a core on the same NUMA, and the mbuf_pool on the 
> same NUMA.
> 
> When NUMA is not present, can we return 0 instead of -1, or return a specific 
> error code that the client can use to know he should just use Socket 0? Right 
> now I can't tell apart any potential errors or weird values from correct 
> values.
> 
> 4) I'm willing to help make and test some patches... but first I want to 
> understand what is happening with these funny functions before doing things 
> blindly.
> 
> Thanks,
> Matthew.

The code is fairly consistent in returning -1 for cases of not a NUMA socket,
bogus port value. It is interpreted as SOCKET_ID_ANY in several places.
The examples mostly check for -1 and use socket 0 as a fallback.
Probably not worth introducing more return values and breaking existing
applications.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] Possible bug in eal_pci pci_scan_one
  2014-10-24 13:06 ` Stephen Hemminger
@ 2014-10-24 19:03   ` Matthew Hall
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Hall @ 2014-10-24 19:03 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On Fri, Oct 24, 2014 at 06:36:29PM +0530, Stephen Hemminger wrote:
> The code is fairly consistent in returning -1 for cases of not a NUMA socket,
> bogus port value. It is interpreted as SOCKET_ID_ANY in several places.
> The examples mostly check for -1 and use socket 0 as a fallback.
> Probably not worth introducing more return values and breaking existing
> applications.

OK. So I'll make a patch to correct the comment which was wrong.

Matthew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-24 18:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-06  9:13 [dpdk-dev] Possible bug in eal_pci pci_scan_one Matthew Hall
2014-10-14  5:47 ` Matthew Hall
2014-10-23  0:44   ` Matthew Hall
2014-10-24 13:06 ` Stephen Hemminger
2014-10-24 19:03   ` Matthew Hall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).