From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.mhcomputing.net (master.mhcomputing.net [74.208.46.186]) by dpdk.org (Postfix) with ESMTP id A829B7E75 for ; Thu, 23 Oct 2014 02:36:57 +0200 (CEST) Received: by mail.mhcomputing.net (Postfix, from userid 1000) id 29A5680C50B; Wed, 22 Oct 2014 17:44:22 -0700 (PDT) Date: Wed, 22 Oct 2014 17:44:22 -0700 From: Matthew Hall To: dev@dpdk.org Message-ID: <20141023004422.GA11344@mhcomputing.net> References: <20141006091344.GA14759@mhcomputing.net> <20141014054711.GB16919@mhcomputing.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141014054711.GB16919@mhcomputing.net> User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] Possible bug in eal_pci pci_scan_one X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Oct 2014 00:36:58 -0000 Hi guys, Could anybody comment on my observations about the PCI scan process and how to improve the NUMA compatibility, etc.? The original bug contains a bogus commit history (fake source for commit, fake timestamp, no detail about what was done in the commit). I also don't have a list of subsystem maintainers for DPDK or subsystem mailing lists so no clue who might be able to assist with the issue I was seeing: commit b6ea6408fbc784f174626aa6ea2fa49610b1f432 Author: Intel Date: Mon Jun 3 00:00:00 2013 +0000 ethdev: store numa_node per device Signed-off-by: Intel I hope that future Intel commits don't come with a censored commit history... it isn't very friendly when you're trying to get help tracking down bugs and fixing stuff from the community side. Thanks, Matthew. On Mon, Oct 13, 2014 at 10:47:12PM -0700, Matthew Hall wrote: > Hi, > > Did anybody get a chance to look what might be going on in this weird NUMA > bug? I could use some help to understand how you're supposed to make code that > will work right on both NUMA and non-NUMA. Otherwise it's hard to make a > bulletproof DPDK based app that will be able to reliably init on single > socket, dual socket non-NUMA, and dual socket NUMA boxes. > > Thanks, > Matthew. > > On Mon, Oct 06, 2014 at 02:13:44AM -0700, Matthew Hall wrote: > > Hi Guys, > > > > I'm doing my development on kind of a cheap machine with no NUMA support... > > but several years ago I used DPDK to build a NUMA box that could do 40 gbits > > bidirectional L4-L7 stateful traffic replay. > > > > So given the past experiences I had before, I wanted to clean the code up so > > it'd work well if some crazy guy tried my code on one of these huge boxes, > > too, but then I ran into some weird issues. > > > > 1) When I call rte_eth_dev_socket_id() I get back -1. But the call can return > > -1 if the port_id is bogus or if pci_scan_one didn't get a numa_node (because > > you're on a non-NUMA box for example). > > > > int rte_eth_dev_socket_id(uint8_t port_id) > > { > > if (port_id >= nb_ports) > > return -1; > > return rte_eth_devices[port_id].pci_dev->numa_node; > > } > > > > So you couldn't tell the different between non-NUMA or a bad port value, etc. > > > > 2) The code's behavior and comments disagree with one another. In the > > pci_scan_one function, there's this code: > > > > /* get numa node */ > > snprintf(filename, sizeof(filename), "%s/numa_node", > > dirname); > > if (access(filename, R_OK) != 0) { > > /* if no NUMA support just set node to 0 */ > > dev->numa_node = -1; > > } else { > > if (eal_parse_sysfs_value(filename, &tmp) < 0) { > > free(dev); > > return -1; > > } > > dev->numa_node = tmp; > > } > > > > It says, just use NUMA node 0 if there is no NUMA support. But then proceeds > > to set the value to -1 in disagreement with the comment, and also stomping on > > the other meaning for -1 in the higher function rte_eth_dev_socket_id. > > > > 3) In conclusion, it seems like some stuff is missing... first there needs to > > be a function that will tell you the number of NUMA nodes present on the box > > so you can create the right number of mbuf_pools, but I couldn't find that function. > > > > Then if you have the function, you can do some magic and shuffle the NICs > > around to get them hooked to a core on the same NUMA, and the mbuf_pool on the > > same NUMA. > > > > When NUMA is not present, can we return 0 instead of -1, or return a specific > > error code that the client can use to know he should just use Socket 0? Right > > now I can't tell apart any potential errors or weird values from correct > > values. > > > > 4) I'm willing to help make and test some patches... but first I want to > > understand what is happening with these funny functions before doing things > > blindly. > > > > Thanks, > > Matthew.