From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 1EAB0AFD9 for ; Mon, 16 Jun 2014 13:07:50 +0200 (CEST) Received: from hmsreliant.think-freely.org ([2001:470:8:a08:7aac:c0ff:fec2:933b] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1WwUlc-0001LF-9H; Mon, 16 Jun 2014 07:08:04 -0400 Date: Mon, 16 Jun 2014 07:07:58 -0400 From: Neil Horman To: "Doherty, Declan" Message-ID: <20140616110758.GA15165@hmsreliant.think-freely.org> References: <258914f35917ae07dddc991ac9726542964dce44.1402662300.git.declan.doherty@intel.com> <20140613160807.GD22451@hmsreliant.think-freely.org> <345C63BAECC1AD42A2EC8C63AFFC3ADC13D38EE9@IRSMSX101.ger.corp.intel.com> <20140613193815.GE22451@hmsreliant.think-freely.org> <345C63BAECC1AD42A2EC8C63AFFC3ADC13D39383@IRSMSX101.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <345C63BAECC1AD42A2EC8C63AFFC3ADC13D39383@IRSMSX101.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library (librte_eal/librte_ether link bonding support changes) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Jun 2014 11:07:50 -0000 On Mon, Jun 16, 2014 at 08:59:25AM +0000, Doherty, Declan wrote: > > -----Original Message----- > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > Sent: Friday, June 13, 2014 8:38 PM > > To: Doherty, Declan > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > > (librte_eal/librte_ether link bonding support changes) > > > > On Fri, Jun 13, 2014 at 06:34:04PM +0000, Doherty, Declan wrote: > > > > -----Original Message----- > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > > > Sent: Friday, June 13, 2014 5:08 PM > > > > To: Doherty, Declan > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH v3 2/5] Link Bonding PMD Library > > > > (librte_eal/librte_ether link bonding support changes) > > > > > > > > On Fri, Jun 13, 2014 at 03:41:59PM +0100, Declan Doherty wrote: > > > > > Updating functionality in EAL to support adding link bonding > > > > > devices via –vdev option. Link bonding devices will be > > > > > initialized after all physical devices have been probed and > > > > > initialized. > > > > > > > > > > Signed-off-by: Declan Doherty > > > > > --- > > > > > lib/librte_eal/common/eal_common_dev.c | 66 > > > > +++++++++++++++++++++++++++-- > > > > > lib/librte_eal/common/eal_common_pci.c | 6 +++ > > > > > lib/librte_eal/common/include/eal_private.h | 7 +++ > > > > > lib/librte_eal/common/include/rte_dev.h | 1 + > > > > > lib/librte_ether/rte_ethdev.c | 34 +++++++++++++-- > > > > > lib/librte_ether/rte_ethdev.h | 7 ++- > > > > > lib/librte_pmd_pcap/rte_eth_pcap.c | 22 +++++----- > > > > > lib/librte_pmd_ring/rte_eth_ring.c | 32 +++++++------- > > > > > lib/librte_pmd_ring/rte_eth_ring.h | 3 +- > > > > > lib/librte_pmd_xenvirt/rte_eth_xenvirt.c | 2 +- > > > > > 10 files changed, 144 insertions(+), 36 deletions(-) > > > > > > > > > > diff --git a/lib/librte_eal/common/eal_common_dev.c > > > > b/lib/librte_eal/common/eal_common_dev.c > > > > > index eae5656..b50c908 100644 > > > > > --- a/lib/librte_eal/common/eal_common_dev.c > > > > > +++ b/lib/librte_eal/common/eal_common_dev.c > > > > > @@ -75,14 +75,28 @@ rte_eal_dev_init(void) > > > > > > > > > > /* call the init function for each virtual device */ > > > > > TAILQ_FOREACH(devargs, &devargs_list, next) { > > > > > + uint8_t bdev = 0; > > > > > > > > > > if (devargs->type != RTE_DEVTYPE_VIRTUAL) > > > > > continue; > > > > > > > > > > TAILQ_FOREACH(driver, &dev_driver_list, next) { > > > > > - if (driver->type != PMD_VDEV) > > > > > + /* RTE_DEVTYPE_VIRTUAL can only be a virtual or bonded > > > > device*/ > > > > > + if (driver->type != PMD_VDEV && driver->type != > > > > PMD_BDEV) > > > > > continue; > > > > > > > > > > + /* > > > > > + * Bonded devices are not initialize here, we do it later in > > > > > + * rte_eal_bonded_dev_init() after all physical devices > > > > have been > > > > > + * probed and initialized > > > > > + */ > > > > > + if (driver->type == PMD_BDEV && > > > > > + !strncmp(driver->name, devargs- > > > > >virtual.drv_name, > > > > > + strlen(driver->name))) { > > > > > + bdev = 1; > > > > > + break; > > > > > + } > > > > > + > > > > I really don't think you need to add a new device type for bonded devs. Its got > > > > no specific hardware that it drives, and you configure it with a --vdev > > command, > > > > so treat it as one here. I understand that you need to pass additional > > > > information about slaves to a bonded device, which is fine, but you can do that > > > > with kvargs pretty easily, at which point its just another vdev. The only other > > > > requirement is that you initilize the bonded vdev after the slave vdevs have > > > > been created, which you can do by any of several methods (a priority field to > > > > indicate that bonded drivers should be initilized last/later, a deferral return > > > > code from the init routine, or by dead reckoning via the careful construction > > of > > > > the application command line (placed the bonded --vdev option last on the > > > > command line argument list at run time). > > > > > > > It was my initial intent to do as you have describe above, but the physical devices > > > cause a real issue here, physical devices don't call through to > > rte_eth_dev_allocate until > > > during rte_eal_pci_probe call, so it's not possible to initialize the bonded device > > from > > > within rte_eal_dev_init as the physical devices have not been fully initialized at > > this > > > point, as a port_id has not been allocated and can't be added as bonding slaves. I > > don't > > > see away around this without changing the EAL API, which I've tried to avoid with > > this solution. > > > > > > Ordering isn't an issue, and can easily be solved if the above problem didn't > > exist, and although > > > a new device type isn't technically required, I think it's a cleaner solution than > > doing > > > string comparisons. > > > > > > > > > Ugh, you're right. That stinks. We still shouldn't just drop the init code in > > with the pci probe code though, especially if dpdk starts supporting more > > stacked network devices. A notification system for interface creation that you > > could monitor for pci_device addresses (B:D:F tupples) would be nice, but might > > be overkill since dpdk doesn't really support device freeing yet. what might be > > useful is a generic ordering mechanism for rte_eal_dev_init. If you added a > > priority field to the rte_driver structure, you could create an enumeration for > > the field to define when the init routine for that driver was called something > > like: > > typdef enum { > > PMD_INIT_FIRST = 0, > > PMD_INIT_POST_PROBE = 1, > > } > > > > Then you could make multiple calls to rte_eal_dev_init from rte_eal_init at the > > appropriate places in the initalization process, each time passing one of the > > above values. rte_eal_dev_init would only initalize those drivers whos priority > > matched the passed in priority. That would be helpful for interfaces like > > vlans, vxlans, tunnels, etc. > > > I'm not sure I understand how you intend this to work, are you proposing that > rte_eal_dev_init is call multiple times by the user pre and post rte_eal_pci_probe Mostly, I'm proposing that rte_eal_pci_probe be called by rte_eal_init, and that we then call rte_eal_dev_init multiple times from within rte_eal_dev_init, passing a parameter to indicate the priority level of device that should be initalized within that call. > this doesn't seem like an idea solution either. I'm not 100% clear why > rte_eal_pci_probe is currently called by the application code and not initiated > from within rte_eal_dev_init, if this was the case we would be able to figure out > a dependency tree for initialization of devices, based on the parsed input > arguments without having extra user input to or multiple calls of rte_eal_dev_init. > I agree, I think you should move it into its proper place within rte_eal_init, though I think multiple calls to rte_eal_dev_init is perfectly acceptible in this scenario. You certainly could parse the command line to build a dependency tree if you wish, though I don't think thats crruently overly complex. The current dependency tree is static at (physical devices, virtual devices, probe, stacked devices). If you need something more complex later, you can re-write it freely as it will be an internal library mechanism with no external visibility. > Either way I don't think that any of these solution are trivial and they would > need to be thoroughly tested, which I don't think will be possible in the time > frame for 1.7. If my current implementation calling rte_eal_bonded_dev_init > from within rte_eal_pci_probe is unacceptable, then I propose that I remove > the EAL changes for --vdev initialization of bonding devices from this release and > only support the C API, and that an overall strategy is discussed for static device > initialization that is developed for 1.7.1 or 1.8. > I would assert that if you can't write the pmd to support the existing pmd interfaces, then this probably needs to wait until 1.8. I'll bow to consensus on that, but I would think that, if it gets in now, the additional work won't happen. Neil > > > > > > > > > > +static int > > > > > +rte_eth_dev_name_unique(const char* name) > > > > > +{ > > > > > + unsigned i; > > > > > + > > > > > + for (i = 0; i < nb_ports; i++) { > > > > > + if (strcmp(rte_eth_devices[i].data->name, name) == 0) > > > > > + return -1; > > > > > + } > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > struct rte_eth_dev * > > > > > -rte_eth_dev_allocate(void) > > > > > +rte_eth_dev_allocate(const char* name) > > > > > { > > > > > struct rte_eth_dev *eth_dev; > > > > > > > > > > @@ -165,23 +179,37 @@ rte_eth_dev_allocate(void) > > > > > if (rte_eth_dev_data == NULL) > > > > > rte_eth_dev_data_alloc(); > > > > > > > > > > + if (rte_eth_dev_name_unique(name)) { > > > > > + PMD_DEBUG_TRACE("Ethernet Device with name %s already > > > > allocated!\n"); > > > > > + return NULL; > > > > > + } > > > > > + > > > > This seems fairly racy if you allow dynamic device creation at run time from > > the > > > > application, if multiple threads attempt to create bonds in parallel. > > > > > > True but if this is an issue then there probably should be some locking around > > this allocation anyway. > > Agreed the allocation of the port id is also racy. > > > > > > > > > > > /** > > > > > diff --git a/lib/librte_pmd_pcap/rte_eth_pcap.c > > > > b/lib/librte_pmd_pcap/rte_eth_pcap.c > > > > Hmm, we're modifying other pmds for the naming feature, I think it would be > > best > > > > split out into a separate patch. Something entitled "support unique interface > > > > naming for virtual pmds" or something. > > > > > > True, but if I do that, I will need to break support for virtual devices in this patch > > set, and > > > I would prefer to keep this functionality working initial release of link bonding. > > Also another > > > patchset will be required at some point in the near future to tidy up/rationalize > > the current > > > handling of virtual devices identification and these changes do not change > > functionally how > > > these devices work. > > > > I'm not sure I'm following what you're saying here. If you split this into > > another patch set, and order earlier in this series, then you won't break > > anything. I'm not suggesting a functional change here, just a organizational > > one, so that this change doesn't do more than the changelog indicates. > > > > Sorry I misunderstood, I thought that you where proposing that I create a completely new > patchset separate to the link bonding release set for these changes. > > > Neil > > Declan