From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id CB35EA04B3; Mon, 10 Feb 2020 16:00:17 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C489B4C90; Mon, 10 Feb 2020 16:00:16 +0100 (CET) Received: from relay2-d.mail.gandi.net (relay2-d.mail.gandi.net [217.70.183.194]) by dpdk.org (Postfix) with ESMTP id DBD1E493D for ; Mon, 10 Feb 2020 16:00:15 +0100 (CET) X-Originating-IP: 37.58.153.206 Received: from [10.0.3.185] (bny206.haproxy.com [37.58.153.206]) (Authenticated sender: grive@u256.net) by relay2-d.mail.gandi.net (Postfix) with ESMTPSA id BA1714001C for ; Mon, 10 Feb 2020 15:00:15 +0000 (UTC) To: dev@dpdk.org References: <20200204091548.2861-1-somnath.kotur@broadcom.com> <83f7e3a5-c36f-a755-1c71-0c248fe48d0e@u256.net> From: Gaetan Rivet Message-ID: Date: Mon, 10 Feb 2020 16:00:15 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: <83f7e3a5-c36f-a755-1c71-0c248fe48d0e@u256.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/02/2020 15:34, Gaetan Rivet wrote: > Hi Somnath, > > Reformating the mails, to keep with the inner-posting (mixing top-posting and inner-posting > makes it hard to follow). See the end of the mail. > > [...] >> Any updates on this ? >> My thoughts on this are just as I'd suspected / suggested earlier the >> parent device (06:02:0) gets probed once >> during ovs-vswitchd bring up right after these 2 cmds >> >> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true >> # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started) >> ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start >> >> After this point the pci probe was invoked on the parent device, but >> without any devargs hence it is NULL. >> >> Now when the 'ovs add-port' cmd is issued to add a representor port >> for example say 06:02:01 (using 06:02:00 as parent device) >> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk >> options:dpdk-devargs=0000:06:02.0,representor=[1] >> >> What happens is >> >> *before* pci_name_set() >> p dev >> $1 = (struct rte_pci_device *) 0x55bd6b0 >> (gdb) p /x *dev >> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = >> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = >> 0x3a59a00, numa_node = 0x0, >>      devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2, >> function = 0x0}.......} >> >> *after* pci_name_set() >> >> p /x *dev >> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next = >> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus = >> 0x3a59a00, numa_node = 0x0, >>      devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2, >> function = 0x0}......} >> >> As you can see, 'devargs' for the pci_device is now populated .. >> >> But if you go further down in pci_scan_one() >> >> pci_scan_one (dirname=0x7ffea330a5e0 >> "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0) >>      at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351 >> 351                                     if (!rte_dev_is_probed(&dev2->device)) { >> (gdb) p dev2 >> $5 = (struct rte_pci_device *) 0x54de5e0 >> (gdb) p /x *dev2 >> $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device = >> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver = >> 0x3a6b7f0, >>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = >> 0x0, bus = 0x6, devid = 0x2, function = 0x0} >> >> we hit the pci_device ( 0x54de5e0) that was created during the first >> time probe of the parent device ? >> Since it is already probed, it goes to line 381 where it does frees >> the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)' >> >> As you can see this pci_device does not have it's devargs set (rightly >> so as initially there were no devargs for this device) >> >> But as shown in the stack trace in my previous reply, when >> pci_find_device() walks the rte_pci_devices list , it finds this >> earlier probed device (without devargs) >> > > Alright, I think you are right, this is what is happening. > The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have > missed the case where the new device has more info that the previous one (linked devargs). > >> pci_find_device (start=0x0, cmp=0x4b92d8 , >> data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426 >> 426                             return &pdev->device; >> (gdb) p pdev >> $15 = (struct rte_pci_device *) 0x54de5e0 >> >> (gdb) p /x *pdev >> $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device = >> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver = >> 0x3a6b7f0, >>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = >> 0x0, bus = 0x6, devid = 0x2, function = 0x0}, >> >> Hope this explains why the pci_device has NULL devargs at this point >> and how my fix to set it at this point helps solve the problem? >> >> Please let me know your thoughts >> > > I think the proper fix is instead to have a clean update during the PCI scan. > The proper way is to keep the old device, but override its fields as new info was found. > > Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency. > > My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities > of some devices on it. It would be better to have a fully compliant scan + probe ops considering > the supported capabilities, instead of forcing this on all buses. > > I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep > the new one, or ignore the probe, free the new devargs and report an error? If the former, > please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course). Unfortunately, this could probably happen with some devargs. In particular, the rte_devargs_insert() function will test whether devargs name are matching before inserting a new devargs. There is an ambiguity with the BDF / DBDF notation for the PCI devices in particular, which could lead to devargs already existing for some devices, but with a new one being inserted. The fix would be to have the bus parse the device, and compare the binary blob resulting from this parsing. The current bus->parse() API might help, but is currently too limited (no way to know how many bytes long the result is). This is fastidious to have to write this because some leeway was allowed to the PCI back in the day. For now at least, mind this possibility when writing your fix, those above considerations could maybe be worked out with future API evolutions.