DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
@ 2020-02-04  9:15 Somnath Kotur
  2020-02-04  9:49 ` Gaetan Rivet
  0 siblings, 1 reply; 11+ messages in thread
From: Somnath Kotur @ 2020-02-04  9:15 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit

As per the comments in this code section, "since there is a matching device,
it is now its responsibility to manage the devargs we've just inserted."
But the matching device ptr's devargs is still uninitialized or not pointing
to the newest dev_args that were passed as a parameter to local_dev_probe().
This is needed particularly in the case when *probe is called again* on an
already probed device(the parent device for the representor) as part of adding
a representor port to an OVS switch(OVS-DPDK) like so:
ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
options:dpdk-devargs=0000:06:02.0,representor=[1]

Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
---
 lib/librte_eal/common/eal_common_dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 9e4f09d..311eef5 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
 	 * those devargs shouldn't be removed manually anymore.
 	 */
 
+	dev->devargs = da;
 	ret = dev->bus->plug(dev);
 	if (ret > 0)
 		ret = -ENOTSUP;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-04  9:15 [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug Somnath Kotur
@ 2020-02-04  9:49 ` Gaetan Rivet
  2020-02-05  8:52   ` Somnath Kotur
  0 siblings, 1 reply; 11+ messages in thread
From: Gaetan Rivet @ 2020-02-04  9:49 UTC (permalink / raw)
  To: dev

On 04/02/2020 10:15, Somnath Kotur wrote:
> As per the comments in this code section, "since there is a matching device,
> it is now its responsibility to manage the devargs we've just inserted."
> But the matching device ptr's devargs is still uninitialized or not pointing
> to the newest dev_args that were passed as a parameter to local_dev_probe().
> This is needed particularly in the case when *probe is called again* on an
> already probed device(the parent device for the representor) as part of adding
> a representor port to an OVS switch(OVS-DPDK) like so:
> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> options:dpdk-devargs=0000:06:02.0,representor=[1]
> 
> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> 
> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> ---
>   lib/librte_eal/common/eal_common_dev.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> index 9e4f09d..311eef5 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
>   	 * those devargs shouldn't be removed manually anymore.
>   	 */
>   
> +	dev->devargs = da;
>   	ret = dev->bus->plug(dev);
>   	if (ret > 0)
>   		ret = -ENOTSUP;
> 

Hello Somnath,

On a surface level, the fix does not seem correct.
The comment

    "Since there is a matching device, it is now its responsibility
     to manage the devargs we've just inserted. From this point,
     those devargs should'nt be removed manually anymore."

means that the err_devarg label is not correct, on further error in the
function, returning the error code and cleaning up the device is sufficient.

Setting the devargs for a device is the responsibility of the bus scan function.
In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
after scanning the system and thus being able to match a devargs to the device name.

Can you please give more information about the device bus, and maybe trace the path taken
by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
afterward, it seems the bug would be there.

It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-04  9:49 ` Gaetan Rivet
@ 2020-02-05  8:52   ` Somnath Kotur
  2020-02-05  9:17     ` Gaetan Rivet
  0 siblings, 1 reply; 11+ messages in thread
From: Somnath Kotur @ 2020-02-05  8:52 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

Hello Gaetan,


On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
>
> On 04/02/2020 10:15, Somnath Kotur wrote:
> > As per the comments in this code section, "since there is a matching device,
> > it is now its responsibility to manage the devargs we've just inserted."
> > But the matching device ptr's devargs is still uninitialized or not pointing
> > to the newest dev_args that were passed as a parameter to local_dev_probe().
> > This is needed particularly in the case when *probe is called again* on an
> > already probed device(the parent device for the representor) as part of adding
> > a representor port to an OVS switch(OVS-DPDK) like so:
> > ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> > options:dpdk-devargs=0000:06:02.0,representor=[1]
> >
> > Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> >
> > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > ---
> >   lib/librte_eal/common/eal_common_dev.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> > index 9e4f09d..311eef5 100644
> > --- a/lib/librte_eal/common/eal_common_dev.c
> > +++ b/lib/librte_eal/common/eal_common_dev.c
> > @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
> >        * those devargs shouldn't be removed manually anymore.
> >        */
> >
> > +     dev->devargs = da;
> >       ret = dev->bus->plug(dev);
> >       if (ret > 0)
> >               ret = -ENOTSUP;
> >
>
> Hello Somnath,
>
> On a surface level, the fix does not seem correct.
> The comment
>
>     "Since there is a matching device, it is now its responsibility
>      to manage the devargs we've just inserted. From this point,
>      those devargs should'nt be removed manually anymore."
>
> means that the err_devarg label is not correct, on further error in the
> function, returning the error code and cleaning up the device is sufficient.
>
> Setting the devargs for a device is the responsibility of the bus scan function.
> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
> after scanning the system and thus being able to match a devargs to the device name.
>
> Can you please give more information about the device bus, and maybe trace the path taken
> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
> afterward, it seems the bug would be there.
>
Sure, here is the stack trace from the pci_name_set()
pci_name_set (dev=0x570eab0) at
/root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
65                              dev->name, sizeof(dev->name));
(gdb) p /x *dev
$1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
= 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
vendor_id = 0x14e4,
    device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
subsystem_device_id = 0x16d7}.........
(gdb) bt
#0  pci_name_set (dev=0x570eab0) at
/root/dpdk19.11/drivers/bus/pci/pci_common.c:65
#1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
"/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
/root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
#2  0x0000000000458188 in rte_pci_scan () at
/root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
#3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
"0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
/root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
#4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
"0000:06:02.0,representor=[1]") at
/root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
#5  0x00000000033d196a in netdev_dpdk_process_devargs
(dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
#6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
#7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
#8  0x00000000031e9a4e in iface_set_netdev_config
(iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
vswitchd/bridge.c:2015
#9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
#10 0x00000000031e9da6 in iface_create (br=0x52801d0,
iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
#11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
wanted_ports=0x52802b0, with_requested_port=false) at
vswitchd/bridge.c:1164
#12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
#13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
vswitchd/bridge.c:893
#14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
#15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
vswitchd/ovs-vswitchd.c:127

> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
As you can see , the DBDF along with the 'representor=[1]' is passed
all the way down to local_dev_probe(). However when i traced further
inside pci_name_set() =>pci_devargs_lookup()

static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
{
        struct rte_devargs *devargs;
        struct rte_pci_addr addr;

        RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
                devargs->bus->parse(devargs->name, &addr);
======================================> pci_parse() [1]

                if (!rte_pci_addr_cmp(&dev->addr, &addr))
                        return devargs;
        }
        return NULL;
}

[1]: This is the value of dev_args at this point in time -
p *devargs
$2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
= "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
"representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
  bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
0x0, data = 0x0}

It has the 'representor' keyword still intact, but what gets passed to
pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
addr=0x7ffdb46ed180) i.e just the original DBDF of the function
that was already probed once during OVS-DPDK init... I'm not sure if
that is the reason , but eventually it walks the entire for_each loop
and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
is where the 'devargs' value passed is getting lost
Hope that gives what you were looking for ?


Thanks
Som

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-05  8:52   ` Somnath Kotur
@ 2020-02-05  9:17     ` Gaetan Rivet
  2020-02-06  4:06       ` Somnath Kotur
  0 siblings, 1 reply; 11+ messages in thread
From: Gaetan Rivet @ 2020-02-05  9:17 UTC (permalink / raw)
  To: Somnath Kotur; +Cc: dev

On 05/02/2020 09:52, Somnath Kotur wrote:
> Hello Gaetan,
> 
> 
> On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
>>
>> On 04/02/2020 10:15, Somnath Kotur wrote:
>>> As per the comments in this code section, "since there is a matching device,
>>> it is now its responsibility to manage the devargs we've just inserted."
>>> But the matching device ptr's devargs is still uninitialized or not pointing
>>> to the newest dev_args that were passed as a parameter to local_dev_probe().
>>> This is needed particularly in the case when *probe is called again* on an
>>> already probed device(the parent device for the representor) as part of adding
>>> a representor port to an OVS switch(OVS-DPDK) like so:
>>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
>>> options:dpdk-devargs=0000:06:02.0,representor=[1]
>>>
>>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
>>>
>>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
>>> ---
>>>    lib/librte_eal/common/eal_common_dev.c | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
>>> index 9e4f09d..311eef5 100644
>>> --- a/lib/librte_eal/common/eal_common_dev.c
>>> +++ b/lib/librte_eal/common/eal_common_dev.c
>>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
>>>         * those devargs shouldn't be removed manually anymore.
>>>         */
>>>
>>> +     dev->devargs = da;
>>>        ret = dev->bus->plug(dev);
>>>        if (ret > 0)
>>>                ret = -ENOTSUP;
>>>
>>
>> Hello Somnath,
>>
>> On a surface level, the fix does not seem correct.
>> The comment
>>
>>      "Since there is a matching device, it is now its responsibility
>>       to manage the devargs we've just inserted. From this point,
>>       those devargs should'nt be removed manually anymore."
>>
>> means that the err_devarg label is not correct, on further error in the
>> function, returning the error code and cleaning up the device is sufficient.
>>
>> Setting the devargs for a device is the responsibility of the bus scan function.
>> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
>> after scanning the system and thus being able to match a devargs to the device name.
>>
>> Can you please give more information about the device bus, and maybe trace the path taken
>> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
>> afterward, it seems the bug would be there.
>>
> Sure, here is the stack trace from the pci_name_set()
> pci_name_set (dev=0x570eab0) at
> /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
> 65                              dev->name, sizeof(dev->name));
> (gdb) p /x *dev
> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
> vendor_id = 0x14e4,
>      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> subsystem_device_id = 0x16d7}.........
> (gdb) bt
> #0  pci_name_set (dev=0x570eab0) at
> /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
> #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
> "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
> /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
> #2  0x0000000000458188 in rte_pci_scan () at
> /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
> #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
> "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
> #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
> "0000:06:02.0,representor=[1]") at
> /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
> #5  0x00000000033d196a in netdev_dpdk_process_devargs
> (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
> errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
> #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
> args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
> #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
> args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
> #8  0x00000000031e9a4e in iface_set_netdev_config
> (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
> vswitchd/bridge.c:2015
> #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
> iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
> errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
> #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
> iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
> #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
> wanted_ports=0x52802b0, with_requested_port=false) at
> vswitchd/bridge.c:1164
> #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
> wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
> #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
> vswitchd/bridge.c:893
> #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
> #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
> vswitchd/ovs-vswitchd.c:127
> 
>> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
> As you can see , the DBDF along with the 'representor=[1]' is passed
> all the way down to local_dev_probe(). However when i traced further
> inside pci_name_set() =>pci_devargs_lookup()
> 
> static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
> {
>          struct rte_devargs *devargs;
>          struct rte_pci_addr addr;
> 
>          RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
>                  devargs->bus->parse(devargs->name, &addr);
> ======================================> pci_parse() [1]
> 
>                  if (!rte_pci_addr_cmp(&dev->addr, &addr))
>                          return devargs;
>          }
>          return NULL;
> }
> 
> [1]: This is the value of dev_args at this point in time -
> p *devargs
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
> "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> 0x0, data = 0x0}
> 
> It has the 'representor' keyword still intact, but what gets passed to
> pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
> addr=0x7ffdb46ed180) i.e just the original DBDF of the function
> that was already probed once during OVS-DPDK init... I'm not sure if
> that is the reason , but eventually it walks the entire for_each loop
> and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
> is where the 'devargs' value passed is getting lost
> Hope that gives what you were looking for ?
> 
> 
> Thanks
> Som
> 

Hello,

Yes, thank you for the thorough response.
Your second device has the BDF 06:02.01 while its parent has 06:02.00. The parent device already matched the devargs and its rte_device should already point to it.

I think it is not correct to force link a parent device devargs to virtual functions spawned from it.
The devargs is NULL, because there are none that were used to spawn the virtual functions.

Now, can you describe your overarching issue more precisely? You are able to call rte_dev_probe twice on the same virtual function, because the check against it does not work without the devargs being linked? If that's correct, this is the real bug IMO, and there could be other ways to mitigate this.

Thanks
Gaetan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-05  9:17     ` Gaetan Rivet
@ 2020-02-06  4:06       ` Somnath Kotur
  2020-02-07  5:52         ` Somnath Kotur
  0 siblings, 1 reply; 11+ messages in thread
From: Somnath Kotur @ 2020-02-06  4:06 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

Hello Gaetan,
            Sorry my bad, i think i captured it for the wrong BDF ...
So the overarching goal here is that i want to create/add a
representor port for 06:02.01 using 06:02:00 as the backing/parent
device on the vswitch.

I recaptured the debug with 06:02:00

Breakpoint 1, pci_name_set (dev=0x5d44e00) at
/root/dpdk-19.11/drivers/bus/pci/pci_common.c:65
65                              dev->name, sizeof(dev->name));
(gdb) p /x *dev
$3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
= 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000,
vendor_id = 0x14e4,
    device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
subsystem_device_id = 0x16d7}

devargs = pci_devargs_lookup(dev);

 p *devargs
$5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
= "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
"representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
  bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
0x0, data = 0x0}

So as you can see here , this time the devargs is *not NULL* and has
the value of 'representor=[1]'

local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]",
new_dev=0x7ffd3e872758) at
/root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162
162             dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
(gdb) p *da
$7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
= "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
"representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
  bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
0x0, data = 0x0}

(gdb) p *dev
$8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00
"0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
<rte_pci_bus>, numa_node = 0, devargs = 0x0}

So it appears that 'find_device' is returning NULL devargs ?

I found that indeed is the case in pci_find_device()

pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
426                             return &pdev->device;

gdb) p pdev->device
$2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00
"0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
<rte_pci_bus>, numa_node = 0, devargs = 0x0}

Any further ideas ?

Thanks
Som

On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <grive@u256.net> wrote:
>
> On 05/02/2020 09:52, Somnath Kotur wrote:
> > Hello Gaetan,
> >
> >
> > On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
> >>
> >> On 04/02/2020 10:15, Somnath Kotur wrote:
> >>> As per the comments in this code section, "since there is a matching device,
> >>> it is now its responsibility to manage the devargs we've just inserted."
> >>> But the matching device ptr's devargs is still uninitialized or not pointing
> >>> to the newest dev_args that were passed as a parameter to local_dev_probe().
> >>> This is needed particularly in the case when *probe is called again* on an
> >>> already probed device(the parent device for the representor) as part of adding
> >>> a representor port to an OVS switch(OVS-DPDK) like so:
> >>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> >>> options:dpdk-devargs=0000:06:02.0,representor=[1]
> >>>
> >>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> >>>
> >>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> >>> ---
> >>>    lib/librte_eal/common/eal_common_dev.c | 1 +
> >>>    1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> >>> index 9e4f09d..311eef5 100644
> >>> --- a/lib/librte_eal/common/eal_common_dev.c
> >>> +++ b/lib/librte_eal/common/eal_common_dev.c
> >>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
> >>>         * those devargs shouldn't be removed manually anymore.
> >>>         */
> >>>
> >>> +     dev->devargs = da;
> >>>        ret = dev->bus->plug(dev);
> >>>        if (ret > 0)
> >>>                ret = -ENOTSUP;
> >>>
> >>
> >> Hello Somnath,
> >>
> >> On a surface level, the fix does not seem correct.
> >> The comment
> >>
> >>      "Since there is a matching device, it is now its responsibility
> >>       to manage the devargs we've just inserted. From this point,
> >>       those devargs should'nt be removed manually anymore."
> >>
> >> means that the err_devarg label is not correct, on further error in the
> >> function, returning the error code and cleaning up the device is sufficient.
> >>
> >> Setting the devargs for a device is the responsibility of the bus scan function.
> >> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
> >> after scanning the system and thus being able to match a devargs to the device name.
> >>
> >> Can you please give more information about the device bus, and maybe trace the path taken
> >> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
> >> afterward, it seems the bug would be there.
> >>
> > Sure, here is the stack trace from the pci_name_set()
> > pci_name_set (dev=0x570eab0) at
> > /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
> > 65                              dev->name, sizeof(dev->name));
> > (gdb) p /x *dev
> > $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> > = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
> > vendor_id = 0x14e4,
> >      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> > subsystem_device_id = 0x16d7}.........
> > (gdb) bt
> > #0  pci_name_set (dev=0x570eab0) at
> > /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
> > #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
> > "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
> > /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
> > #2  0x0000000000458188 in rte_pci_scan () at
> > /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
> > #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
> > "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
> > /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
> > #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
> > "0000:06:02.0,representor=[1]") at
> > /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
> > #5  0x00000000033d196a in netdev_dpdk_process_devargs
> > (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
> > errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
> > #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
> > args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
> > #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
> > args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
> > #8  0x00000000031e9a4e in iface_set_netdev_config
> > (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
> > vswitchd/bridge.c:2015
> > #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
> > iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
> > errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
> > #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
> > iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
> > #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
> > wanted_ports=0x52802b0, with_requested_port=false) at
> > vswitchd/bridge.c:1164
> > #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
> > wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
> > #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
> > vswitchd/bridge.c:893
> > #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
> > #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
> > vswitchd/ovs-vswitchd.c:127
> >
> >> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
> > As you can see , the DBDF along with the 'representor=[1]' is passed
> > all the way down to local_dev_probe(). However when i traced further
> > inside pci_name_set() =>pci_devargs_lookup()
> >
> > static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
> > {
> >          struct rte_devargs *devargs;
> >          struct rte_pci_addr addr;
> >
> >          RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
> >                  devargs->bus->parse(devargs->name, &addr);
> > ======================================> pci_parse() [1]
> >
> >                  if (!rte_pci_addr_cmp(&dev->addr, &addr))
> >                          return devargs;
> >          }
> >          return NULL;
> > }
> >
> > [1]: This is the value of dev_args at this point in time -
> > p *devargs
> > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> > type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> > = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
> > "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
> >    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> > 0x0, data = 0x0}
> >
> > It has the 'representor' keyword still intact, but what gets passed to
> > pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
> > addr=0x7ffdb46ed180) i.e just the original DBDF of the function
> > that was already probed once during OVS-DPDK init... I'm not sure if
> > that is the reason , but eventually it walks the entire for_each loop
> > and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
> > is where the 'devargs' value passed is getting lost
> > Hope that gives what you were looking for ?
> >
> >
> > Thanks
> > Som
> >
>
> Hello,
>
> Yes, thank you for the thorough response.
> Your second device has the BDF 06:02.01 while its parent has 06:02.00. The parent device already matched the devargs and its rte_device should already point to it.
>
> I think it is not correct to force link a parent device devargs to virtual functions spawned from it.
> The devargs is NULL, because there are none that were used to spawn the virtual functions.
>
> Now, can you describe your overarching issue more precisely? You are able to call rte_dev_probe twice on the same virtual function, because the check against it does not work without the devargs being linked? If that's correct, this is the real bug IMO, and there could be other ways to mitigate this.
>
> Thanks
> Gaetan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-06  4:06       ` Somnath Kotur
@ 2020-02-07  5:52         ` Somnath Kotur
  2020-02-10 14:34           ` Gaetan Rivet
  0 siblings, 1 reply; 11+ messages in thread
From: Somnath Kotur @ 2020-02-07  5:52 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

HI Gaetan,


On Thu, Feb 6, 2020 at 9:36 AM Somnath Kotur <somnath.kotur@broadcom.com> wrote:
>
> Hello Gaetan,
>             Sorry my bad, i think i captured it for the wrong BDF ...
> So the overarching goal here is that i want to create/add a
> representor port for 06:02.01 using 06:02:00 as the backing/parent
> device on the vswitch.
>
> I recaptured the debug with 06:02:00
>
> Breakpoint 1, pci_name_set (dev=0x5d44e00) at
> /root/dpdk-19.11/drivers/bus/pci/pci_common.c:65
> 65                              dev->name, sizeof(dev->name));
> (gdb) p /x *dev
> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> = 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000,
> vendor_id = 0x14e4,
>     device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> subsystem_device_id = 0x16d7}
>
> devargs = pci_devargs_lookup(dev);
>
>  p *devargs
> $5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
>   bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> 0x0, data = 0x0}
>
> So as you can see here , this time the devargs is *not NULL* and has
> the value of 'representor=[1]'
>
> local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]",
> new_dev=0x7ffd3e872758) at
> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162
> 162             dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
> (gdb) p *da
> $7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
>   bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> 0x0, data = 0x0}
>
> (gdb) p *dev
> $8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00
> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
>
> So it appears that 'find_device' is returning NULL devargs ?
>
> I found that indeed is the case in pci_find_device()
>
> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> 426                             return &pdev->device;
>
> gdb) p pdev->device
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00
> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
>
Any updates on this ?
My thoughts on this are just as I'd suspected / suggested earlier the
parent device (06:02:0) gets probed once
during ovs-vswitchd bring up right after these 2 cmds

ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
# Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start

After this point the pci probe was invoked on the parent device, but
without any devargs hence it is NULL.

Now when the 'ovs add-port' cmd is issued to add a representor port
for example say 06:02:01 (using 06:02:00 as parent device)
ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
options:dpdk-devargs=0000:06:02.0,representor=[1]

What happens is

*before* pci_name_set()
p dev
$1 = (struct rte_pci_device *) 0x55bd6b0
(gdb) p /x *dev
$2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0,
    devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
function = 0x0}.......}

*after* pci_name_set()

p /x *dev
$3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
{tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
0x3a59a00, numa_node = 0x0,
    devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
function = 0x0}......}

As you can see, 'devargs' for the pci_device is now populated ..

But if you go further down in pci_scan_one()

pci_scan_one (dirname=0x7ffea330a5e0
"/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
    at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
351                                     if (!rte_dev_is_probed(&dev2->device)) {
(gdb) p dev2
$5 = (struct rte_pci_device *) 0x54de5e0
(gdb) p /x *dev2
$6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
{next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
0x3a6b7f0,
    bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
0x0, bus = 0x6, devid = 0x2, function = 0x0}

we hit the pci_device ( 0x54de5e0) that was created during the first
time probe of the parent device ?
Since it is already probed, it goes to line 381 where it does frees
the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'

As you can see this pci_device does not have it's devargs set (rightly
so as initially there were no devargs for this device)

But as shown in the stack trace in my previous reply, when
pci_find_device() walks the rte_pci_devices list , it finds this
earlier probed device (without devargs)

pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
426                             return &pdev->device;
(gdb) p pdev
$15 = (struct rte_pci_device *) 0x54de5e0

(gdb) p /x *pdev
$16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
{next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
0x3a6b7f0,
    bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
0x0, bus = 0x6, devid = 0x2, function = 0x0},

Hope this explains why the pci_device has NULL devargs at this point
and how my fix to set it at this point helps solve the problem?

Please let me know your thoughts

Thanks
Som




> Som
>
> On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <grive@u256.net> wrote:
> >
> > On 05/02/2020 09:52, Somnath Kotur wrote:
> > > Hello Gaetan,
> > >
> > >
> > > On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
> > >>
> > >> On 04/02/2020 10:15, Somnath Kotur wrote:
> > >>> As per the comments in this code section, "since there is a matching device,
> > >>> it is now its responsibility to manage the devargs we've just inserted."
> > >>> But the matching device ptr's devargs is still uninitialized or not pointing
> > >>> to the newest dev_args that were passed as a parameter to local_dev_probe().
> > >>> This is needed particularly in the case when *probe is called again* on an
> > >>> already probed device(the parent device for the representor) as part of adding
> > >>> a representor port to an OVS switch(OVS-DPDK) like so:
> > >>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> > >>> options:dpdk-devargs=0000:06:02.0,representor=[1]
> > >>>
> > >>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> > >>>
> > >>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > >>> ---
> > >>>    lib/librte_eal/common/eal_common_dev.c | 1 +
> > >>>    1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> > >>> index 9e4f09d..311eef5 100644
> > >>> --- a/lib/librte_eal/common/eal_common_dev.c
> > >>> +++ b/lib/librte_eal/common/eal_common_dev.c
> > >>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
> > >>>         * those devargs shouldn't be removed manually anymore.
> > >>>         */
> > >>>
> > >>> +     dev->devargs = da;
> > >>>        ret = dev->bus->plug(dev);
> > >>>        if (ret > 0)
> > >>>                ret = -ENOTSUP;
> > >>>
> > >>
> > >> Hello Somnath,
> > >>
> > >> On a surface level, the fix does not seem correct.
> > >> The comment
> > >>
> > >>      "Since there is a matching device, it is now its responsibility
> > >>       to manage the devargs we've just inserted. From this point,
> > >>       those devargs should'nt be removed manually anymore."
> > >>
> > >> means that the err_devarg label is not correct, on further error in the
> > >> function, returning the error code and cleaning up the device is sufficient.
> > >>
> > >> Setting the devargs for a device is the responsibility of the bus scan function.
> > >> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
> > >> after scanning the system and thus being able to match a devargs to the device name.
> > >>
> > >> Can you please give more information about the device bus, and maybe trace the path taken
> > >> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
> > >> afterward, it seems the bug would be there.
> > >>
> > > Sure, here is the stack trace from the pci_name_set()
> > > pci_name_set (dev=0x570eab0) at
> > > /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
> > > 65                              dev->name, sizeof(dev->name));
> > > (gdb) p /x *dev
> > > $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > > 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> > > = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
> > > vendor_id = 0x14e4,
> > >      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> > > subsystem_device_id = 0x16d7}.........
> > > (gdb) bt
> > > #0  pci_name_set (dev=0x570eab0) at
> > > /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
> > > #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
> > > "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
> > > /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
> > > #2  0x0000000000458188 in rte_pci_scan () at
> > > /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
> > > #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
> > > "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
> > > /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
> > > #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
> > > "0000:06:02.0,representor=[1]") at
> > > /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
> > > #5  0x00000000033d196a in netdev_dpdk_process_devargs
> > > (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
> > > errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
> > > #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
> > > args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
> > > #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
> > > args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
> > > #8  0x00000000031e9a4e in iface_set_netdev_config
> > > (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
> > > vswitchd/bridge.c:2015
> > > #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
> > > iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
> > > errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
> > > #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
> > > iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
> > > #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
> > > wanted_ports=0x52802b0, with_requested_port=false) at
> > > vswitchd/bridge.c:1164
> > > #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
> > > wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
> > > #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
> > > vswitchd/bridge.c:893
> > > #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
> > > #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
> > > vswitchd/ovs-vswitchd.c:127
> > >
> > >> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
> > > As you can see , the DBDF along with the 'representor=[1]' is passed
> > > all the way down to local_dev_probe(). However when i traced further
> > > inside pci_name_set() =>pci_devargs_lookup()
> > >
> > > static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
> > > {
> > >          struct rte_devargs *devargs;
> > >          struct rte_pci_addr addr;
> > >
> > >          RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
> > >                  devargs->bus->parse(devargs->name, &addr);
> > > ======================================> pci_parse() [1]
> > >
> > >                  if (!rte_pci_addr_cmp(&dev->addr, &addr))
> > >                          return devargs;
> > >          }
> > >          return NULL;
> > > }
> > >
> > > [1]: This is the value of dev_args at this point in time -
> > > p *devargs
> > > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> > > type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> > > = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
> > > "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
> > >    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> > > 0x0, data = 0x0}
> > >
> > > It has the 'representor' keyword still intact, but what gets passed to
> > > pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
> > > addr=0x7ffdb46ed180) i.e just the original DBDF of the function
> > > that was already probed once during OVS-DPDK init... I'm not sure if
> > > that is the reason , but eventually it walks the entire for_each loop
> > > and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
> > > is where the 'devargs' value passed is getting lost
> > > Hope that gives what you were looking for ?
> > >
> > >
> > > Thanks
> > > Som
> > >
> >
> > Hello,
> >
> > Yes, thank you for the thorough response.
> > Your second device has the BDF 06:02.01 while its parent has 06:02.00. The parent device already matched the devargs and its rte_device should already point to it.
> >
> > I think it is not correct to force link a parent device devargs to virtual functions spawned from it.
> > The devargs is NULL, because there are none that were used to spawn the virtual functions.
> >
> > Now, can you describe your overarching issue more precisely? You are able to call rte_dev_probe twice on the same virtual function, because the check against it does not work without the devargs being linked? If that's correct, this is the real bug IMO, and there could be other ways to mitigate this.
> >
> > Thanks
> > Gaetan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-07  5:52         ` Somnath Kotur
@ 2020-02-10 14:34           ` Gaetan Rivet
  2020-02-10 15:00             ` Gaetan Rivet
  2020-02-11  4:33             ` Somnath Kotur
  0 siblings, 2 replies; 11+ messages in thread
From: Gaetan Rivet @ 2020-02-10 14:34 UTC (permalink / raw)
  To: Somnath Kotur; +Cc: dev

Hi Somnath,

Reformating the mails, to keep with the inner-posting (mixing top-posting and inner-posting
makes it hard to follow). See the end of the mail.


>> On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <grive@u256.net> wrote:
>>>
>>> On 05/02/2020 09:52, Somnath Kotur wrote:
>>>> Hello Gaetan,
>>>>
>>>>
>>>> On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
>>>>>
>>>>> On 04/02/2020 10:15, Somnath Kotur wrote:
>>>>>> As per the comments in this code section, "since there is a matching device,
>>>>>> it is now its responsibility to manage the devargs we've just inserted."
>>>>>> But the matching device ptr's devargs is still uninitialized or not pointing
>>>>>> to the newest dev_args that were passed as a parameter to local_dev_probe().
>>>>>> This is needed particularly in the case when *probe is called again* on an
>>>>>> already probed device(the parent device for the representor) as part of adding
>>>>>> a representor port to an OVS switch(OVS-DPDK) like so:
>>>>>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
>>>>>> options:dpdk-devargs=0000:06:02.0,representor=[1]
>>>>>>
>>>>>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
>>>>>>
>>>>>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
>>>>>> ---
>>>>>>     lib/librte_eal/common/eal_common_dev.c | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
>>>>>> index 9e4f09d..311eef5 100644
>>>>>> --- a/lib/librte_eal/common/eal_common_dev.c
>>>>>> +++ b/lib/librte_eal/common/eal_common_dev.c
>>>>>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
>>>>>>          * those devargs shouldn't be removed manually anymore.
>>>>>>          */
>>>>>>
>>>>>> +     dev->devargs = da;
>>>>>>         ret = dev->bus->plug(dev);
>>>>>>         if (ret > 0)
>>>>>>                 ret = -ENOTSUP;
>>>>>>
>>>>>
>>>>> Hello Somnath,
>>>>>
>>>>> On a surface level, the fix does not seem correct.
>>>>> The comment
>>>>>
>>>>>       "Since there is a matching device, it is now its responsibility
>>>>>        to manage the devargs we've just inserted. From this point,
>>>>>        those devargs should'nt be removed manually anymore."
>>>>>
>>>>> means that the err_devarg label is not correct, on further error in the
>>>>> function, returning the error code and cleaning up the device is sufficient.
>>>>>
>>>>> Setting the devargs for a device is the responsibility of the bus scan function.
>>>>> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
>>>>> after scanning the system and thus being able to match a devargs to the device name.
>>>>>
>>>>> Can you please give more information about the device bus, and maybe trace the path taken
>>>>> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
>>>>> afterward, it seems the bug would be there.
>>>>>
>>>> Sure, here is the stack trace from the pci_name_set()
>>>> pci_name_set (dev=0x570eab0) at
>>>> /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
>>>> 65                              dev->name, sizeof(dev->name));
>>>> (gdb) p /x *dev
>>>> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
>>>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
>>>> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
>>>> = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
>>>> vendor_id = 0x14e4,
>>>>       device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
>>>> subsystem_device_id = 0x16d7}.........
>>>> (gdb) bt
>>>> #0  pci_name_set (dev=0x570eab0) at
>>>> /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
>>>> #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
>>>> "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
>>>> /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
>>>> #2  0x0000000000458188 in rte_pci_scan () at
>>>> /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
>>>> #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
>>>> "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
>>>> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
>>>> #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
>>>> "0000:06:02.0,representor=[1]") at
>>>> /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
>>>> #5  0x00000000033d196a in netdev_dpdk_process_devargs
>>>> (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
>>>> errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
>>>> #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
>>>> args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
>>>> #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
>>>> args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
>>>> #8  0x00000000031e9a4e in iface_set_netdev_config
>>>> (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
>>>> vswitchd/bridge.c:2015
>>>> #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
>>>> iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
>>>> errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
>>>> #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
>>>> iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
>>>> #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
>>>> wanted_ports=0x52802b0, with_requested_port=false) at
>>>> vswitchd/bridge.c:1164
>>>> #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
>>>> wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
>>>> #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
>>>> vswitchd/bridge.c:893
>>>> #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
>>>> #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
>>>> vswitchd/ovs-vswitchd.c:127
>>>>
>>>>> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
>>>> As you can see , the DBDF along with the 'representor=[1]' is passed
>>>> all the way down to local_dev_probe(). However when i traced further
>>>> inside pci_name_set() =>pci_devargs_lookup()
>>>>
>>>> static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
>>>> {
>>>>           struct rte_devargs *devargs;
>>>>           struct rte_pci_addr addr;
>>>>
>>>>           RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
>>>>                   devargs->bus->parse(devargs->name, &addr);
>>>> ======================================> pci_parse() [1]
>>>>
>>>>                   if (!rte_pci_addr_cmp(&dev->addr, &addr))
>>>>                           return devargs;
>>>>           }
>>>>           return NULL;
>>>> }
>>>>
>>>> [1]: This is the value of dev_args at this point in time -
>>>> p *devargs
>>>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
>>>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
>>>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
>>>> "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
>>>>     bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
>>>> 0x0, data = 0x0}
>>>>
>>>> It has the 'representor' keyword still intact, but what gets passed to
>>>> pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
>>>> addr=0x7ffdb46ed180) i.e just the original DBDF of the function
>>>> that was already probed once during OVS-DPDK init... I'm not sure if
>>>> that is the reason , but eventually it walks the entire for_each loop
>>>> and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
>>>> is where the 'devargs' value passed is getting lost
>>>> Hope that gives what you were looking for ?
>>>>
>>>>
>>>> Thanks
>>>> Som
>>>>
>>>
>>> Hello,
>>>
>>> Yes, thank you for the thorough response.
>>> Your second device has the BDF 06:02.01 while its parent has 06:02.00. The parent device already matched the devargs and its rte_device should already point to it.
>>>
>>> I think it is not correct to force link a parent device devargs to virtual functions spawned from it.
>>> The devargs is NULL, because there are none that were used to spawn the virtual functions.
>>>
>>> Now, can you describe your overarching issue more precisely? You are able to call rte_dev_probe twice on the same virtual function, because the check against it does not work without the devargs being linked? If that's correct, this is the real bug IMO, and there could be other ways to mitigate this.
>>>
>>> Thanks
>>> Gaetan
> 
> On 07/02/2020 06:52, Somnath Kotur wrote:
> HI Gaetan,
> 
> 
> On Thu, Feb 6, 2020 at 9:36 AM Somnath Kotur <somnath.kotur@broadcom.com> wrote:
>>
>> Hello Gaetan,
>>              Sorry my bad, i think i captured it for the wrong BDF ...
>> So the overarching goal here is that i want to create/add a
>> representor port for 06:02.01 using 06:02:00 as the backing/parent
>> device on the vswitch.
>>
>> I recaptured the debug with 06:02:00
>>
>> Breakpoint 1, pci_name_set (dev=0x5d44e00) at
>> /root/dpdk-19.11/drivers/bus/pci/pci_common.c:65
>> 65                              dev->name, sizeof(dev->name));
>> (gdb) p /x *dev
>> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
>> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
>> = 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000,
>> vendor_id = 0x14e4,
>>      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
>> subsystem_device_id = 0x16d7}
>>
>> devargs = pci_devargs_lookup(dev);
>>
>>   p *devargs
>> $5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
>> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
>>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
>> 0x0, data = 0x0}
>>
>> So as you can see here , this time the devargs is *not NULL* and has
>> the value of 'representor=[1]'
>>
>> local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]",
>> new_dev=0x7ffd3e872758) at
>> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162
>> 162             dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
>> (gdb) p *da
>> $7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
>> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
>>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
>> 0x0, data = 0x0}
>>
>> (gdb) p *dev
>> $8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00
>> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
>> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
>>
>> So it appears that 'find_device' is returning NULL devargs ?
>>
>> I found that indeed is the case in pci_find_device()
>>
>> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
>> data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
>> 426                             return &pdev->device;
>>
>> gdb) p pdev->device
>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00
>> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
>> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
>>
> Any updates on this ?
> My thoughts on this are just as I'd suspected / suggested earlier the
> parent device (06:02:0) gets probed once
> during ovs-vswitchd bring up right after these 2 cmds
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
> ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
> 
> After this point the pci probe was invoked on the parent device, but
> without any devargs hence it is NULL.
> 
> Now when the 'ovs add-port' cmd is issued to add a representor port
> for example say 06:02:01 (using 06:02:00 as parent device)
> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
> options:dpdk-devargs=0000:06:02.0,representor=[1]
> 
> What happens is
> 
> *before* pci_name_set()
> p dev
> $1 = (struct rte_pci_device *) 0x55bd6b0
> (gdb) p /x *dev
> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> 0x3a59a00, numa_node = 0x0,
>      devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> function = 0x0}.......}
> 
> *after* pci_name_set()
> 
> p /x *dev
> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> 0x3a59a00, numa_node = 0x0,
>      devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> function = 0x0}......}
> 
> As you can see, 'devargs' for the pci_device is now populated ..
> 
> But if you go further down in pci_scan_one()
> 
> pci_scan_one (dirname=0x7ffea330a5e0
> "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
>      at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
> 351                                     if (!rte_dev_is_probed(&dev2->device)) {
> (gdb) p dev2
> $5 = (struct rte_pci_device *) 0x54de5e0
> (gdb) p /x *dev2
> $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> 0x3a6b7f0,
>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> 0x0, bus = 0x6, devid = 0x2, function = 0x0}
> 
> we hit the pci_device ( 0x54de5e0) that was created during the first
> time probe of the parent device ?
> Since it is already probed, it goes to line 381 where it does frees
> the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
> 
> As you can see this pci_device does not have it's devargs set (rightly
> so as initially there were no devargs for this device)
> 
> But as shown in the stack trace in my previous reply, when
> pci_find_device() walks the rte_pci_devices list , it finds this
> earlier probed device (without devargs)
> 

Alright, I think you are right, this is what is happening.
The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have
missed the case where the new device has more info that the previous one (linked devargs).

> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> 426                             return &pdev->device;
> (gdb) p pdev
> $15 = (struct rte_pci_device *) 0x54de5e0
> 
> (gdb) p /x *pdev
> $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> 0x3a6b7f0,
>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> 0x0, bus = 0x6, devid = 0x2, function = 0x0},
> 
> Hope this explains why the pci_device has NULL devargs at this point
> and how my fix to set it at this point helps solve the problem?
> 
> Please let me know your thoughts
> 

I think the proper fix is instead to have a clean update during the PCI scan.
The proper way is to keep the old device, but override its fields as new info was found.

Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency.

My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities
of some devices on it. It would be better to have a fully compliant scan + probe ops considering
the supported capabilities, instead of forcing this on all buses.

I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep
the new one, or ignore the probe, free the new devargs and report an error? If the former,
please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course).

> Thanks
> Som
> 
> 
> 
> 
>> Som
>>

BR,
Gaetan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-10 14:34           ` Gaetan Rivet
@ 2020-02-10 15:00             ` Gaetan Rivet
  2020-02-11  4:33             ` Somnath Kotur
  1 sibling, 0 replies; 11+ messages in thread
From: Gaetan Rivet @ 2020-02-10 15:00 UTC (permalink / raw)
  To: dev

On 10/02/2020 15:34, Gaetan Rivet wrote:
> Hi Somnath,
> 
> Reformating the mails, to keep with the inner-posting (mixing top-posting and inner-posting
> makes it hard to follow). See the end of the mail.
> 
> 

[...]

>> Any updates on this ?
>> My thoughts on this are just as I'd suspected / suggested earlier the
>> parent device (06:02:0) gets probed once
>> during ovs-vswitchd bring up right after these 2 cmds
>>
>> ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
>> # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
>> ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
>>
>> After this point the pci probe was invoked on the parent device, but
>> without any devargs hence it is NULL.
>>
>> Now when the 'ovs add-port' cmd is issued to add a representor port
>> for example say 06:02:01 (using 06:02:00 as parent device)
>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
>> options:dpdk-devargs=0000:06:02.0,representor=[1]
>>
>> What happens is
>>
>> *before* pci_name_set()
>> p dev
>> $1 = (struct rte_pci_device *) 0x55bd6b0
>> (gdb) p /x *dev
>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
>> 0x3a59a00, numa_node = 0x0,
>>      devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
>> function = 0x0}.......}
>>
>> *after* pci_name_set()
>>
>> p /x *dev
>> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
>> 0x3a59a00, numa_node = 0x0,
>>      devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
>> function = 0x0}......}
>>
>> As you can see, 'devargs' for the pci_device is now populated ..
>>
>> But if you go further down in pci_scan_one()
>>
>> pci_scan_one (dirname=0x7ffea330a5e0
>> "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
>>      at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
>> 351                                     if (!rte_dev_is_probed(&dev2->device)) {
>> (gdb) p dev2
>> $5 = (struct rte_pci_device *) 0x54de5e0
>> (gdb) p /x *dev2
>> $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
>> 0x3a6b7f0,
>>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
>> 0x0, bus = 0x6, devid = 0x2, function = 0x0}
>>
>> we hit the pci_device ( 0x54de5e0) that was created during the first
>> time probe of the parent device ?
>> Since it is already probed, it goes to line 381 where it does frees
>> the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
>>
>> As you can see this pci_device does not have it's devargs set (rightly
>> so as initially there were no devargs for this device)
>>
>> But as shown in the stack trace in my previous reply, when
>> pci_find_device() walks the rte_pci_devices list , it finds this
>> earlier probed device (without devargs)
>>
> 
> Alright, I think you are right, this is what is happening.
> The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have
> missed the case where the new device has more info that the previous one (linked devargs).
> 
>> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
>> data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
>> 426                             return &pdev->device;
>> (gdb) p pdev
>> $15 = (struct rte_pci_device *) 0x54de5e0
>>
>> (gdb) p /x *pdev
>> $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
>> 0x3a6b7f0,
>>      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
>> 0x0, bus = 0x6, devid = 0x2, function = 0x0},
>>
>> Hope this explains why the pci_device has NULL devargs at this point
>> and how my fix to set it at this point helps solve the problem?
>>
>> Please let me know your thoughts
>>
> 
> I think the proper fix is instead to have a clean update during the PCI scan.
> The proper way is to keep the old device, but override its fields as new info was found.
> 
> Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency.
> 
> My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities
> of some devices on it. It would be better to have a fully compliant scan + probe ops considering
> the supported capabilities, instead of forcing this on all buses.
> 
> I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep
> the new one, or ignore the probe, free the new devargs and report an error? If the former,
> please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course).

Unfortunately, this could probably happen with some devargs. In particular, the rte_devargs_insert() function will test whether devargs name are matching before inserting a new devargs. There is an ambiguity with the BDF / DBDF notation for the PCI devices in particular, which could lead to devargs already existing for some devices, but with a new one being inserted.

The fix would be to have the bus parse the device, and compare the binary blob resulting from this parsing.
The current bus->parse() API might help, but is currently too limited (no way to know how many bytes long the result is).

This is fastidious to have to write this because some leeway was allowed to the PCI back in the day. For now at least, mind this possibility when writing your fix, those above considerations could maybe be worked out with future API evolutions.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-10 14:34           ` Gaetan Rivet
  2020-02-10 15:00             ` Gaetan Rivet
@ 2020-02-11  4:33             ` Somnath Kotur
  2020-02-11  8:38               ` Gaetan Rivet
  1 sibling, 1 reply; 11+ messages in thread
From: Somnath Kotur @ 2020-02-11  4:33 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

On Mon, Feb 10, 2020 at 8:04 PM Gaetan Rivet <grive@u256.net> wrote:
>
> Hi Somnath,
>
> Reformating the mails, to keep with the inner-posting (mixing top-posting and inner-posting
> makes it hard to follow). See the end of the mail.
>
>
> >> On Wed, Feb 5, 2020 at 2:47 PM Gaetan Rivet <grive@u256.net> wrote:
> >>>
> >>> On 05/02/2020 09:52, Somnath Kotur wrote:
> >>>> Hello Gaetan,
> >>>>
> >>>>
> >>>> On Tue, Feb 4, 2020 at 3:19 PM Gaetan Rivet <grive@u256.net> wrote:
> >>>>>
> >>>>> On 04/02/2020 10:15, Somnath Kotur wrote:
> >>>>>> As per the comments in this code section, "since there is a matching device,
> >>>>>> it is now its responsibility to manage the devargs we've just inserted."
> >>>>>> But the matching device ptr's devargs is still uninitialized or not pointing
> >>>>>> to the newest dev_args that were passed as a parameter to local_dev_probe().
> >>>>>> This is needed particularly in the case when *probe is called again* on an
> >>>>>> already probed device(the parent device for the representor) as part of adding
> >>>>>> a representor port to an OVS switch(OVS-DPDK) like so:
> >>>>>> ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk \
> >>>>>> options:dpdk-devargs=0000:06:02.0,representor=[1]
> >>>>>>
> >>>>>> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> >>>>>>
> >>>>>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> >>>>>> ---
> >>>>>>     lib/librte_eal/common/eal_common_dev.c | 1 +
> >>>>>>     1 file changed, 1 insertion(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> >>>>>> index 9e4f09d..311eef5 100644
> >>>>>> --- a/lib/librte_eal/common/eal_common_dev.c
> >>>>>> +++ b/lib/librte_eal/common/eal_common_dev.c
> >>>>>> @@ -171,6 +171,7 @@ static int cmp_dev_name(const struct rte_device *dev, const void *_name)
> >>>>>>          * those devargs shouldn't be removed manually anymore.
> >>>>>>          */
> >>>>>>
> >>>>>> +     dev->devargs = da;
> >>>>>>         ret = dev->bus->plug(dev);
> >>>>>>         if (ret > 0)
> >>>>>>                 ret = -ENOTSUP;
> >>>>>>
> >>>>>
> >>>>> Hello Somnath,
> >>>>>
> >>>>> On a surface level, the fix does not seem correct.
> >>>>> The comment
> >>>>>
> >>>>>       "Since there is a matching device, it is now its responsibility
> >>>>>        to manage the devargs we've just inserted. From this point,
> >>>>>        those devargs should'nt be removed manually anymore."
> >>>>>
> >>>>> means that the err_devarg label is not correct, on further error in the
> >>>>> function, returning the error code and cleaning up the device is sufficient.
> >>>>>
> >>>>> Setting the devargs for a device is the responsibility of the bus scan function.
> >>>>> In the PCI bus for example, this is done in pci_name_set(), called once a device name is fully qualified
> >>>>> after scanning the system and thus being able to match a devargs to the device name.
> >>>>>
> >>>>> Can you please give more information about the device bus, and maybe trace the path taken
> >>>>> by the line "ret = da->bus->scan();" a few lines above your edit? If your dev->devargs is not set
> >>>>> afterward, it seems the bug would be there.
> >>>>>
> >>>> Sure, here is the stack trace from the pci_name_set()
> >>>> pci_name_set (dev=0x570eab0) at
> >>>> /root/dpdk-int_nxt/drivers/bus/pci/pci_common.c:65
> >>>> 65                              dev->name, sizeof(dev->name));
> >>>> (gdb) p /x *dev
> >>>> $1 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> >>>> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> >>>> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> >>>> = 0x6, devid = 0x2, function = 0x1}, id = {class_id = 0x20000,
> >>>> vendor_id = 0x14e4,
> >>>>       device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> >>>> subsystem_device_id = 0x16d7}.........
> >>>> (gdb) bt
> >>>> #0  pci_name_set (dev=0x570eab0) at
> >>>> /root/dpdk19.11/drivers/bus/pci/pci_common.c:65
> >>>> #1  0x0000000000457b69 in pci_scan_one (dirname=0x7ffdb46ef230
> >>>> "/sys/bus/pci/devices/0000:06:02.1", addr=0x7ffdb46ef220) at
> >>>> /root/dpdk-19.11/drivers/bus/pci/linux/pci.c:305
> >>>> #2  0x0000000000458188 in rte_pci_scan () at
> >>>> /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:488
> >>>> #3  0x00000000004b9530 in local_dev_probe (devargs=0x527cc10
> >>>> "0000:06:02.0,representor=[1]", new_dev=0x7ffdb46f02b8) at
> >>>> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:158
> >>>> #4  0x00000000004b9726 in rte_dev_probe (devargs=0x527cc10
> >>>> "0000:06:02.0,representor=[1]") at
> >>>> /root/dpdk-int_nxt/lib/librte_eal/common/eal_common_dev.c:227
> >>>> #5  0x00000000033d196a in netdev_dpdk_process_devargs
> >>>> (dev=0x2000e4800, devargs=0x527cc10 "0000:06:02.0,representor=[1]",
> >>>> errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1798
> >>>> #6  0x00000000033d1d94 in netdev_dpdk_set_config (netdev=0x2000e4880,
> >>>> args=0x56edf10, errp=0x7ffdb46f0478) at lib/netdev-dpdk.c:1921
> >>>> #7  0x00000000032c2e48 in netdev_set_config (netdev=0x2000e4880,
> >>>> args=0x56edf10, errp=0x7ffdb46f0598) at lib/netdev.c:494
> >>>> #8  0x00000000031e9a4e in iface_set_netdev_config
> >>>> (iface_cfg=0x56edc90, netdev=0x2000e4880, errp=0x7ffdb46f0598) at
> >>>> vswitchd/bridge.c:2015
> >>>> #9  0x00000000031e9bd1 in iface_do_create (br=0x52801d0,
> >>>> iface_cfg=0x56edc90, ofp_portp=0x7ffdb46f05a4, netdevp=0x7ffdb46f05a8,
> >>>> errp=0x7ffdb46f0598) at vswitchd/bridge.c:2049
> >>>> #10 0x00000000031e9da6 in iface_create (br=0x52801d0,
> >>>> iface_cfg=0x56edc90, port_cfg=0x56bae30) at vswitchd/bridge.c:2100
> >>>> #11 0x00000000031e7494 in bridge_add_ports__ (br=0x52801d0,
> >>>> wanted_ports=0x52802b0, with_requested_port=false) at
> >>>> vswitchd/bridge.c:1164
> >>>> #12 0x00000000031e7525 in bridge_add_ports (br=0x52801d0,
> >>>> wanted_ports=0x52802b0) at vswitchd/bridge.c:1180
> >>>> #13 0x00000000031e6a4f in bridge_reconfigure (ovs_cfg=0x52b2c40) at
> >>>> vswitchd/bridge.c:893
> >>>> #14 0x00000000031ed37c in bridge_run () at vswitchd/bridge.c:3324
> >>>> #15 0x00000000031f2977 in main (argc=11, argv=0x7ffdb46f0878) at
> >>>> vswitchd/ovs-vswitchd.c:127
> >>>>
> >>>>> It could be the bus that is not properly implemented, the devargs name not fully qualified, a special path taken by a vdev (I'm not sure what the ovsbr0 device is for example), or a specific representor pluging function, difficult to pinpoint exactly right now.
> >>>> As you can see , the DBDF along with the 'representor=[1]' is passed
> >>>> all the way down to local_dev_probe(). However when i traced further
> >>>> inside pci_name_set() =>pci_devargs_lookup()
> >>>>
> >>>> static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev)
> >>>> {
> >>>>           struct rte_devargs *devargs;
> >>>>           struct rte_pci_addr addr;
> >>>>
> >>>>           RTE_EAL_DEVARGS_FOREACH("pci", devargs) {
> >>>>                   devargs->bus->parse(devargs->name, &addr);
> >>>> ======================================> pci_parse() [1]
> >>>>
> >>>>                   if (!rte_pci_addr_cmp(&dev->addr, &addr))
> >>>>                           return devargs;
> >>>>           }
> >>>>           return NULL;
> >>>> }
> >>>>
> >>>> [1]: This is the value of dev_args at this point in time -
> >>>> p *devargs
> >>>> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >>>> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >>>> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x56eec00
> >>>> "representor=[1]", drv_str = 0x56eec00 "representor=[1]"},
> >>>>     bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >>>> 0x0, data = 0x0}
> >>>>
> >>>> It has the 'representor' keyword still intact, but what gets passed to
> >>>> pci_parse() is pci_parse (name=0x56d76b8 "0000:06:02.0",
> >>>> addr=0x7ffdb46ed180) i.e just the original DBDF of the function
> >>>> that was already probed once during OVS-DPDK init... I'm not sure if
> >>>> that is the reason , but eventually it walks the entire for_each loop
> >>>> and returns NULL ...i.e pci_devargs_lookup() returns NULL ... So this
> >>>> is where the 'devargs' value passed is getting lost
> >>>> Hope that gives what you were looking for ?
> >>>>
> >>>>
> >>>> Thanks
> >>>> Som
> >>>>
> >>>
> >>> Hello,
> >>>
> >>> Yes, thank you for the thorough response.
> >>> Your second device has the BDF 06:02.01 while its parent has 06:02.00. The parent device already matched the devargs and its rte_device should already point to it.
> >>>
> >>> I think it is not correct to force link a parent device devargs to virtual functions spawned from it.
> >>> The devargs is NULL, because there are none that were used to spawn the virtual functions.
> >>>
> >>> Now, can you describe your overarching issue more precisely? You are able to call rte_dev_probe twice on the same virtual function, because the check against it does not work without the devargs being linked? If that's correct, this is the real bug IMO, and there could be other ways to mitigate this.
> >>>
> >>> Thanks
> >>> Gaetan
> >
> > On 07/02/2020 06:52, Somnath Kotur wrote:
> > HI Gaetan,
> >
> >
> > On Thu, Feb 6, 2020 at 9:36 AM Somnath Kotur <somnath.kotur@broadcom.com> wrote:
> >>
> >> Hello Gaetan,
> >>              Sorry my bad, i think i captured it for the wrong BDF ...
> >> So the overarching goal here is that i want to create/add a
> >> representor port for 06:02.01 using 06:02:00 as the backing/parent
> >> device on the vswitch.
> >>
> >> I recaptured the debug with 06:02:00
> >>
> >> Breakpoint 1, pci_name_set (dev=0x5d44e00) at
> >> /root/dpdk-19.11/drivers/bus/pci/pci_common.c:65
> >> 65                              dev->name, sizeof(dev->name));
> >> (gdb) p /x *dev
> >> $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> >> {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> >> 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain = 0x0, bus
> >> = 0x6, devid = 0x2, function = 0x0}, id = {class_id = 0x20000,
> >> vendor_id = 0x14e4,
> >>      device_id = 0x16dc, subsystem_vendor_id = 0x14e4,
> >> subsystem_device_id = 0x16d7}
> >>
> >> devargs = pci_devargs_lookup(dev);
> >>
> >>   p *devargs
> >> $5 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
> >>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >> 0x0, data = 0x0}
> >>
> >> So as you can see here , this time the devargs is *not NULL* and has
> >> the value of 'representor=[1]'
> >>
> >> local_dev_probe (devargs=0x5cefec0 "0000:06:02.0,representor=[1]",
> >> new_dev=0x7ffd3e872758) at
> >> /root/dpdk-19.11/lib/librte_eal/common/eal_common_dev.c:162
> >> 162             dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
> >> (gdb) p *da
> >> $7 = {next = {tqe_next = 0x0, tqe_prev = 0x3a67970 <devargs_list>},
> >> type = RTE_DEVTYPE_WHITELISTED_PCI, policy = RTE_DEV_WHITELISTED, name
> >> = "0000:06:02.0", '\000' <repeats 51 times>, {args = 0x5d1c8a0
> >> "representor=[1]", drv_str = 0x5d1c8a0 "representor=[1]"},
> >>    bus = 0x3a59a00 <rte_pci_bus>, cls = 0x0, bus_str = 0x0, cls_str =
> >> 0x0, data = 0x0}
> >>
> >> (gdb) p *dev
> >> $8 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5c6df00
> >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> >> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
> >>
> >> So it appears that 'find_device' is returning NULL devargs ?
> >>
> >> I found that indeed is the case in pci_find_device()
> >>
> >> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> >> data=0x5bb53f8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> >> 426                             return &pdev->device;
> >>
> >> gdb) p pdev->device
> >> $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x5b14f00
> >> "0000:06:02.0", driver = 0x3a6b7f0 <bnxt_rte_pmd+16>, bus = 0x3a59a00
> >> <rte_pci_bus>, numa_node = 0, devargs = 0x0}
> >>
> > Any updates on this ?
> > My thoughts on this are just as I'd suspected / suggested earlier the
> > parent device (06:02:0) gets probed once
> > during ovs-vswitchd bring up right after these 2 cmds
> >
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > # Now start ovs-vswitchd (--no-ovsdb-server, since it is already started)
> > ovs-ctl --no-ovsdb-server --db-sock="$DB_SOCK" start
> >
> > After this point the pci probe was invoked on the parent device, but
> > without any devargs hence it is NULL.
> >
> > Now when the 'ovs add-port' cmd is issued to add a representor port
> > for example say 06:02:01 (using 06:02:00 as parent device)
> > ovs-vsctl add-port ovsbr0 vfrep1 -- set Interface vfrep1 type=dpdk
> > options:dpdk-devargs=0000:06:02.0,representor=[1]
> >
> > What happens is
> >
> > *before* pci_name_set()
> > p dev
> > $1 = (struct rte_pci_device *) 0x55bd6b0
> > (gdb) p /x *dev
> > $2 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > 0x3a59a00, numa_node = 0x0,
> >      devargs = 0x0}, addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> > function = 0x0}.......}
> >
> > *after* pci_name_set()
> >
> > p /x *dev
> > $3 = {next = {tqe_next = 0x0, tqe_prev = 0x0}, device = {next =
> > {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x0, driver = 0x0, bus =
> > 0x3a59a00, numa_node = 0x0,
> >      devargs = 0x55a4ae0},addr = {domain = 0x0, bus = 0x6, devid = 0x2,
> > function = 0x0}......}
> >
> > As you can see, 'devargs' for the pci_device is now populated ..
> >
> > But if you go further down in pci_scan_one()
> >
> > pci_scan_one (dirname=0x7ffea330a5e0
> > "/sys/bus/pci/devices/0000:06:02.0", addr=0x7ffea330a5d0)
> >      at /root/dpdk-int_nxt/drivers/bus/pci/linux/pci.c:351
> > 351                                     if (!rte_dev_is_probed(&dev2->device)) {
> > (gdb) p dev2
> > $5 = (struct rte_pci_device *) 0x54de5e0
> > (gdb) p /x *dev2
> > $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> > 0x3a6b7f0,
> >      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> > 0x0, bus = 0x6, devid = 0x2, function = 0x0}
> >
> > we hit the pci_device ( 0x54de5e0) that was created during the first
> > time probe of the parent device ?
> > Since it is already probed, it goes to line 381 where it does frees
> > the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
> >
> > As you can see this pci_device does not have it's devargs set (rightly
> > so as initially there were no devargs for this device)
> >
> > But as shown in the stack trace in my previous reply, when
> > pci_find_device() walks the rte_pci_devices list , it finds this
> > earlier probed device (without devargs)
> >
>
> Alright, I think you are right, this is what is happening.
> The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have
> missed the case where the new device has more info that the previous one (linked devargs).
>
That is correct

> > pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> > data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> > 426                             return &pdev->device;
> > (gdb) p pdev
> > $15 = (struct rte_pci_device *) 0x54de5e0
> >
> > (gdb) p /x *pdev
> > $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> > {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> > 0x3a6b7f0,
> >      bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> > 0x0, bus = 0x6, devid = 0x2, function = 0x0},
> >
> > Hope this explains why the pci_device has NULL devargs at this point
> > and how my fix to set it at this point helps solve the problem?
> >
> > Please let me know your thoughts
> >
>
> I think the proper fix is instead to have a clean update during the PCI scan.
> The proper way is to keep the old device, but override its fields as new info was found.

Agreed
>
> Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency.
>
> My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities
> of some devices on it. It would be better to have a fully compliant scan + probe ops considering
> the supported capabilities, instead of forcing this on all buses.
Sure, so i'm guessing you meant something like this ?

dex 740a2cd..92a41c2 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -377,6 +377,8 @@
                                                 */
                                                RTE_LOG(ERR, EAL,
"Unexpected device scan at %s!\n",
                                                        filename);
+                                       else if (dev2->device.devargs
!= dev->device.devargs)
+                                               pci_name_set(dev2);
                                }
                                free(dev);
                        }

Which is basically checking for devargs mismatch between the 'new'
device and a match with an
already probed device and  return the old device while adjusting the
new info (devargs)
Is that OK?

>
> I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep
> the new one, or ignore the probe, free the new devargs and report an error? If the former,
> please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course).
>
> > Thanks
> > Som
> >
> >
> >
> >
> >> Som
> >>
>
> BR,
> Gaetan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-11  4:33             ` Somnath Kotur
@ 2020-02-11  8:38               ` Gaetan Rivet
  2020-02-14  6:46                 ` Somnath Kotur
  0 siblings, 1 reply; 11+ messages in thread
From: Gaetan Rivet @ 2020-02-11  8:38 UTC (permalink / raw)
  To: Somnath Kotur; +Cc: dev

[...]
>>> (gdb) p dev2
>>> $5 = (struct rte_pci_device *) 0x54de5e0
>>> (gdb) p /x *dev2
>>> $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
>>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
>>> 0x3a6b7f0,
>>>       bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
>>> 0x0, bus = 0x6, devid = 0x2, function = 0x0}
>>>
>>> we hit the pci_device ( 0x54de5e0) that was created during the first
>>> time probe of the parent device ?
>>> Since it is already probed, it goes to line 381 where it does frees
>>> the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
>>>
>>> As you can see this pci_device does not have it's devargs set (rightly
>>> so as initially there were no devargs for this device)
>>>
>>> But as shown in the stack trace in my previous reply, when
>>> pci_find_device() walks the rte_pci_devices list , it finds this
>>> earlier probed device (without devargs)
>>>
>>
>> Alright, I think you are right, this is what is happening.
>> The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have
>> missed the case where the new device has more info that the previous one (linked devargs).
>>
> That is correct
> 
>>> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
>>> data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
>>> 426                             return &pdev->device;
>>> (gdb) p pdev
>>> $15 = (struct rte_pci_device *) 0x54de5e0
>>>
>>> (gdb) p /x *pdev
>>> $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
>>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
>>> 0x3a6b7f0,
>>>       bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
>>> 0x0, bus = 0x6, devid = 0x2, function = 0x0},
>>>
>>> Hope this explains why the pci_device has NULL devargs at this point
>>> and how my fix to set it at this point helps solve the problem?
>>>
>>> Please let me know your thoughts
>>>
>>
>> I think the proper fix is instead to have a clean update during the PCI scan.
>> The proper way is to keep the old device, but override its fields as new info was found.
> 
> Agreed
>>
>> Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency.
>>
>> My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities
>> of some devices on it. It would be better to have a fully compliant scan + probe ops considering
>> the supported capabilities, instead of forcing this on all buses.
> Sure, so i'm guessing you meant something like this ?
> 
> dex 740a2cd..92a41c2 100644
> --- a/drivers/bus/pci/linux/pci.c
> +++ b/drivers/bus/pci/linux/pci.c
> @@ -377,6 +377,8 @@
>                                                   */
>                                                  RTE_LOG(ERR, EAL,
> "Unexpected device scan at %s!\n",
>                                                          filename);
> +                                       else if (dev2->device.devargs
> != dev->device.devargs)
> +                                               pci_name_set(dev2);
>                                  }
>                                  free(dev);
>                          }
> 
> Which is basically checking for devargs mismatch between the 'new'
> device and a match with an
> already probed device and  return the old device while adjusting the
> new info (devargs)
> Is that OK?
> 
>>
>> I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep
>> the new one, or ignore the probe, free the new devargs and report an error? If the former,
>> please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course).

Hello Somnath,

Yes, this is basically it, however you also need to take care of a potential already existing devargs here, because the PCI bus does not guarantee that all devargs of a device will be named the same.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug
  2020-02-11  8:38               ` Gaetan Rivet
@ 2020-02-14  6:46                 ` Somnath Kotur
  0 siblings, 0 replies; 11+ messages in thread
From: Somnath Kotur @ 2020-02-14  6:46 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev

On Tue, Feb 11, 2020 at 2:08 PM Gaetan Rivet <grive@u256.net> wrote:
>
> [...]
> >>> (gdb) p dev2
> >>> $5 = (struct rte_pci_device *) 0x54de5e0
> >>> (gdb) p /x *dev2
> >>> $6 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> >>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> >>> 0x3a6b7f0,
> >>>       bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> >>> 0x0, bus = 0x6, devid = 0x2, function = 0x0}
> >>>
> >>> we hit the pci_device ( 0x54de5e0) that was created during the first
> >>> time probe of the parent device ?
> >>> Since it is already probed, it goes to line 381 where it does frees
> >>> the just allocated 'pci_device'  inside pci_scan_one() via 'free(dev)'
> >>>
> >>> As you can see this pci_device does not have it's devargs set (rightly
> >>> so as initially there were no devargs for this device)
> >>>
> >>> But as shown in the stack trace in my previous reply, when
> >>> pci_find_device() walks the rte_pci_devices list , it finds this
> >>> earlier probed device (without devargs)
> >>>
> >>
> >> Alright, I think you are right, this is what is happening.
> >> The issue IMO, is that the PCI scan is thus hitting an already existing device, but we have
> >> missed the case where the new device has more info that the previous one (linked devargs).
> >>
> > That is correct
> >
> >>> pci_find_device (start=0x0, cmp=0x4b92d8 <cmp_dev_name>,
> >>> data=0x55a4af8) at /root/dpdk-19.11/drivers/bus/pci/pci_common.c:426
> >>> 426                             return &pdev->device;
> >>> (gdb) p pdev
> >>> $15 = (struct rte_pci_device *) 0x54de5e0
> >>>
> >>> (gdb) p /x *pdev
> >>> $16 = {next = {tqe_next = 0x5307460, tqe_prev = 0x5539f80}, device =
> >>> {next = {tqe_next = 0x0, tqe_prev = 0x0}, name = 0x54e4f00, driver =
> >>> 0x3a6b7f0,
> >>>       bus = 0x3a59a00, numa_node = 0x0, devargs = 0x0}, addr = {domain =
> >>> 0x0, bus = 0x6, devid = 0x2, function = 0x0},
> >>>
> >>> Hope this explains why the pci_device has NULL devargs at this point
> >>> and how my fix to set it at this point helps solve the problem?
> >>>
> >>> Please let me know your thoughts
> >>>
> >>
> >> I think the proper fix is instead to have a clean update during the PCI scan.
> >> The proper way is to keep the old device, but override its fields as new info was found.
> >
> > Agreed
> >>
> >> Something like calling pci_name_set(dev2); line 359 maybe. BSD should also be updated for consistency.
> >>
> >> My issue with your patch is that I think this issue is very specific to the PCI bus and the capabilities
> >> of some devices on it. It would be better to have a fully compliant scan + probe ops considering
> >> the supported capabilities, instead of forcing this on all buses.
> > Sure, so i'm guessing you meant something like this ?
> >
> > dex 740a2cd..92a41c2 100644
> > --- a/drivers/bus/pci/linux/pci.c
> > +++ b/drivers/bus/pci/linux/pci.c
> > @@ -377,6 +377,8 @@
> >                                                   */
> >                                                  RTE_LOG(ERR, EAL,
> > "Unexpected device scan at %s!\n",
> >                                                          filename);
> > +                                       else if (dev2->device.devargs
> > != dev->device.devargs)
> > +                                               pci_name_set(dev2);
> >                                  }
> >                                  free(dev);
> >                          }
> >
> > Which is basically checking for devargs mismatch between the 'new'
> > device and a match with an
> > already probed device and  return the old device while adjusting the
> > new info (devargs)
> > Is that OK?
> >
> >>
> >> I'm wondering whether this can happen with an already existing devargs? If so, is it more correct to keep
> >> the new one, or ignore the probe, free the new devargs and report an error? If the former,
> >> please clean up properly the devargs using rte_devargs_remove() (before calling pci_name_set() of course).
>
> Hello Somnath,
>
> Yes, this is basically it, however you also need to take care of a potential already existing devargs here, because the PCI bus does not guarantee that all devargs of a device will be named the same
Thanks a lot Gaetan, have sent out V2 incorporating your suggestions
here.. Please ack


>
> .

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-14  6:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04  9:15 [dpdk-dev] [PATCH] eal: fix to set the rte_device ptr's device args before hotplug Somnath Kotur
2020-02-04  9:49 ` Gaetan Rivet
2020-02-05  8:52   ` Somnath Kotur
2020-02-05  9:17     ` Gaetan Rivet
2020-02-06  4:06       ` Somnath Kotur
2020-02-07  5:52         ` Somnath Kotur
2020-02-10 14:34           ` Gaetan Rivet
2020-02-10 15:00             ` Gaetan Rivet
2020-02-11  4:33             ` Somnath Kotur
2020-02-11  8:38               ` Gaetan Rivet
2020-02-14  6:46                 ` Somnath Kotur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).