DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [rte_ethdev] mac_addrs as part of dev_private may cause primary process crash
@ 2020-03-16  8:47 胡林帆
  0 siblings, 0 replies; only message in thread
From: 胡林帆 @ 2020-03-16  8:47 UTC (permalink / raw)
  To: dev

Hi all,


struct rte_eth_dev_data has a member named dev_private and another named mac_addrs, as shown below:


struct rte_eth_dev_data {
...
void *dev_private;
/**< PMD-specific private data.
*   @see rte_eth_dev_release_port()
*/


struct ether_addr *mac_addrs;
/**< Device Ethernet link address.
*   @see rte_eth_dev_release_port()
*/
...
};


Some drivers like mlx5 implements mac_addrs as part of dev_private:


static struct rte_eth_dev *
mlx5_dev_spawn(struct rte_device *dpdk_dev,
      struct ibv_device *ibv_dev,
      struct mlx5_dev_config config,
      const struct mlx5_switch_info *switch_info)
{
...
    eth_dev->data->mac_addrs = priv->mac;
...
}


I don't think it's a good code habit, cause this may have potential issues while freeing dev_private and/or mac_addrs., even though they commented: 


'/* mac_addrs must not be freed because part of dev_private */"
eth_dev->data->mac_addrs = NULL;


It's all good when things all done in primary process. But if this invoked in secondary process, it'll cause primary crash.


In my test environment I have two Mellanox ports bonded with 802.3ad mode. And my dpdk-app enabled with rte_pdump. After I started my dpdk-app, I launch dpdk-dump to capture packets. Then my dpdk-app crash


Here is the backtrace: 


(gdb) bt
#0  0x00005555556a477b in rte_eth_macaddr_get ()
#1  0x0000555555699cbc in bond_mode_8023ad_periodic_cb ()
#2  0x00005555556d836c in eal_alarm_callback ()
#3  0x00005555556d68a2 in eal_intr_thread_main ()
#4  0x00007ffff67744a4 in start_thread (arg=0x7ffff5d3a700) at pthread_create.c:456
#5  0x00007ffff62abd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97


Here is dpdk-pdump's backtrace:


(gdb) r
Starting program: /home/hulinfan/dpdk-stable-18.11.2/build/app/dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=eth7,tx-dev=eth7'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
EAL: Detected 40 lcore(s)
EAL: Detected 2 NUMA nodes
pathname for rte_mem_config: /var/run/dpdk/rte/config
[New Thread 0x7ffff61d3700 (LWP 5508)]
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_5507_a3efe2a87b368c
[New Thread 0x7ffff59d2700 (LWP 5509)]
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:01:00.1 on NUMA socket 0
EAL:   probe driver: 8086:10fb net_ixgbe
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1015 net_mlx5
net_mlx5: port 0 UAR address 0x7ffef4538000 size 4294967296 occupied, please adjust MLX5_UAR_OFFSET or try EAL parameter --base-virtaddr


Thread 1 "dpdk-pdump" hit Breakpoint 4, mlx5_dev_spawn (dpdk_dev=dpdk_dev@entry=0x5555563d84e0, ibv_dev=<optimized out>, config=..., switch_info=switch_info@entry=0x7fffffffe5a8)
    at /home/hulinfan/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5.c:1279
1279eth_dev->data->mac_addrs = NULL;
(gdb) l
1274}


Let me try to explain:
1) Secondary (like dpdk-pdump) init eal environment with rte_eal_init();
2) Probe all the buses and devices/drivers with rte_bus_probe();
3) the pci probe  handler mlx5_pci_probe() is called;
4) mlx5_dev_spawn() is called
5) mlx5_uar_init_secondary() is called and failed
6) go out and clean resources, setting eth_dev->data->mac_addrs = NULL;
7) Primary's alarm callback bond_mode_8023ad_periodic_cb() try to get mac addrs with rte_eth_macaddr_get (), then crashed.


Please comfirm whether  the following drivers have the same bug:


---- part of dev_private Matches (12 in 9 files) ----
fs_eth_dev_create in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not be freed alone because part of dev_private */
fs_rte_eth_free in failsafe.c (drivers\net\failsafe) : /* mac_addrs must not be freed alone because part of dev_private */
eth_dev_vmbus_release in hn_ethdev.c (drivers\net\netvsc) : /* mac_addrs must not be freed alone because part of dev_private */
mlx4_pci_probe in mlx4.c (drivers\net\mlx4) : /* mac_addrs must not be freed because part of dev_private */
mlx5_dev_close in mlx5.c (drivers\net\mlx5) : * rte_eth_dev_release_port(). mac_addrs is part of dev_private so
mlx5_dev_spawn in mlx5.c (drivers\net\mlx5) : /* mac_addrs must not be freed alone because part of dev_private */
rte_pmd_af_packet_remove in rte_eth_af_packet.c (drivers\net\af_packet) : /* mac_addrs must not be freed alone because part of dev_private */
eth_kni_remove in rte_eth_kni.c (drivers\net\kni) : /* mac_addrs must not be freed alone because part of dev_private */
rte_pmd_null_remove in rte_eth_null.c (drivers\net\null) : /* mac_addrs must not be freed alone because part of dev_private */
rte_pmd_ring_remove in rte_eth_ring.c (drivers\net\ring) : /* mac_addrs must not be freed alone because part of dev_private */
eth_dev_tap_create in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not be freed alone because part of dev_private */
rte_pmd_tap_remove in rte_eth_tap.c (drivers\net\tap) : /* mac_addrs must not be freed alone because part of dev_private */




| |
Linfan Hu
|
|
zhongdahulinfan@163.com
|
签名由网易邮箱大师定制

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-03-16  8:47 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-16  8:47 [dpdk-dev] [rte_ethdev] mac_addrs as part of dev_private may cause primary process crash 胡林帆

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).