DPDK patches and discussions
 help / color / mirror / Atom feed
From: "junwang01@cestc.cn" <junwang01@cestc.cn>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: dev <dev@dpdk.org>
Subject: Re: Re: dumpcap coredump for 82599 NIC
Date: Thu, 14 Mar 2024 17:22:37 +0800	[thread overview]
Message-ID: <2024031417223703681414@cestc.cn> (raw)
In-Reply-To: <20240313092953.517ac6c7@hermes.local>

[-- Attachment #1: Type: text/plain, Size: 5045 bytes --]

Yes, I think you are right. After adding some debug information, I can confirm that it's probably an initialization issue with the ixgbe driver. 
Secondary processes should initialize some callback functions, but they seem to be missing. 

I made some minor modifications by moving the ixgbe_init_shared_code(hw) position before the secondary processes.
While this brought about some changes, there still occurred a core dump.
I suspect there might be other issues or that such modification might not be appropriate.

[root@xc03-compute3 /]# /dpdk/app/dpdk-dumpcap -i 0000:18:00.0
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device 0000:3b:00.0 cannot be used
mlx5_net: Cannot attach mlx5 shared data
mlx5_net: Unable to init PMD global data: No such file or directory
mlx5_common: Failed to load driver mlx5_eth
EAL: Requested device 0000:3b:00.1 cannot be used
File: /tmp/dpdk-dumpcap_0_0000:18:00.0_20240314091910.pcapng
Capturing on '0000:18:00.0'
Packets captured: 2 Primary process is no longer active, exiting...
EAL: Fail to recv reply for request /var/run/dpdk/rte/mp_socket:mp_pdump
pdump_prepare_client_request(): client request for pdump enable/disable failed
Floating point exception (core dumped)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index d6cf00317e77b64f9822c155115f388ae62241eb..0bf885d7eaba3689fb9b98cdcaa6a928aa787985 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1104,6 +1104,24 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 	eth_dev->tx_pkt_burst = &ixgbe_xmit_pkts;
 	eth_dev->tx_pkt_prepare = &ixgbe_prep_pkts;
 
+	/* Vendor and Device ID need to be set before init of shared code */
+	hw->device_id = pci_dev->id.device_id;
+	hw->vendor_id = pci_dev->id.vendor_id;
+	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->allow_unsupported_sfp = 1;
+
+	/* Initialize the shared code (base driver) */
+#ifdef RTE_LIBRTE_IXGBE_BYPASS
+	diag = ixgbe_bypass_init_shared_code(hw);
+#else
+	diag = ixgbe_init_shared_code(hw);
+#endif /* RTE_LIBRTE_IXGBE_BYPASS */
+
+	if (diag != IXGBE_SUCCESS) {
+		PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
+		return -EIO;
+	}
+
 	/*
 	 * For secondary processes, we don't initialise any further as primary
 	 * has already done this work. Only check we don't need a different
@@ -1135,24 +1153,6 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev, void *init_params __rte_unused)
 	rte_eth_copy_pci_info(eth_dev, pci_dev);
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 
-	/* Vendor and Device ID need to be set before init of shared code */
-	hw->device_id = pci_dev->id.device_id;
-	hw->vendor_id = pci_dev->id.vendor_id;
-	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
-	hw->allow_unsupported_sfp = 1;
-
-	/* Initialize the shared code (base driver) */
-#ifdef RTE_LIBRTE_IXGBE_BYPASS
-	diag = ixgbe_bypass_init_shared_code(hw);
-#else
-	diag = ixgbe_init_shared_code(hw);
-#endif /* RTE_LIBRTE_IXGBE_BYPASS */
-
-	if (diag != IXGBE_SUCCESS) {
-		PMD_INIT_LOG(ERR, "Shared code init failed: %d", diag);
-		return -EIO;
-	}
-
 	if (hw->mac.ops.fw_recovery_mode && hw->mac.ops.fw_recovery_mode(hw)) {
 		PMD_INIT_LOG(ERR, "\nERROR: "
 			"Firmware recovery mode detected. Limiting functionality.\n"


Additionally, I'm using a debug build, but the printed call stack still doesn't feel clear enough, which is quite strange. 

    meson  -Dc_args="-mno-avx512f" -Ddisable_drivers=net/ark,net/atlantic,net/avp,net/axgbe,net/pfe,net/netvsc -Dmax_numa_nodes=8 -Dmax_ethports=128 --buildtype=debug --optimization=0 build 
    ninja -C build install




junwang01@cestc.cn

From: Stephen Hemminger
Date: 2024-03-14 00:29
To: junwang01@cestc.cn
CC: dev
Subject: Re: dumpcap coredump for 82599 NIC
On Wed, 13 Mar 2024 10:00:17 +0800
"junwang01@cestc.cn" <junwang01@cestc.cn> wrote:

> Hi, when I use dumpcap to capture packets on the 82559 network card, coredump appears. 
> The network card bound to ovs-dpdk is 82599, but when I capture packets in other non-82599 network cards (mellanox CX5/C6 or Intel's E810), it is normal. ,
> the dpdk version I am using is 22.11.1, but I see that the call stack is strange, so I am asking you for help. 
> 
> 
> 
>  
> 
> I thought the new version of dpdk might solve it, so I upgraded the dpdk version to 23.11, but the problem is still the same, but the call stack is different and weirder. 
> 
> 
> 
> 
> 
> 
> junwang01@cestc.cn

This is not an issue with dumpcap. The problem is in ixgbe driver.
Some part of the code for checking link status is not safe to be called in
secondary process.

The backtrace looks a bit messed up, since ixgbe driver should not be calling i40e code.
Maybe do a debug build (so more complete symbols available).

[-- Attachment #2: Type: text/html, Size: 7899 bytes --]

  reply	other threads:[~2024-03-15  8:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13  2:00 junwang01
2024-03-13 16:29 ` Stephen Hemminger
2024-03-14  9:22   ` junwang01 [this message]
2024-03-18  2:48     ` junwang01
2024-03-18 15:06       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2024031417223703681414@cestc.cn \
    --to=junwang01@cestc.cn \
    --cc=dev@dpdk.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).