From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8AC0845FF3 for ; Sun, 5 Jan 2025 17:01:44 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 186D140278; Sun, 5 Jan 2025 17:01:44 +0100 (CET) Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) by mails.dpdk.org (Postfix) with ESMTP id 5EAED4014F for ; Sun, 5 Jan 2025 17:01:42 +0100 (CET) Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-29fad34bb62so6986610fac.1 for ; Sun, 05 Jan 2025 08:01:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736092901; x=1736697701; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=5QOcJUlVzcZzsbD8jr+IjUqyWe17G2bMGLokdTaq038=; b=SwcQsp2Q71bE4HX5eyoBlcmhdQxkTkq+nIGCG6h3XINDqajz+M2a48D7gT2s16LSLj /Rwji4lbtc5Chkw/5LKNTKxBOKHz15dsHN3XOXGgGT3qCxmmFq0ZS/R2mg4o38QJCnbd yaatDrQbg2cfe3nXhwiyUtgpzFGEhOCNBxlHwFgukt8FnDLrOpGXx4CNIJnCHMf3mAST OOzxXZmefDeJRSym+9URqIrrMdgb/Y6Ra1+5Dw1ZAlFNnc109+Xh+awfUeNpP0zw309M L0o23LdAQD4nSEiu8R1J9gtvlQ1OeDhd3VhadpaccNXGbWKL7jMhRET7cY/scqgpYVqv Q3GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736092901; x=1736697701; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5QOcJUlVzcZzsbD8jr+IjUqyWe17G2bMGLokdTaq038=; b=red6oy6ZW0FoVBxMuUVOQa5buZpMDk6R1DJQtSNnjrGyNp5SGn4A5/pGijCF1A0r0l m1xXsNFYTGKVRkSMOrpmSONC0jyYLHn51Xi6l/ivxObNkpSQprdsr7F5Ybu7SJnYoDio g5fvsUNc6A/vIeLGQO/WorjooIyu4fHlqhfZGabeLwxxJ+nUrOS1KbHowRAQfEuU7ICn 3B6BuY/RiPozPHTLWgkQjpvTM0lT0ZIlPeXjH+8Z2gUd5CxPTkvJ7DwodOH4GwuvHAEZ IMPUlGlD70pfH3Dhfo81aj2Yz1mhDgNycC26bYOfRj9z8fLQ8oKhVFZrhx2a+3q3FKwX Vmwg== X-Gm-Message-State: AOJu0YydW55YmjJEkEHdtFhLEkW8DENu0pz3XInMrFtaQ/Q9v8Nt0hY8 nxaD5sT7bIgHPQfVkJs8rRbiqSkp6nQpFd6+QUBoR6vh2ra1uCBwku3OW+ypbscRB0vNjXWrrME OpE8D8beOmrxobr0F1Xgt3seSIY4= X-Gm-Gg: ASbGnct1r3Z23+9iX5r1V+C8+JcZyJaYXUaszb/1dL8exesK+aIKuhsGhMQUQaShg3T BIED/X5mrpve12Dv/V9JFt/S4KOUWvB3fDQ30 X-Google-Smtp-Source: AGHT+IHuZR6m0KHYsMgnq/lUvtMwZaFvf3WZDm3w9gpDhsOiM2XAdtKCnqG8k0xsejL24d/QCqriFgmrkXgrEMgxB30= X-Received: by 2002:a05:6870:2c88:b0:29e:4111:fefc with SMTP id 586e51a60fabf-2a7fb2348ecmr30058272fac.12.1736092899854; Sun, 05 Jan 2025 08:01:39 -0800 (PST) MIME-Version: 1.0 References: <20250104214032.04eb6d25@sovereign> <20250105010148.1ef26333@sovereign> In-Reply-To: <20250105010148.1ef26333@sovereign> From: Alan Beadle Date: Sun, 5 Jan 2025 11:01:28 -0500 Message-ID: Subject: Re: Multiprocess App Problems with tx_burst To: Dmitry Kozlyuk Cc: users@dpdk.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org > So, "deamon" and "server" may try using the same queue sometimes, correct? > Synchronizing all access to the single queue should work in this case. That is correct. > BTW, rte_eth_tx_burst() returning >0 does not mean the packets have been sent. > It only means they have been enqueued for sending. > At some point the NIC will complete sending, > only then the PMD can free the mbuf (or decrement its reference count). > For most PMDs, this happens on a subsequent call to rte_eth_tx_burst(). > Which PMD and HW is it? Here is the output of 'dpdk-devbind.py --status': Network devices using DPDK-compatible driver ============================================ 0000:65:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=uio_pci_generic > Have you tried to print as many stats as possible when rte_eth_tx_burst() > can't consume all packets (rte_eth_stats_get(), rte_eth_xstats_get())? In setting this up, I discovered that this error only occurs when the primary process on the other host exits (due to an error) or is not initially running (the NIC is "down" in this case?). It happens consistently when I only launch the processes on one of the two machines. ***But*** counterintuitively, it looks like packets are successfully "sent" by the daemon until the other process begins to run. In case it is useful, I summarize the stats for this case below. Note that I am also seeing another error. Sometimes, rather than tx failing, my app detects incorrect/corrupted mbuf contents and exits immediately. It appears that mbufs are being re-allocated when they should not be. I thought I had finally solved this (see my earlier threads) but with multi-core concurrency this problem has returned. It is very possible that this error is somewhere in my own library code, as it looks like the accompanying non-DPDK structures are also being corrupted (probably first). For background, I maintain a hash table of header structs to track individual mbufs. The sequence numbers in the headers should match those contained in the mbuf's payload. This check is failing after a few hundred successful data messages have been exchanged between the hosts. The sequence number in the mbuf shows that it is in the wrong hash bucket, and the sequence number in the header is a large corrupted value which is out of range for my sequence numbers (and also not matching the bucket). Back to the issue of failed tx bursts: Here are the stats I observed after a packet failed to send from the daemon (after only launching the primary+secondary processes on one of the machines). This failure occurred after the daemon had successfully "sent" hundreds of handshake packets (to nowhere, presumably?), and the failure occurred as soon as the second process had finished initialization: ipackets:0, opackets:0, ibytes:0, obytes:0, ierrors:0, oerrors:0 Got 146 xstats Port:0, tx_q0_packets:1138 Port:0, tx_q0_bytes:125180 Port:0, mac_local_errors:2 Port:0, out_pkts_untagged:5 (All other stats had a value of 0 and are omitted). I will continue investigating the corruption bug in the (likely) case that it is in my library code. In the meantime please let me know if I am using DPDK incorrectly. Thank you again! -Alan