From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D3D39A00C3 for ; Mon, 8 Jun 2020 00:56:20 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E731A1BE9D; Mon, 8 Jun 2020 00:56:19 +0200 (CEST) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id B2A291BE9B for ; Mon, 8 Jun 2020 00:56:18 +0200 (CEST) Received: from [::1] (port=43132 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1ji4DB-00FwiI-Un; Mon, 08 Jun 2020 00:56:17 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Mon, 08 Jun 2020 00:56:17 +0200 From: Alex Kiselev To: Cliff Burdick Cc: Stephen Hemminger , users In-Reply-To: References: <504fcb6e5a12a03035e7b55507e7c279@therouter.net> <20200601091729.03ea9e50@hermes.lan> <7DA537F2-9887-4B0A-9249-064736E8A9AD@therouter.net> <5e91c3aa80e354241b03b908f5529d6b@therouter.net> <1c5e65d6b5e388ac0b5c190b4084b53e@therouter.net> Message-ID: <35e42a78fa4485ed1993da352d519c0b@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] segmention fault while accessing mbuf X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-06-07 20:11, Cliff Burdick wrote: > I don't think so since they're completely independent mempools. They are not. Just think of a typical middle box, you receive a packet, alter some headers and send it back. It's the same mbuf that goes from a rx queue to a tx queue. > I also > didn't think the mtu function actually has anything to do with > prepping the card for the mbuf size you want, and that it's typically > done in rte_eth_dev_configure inside of eth_conf in rx_mode and > tx_mode. > I would have to look at the code to confirm, but also check what > you're setting this structures to. > > On Sun, Jun 7, 2020, 10:11 Alex Kiselev wrote: > >> On 2020-06-07 17:21, Cliff Burdick wrote: >>> The mbuf pool said be configured to be the size of the largest >> packet >>> you expect to receive. If you're getting packets longer than that, >> I >>> would expect you to see problems. Same goes for transmitting; I >>> believe it will just read past the end of the mbuf data. >> >> I am using rte_eth_dev_set_mtu() call with mtu value that is >> consistent >> with the mbuf size. Therefore I believe I don't have any overflow >> bugs >> in the >> RX code. >> >> And I've found a couple of bugs in the TX code. Both of them are >> have to do with the incorrect use of pkt_len/data_len mbufs field. >> >> But, the crash happened while receiving packets, that's why >> I am wondering could the bugs I found in the TX code cause the crush >> in RX? >> >>> >>> On Sun, Jun 7, 2020, 06:36 Alex Kiselev >> wrote: >>> >>>> On 2020-06-07 15:16, Cliff Burdick wrote: >>>>> That shouldn't matter. The mbuf size is allocated when you >> create >>>> the >>>>> mempool, and data_len/pkt_len are just to specify the size of >> the >>>>> total packet and each segment. The underlying storage size is >>>> still >>>>> the same. >>>> >>>> It does matter. I've done some tests and after >>>> sending a few mbufs with data_len/pkt_len bigger than the size >>>> of mbuf's underlying buffer the app stops sending/receiving >> packets. >>>> The PMD apparently goes beyong the mbuf's buffer, that's why >>>> I sill think that my question about the impact of using incorrect >>>> data_len/pkt is valid. >>>> >>>>> >>>>> Have you checked to see if it's potentially a hugepage issue? >>>> >>>> Please, explain. >>>> >>>> The app had been working two monghts before the crush >>>> and the load was 3-4 gbit/s, so no, I don't think that >>>> something is wrong with hugepages on that machine. >>>> >>>>> >>>>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev >>>> wrote: >>>>> >>>>>> On 2020-06-07 04:41, Cliff Burdick wrote: >>>>>>> I can't tell from your code, but you assigned nb_rx to the >>>> number >>>>>> of >>>>>>> packets received, but then used vec_size, which might be >> larger. >>>>>> Does >>>>>>> this happen if you use nb_rx in your loops? >>>>>> >>>>>> No, this doesn't happen. >>>>>> I just skip the part of the code that translates nb_rx to >>>> vec_size, >>>>>> since that code is double checked. >>>>>> >>>>>> My actual question now is about possible impact of using >>>>>> incorrect values of mbuf's pkt_len and data_len fields. >>>>>> >>>>>>> >>>>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev >> >>>>>>> wrote: >>>>>>> >>>>>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger >>>>>>>> написал(а): >>>>>>>>> >>>>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200 >>>>>>>>> Alex Kiselev wrote: >>>>>>>>> >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> I've got a segmentation fault error in my data plane path. >>>>>>>>>> I am pretty sure the code where the segfault happened is >> ok, >>>>>>>>>> so my guess is that I somehow received a corrupted mbuf. >>>>>>>>>> How could I troubleshoot this? Is there any way? >>>>>>>>>> Is it possible that other threads of the application >>>>>>>>>> corrupted that mbuf? >>>>>>>>>> >>>>>>>>>> I would really appriciate any advice. >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> DPDK 18.11.3 >>>>>>>>>> NIC: 82599ES >>>>>>>>>> >>>>>>>>>> Code: >>>>>>>>>> >>>>>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst, >>>>>>>>>> MAX_PKT_BURST); >>>>>>>>>> >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> for (i=0; i < vec_size; i++) { >>>>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *)); >>>>>>>>>> >>>>>>>>>> for (i=0; i < vec_size; i++) { >>>>>>>>>> m = m_v[i]; >>>>>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *); >>>>>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type); >>>>>>>> <--- >>>>>>>>>> Segmentation fault >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> #0 rte_arch_bswap16 (_x=>>> access >>>>>>>> memory >>>>>>>>>> at address 0x4d80000000053010>) >>>>>>>>> >>>>>>>>> Build with as many of the debug options turned on in the >> DPDK >>>>>>>> config, >>>>>>>>> and build with EXTRA_CFLAGS of -g. >>>>>>>> >>>>>>>> Could using an incorrect (a very big one) value of mbuf >> pkt_len >>>>>> and >>>>>>>> data_len while transmitting cause mbuf corruption and >> following >>>>>>>> segmentation fault on rx?