From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id CFC73A00C3 for ; Sun, 7 Jun 2020 20:11:54 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2711F1BEAF; Sun, 7 Jun 2020 20:11:54 +0200 (CEST) Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by dpdk.org (Postfix) with ESMTP id 8EF3A37B7 for ; Sun, 7 Jun 2020 20:11:53 +0200 (CEST) Received: by mail-wr1-f45.google.com with SMTP id x13so14988407wrv.4 for ; Sun, 07 Jun 2020 11:11:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Y3sul08SKbc+p3i8WqD3SQKQrm+TxbMQJ/3DlQ++sUo=; b=nyydHAVduyeP0J46LMxdMrEwyivqhG9DptPXVUOPlCCbsR3b9vo9lkokqobSHsZIlX vtVBUCGIVZWsnrpLwpH83hjsNTp0iR+K/M0uOQJo6irkw41m4Et+AfijRzAPogKdMpUP s4zZ2S27h6oy1W95BG2R2rWW8HXJDd2Zh3+XrshgvSOnqWo3mBHMt+wbvU4A8eD8xjCE /KvMrl5Xojkd132ZxzVqf8DFV5/lhOJ4nq5SemEYhlP67wwu+2iKoA2mHDfEkTtQB0UH EWXUruEc9YAdtSTk17qPPrjCRQd4txIxNV6AROMlT8A0A8u+vAo+unZNLmqi47Tptquw MqwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Y3sul08SKbc+p3i8WqD3SQKQrm+TxbMQJ/3DlQ++sUo=; b=ji5bYwx5gSFpawZypbIckiylke5y7M/bp3aOUMW2ikic7Tz5heJnTVUBCIkZl+AeL6 ffbPsiEUV7EEURb9J3xMARJQjYij6z5IZ7XJ+OdH8AIeB2ELLrOd1g55VVFgH7GkB4tC D6g+EnHVVOZR4zw/seV/ugSlUivNJ03/m8g96vIkCWeud4xIep4EPh+ZlVEVzgW5NawN 4RHDf3XXHsZPD/hpTBWJUzPFbB6JpW4Yv1b/CjWSAa/aZq7HhlWLUXmx3e9/4CV048o7 pEPG6rv0bhe2CYlxPxUsKFe7bG+vuqQswwywunBaGapZ/vy1w02twRIbHL2SgyQh0pq6 b68g== X-Gm-Message-State: AOAM531nWAdfo7dixdEOkWxahEAQ7lIYCsCO1y54Rm/XzjSEYQn6WSpJ dTqJyXX7ihrvj6I1AfgwY971I6jEig5G9pfJbrw= X-Google-Smtp-Source: ABdhPJwexEUKPO7KVx52q3IqsQuOMXWIEFmSJmnAblVd8gxxWS3ZdoI8rWPAfDQAeODNyFM4Eyf18WWVfMGw/LWDIIk= X-Received: by 2002:adf:e545:: with SMTP id z5mr19054344wrm.89.1591553513086; Sun, 07 Jun 2020 11:11:53 -0700 (PDT) MIME-Version: 1.0 References: <504fcb6e5a12a03035e7b55507e7c279@therouter.net> <20200601091729.03ea9e50@hermes.lan> <7DA537F2-9887-4B0A-9249-064736E8A9AD@therouter.net> <5e91c3aa80e354241b03b908f5529d6b@therouter.net> <1c5e65d6b5e388ac0b5c190b4084b53e@therouter.net> In-Reply-To: <1c5e65d6b5e388ac0b5c190b4084b53e@therouter.net> From: Cliff Burdick Date: Sun, 7 Jun 2020 11:11:41 -0700 Message-ID: To: Alex Kiselev Cc: Stephen Hemminger , users Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] segmention fault while accessing mbuf X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" I don't think so since they're completely independent mempools. I also didn't think the mtu function actually has anything to do with prepping the card for the mbuf size you want, and that it's typically done in rte_eth_dev_configure inside of eth_conf in rx_mode and tx_mode. I would have to look at the code to confirm, but also check what you're setting this structures to. On Sun, Jun 7, 2020, 10:11 Alex Kiselev wrote: > On 2020-06-07 17:21, Cliff Burdick wrote: > > The mbuf pool said be configured to be the size of the largest packet > > you expect to receive. If you're getting packets longer than that, I > > would expect you to see problems. Same goes for transmitting; I > > believe it will just read past the end of the mbuf data. > > I am using rte_eth_dev_set_mtu() call with mtu value that is consistent > with the mbuf size. Therefore I believe I don't have any overflow bugs > in the > RX code. > > And I've found a couple of bugs in the TX code. Both of them are > have to do with the incorrect use of pkt_len/data_len mbufs field. > > But, the crash happened while receiving packets, that's why > I am wondering could the bugs I found in the TX code cause the crush > in RX? > > > > > > On Sun, Jun 7, 2020, 06:36 Alex Kiselev wrote: > > > >> On 2020-06-07 15:16, Cliff Burdick wrote: > >>> That shouldn't matter. The mbuf size is allocated when you create > >> the > >>> mempool, and data_len/pkt_len are just to specify the size of the > >>> total packet and each segment. The underlying storage size is > >> still > >>> the same. > >> > >> It does matter. I've done some tests and after > >> sending a few mbufs with data_len/pkt_len bigger than the size > >> of mbuf's underlying buffer the app stops sending/receiving packets. > >> The PMD apparently goes beyong the mbuf's buffer, that's why > >> I sill think that my question about the impact of using incorrect > >> data_len/pkt is valid. > >> > >>> > >>> Have you checked to see if it's potentially a hugepage issue? > >> > >> Please, explain. > >> > >> The app had been working two monghts before the crush > >> and the load was 3-4 gbit/s, so no, I don't think that > >> something is wrong with hugepages on that machine. > >> > >>> > >>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev > >> wrote: > >>> > >>>> On 2020-06-07 04:41, Cliff Burdick wrote: > >>>>> I can't tell from your code, but you assigned nb_rx to the > >> number > >>>> of > >>>>> packets received, but then used vec_size, which might be larger. > >>>> Does > >>>>> this happen if you use nb_rx in your loops? > >>>> > >>>> No, this doesn't happen. > >>>> I just skip the part of the code that translates nb_rx to > >> vec_size, > >>>> since that code is double checked. > >>>> > >>>> My actual question now is about possible impact of using > >>>> incorrect values of mbuf's pkt_len and data_len fields. > >>>> > >>>>> > >>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev > >>>>> wrote: > >>>>> > >>>>>>> 1 =D0=B8=D1=8E=D0=BD=D1=8F 2020 =D0=B3., =D0=B2 19:17, Stephen He= mminger > >>>>>> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0= =D0=BB(=D0=B0): > >>>>>>> > >>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200 > >>>>>>> Alex Kiselev wrote: > >>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I've got a segmentation fault error in my data plane path. > >>>>>>>> I am pretty sure the code where the segfault happened is ok, > >>>>>>>> so my guess is that I somehow received a corrupted mbuf. > >>>>>>>> How could I troubleshoot this? Is there any way? > >>>>>>>> Is it possible that other threads of the application > >>>>>>>> corrupted that mbuf? > >>>>>>>> > >>>>>>>> I would really appriciate any advice. > >>>>>>>> Thanks. > >>>>>>>> > >>>>>>>> DPDK 18.11.3 > >>>>>>>> NIC: 82599ES > >>>>>>>> > >>>>>>>> Code: > >>>>>>>> > >>>>>>>> nb_rx =3D rte_eth_rx_burst(port_id, queue_id, pkts_burst, > >>>>>>>> MAX_PKT_BURST); > >>>>>>>> > >>>>>>>> ... > >>>>>>>> > >>>>>>>> for (i=3D0; i < vec_size; i++) { > >>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *)); > >>>>>>>> > >>>>>>>> for (i=3D0; i < vec_size; i++) { > >>>>>>>> m =3D m_v[i]; > >>>>>>>> eth_hdr =3D rte_pktmbuf_mtod(m, struct ether_hdr *); > >>>>>>>> eth_type =3D rte_be_to_cpu_16(eth_hdr->ether_type); > >>>>>> <--- > >>>>>>>> Segmentation fault > >>>>>>>> ... > >>>>>>>> > >>>>>>>> #0 rte_arch_bswap16 (_x=3D >> access > >>>>>> memory > >>>>>>>> at address 0x4d80000000053010>) > >>>>>>> > >>>>>>> Build with as many of the debug options turned on in the DPDK > >>>>>> config, > >>>>>>> and build with EXTRA_CFLAGS of -g. > >>>>>> > >>>>>> Could using an incorrect (a very big one) value of mbuf pkt_len > >>>> and > >>>>>> data_len while transmitting cause mbuf corruption and following > >>>>>> segmentation fault on rx? >