From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <users-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D3D39A00C3
	for <public@inbox.dpdk.org>; Mon,  8 Jun 2020 00:56:20 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id E731A1BE9D;
	Mon,  8 Jun 2020 00:56:19 +0200 (CEST)
Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194])
 by dpdk.org (Postfix) with ESMTP id B2A291BE9B
 for <users@dpdk.org>; Mon,  8 Jun 2020 00:56:18 +0200 (CEST)
Received: from [::1] (port=43132 helo=wh10.alp1.flow.ch)
 by wh10.alp1.flow.ch with esmtpa (Exim 4.92)
 (envelope-from <alex@therouter.net>)
 id 1ji4DB-00FwiI-Un; Mon, 08 Jun 2020 00:56:17 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
 format=flowed
Content-Transfer-Encoding: 8bit
Date: Mon, 08 Jun 2020 00:56:17 +0200
From: Alex Kiselev <alex@therouter.net>
To: Cliff Burdick <shaklee3@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>, users <users@dpdk.org>
In-Reply-To: <CA+Gp1nZb6dtQez_Y_ZK-f9gGNzywH3wCb-WXMkkCpO87WNgwyw@mail.gmail.com>
References: <504fcb6e5a12a03035e7b55507e7c279@therouter.net>
 <20200601091729.03ea9e50@hermes.lan>
 <7DA537F2-9887-4B0A-9249-064736E8A9AD@therouter.net>
 <CA+Gp1nb728xMORw0ZLwqyWqQSKLNhbfLEQUmmz8t9+1y9kyutg@mail.gmail.com>
 <b232d58f8fd805e11d5377e153b955fa@therouter.net>
 <CA+Gp1nZRyaz9C6Ko-_09E9ZTUYGO=duEbQObEypPGiD_ibU-WQ@mail.gmail.com>
 <5e91c3aa80e354241b03b908f5529d6b@therouter.net>
 <CA+Gp1nYRVViyKFg_7GEp_coq5irdZ1a+J4LqwA4itVcQSRiVZA@mail.gmail.com>
 <1c5e65d6b5e388ac0b5c190b4084b53e@therouter.net>
 <CA+Gp1nZb6dtQez_Y_ZK-f9gGNzywH3wCb-WXMkkCpO87WNgwyw@mail.gmail.com>
Message-ID: <35e42a78fa4485ed1993da352d519c0b@therouter.net>
X-Sender: alex@therouter.net
User-Agent: Roundcube Webmail/1.3.8
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch
X-AntiAbuse: Original Domain - dpdk.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - therouter.net
X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id:
 alex@therouter.net
X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Subject: Re: [dpdk-users] segmention fault while accessing mbuf
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
Errors-To: users-bounces@dpdk.org
Sender: "users" <users-bounces@dpdk.org>

On 2020-06-07 20:11, Cliff Burdick wrote:
> I don't think so since they're completely independent mempools.

They are not. Just think of a typical middle box, you receive a packet,
alter some headers and send it back. It's the same mbuf that goes from
a rx queue to a tx queue.

> I also
> didn't think the mtu function actually has anything to do with
> prepping the card for the mbuf size you want, and that it's typically
> done in rte_eth_dev_configure inside of eth_conf in rx_mode and
> tx_mode.
> I would have to look at the code to confirm, but also check what
> you're setting this structures to.
> 
> On Sun, Jun 7, 2020, 10:11 Alex Kiselev <alex@therouter.net> wrote:
> 
>> On 2020-06-07 17:21, Cliff Burdick wrote:
>>> The mbuf pool said be configured to be the size of the largest
>> packet
>>> you expect to receive. If you're getting packets longer than that,
>> I
>>> would expect you to see problems. Same goes for transmitting; I
>>> believe it will just read past the end of the mbuf data.
>> 
>> I am using rte_eth_dev_set_mtu() call with mtu value that is
>> consistent
>> with the mbuf size. Therefore I believe I don't have any overflow
>> bugs
>> in the
>> RX code.
>> 
>> And I've found a couple of bugs in the TX code. Both of them are
>> have to do with the incorrect use of pkt_len/data_len mbufs field.
>> 
>> But, the crash happened while receiving packets, that's why
>> I am wondering could the bugs I found in the TX code cause the crush
>> in RX?
>> 
>>> 
>>> On Sun, Jun 7, 2020, 06:36 Alex Kiselev <alex@therouter.net>
>> wrote:
>>> 
>>>> On 2020-06-07 15:16, Cliff Burdick wrote:
>>>>> That shouldn't matter. The mbuf size is allocated when you
>> create
>>>> the
>>>>> mempool, and data_len/pkt_len are just to specify the size of
>> the
>>>>> total packet and each segment. The underlying storage size is
>>>> still
>>>>> the same.
>>>> 
>>>> It does matter. I've done some tests and after
>>>> sending a few mbufs with data_len/pkt_len bigger than the size
>>>> of mbuf's underlying buffer the app stops sending/receiving
>> packets.
>>>> The PMD apparently goes beyong the mbuf's buffer, that's why
>>>> I sill think that my question about the impact of using incorrect
>>>> data_len/pkt is valid.
>>>> 
>>>>> 
>>>>> Have you checked to see if it's potentially a hugepage issue?
>>>> 
>>>> Please, explain.
>>>> 
>>>> The app had been working two monghts before the crush
>>>> and the load was 3-4 gbit/s, so no, I don't think that
>>>> something is wrong with hugepages on that machine.
>>>> 
>>>>> 
>>>>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net>
>>>> wrote:
>>>>> 
>>>>>> On 2020-06-07 04:41, Cliff Burdick wrote:
>>>>>>> I can't tell from your code, but you assigned nb_rx to the
>>>> number
>>>>>> of
>>>>>>> packets received, but then used vec_size, which might be
>> larger.
>>>>>> Does
>>>>>>> this happen if you use nb_rx in your loops?
>>>>>> 
>>>>>> No, this doesn't happen.
>>>>>> I just skip the part of the code that translates nb_rx to
>>>> vec_size,
>>>>>> since that code is double checked.
>>>>>> 
>>>>>> My actual question now is about possible impact of using
>>>>>> incorrect values of mbuf's pkt_len and data_len fields.
>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev
>> <alex@therouter.net>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
>>>>>>>> <stephen@networkplumber.org> написал(а):
>>>>>>>>> 
>>>>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200
>>>>>>>>> Alex Kiselev <alex@therouter.net> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> I've got a segmentation fault error in my data plane path.
>>>>>>>>>> I am pretty sure the code where the segfault happened is
>> ok,
>>>>>>>>>> so my guess is that I somehow received a corrupted mbuf.
>>>>>>>>>> How could I troubleshoot this? Is there any way?
>>>>>>>>>> Is it possible that other threads of the application
>>>>>>>>>> corrupted that mbuf?
>>>>>>>>>> 
>>>>>>>>>> I would really appriciate any advice.
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> DPDK 18.11.3
>>>>>>>>>> NIC: 82599ES
>>>>>>>>>> 
>>>>>>>>>> Code:
>>>>>>>>>> 
>>>>>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>>>>>>>>> MAX_PKT_BURST);
>>>>>>>>>> 
>>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>>>>>>>>>> 
>>>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>>>> m = m_v[i];
>>>>>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>>>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>>>>>>> <---
>>>>>>>>>> Segmentation fault
>>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot
>>>> access
>>>>>>>> memory
>>>>>>>>>> at address 0x4d80000000053010>)
>>>>>>>>> 
>>>>>>>>> Build with as many of the debug options turned on in the
>> DPDK
>>>>>>>> config,
>>>>>>>>> and build with EXTRA_CFLAGS of -g.
>>>>>>>> 
>>>>>>>> Could using an incorrect (a very big one) value of mbuf
>> pkt_len
>>>>>> and
>>>>>>>> data_len while transmitting cause mbuf corruption and
>> following
>>>>>>>> segmentation fault on rx?