[dpdk-users] segmention fault while accessing mbuf

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] segmention fault while accessing mbuf
@ 2020-06-01 13:24 Alex Kiselev
  2020-06-01 16:17 ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Kiselev @ 2020-06-01 13:24 UTC (permalink / raw)
  To: users

Hello,

I've got a segmentation fault error in my data plane path.
I am pretty sure the code where the segfault happened is ok,
so my guess is that I somehow received a corrupted mbuf.
How could I troubleshoot this? Is there any way?
Is it possible that other threads of the application
corrupted that mbuf?

I would really appriciate any advice.
Thanks.

DPDK 18.11.3
NIC: 82599ES

Code:

nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
		  MAX_PKT_BURST);

...

for (i=0; i < vec_size; i++) {
	rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));

for (i=0; i < vec_size; i++) {
	m = m_v[i];
	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
	eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);               <--- 
Segmentation fault
	...

#0  rte_arch_bswap16 (_x=<error reading variable: Cannot access memory 
at address 0x4d80000000053010>)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-01 13:24 [dpdk-users] segmention fault while accessing mbuf Alex Kiselev
@ 2020-06-01 16:17 ` Stephen Hemminger
  2020-06-02 18:46   ` Alex Kiselev
  2020-06-06 12:59   ` Alex Kiselev
  0 siblings, 2 replies; 12+ messages in thread
From: Stephen Hemminger @ 2020-06-01 16:17 UTC (permalink / raw)
  To: Alex Kiselev; +Cc: users

On Mon, 01 Jun 2020 15:24:25 +0200
Alex Kiselev <alex@therouter.net> wrote:

> Hello,
> 
> I've got a segmentation fault error in my data plane path.
> I am pretty sure the code where the segfault happened is ok,
> so my guess is that I somehow received a corrupted mbuf.
> How could I troubleshoot this? Is there any way?
> Is it possible that other threads of the application
> corrupted that mbuf?
> 
> I would really appriciate any advice.
> Thanks.
> 
> DPDK 18.11.3
> NIC: 82599ES
> 
> Code:
> 
> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
> 		  MAX_PKT_BURST);
> 
> ...
> 
> for (i=0; i < vec_size; i++) {
> 	rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
> 
> for (i=0; i < vec_size; i++) {
> 	m = m_v[i];
> 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> 	eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);               <--- 
> Segmentation fault
> 	...
> 
> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access memory 
> at address 0x4d80000000053010>)

Build with as many of the debug options turned on in the DPDK config,
and build with EXTRA_CFLAGS of -g.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-01 16:17 ` Stephen Hemminger
@ 2020-06-02 18:46   ` Alex Kiselev
  2020-06-06 12:59   ` Alex Kiselev
  1 sibling, 0 replies; 12+ messages in thread
From: Alex Kiselev @ 2020-06-02 18:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

On 2020-06-01 18:17, Stephen Hemminger wrote:
> On Mon, 01 Jun 2020 15:24:25 +0200
> Alex Kiselev <alex@therouter.net> wrote:
> 
>> Hello,
>> 
>> I've got a segmentation fault error in my data plane path.
>> I am pretty sure the code where the segfault happened is ok,
>> so my guess is that I somehow received a corrupted mbuf.
>> How could I troubleshoot this? Is there any way?
>> Is it possible that other threads of the application
>> corrupted that mbuf?
>> 
>> I would really appriciate any advice.
>> Thanks.
>> 
>> DPDK 18.11.3
>> NIC: 82599ES
>> 
>> Code:
>> 
>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>> 		  MAX_PKT_BURST);
>> 
>> ...
>> 
>> for (i=0; i < vec_size; i++) {
>> 	rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>> 
>> for (i=0; i < vec_size; i++) {
>> 	m = m_v[i];
>> 	eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>> 	eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);               <---
>> Segmentation fault
>> 	...
>> 
>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access memory
>> at address 0x4d80000000053010>)
> 
> Build with as many of the debug options turned on in the DPDK config,
> and build with EXTRA_CFLAGS of -g.

I usually use this options in some debug environments:

   CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=y
   CONFIG_RTE_MALLOC_DEBUG=y

as well as gcc sanitize options and
some custom memory sanitize techniques
to make sure there is no mempool leaks,
use after free or double free.

But what was your point? Could you please explain
what should I expect from enabling other debug options
or what should I pay attention to?
Should I take a look the PMD drivers debug options?

Unfortunately, there is no way to reproduce the bug.
So I need to understand what could cause it to make
my search more precise.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-01 16:17 ` Stephen Hemminger
  2020-06-02 18:46   ` Alex Kiselev
@ 2020-06-06 12:59   ` Alex Kiselev
  2020-06-07  2:41     ` Cliff Burdick
  1 sibling, 1 reply; 12+ messages in thread
From: Alex Kiselev @ 2020-06-06 12:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users



> 1 июня 2020 г., в 19:17, Stephen Hemminger <stephen@networkplumber.org> написал(а):
> 
> On Mon, 01 Jun 2020 15:24:25 +0200
> Alex Kiselev <alex@therouter.net> wrote:
> 
>> Hello,
>> 
>> I've got a segmentation fault error in my data plane path.
>> I am pretty sure the code where the segfault happened is ok,
>> so my guess is that I somehow received a corrupted mbuf.
>> How could I troubleshoot this? Is there any way?
>> Is it possible that other threads of the application
>> corrupted that mbuf?
>> 
>> I would really appriciate any advice.
>> Thanks.
>> 
>> DPDK 18.11.3
>> NIC: 82599ES
>> 
>> Code:
>> 
>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>          MAX_PKT_BURST);
>> 
>> ...
>> 
>> for (i=0; i < vec_size; i++) {
>>    rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>> 
>> for (i=0; i < vec_size; i++) {
>>    m = m_v[i];
>>    eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>    eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);               <--- 
>> Segmentation fault
>>    ...
>> 
>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access memory 
>> at address 0x4d80000000053010>)
> 
> Build with as many of the debug options turned on in the DPDK config,
> and build with EXTRA_CFLAGS of -g.

Could using an incorrect (a very big one) value of mbuf pkt_len and data_len while transmitting cause mbuf corruption and following segmentation fault on rx?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-06 12:59   ` Alex Kiselev
@ 2020-06-07  2:41     ` Cliff Burdick
  2020-06-07  9:59       ` Alex Kiselev
  0 siblings, 1 reply; 12+ messages in thread
From: Cliff Burdick @ 2020-06-07  2:41 UTC (permalink / raw)
  To: Alex Kiselev; +Cc: Stephen Hemminger, users

I can't tell from your code, but you assigned nb_rx to the number of
packets received, but then used vec_size, which might be larger. Does this
happen if you use nb_rx in your loops?

On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net> wrote:

>
>
> > 1 июня 2020 г., в 19:17, Stephen Hemminger <stephen@networkplumber.org>
> написал(а):
> >
> > On Mon, 01 Jun 2020 15:24:25 +0200
> > Alex Kiselev <alex@therouter.net> wrote:
> >
> >> Hello,
> >>
> >> I've got a segmentation fault error in my data plane path.
> >> I am pretty sure the code where the segfault happened is ok,
> >> so my guess is that I somehow received a corrupted mbuf.
> >> How could I troubleshoot this? Is there any way?
> >> Is it possible that other threads of the application
> >> corrupted that mbuf?
> >>
> >> I would really appriciate any advice.
> >> Thanks.
> >>
> >> DPDK 18.11.3
> >> NIC: 82599ES
> >>
> >> Code:
> >>
> >> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
> >>          MAX_PKT_BURST);
> >>
> >> ...
> >>
> >> for (i=0; i < vec_size; i++) {
> >>    rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
> >>
> >> for (i=0; i < vec_size; i++) {
> >>    m = m_v[i];
> >>    eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> >>    eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);               <---
> >> Segmentation fault
> >>    ...
> >>
> >> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access memory
> >> at address 0x4d80000000053010>)
> >
> > Build with as many of the debug options turned on in the DPDK config,
> > and build with EXTRA_CFLAGS of -g.
>
> Could using an incorrect (a very big one) value of mbuf pkt_len and
> data_len while transmitting cause mbuf corruption and following
> segmentation fault on rx?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07  2:41     ` Cliff Burdick
@ 2020-06-07  9:59       ` Alex Kiselev
  2020-06-07 13:16         ` Cliff Burdick
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Kiselev @ 2020-06-07  9:59 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: Stephen Hemminger, users

On 2020-06-07 04:41, Cliff Burdick wrote:
> I can't tell from your code, but you assigned nb_rx to the number of
> packets received, but then used vec_size, which might be larger. Does
> this happen if you use nb_rx in your loops?

No, this doesn't happen.
I just skip the part of the code that translates nb_rx to vec_size,
since that code is double checked.

My actual question now is about possible impact of using
incorrect values of mbuf's pkt_len and data_len fields.

> 
> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
> wrote:
> 
>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
>> <stephen@networkplumber.org> написал(а):
>>> 
>>> On Mon, 01 Jun 2020 15:24:25 +0200
>>> Alex Kiselev <alex@therouter.net> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I've got a segmentation fault error in my data plane path.
>>>> I am pretty sure the code where the segfault happened is ok,
>>>> so my guess is that I somehow received a corrupted mbuf.
>>>> How could I troubleshoot this? Is there any way?
>>>> Is it possible that other threads of the application
>>>> corrupted that mbuf?
>>>> 
>>>> I would really appriciate any advice.
>>>> Thanks.
>>>> 
>>>> DPDK 18.11.3
>>>> NIC: 82599ES
>>>> 
>>>> Code:
>>>> 
>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>>> MAX_PKT_BURST);
>>>> 
>>>> ...
>>>> 
>>>> for (i=0; i < vec_size; i++) {
>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>>>> 
>>>> for (i=0; i < vec_size; i++) {
>>>> m = m_v[i];
>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>> <---
>>>> Segmentation fault
>>>> ...
>>>> 
>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access
>> memory
>>>> at address 0x4d80000000053010>)
>>> 
>>> Build with as many of the debug options turned on in the DPDK
>> config,
>>> and build with EXTRA_CFLAGS of -g.
>> 
>> Could using an incorrect (a very big one) value of mbuf pkt_len and
>> data_len while transmitting cause mbuf corruption and following
>> segmentation fault on rx?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07  9:59       ` Alex Kiselev
@ 2020-06-07 13:16         ` Cliff Burdick
  2020-06-07 13:36           ` Alex Kiselev
  0 siblings, 1 reply; 12+ messages in thread
From: Cliff Burdick @ 2020-06-07 13:16 UTC (permalink / raw)
  To: Alex Kiselev; +Cc: Stephen Hemminger, users

That shouldn't matter. The mbuf size is allocated when you create the
mempool, and data_len/pkt_len are just to specify the size of the total
packet and each segment. The underlying storage size is still the same.

Have you checked to see if it's potentially a hugepage issue?



On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net> wrote:

> On 2020-06-07 04:41, Cliff Burdick wrote:
> > I can't tell from your code, but you assigned nb_rx to the number of
> > packets received, but then used vec_size, which might be larger. Does
> > this happen if you use nb_rx in your loops?
>
> No, this doesn't happen.
> I just skip the part of the code that translates nb_rx to vec_size,
> since that code is double checked.
>
> My actual question now is about possible impact of using
> incorrect values of mbuf's pkt_len and data_len fields.
>
> >
> > On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
> > wrote:
> >
> >>> 1 июня 2020 г., в 19:17, Stephen Hemminger
> >> <stephen@networkplumber.org> написал(а):
> >>>
> >>> On Mon, 01 Jun 2020 15:24:25 +0200
> >>> Alex Kiselev <alex@therouter.net> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I've got a segmentation fault error in my data plane path.
> >>>> I am pretty sure the code where the segfault happened is ok,
> >>>> so my guess is that I somehow received a corrupted mbuf.
> >>>> How could I troubleshoot this? Is there any way?
> >>>> Is it possible that other threads of the application
> >>>> corrupted that mbuf?
> >>>>
> >>>> I would really appriciate any advice.
> >>>> Thanks.
> >>>>
> >>>> DPDK 18.11.3
> >>>> NIC: 82599ES
> >>>>
> >>>> Code:
> >>>>
> >>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
> >>>> MAX_PKT_BURST);
> >>>>
> >>>> ...
> >>>>
> >>>> for (i=0; i < vec_size; i++) {
> >>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
> >>>>
> >>>> for (i=0; i < vec_size; i++) {
> >>>> m = m_v[i];
> >>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> >>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> >> <---
> >>>> Segmentation fault
> >>>> ...
> >>>>
> >>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access
> >> memory
> >>>> at address 0x4d80000000053010>)
> >>>
> >>> Build with as many of the debug options turned on in the DPDK
> >> config,
> >>> and build with EXTRA_CFLAGS of -g.
> >>
> >> Could using an incorrect (a very big one) value of mbuf pkt_len and
> >> data_len while transmitting cause mbuf corruption and following
> >> segmentation fault on rx?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07 13:16         ` Cliff Burdick
@ 2020-06-07 13:36           ` Alex Kiselev
  2020-06-07 15:21             ` Cliff Burdick
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Kiselev @ 2020-06-07 13:36 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: Stephen Hemminger, users

On 2020-06-07 15:16, Cliff Burdick wrote:
> That shouldn't matter. The mbuf size is allocated when you create the
> mempool, and data_len/pkt_len are just to specify the size of the
> total packet and each segment. The underlying storage size is still
> the same.

It does matter. I've done some tests and after
sending a few mbufs with data_len/pkt_len bigger than the size
of mbuf's underlying buffer the app stops sending/receiving packets.
The PMD apparently goes beyong the mbuf's buffer, that's why
I sill think that my question about the impact of using incorrect
data_len/pkt is valid.

> 
> Have you checked to see if it's potentially a hugepage issue?

Please, explain.

The app had been working two monghts before the crush
and the load was 3-4 gbit/s, so no, I don't think that
something is wrong with hugepages on that machine.


> 
> On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net> wrote:
> 
>> On 2020-06-07 04:41, Cliff Burdick wrote:
>>> I can't tell from your code, but you assigned nb_rx to the number
>> of
>>> packets received, but then used vec_size, which might be larger.
>> Does
>>> this happen if you use nb_rx in your loops?
>> 
>> No, this doesn't happen.
>> I just skip the part of the code that translates nb_rx to vec_size,
>> since that code is double checked.
>> 
>> My actual question now is about possible impact of using
>> incorrect values of mbuf's pkt_len and data_len fields.
>> 
>>> 
>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
>>> wrote:
>>> 
>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
>>>> <stephen@networkplumber.org> написал(а):
>>>>> 
>>>>> On Mon, 01 Jun 2020 15:24:25 +0200
>>>>> Alex Kiselev <alex@therouter.net> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I've got a segmentation fault error in my data plane path.
>>>>>> I am pretty sure the code where the segfault happened is ok,
>>>>>> so my guess is that I somehow received a corrupted mbuf.
>>>>>> How could I troubleshoot this? Is there any way?
>>>>>> Is it possible that other threads of the application
>>>>>> corrupted that mbuf?
>>>>>> 
>>>>>> I would really appriciate any advice.
>>>>>> Thanks.
>>>>>> 
>>>>>> DPDK 18.11.3
>>>>>> NIC: 82599ES
>>>>>> 
>>>>>> Code:
>>>>>> 
>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>>>>> MAX_PKT_BURST);
>>>>>> 
>>>>>> ...
>>>>>> 
>>>>>> for (i=0; i < vec_size; i++) {
>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>>>>>> 
>>>>>> for (i=0; i < vec_size; i++) {
>>>>>> m = m_v[i];
>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>>> <---
>>>>>> Segmentation fault
>>>>>> ...
>>>>>> 
>>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access
>>>> memory
>>>>>> at address 0x4d80000000053010>)
>>>>> 
>>>>> Build with as many of the debug options turned on in the DPDK
>>>> config,
>>>>> and build with EXTRA_CFLAGS of -g.
>>>> 
>>>> Could using an incorrect (a very big one) value of mbuf pkt_len
>> and
>>>> data_len while transmitting cause mbuf corruption and following
>>>> segmentation fault on rx?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07 13:36           ` Alex Kiselev
@ 2020-06-07 15:21             ` Cliff Burdick
  2020-06-07 17:11               ` Alex Kiselev
  0 siblings, 1 reply; 12+ messages in thread
From: Cliff Burdick @ 2020-06-07 15:21 UTC (permalink / raw)
  To: Alex Kiselev; +Cc: Stephen Hemminger, users

The mbuf pool said be configured to be the size of the largest packet you
expect to receive. If you're getting packets longer than that, I would
expect you to see problems. Same goes for transmitting; I believe it will
just read past the end of the mbuf data.



On Sun, Jun 7, 2020, 06:36 Alex Kiselev <alex@therouter.net> wrote:

> On 2020-06-07 15:16, Cliff Burdick wrote:
> > That shouldn't matter. The mbuf size is allocated when you create the
> > mempool, and data_len/pkt_len are just to specify the size of the
> > total packet and each segment. The underlying storage size is still
> > the same.
>
> It does matter. I've done some tests and after
> sending a few mbufs with data_len/pkt_len bigger than the size
> of mbuf's underlying buffer the app stops sending/receiving packets.
> The PMD apparently goes beyong the mbuf's buffer, that's why
> I sill think that my question about the impact of using incorrect
> data_len/pkt is valid.
>
> >
> > Have you checked to see if it's potentially a hugepage issue?
>
> Please, explain.
>
> The app had been working two monghts before the crush
> and the load was 3-4 gbit/s, so no, I don't think that
> something is wrong with hugepages on that machine.
>
>
> >
> > On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net> wrote:
> >
> >> On 2020-06-07 04:41, Cliff Burdick wrote:
> >>> I can't tell from your code, but you assigned nb_rx to the number
> >> of
> >>> packets received, but then used vec_size, which might be larger.
> >> Does
> >>> this happen if you use nb_rx in your loops?
> >>
> >> No, this doesn't happen.
> >> I just skip the part of the code that translates nb_rx to vec_size,
> >> since that code is double checked.
> >>
> >> My actual question now is about possible impact of using
> >> incorrect values of mbuf's pkt_len and data_len fields.
> >>
> >>>
> >>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
> >>> wrote:
> >>>
> >>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
> >>>> <stephen@networkplumber.org> написал(а):
> >>>>>
> >>>>> On Mon, 01 Jun 2020 15:24:25 +0200
> >>>>> Alex Kiselev <alex@therouter.net> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> I've got a segmentation fault error in my data plane path.
> >>>>>> I am pretty sure the code where the segfault happened is ok,
> >>>>>> so my guess is that I somehow received a corrupted mbuf.
> >>>>>> How could I troubleshoot this? Is there any way?
> >>>>>> Is it possible that other threads of the application
> >>>>>> corrupted that mbuf?
> >>>>>>
> >>>>>> I would really appriciate any advice.
> >>>>>> Thanks.
> >>>>>>
> >>>>>> DPDK 18.11.3
> >>>>>> NIC: 82599ES
> >>>>>>
> >>>>>> Code:
> >>>>>>
> >>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
> >>>>>> MAX_PKT_BURST);
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>> for (i=0; i < vec_size; i++) {
> >>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
> >>>>>>
> >>>>>> for (i=0; i < vec_size; i++) {
> >>>>>> m = m_v[i];
> >>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> >>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> >>>> <---
> >>>>>> Segmentation fault
> >>>>>> ...
> >>>>>>
> >>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot access
> >>>> memory
> >>>>>> at address 0x4d80000000053010>)
> >>>>>
> >>>>> Build with as many of the debug options turned on in the DPDK
> >>>> config,
> >>>>> and build with EXTRA_CFLAGS of -g.
> >>>>
> >>>> Could using an incorrect (a very big one) value of mbuf pkt_len
> >> and
> >>>> data_len while transmitting cause mbuf corruption and following
> >>>> segmentation fault on rx?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07 15:21             ` Cliff Burdick
@ 2020-06-07 17:11               ` Alex Kiselev
  2020-06-07 18:11                 ` Cliff Burdick
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Kiselev @ 2020-06-07 17:11 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: Stephen Hemminger, users

On 2020-06-07 17:21, Cliff Burdick wrote:
> The mbuf pool said be configured to be the size of the largest packet
> you expect to receive. If you're getting packets longer than that, I
> would expect you to see problems. Same goes for transmitting; I
> believe it will just read past the end of the mbuf data.

I am using rte_eth_dev_set_mtu() call with mtu value that is consistent
with the mbuf size. Therefore I believe I don't have any overflow bugs 
in the
RX code.

And I've found a couple of bugs in the TX code. Both of them are
have to do with the incorrect use of pkt_len/data_len mbufs field.

But, the crash happened while receiving packets, that's why
I am wondering could the bugs I found in the TX code cause the crush
in RX?


> 
> On Sun, Jun 7, 2020, 06:36 Alex Kiselev <alex@therouter.net> wrote:
> 
>> On 2020-06-07 15:16, Cliff Burdick wrote:
>>> That shouldn't matter. The mbuf size is allocated when you create
>> the
>>> mempool, and data_len/pkt_len are just to specify the size of the
>>> total packet and each segment. The underlying storage size is
>> still
>>> the same.
>> 
>> It does matter. I've done some tests and after
>> sending a few mbufs with data_len/pkt_len bigger than the size
>> of mbuf's underlying buffer the app stops sending/receiving packets.
>> The PMD apparently goes beyong the mbuf's buffer, that's why
>> I sill think that my question about the impact of using incorrect
>> data_len/pkt is valid.
>> 
>>> 
>>> Have you checked to see if it's potentially a hugepage issue?
>> 
>> Please, explain.
>> 
>> The app had been working two monghts before the crush
>> and the load was 3-4 gbit/s, so no, I don't think that
>> something is wrong with hugepages on that machine.
>> 
>>> 
>>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net>
>> wrote:
>>> 
>>>> On 2020-06-07 04:41, Cliff Burdick wrote:
>>>>> I can't tell from your code, but you assigned nb_rx to the
>> number
>>>> of
>>>>> packets received, but then used vec_size, which might be larger.
>>>> Does
>>>>> this happen if you use nb_rx in your loops?
>>>> 
>>>> No, this doesn't happen.
>>>> I just skip the part of the code that translates nb_rx to
>> vec_size,
>>>> since that code is double checked.
>>>> 
>>>> My actual question now is about possible impact of using
>>>> incorrect values of mbuf's pkt_len and data_len fields.
>>>> 
>>>>> 
>>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
>>>>> wrote:
>>>>> 
>>>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
>>>>>> <stephen@networkplumber.org> написал(а):
>>>>>>> 
>>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200
>>>>>>> Alex Kiselev <alex@therouter.net> wrote:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> I've got a segmentation fault error in my data plane path.
>>>>>>>> I am pretty sure the code where the segfault happened is ok,
>>>>>>>> so my guess is that I somehow received a corrupted mbuf.
>>>>>>>> How could I troubleshoot this? Is there any way?
>>>>>>>> Is it possible that other threads of the application
>>>>>>>> corrupted that mbuf?
>>>>>>>> 
>>>>>>>> I would really appriciate any advice.
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> DPDK 18.11.3
>>>>>>>> NIC: 82599ES
>>>>>>>> 
>>>>>>>> Code:
>>>>>>>> 
>>>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>>>>>>> MAX_PKT_BURST);
>>>>>>>> 
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>>>>>>>> 
>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>> m = m_v[i];
>>>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>>>>> <---
>>>>>>>> Segmentation fault
>>>>>>>> ...
>>>>>>>> 
>>>>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot
>> access
>>>>>> memory
>>>>>>>> at address 0x4d80000000053010>)
>>>>>>> 
>>>>>>> Build with as many of the debug options turned on in the DPDK
>>>>>> config,
>>>>>>> and build with EXTRA_CFLAGS of -g.
>>>>>> 
>>>>>> Could using an incorrect (a very big one) value of mbuf pkt_len
>>>> and
>>>>>> data_len while transmitting cause mbuf corruption and following
>>>>>> segmentation fault on rx?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07 17:11               ` Alex Kiselev
@ 2020-06-07 18:11                 ` Cliff Burdick
  2020-06-07 22:56                   ` Alex Kiselev
  0 siblings, 1 reply; 12+ messages in thread
From: Cliff Burdick @ 2020-06-07 18:11 UTC (permalink / raw)
  To: Alex Kiselev; +Cc: Stephen Hemminger, users

I don't think so since they're completely independent mempools. I also
didn't think the mtu function actually has anything to do with prepping the
card for the mbuf size you want, and that it's typically done in
rte_eth_dev_configure inside of eth_conf in rx_mode and tx_mode.
I would have to look at the code to confirm, but also check what you're
setting this structures to.


On Sun, Jun 7, 2020, 10:11 Alex Kiselev <alex@therouter.net> wrote:

> On 2020-06-07 17:21, Cliff Burdick wrote:
> > The mbuf pool said be configured to be the size of the largest packet
> > you expect to receive. If you're getting packets longer than that, I
> > would expect you to see problems. Same goes for transmitting; I
> > believe it will just read past the end of the mbuf data.
>
> I am using rte_eth_dev_set_mtu() call with mtu value that is consistent
> with the mbuf size. Therefore I believe I don't have any overflow bugs
> in the
> RX code.
>
> And I've found a couple of bugs in the TX code. Both of them are
> have to do with the incorrect use of pkt_len/data_len mbufs field.
>
> But, the crash happened while receiving packets, that's why
> I am wondering could the bugs I found in the TX code cause the crush
> in RX?
>
>
> >
> > On Sun, Jun 7, 2020, 06:36 Alex Kiselev <alex@therouter.net> wrote:
> >
> >> On 2020-06-07 15:16, Cliff Burdick wrote:
> >>> That shouldn't matter. The mbuf size is allocated when you create
> >> the
> >>> mempool, and data_len/pkt_len are just to specify the size of the
> >>> total packet and each segment. The underlying storage size is
> >> still
> >>> the same.
> >>
> >> It does matter. I've done some tests and after
> >> sending a few mbufs with data_len/pkt_len bigger than the size
> >> of mbuf's underlying buffer the app stops sending/receiving packets.
> >> The PMD apparently goes beyong the mbuf's buffer, that's why
> >> I sill think that my question about the impact of using incorrect
> >> data_len/pkt is valid.
> >>
> >>>
> >>> Have you checked to see if it's potentially a hugepage issue?
> >>
> >> Please, explain.
> >>
> >> The app had been working two monghts before the crush
> >> and the load was 3-4 gbit/s, so no, I don't think that
> >> something is wrong with hugepages on that machine.
> >>
> >>>
> >>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net>
> >> wrote:
> >>>
> >>>> On 2020-06-07 04:41, Cliff Burdick wrote:
> >>>>> I can't tell from your code, but you assigned nb_rx to the
> >> number
> >>>> of
> >>>>> packets received, but then used vec_size, which might be larger.
> >>>> Does
> >>>>> this happen if you use nb_rx in your loops?
> >>>>
> >>>> No, this doesn't happen.
> >>>> I just skip the part of the code that translates nb_rx to
> >> vec_size,
> >>>> since that code is double checked.
> >>>>
> >>>> My actual question now is about possible impact of using
> >>>> incorrect values of mbuf's pkt_len and data_len fields.
> >>>>
> >>>>>
> >>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev <alex@therouter.net>
> >>>>> wrote:
> >>>>>
> >>>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
> >>>>>> <stephen@networkplumber.org> написал(а):
> >>>>>>>
> >>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200
> >>>>>>> Alex Kiselev <alex@therouter.net> wrote:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I've got a segmentation fault error in my data plane path.
> >>>>>>>> I am pretty sure the code where the segfault happened is ok,
> >>>>>>>> so my guess is that I somehow received a corrupted mbuf.
> >>>>>>>> How could I troubleshoot this? Is there any way?
> >>>>>>>> Is it possible that other threads of the application
> >>>>>>>> corrupted that mbuf?
> >>>>>>>>
> >>>>>>>> I would really appriciate any advice.
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> DPDK 18.11.3
> >>>>>>>> NIC: 82599ES
> >>>>>>>>
> >>>>>>>> Code:
> >>>>>>>>
> >>>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
> >>>>>>>> MAX_PKT_BURST);
> >>>>>>>>
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>> for (i=0; i < vec_size; i++) {
> >>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
> >>>>>>>>
> >>>>>>>> for (i=0; i < vec_size; i++) {
> >>>>>>>> m = m_v[i];
> >>>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
> >>>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
> >>>>>> <---
> >>>>>>>> Segmentation fault
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot
> >> access
> >>>>>> memory
> >>>>>>>> at address 0x4d80000000053010>)
> >>>>>>>
> >>>>>>> Build with as many of the debug options turned on in the DPDK
> >>>>>> config,
> >>>>>>> and build with EXTRA_CFLAGS of -g.
> >>>>>>
> >>>>>> Could using an incorrect (a very big one) value of mbuf pkt_len
> >>>> and
> >>>>>> data_len while transmitting cause mbuf corruption and following
> >>>>>> segmentation fault on rx?
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-users] segmention fault while accessing mbuf
  2020-06-07 18:11                 ` Cliff Burdick
@ 2020-06-07 22:56                   ` Alex Kiselev
  0 siblings, 0 replies; 12+ messages in thread
From: Alex Kiselev @ 2020-06-07 22:56 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: Stephen Hemminger, users

On 2020-06-07 20:11, Cliff Burdick wrote:
> I don't think so since they're completely independent mempools.

They are not. Just think of a typical middle box, you receive a packet,
alter some headers and send it back. It's the same mbuf that goes from
a rx queue to a tx queue.

> I also
> didn't think the mtu function actually has anything to do with
> prepping the card for the mbuf size you want, and that it's typically
> done in rte_eth_dev_configure inside of eth_conf in rx_mode and
> tx_mode.
> I would have to look at the code to confirm, but also check what
> you're setting this structures to.
> 
> On Sun, Jun 7, 2020, 10:11 Alex Kiselev <alex@therouter.net> wrote:
> 
>> On 2020-06-07 17:21, Cliff Burdick wrote:
>>> The mbuf pool said be configured to be the size of the largest
>> packet
>>> you expect to receive. If you're getting packets longer than that,
>> I
>>> would expect you to see problems. Same goes for transmitting; I
>>> believe it will just read past the end of the mbuf data.
>> 
>> I am using rte_eth_dev_set_mtu() call with mtu value that is
>> consistent
>> with the mbuf size. Therefore I believe I don't have any overflow
>> bugs
>> in the
>> RX code.
>> 
>> And I've found a couple of bugs in the TX code. Both of them are
>> have to do with the incorrect use of pkt_len/data_len mbufs field.
>> 
>> But, the crash happened while receiving packets, that's why
>> I am wondering could the bugs I found in the TX code cause the crush
>> in RX?
>> 
>>> 
>>> On Sun, Jun 7, 2020, 06:36 Alex Kiselev <alex@therouter.net>
>> wrote:
>>> 
>>>> On 2020-06-07 15:16, Cliff Burdick wrote:
>>>>> That shouldn't matter. The mbuf size is allocated when you
>> create
>>>> the
>>>>> mempool, and data_len/pkt_len are just to specify the size of
>> the
>>>>> total packet and each segment. The underlying storage size is
>>>> still
>>>>> the same.
>>>> 
>>>> It does matter. I've done some tests and after
>>>> sending a few mbufs with data_len/pkt_len bigger than the size
>>>> of mbuf's underlying buffer the app stops sending/receiving
>> packets.
>>>> The PMD apparently goes beyong the mbuf's buffer, that's why
>>>> I sill think that my question about the impact of using incorrect
>>>> data_len/pkt is valid.
>>>> 
>>>>> 
>>>>> Have you checked to see if it's potentially a hugepage issue?
>>>> 
>>>> Please, explain.
>>>> 
>>>> The app had been working two monghts before the crush
>>>> and the load was 3-4 gbit/s, so no, I don't think that
>>>> something is wrong with hugepages on that machine.
>>>> 
>>>>> 
>>>>> On Sun, Jun 7, 2020, 02:59 Alex Kiselev <alex@therouter.net>
>>>> wrote:
>>>>> 
>>>>>> On 2020-06-07 04:41, Cliff Burdick wrote:
>>>>>>> I can't tell from your code, but you assigned nb_rx to the
>>>> number
>>>>>> of
>>>>>>> packets received, but then used vec_size, which might be
>> larger.
>>>>>> Does
>>>>>>> this happen if you use nb_rx in your loops?
>>>>>> 
>>>>>> No, this doesn't happen.
>>>>>> I just skip the part of the code that translates nb_rx to
>>>> vec_size,
>>>>>> since that code is double checked.
>>>>>> 
>>>>>> My actual question now is about possible impact of using
>>>>>> incorrect values of mbuf's pkt_len and data_len fields.
>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Jun 6, 2020 at 5:59 AM Alex Kiselev
>> <alex@therouter.net>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>> 1 июня 2020 г., в 19:17, Stephen Hemminger
>>>>>>>> <stephen@networkplumber.org> написал(а):
>>>>>>>>> 
>>>>>>>>> On Mon, 01 Jun 2020 15:24:25 +0200
>>>>>>>>> Alex Kiselev <alex@therouter.net> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> I've got a segmentation fault error in my data plane path.
>>>>>>>>>> I am pretty sure the code where the segfault happened is
>> ok,
>>>>>>>>>> so my guess is that I somehow received a corrupted mbuf.
>>>>>>>>>> How could I troubleshoot this? Is there any way?
>>>>>>>>>> Is it possible that other threads of the application
>>>>>>>>>> corrupted that mbuf?
>>>>>>>>>> 
>>>>>>>>>> I would really appriciate any advice.
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> DPDK 18.11.3
>>>>>>>>>> NIC: 82599ES
>>>>>>>>>> 
>>>>>>>>>> Code:
>>>>>>>>>> 
>>>>>>>>>> nb_rx = rte_eth_rx_burst(port_id, queue_id, pkts_burst,
>>>>>>>>>> MAX_PKT_BURST);
>>>>>>>>>> 
>>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>>>> rte_prefetch0(rte_pktmbuf_mtod(m_v[i], void *));
>>>>>>>>>> 
>>>>>>>>>> for (i=0; i < vec_size; i++) {
>>>>>>>>>> m = m_v[i];
>>>>>>>>>> eth_hdr = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>>>>>>>> eth_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>>>>>>> <---
>>>>>>>>>> Segmentation fault
>>>>>>>>>> ...
>>>>>>>>>> 
>>>>>>>>>> #0  rte_arch_bswap16 (_x=<error reading variable: Cannot
>>>> access
>>>>>>>> memory
>>>>>>>>>> at address 0x4d80000000053010>)
>>>>>>>>> 
>>>>>>>>> Build with as many of the debug options turned on in the
>> DPDK
>>>>>>>> config,
>>>>>>>>> and build with EXTRA_CFLAGS of -g.
>>>>>>>> 
>>>>>>>> Could using an incorrect (a very big one) value of mbuf
>> pkt_len
>>>>>> and
>>>>>>>> data_len while transmitting cause mbuf corruption and
>> following
>>>>>>>> segmentation fault on rx?

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-06-07 22:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-01 13:24 [dpdk-users] segmention fault while accessing mbuf Alex Kiselev
2020-06-01 16:17 ` Stephen Hemminger
2020-06-02 18:46   ` Alex Kiselev
2020-06-06 12:59   ` Alex Kiselev
2020-06-07  2:41     ` Cliff Burdick
2020-06-07  9:59       ` Alex Kiselev
2020-06-07 13:16         ` Cliff Burdick
2020-06-07 13:36           ` Alex Kiselev
2020-06-07 15:21             ` Cliff Burdick
2020-06-07 17:11               ` Alex Kiselev
2020-06-07 18:11                 ` Cliff Burdick
2020-06-07 22:56                   ` Alex Kiselev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).