From: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
To: "Van Haaren, Harry" <harry.van.haaren@intel.com>,
"Eads, Gage" <gage.eads@intel.com>,
"jerin.jacob@caviumnetworks.com" <jerin.jacob@caviumnetworks.com>,
"santosh.shukla@caviumnetworks.com"
<santosh.shukla@caviumnetworks.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 2/2] event/sw: use dynamically-sized IQs
Date: Mon, 8 Jan 2018 21:35:30 +0530 [thread overview]
Message-ID: <20180108160529.gven7vlrbmrrlw2p@Pavan-LT> (raw)
In-Reply-To: <E923DB57A917B54B9182A2E928D00FA650FEB40B@IRSMSX102.ger.corp.intel.com>
On Mon, Jan 08, 2018 at 03:50:24PM +0000, Van Haaren, Harry wrote:
> > From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com]
> > Sent: Monday, January 8, 2018 3:32 PM
> > To: Eads, Gage <gage.eads@intel.com>; Van Haaren, Harry
> > <harry.van.haaren@intel.com>; jerin.jacob@caviumnetworks.com;
> > santosh.shukla@caviumnetworks.com
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH 2/2] event/sw: use dynamically-sized IQs
> >
> > On Wed, Nov 29, 2017 at 09:08:34PM -0600, Gage Eads wrote:
> > > This commit introduces dynamically-sized IQs, by switching the underlying
> > > data structure from a fixed-size ring to a linked list of queue 'chunks.'
>
> <snip>
>
> > Sw eventdev crashes when used alongside Rx adapter. The crash happens when
> > pumping traffic at > 1.4mpps. This commit seems responsible for this.
> >
> >
> > Apply the following Rx adapter patch
> > http://dpdk.org/dev/patchwork/patch/31977/
> > Command used:
> > ./build/eventdev_pipeline_sw_pmd -c 0xfffff8 --vdev="event_sw" -- -r0x800
> > -t0x100 -w F000 -e 0x10
>
> Applied the patch to current master, recompiled; cannot reproduce here..
>
master in the sense dpdk-next-eventdev right?
> Is it 100% reproducible and "instant" or can it take some time to occur there?
>
It is instant
>
> > Backtrace:
> >
> > Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0xffffb6c8f040 (LWP 25291)]
> > 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
> > iq=0xffff9f764720, sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
> > 142 ev[total++] = current->events[index++];
>
> Could we get the output of (gdb) info locals?
>
Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb6c8f040 (LWP 19751)]
0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
iq=0xffff9f764620, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
142 ev[total++] = current->events[index++];
(gdb) info locals
next = 0x7000041400be73b
current = 0x7000041400be73b
total = 36
index = 1
(gdb)
Noticed an other crash:
Thread 4 "lcore-slave-4" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xffffb6c8f040 (LWP 19690)]
0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63
63 sw->chunk_list_head = chunk->next;
(gdb) info locals
chunk = 0x14340000119
(gdb) bt
#0 0x0000aaaaaadcfb78 in iq_alloc_chunk (sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:63
#1 iq_enqueue (ev=0xffff9f3967c0, iq=0xffff9f764620, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:95
#2 __pull_port_lb (allow_reorder=0, port_id=5, sw=0xffff9f332500) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:463
#3 sw_schedule_pull_port_no_reorder (sw=0xffff9f332500, port_id=5) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:486
#4 0x0000aaaaaadd0608 in sw_event_schedule (dev=0xaaaaaafbd200
<rte_event_devices>) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev_scheduler.c:554
#5 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200
<rte_event_devices>) at
/root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767
#6 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80,
cs=0xffff9ffef900, service_idx=0) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:349
#7 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900,
service_mask=18446744073709551615) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:376
#8 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0,
serialize_mt_unsafe=1) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/common/rte_service.c:405
#9 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at
/root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223
#10 0x0000aaaaaaaef234 in worker (arg=0xffff9f331c80) at
/root/clean/rebase/dpdk-next-eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274
#11 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at
/root/clean/rebase/dpdk-next-eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182
#12 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0
#13 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6
>
>
> > (gdb) bt
> > #0 0x0000aaaaaadcc0d4 in iq_dequeue_burst (count=48, ev=0xffffb6c8dd38,
> > iq=0xffff9f764720, sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/iq_chunk.h:142
> > #1 sw_schedule_atomic_to_cq (sw=0xffff9f332600, qid=0xffff9f764700,
> > iq_num=0,
> > count=48) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:74
> > #2 0x0000aaaaaadcdc44 in sw_schedule_qid_to_cq (sw=0xffff9f332600) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:262
> > #3 0x0000aaaaaadd069c in sw_event_schedule (dev=0xaaaaaafbd200
> > <rte_event_devices>) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/drivers/event/sw/sw_evdev_scheduler.c:564
> > #4 0x0000aaaaaadca008 in sw_sched_service_func (args=0xaaaaaafbd200
> > <rte_event_devices>) at
> > /root/clean/rebase/dpdk-next-eventdev/drivers/event/sw/sw_evdev.c:767
> > #5 0x0000aaaaaab54740 in rte_service_runner_do_callback (s=0xffff9fffdf80,
> > cs=0xffff9ffef900, service_idx=0) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:349
> > #6 0x0000aaaaaab54868 in service_run (i=0, cs=0xffff9ffef900,
> > service_mask=18446744073709551615) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:376
> > #7 0x0000aaaaaab54954 in rte_service_run_iter_on_app_lcore (id=0,
> > serialize_mt_unsafe=1) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/common/rte_service.c:405
> > #8 0x0000aaaaaaaef04c in schedule_devices (lcore_id=4) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:223
> > #9 0x0000aaaaaaaef234 in worker (arg=0xffff9f331d80) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/examples/eventdev_pipeline_sw_pmd/main.c:274
> > #10 0x0000aaaaaab4382c in eal_thread_loop (arg=0x0) at
> > /root/clean/rebase/dpdk-next-
> > eventdev/lib/librte_eal/linuxapp/eal/eal_thread.c:182
> > #11 0x0000ffffb7e46d64 in start_thread () from /usr/lib/libpthread.so.0
> > #12 0x0000ffffb7da8bbc in thread_start () from /usr/lib/libc.so.6
> >
> > Segfault seems to happen in sw_event_schedule and only happens under high
> > traffic load.
>
> I've added -n 0 to the command line allowing it to run forever,
> and after ~2 mins its still happily forwarding pkts at ~10G line rate here.
>
On arm64 the crash is instant even without -n0.
>
> > Thanks,
> > Pavan
>
> Thanks for reporting - I'm afraid I'll have to ask a few questions to identify why I can't reproduce here before I can dig in and identify a fix.
>
> Anything special about the system that it is on?
Running on arm64 octeontx with 8x10G connected.
> What traffic pattern is being sent to the app?
Using something similar to trafficgen, IPv4/UDP pkts.
0:00:51 958245 |0xB00 2816|0xB10 2832|0xB20 2848|0xB30 2864|0xC00 * 3072|0xC10 * 3088|0xC20 * 3104|0xC30 * 3120| Totals
Port Status |XFI30 Up|XFI31 Up|XFI32 Up|XFI33 Up|XFI40 Up|XFI41 Up|XFI42 Up|XFI43 Up|
1:Total TX packets | 7197041566| 5194976604| 5120240981| 4424870160| 5860892739| 5191225514| 5126500427| 4429259828|42545007819
3:Total RX packets | 358886055| 323055411| 321000948| 277179800| 387486466| 350278086| 348080242| 295460613|2661427621
6:TX packet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0
7:TX octet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0
8:TX bit rate, Mbps | 0| 0| 0| 0| 0| 0| 0| 0| 0
10:RX packet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0
11:RX octet rate | 0| 0| 0| 0| 0| 0| 0| 0| 0
12:RX bit rate, Mbps | 0| 0| 0| 0| 0| 0| 0| 0| 0
36:tx.size | 60| 60| 60| 60| 60| 60| 60| 60|
37:tx.type | IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP| IPv4+UDP|
38:tx.payload | abc| abc| abc| abc| abc| abc| abc| abc|
47:dest.mac | fb71189c0| fb71189d0| fb71189e0| fb71189bf| fb7118ac0| fb7118ad0| fb7118ae0| fb7118abf|
51:src.mac | fb71189bf| fb71189cf| fb71189df| fb71189ef| fb7118abf| fb7118acf| fb7118adf| fb7118aef|
55:dest.ip | 11.1.0.99| 11.17.0.99| 11.33.0.99| 11.0.0.99| 14.1.0.99| 14.17.0.99| 14.33.0.99| 14.0.0.99|
59:src.ip | 11.0.0.99| 11.16.0.99| 11.32.0.99| 11.48.0.99| 14.0.0.99| 14.16.0.99| 14.32.0.99| 14.48.0.99|
73:bridge | off| off| off| off| off| off| off| off|
77:validate packets | off| off| off| off| off| off| off| off|
Thanks,
Pavan.
>
> Thanks
>
>
> <snip>
>
next prev parent reply other threads:[~2018-01-08 16:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-30 3:08 [dpdk-dev] [PATCH 1/2] event/sw: fix queue memory leak and multi-link bug Gage Eads
2017-11-30 3:08 ` [dpdk-dev] [PATCH 2/2] event/sw: use dynamically-sized IQs Gage Eads
2017-12-07 17:15 ` Van Haaren, Harry
2017-12-09 9:26 ` Jerin Jacob
2018-01-08 15:32 ` Pavan Nikhilesh
2018-01-08 15:50 ` Van Haaren, Harry
2018-01-08 16:05 ` Pavan Nikhilesh [this message]
2018-01-08 18:36 ` Eads, Gage
2018-01-09 7:12 ` Pavan Nikhilesh
2017-12-07 17:15 ` [dpdk-dev] [PATCH 1/2] event/sw: fix queue memory leak and multi-link bug Van Haaren, Harry
2017-12-09 9:26 ` Jerin Jacob
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180108160529.gven7vlrbmrrlw2p@Pavan-LT \
--to=pbhagavatula@caviumnetworks.com \
--cc=dev@dpdk.org \
--cc=gage.eads@intel.com \
--cc=harry.van.haaren@intel.com \
--cc=jerin.jacob@caviumnetworks.com \
--cc=santosh.shukla@caviumnetworks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).