DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Van Haaren, Harry" <harry.van.haaren@intel.com>
To: Aaron Conole <aconole@redhat.com>,
	David Marchand <david.marchand@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [BUG] service_lcore_en_dis_able from service_autotest failing
Date: Mon, 7 Oct 2019 09:50:46 +0000	[thread overview]
Message-ID: <E923DB57A917B54B9182A2E928D00FA6759075DA@IRSMSX102.ger.corp.intel.com> (raw)
In-Reply-To: <f7ttv9ral9c.fsf@dhcp-25.97.bos.redhat.com>

> -----Original Message-----
> From: Aaron Conole [mailto:aconole@redhat.com]
> Sent: Wednesday, September 4, 2019 8:56 PM
> To: David Marchand <david.marchand@redhat.com>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [BUG] service_lcore_en_dis_able from service_autotest
> failing
> 
> David Marchand <david.marchand@redhat.com> writes:
> 
> > On Wed, Sep 4, 2019 at 12:04 PM David Marchand
> > <david.marchand@redhat.com> wrote:
> >>
> >> On Wed, Sep 4, 2019 at 11:42 AM Van Haaren, Harry
> >> <harry.van.haaren@intel.com> wrote:
> >> >
> >> > > -----Original Message-----
> >> > > From: Aaron Conole [mailto:aconole@redhat.com]
> >> > > Sent: Tuesday, September 3, 2019 3:46 PM
> >> > > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> >> > > Cc: dev@dpdk.org
> >> > > Subject: [BUG] service_lcore_en_dis_able from service_autotest failing
> >> > >
> >> > > Hi Harry,
> >> >
> >> > Hey Aaron,
> >> >
> >> > > I noticed as part of series_6218
> >> > > (http://patches.dpdk.org/project/dpdk/list/?series=6218) that the
> travis
> >> > > build had a single failure, in service_autotest but it doesn't seem
> >> > > related to the series at all.
> >> > >
> >> > >   https://travis-ci.com/ovsrobot/dpdk/jobs/230358460
> >> > >
> >> > > Not sure if there's some kind of debugging we can add or look at to
> help
> >> > > diagnose failures when they occur.  Do you have time to have a look?
> >> >
> >> > Thanks for flagging this.
> >> >
> >> > I've just re-run the unit tests here multiple times to see if I can
> >> > reproduce something strange, no luck on reproducing the issue.
> >> >
> >> > Attempted with clang-6 and clang-7 (travis error on clang-7),
> >> > still no issues found.
> >> >
> >> > Building with Clang-7 and Shared libs (instead of default static)
> >> > still no issues found.
> >> >
> >> > If somebody can reproduce please send an update to here and I'll
> >> > attempt to replicate that setup. Right now I can't reproduce the issue.
> >>
> >> You have to be patient, but I caught it on my laptop:
> >>
> >
> > Ok, and now with the logs:
> >
> >
> > # time (log=/tmp/$$.log; while true; do echo service_autotest |taskset
> > -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8 -l 0-1
> >>$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f
> > $log)
> > EAL: Detected lcore 0 as core 0 on socket 0
> > EAL: Detected lcore 1 as core 1 on socket 0
> > EAL: Detected lcore 2 as core 2 on socket 0
<snip verbose logs>
> > EAL:   probe driver: 8086:15d7 net_e1000_em
> > EAL:   Not managed by a supported kernel driver, skipped
> > EAL: Module /sys/module/vfio not found! error 2 (No such file or directory)
> > APP: HPET is not enabled, using TSC as default timer
> > RTE>>service_autotest
> >  + ------------------------------------------------------- +
> >  + Test Suite : service core test suite
> >  + ------------------------------------------------------- +
> >  + TestCase [ 0] : unregister_all succeeded
> >  + TestCase [ 1] : service_name succeeded
> >  + TestCase [ 2] : service_get_by_name succeeded
> > Service dummy_service Summary
> >   dummy_service: stats 1    calls 0    cycles 0    avg: 0
> > Service dummy_service Summary
> >   dummy_service: stats 0    calls 0    cycles 0    avg: 0
> >  + TestCase [ 3] : service_dump succeeded
> >  + TestCase [ 4] : service_attr_get succeeded
> >  + TestCase [ 5] : service_lcore_attr_get succeeded
> >  + TestCase [ 6] : service_probe_capability succeeded
> >  + TestCase [ 7] : service_start_stop succeeded
> >  + TestCase [ 8] : service_lcore_add_del skipped
> >  + TestCase [ 9] : service_lcore_start_stop succeeded
> > EAL: Test assert service_lcore_en_dis_able line 488 failed: Ex-service
> > core function call had no effect.
> >  + TestCase [10] : service_lcore_en_dis_able failed
> >  + TestCase [11] : service_mt_unsafe_poll skipped
> >  + TestCase [12] : service_mt_safe_poll skipped
> >  + TestCase [13] : service_app_lcore_mt_safe succeeded
> >  + TestCase [14] : service_app_lcore_mt_unsafe succeeded
> >  + TestCase [15] : service_may_be_active succeeded
> >  + ------------------------------------------------------- +
> >  + Test Suite Summary
> >  + Tests Total :       16
> >  + Tests Skipped :      3
> >  + Tests Executed :    16
> >  + Tests Unsupported:   0
> >  + Tests Passed :      12
> >  + Tests Failed :       1
> >  + ------------------------------------------------------- +
> > Test Failed
> > RTE>>EAL: request: mp_malloc_sync
> > EAL: Heap on socket 0 was shrunk by 2MB
> >
> > real    2m42.884s
> > user    5m1.902s
> > sys    0m2.208s
> 
> I can confirm - takes about 1m to fail.


Hi Aaron and David,

I've been attempting to reproduce this, still no errors here.

Given the nature of service-cores, and the difficulty to reproduce
here this feels like a race-condition - one that may not exist in all
binaries. Can you describe your compiler/command setup? (gcc 7.4.0 here).

I'm using Meson to build, so reproducing using this instead of the command
as provided above. There should be no difference in reproducing due to this:

$ meson test service_autotest --repeat 50

1/1 DPDK:fast-tests / service_autotest      OK       3.86 s
1/1 DPDK:fast-tests / service_autotest      OK       3.87 s
...
1/1 DPDK:fast-tests / service_autotest      OK       3.84 s

OK:        50
FAIL:       0
SKIP:       0
TIMEOUT:    0

I'll keep it running for a few hours but I have little faith if it only
takes 1 minute on your machines...

Regards, -Harry

  reply	other threads:[~2019-10-07  9:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 14:45 Aaron Conole
2019-09-04  9:41 ` Van Haaren, Harry
2019-09-04 10:04   ` David Marchand
2019-09-04 10:38     ` David Marchand
2019-09-04 19:56       ` Aaron Conole
2019-10-07  9:50         ` Van Haaren, Harry [this message]
2019-10-07 12:38           ` Aaron Conole
2019-10-14 14:53             ` Aaron Conole
2019-10-14 16:48               ` Van Haaren, Harry
2019-10-15 16:42                 ` Van Haaren, Harry
2019-09-04  9:55 ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E923DB57A917B54B9182A2E928D00FA6759075DA@IRSMSX102.ger.corp.intel.com \
    --to=harry.van.haaren@intel.com \
    --cc=aconole@redhat.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).