From: Adam Hassick <ahassick@iol.unh.edu>
To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: Patrick Robb <probb@iol.unh.edu>,
Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>,
ci@dpdk.org
Subject: Re: Setting up DPDK PMD Test Suite
Date: Fri, 25 Aug 2023 10:06:00 -0400 [thread overview]
Message-ID: <CAC-YWqj28X-YhLzM-rChuKMdaoTrDS6fN68YxzS+rb1N-i5AkQ@mail.gmail.com> (raw)
In-Reply-To: <cc758ad7-2f40-7c6c-ffab-c574011ba770@oktetlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 20482 bytes --]
Hi Andrew,
Two of our systems (the Test Engine runner and the DUT host) are running
Ubuntu 20.04 LTS, however this morning I noticed that the tester system
(the one having issues) is running Ubuntu 22.04 LTS.
This could be the source of the problem. I encountered a dependency issue
trying to run the Test Engine on 22.04 LTS, so I downgraded the system.
Since the tester is also the host having connection issues, I will try
downgrading that system to 20.04, and see if that changes anything.
I did try passing in the "--vg-rcf" argument to the run.sh script of the
test suite after installing valgrind, but there was no additional output
that I saw.
I will try pulling in the changes you've pushed up, and will see if that
fixes anything.
Thanks,
Adam
On Fri, Aug 25, 2023 at 9:57 AM Andrew Rybchenko <
andrew.rybchenko@oktetlabs.ru> wrote:
> Hello Adam,
>
> On 8/24/23 23:54, Andrew Rybchenko wrote:
>
> I'd like to try to repeat the problem locally. Which Linux distro is
> running on test engine and agents?
>
> In fact I know one problem with Debian 12 and Fedora 38 and we have
> patch in review to fix it, however, the behaviour is different in
> this case, so it is unlike the same problem.
>
>
> I've just published a new tag which fixes known test engine side problems
> on Debian 12 and Fedora 38.
>
>
> One more idea is to install valgrind on the test engine host and
> run with option --vg-rcf to check if something weird is happening.
>
> What I don't understand right now is why I see just one failed attempt
> to connect in your log.txt and then Logger shutdown after 9 minutes.
>
> Andrew.
>
> On 8/24/23 23:29, Adam Hassick wrote:
>
> > Is there any firewall in the network or on test hosts which could block
> incoming TCP connection to the port 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host where you
> run test engine?
>
> Our test engine host and the testbed are on the same subnet. The
> connection does work sometimes.
>
> > If behaviour the same on the next try and you see that test agent is
> kept running, could you check using
> >
> > # netstat -tnlp
> >
> > that Test Agent is listening on the port and try to establish TCP
> connection from test agent using
> >
> > $ telnet iol-dts-tester.dpdklab.iol.unh.edu
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> >
> > and check if TCP connection could be established.
>
> I was able to replicate the same behavior again, where it hangs while RCF
> is trying to start.
> Running this command, I see this in the output:
>
> tcp 0 0 0.0.0.0:23571 <http://0.0.0.0:23571>
> <http://0.0.0.0:23571> 0.0.0.0:* LISTEN
> 18599/ta
>
> So it seems like it is listening on the correct port.
> Additionally, I was able to connect to the Tester machine from our Test
> Engine host using telnet. It printed the PID of the process once the
> connection was opened.
>
> I tried running the "ta" application manually on the command line, and it
> didn't print anything at all.
> Maybe the issue is something on the Test Engine side.
>
> On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko <
> andrew.rybchenko@oktetlabs.ru <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
> Hi Adam,
>
> > On the tester host (which appears to be the Peer agent), there
> are four processes that I see running, which look like the test
> agent processes.
>
> Before the next try I'd recommend to kill these processes.
>
> Is there any firewall in the network or on test hosts which could
> block incoming TCP connection to the port 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host
> where you run test engine?
>
> If behaviour the same on the next try and you see that test agent is
> kept running, could you check using
>
> # netstat -tnlp
>
> that Test Agent is listening on the port and try to establish TCP
> connection from test agent using
>
> $ telnet iol-dts-tester.dpdklab.iol.unh.edu
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
> and check if TCP connection could be established.
>
> Another idea is to login Tester under root as testing does, get
> start TA command from the log and try it by hands without -n and
> remove extra escaping.
>
> # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
>
> LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
> /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
> host=iol-dts-tester.dpdklab.iol.unh.edu:
> port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=
>
> Hopefully in this case test agent directory remains in the /tmp and
> you don't need to copy it as testing does.
> May be output could shed some light on what's going on.
>
> Andrew.
>
> On 8/24/23 17:30, Adam Hassick wrote:
>
> Hi Andrew,
>
> This is the output that I see in the terminal when this failure
> occurs, after the test agent binaries build and the test engine
> starts:
>
> Platform default build - pass
> Simple RCF consistency check succeeded
> --->>> Starting Logger...done
> --->>> Starting RCF...rcf_net_engine_connect(): Connection timed
> out iol-dts-tester.dpdklab.iol.unh.edu:23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
> Then, it hangs here until I kill the "te_rcf" and "te_tee"
> processes. I let it hang for around 9 minutes.
>
> On the tester host (which appears to be the Peer agent), there are
> four processes that I see running, which look like the test agent
> processes.
>
> ta.Peer is an empty file. I've attached the log.txt from this run.
>
> - Adam
>
> On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru
> <mailto:andrew.rybchenko@oktetlabs.ru> <andrew.rybchenko@oktetlabs.ru>>
> wrote:
>
> Hi Adam,
>
> Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked
> that it goes to 'copy_timeout' in ts-conf/rcf.conf.
> Description in in doc/sphinx/pages/group_te_engine_rcf.rst
> says that copy_timeout is in seconds and implementation in
> lib/rcfunix/rcfunix.c passes the value to select() tv_sec.
> Theoretically select() could be interrupted by signal, but I
> think it is unlikely here.
>
> I'm not sure that I understand what do you mean by RCF
> connection timeout. Does it happen on TE startup when RCF
> starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or
> does it happen when tests are in progress, e.g. in the middle
> of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most
> likely either host with test agent dies or test agent itself
> crashes. It would be easier for me if classify it if you share
> text log (log.txt, full or just corresponding fragment with
> some context). Also content of ta.DPDK or ta.Peer file
> depending on which agent has problems could shed some light.
> Corresponding files contain stdout/stderr of test agents.
>
> Andrew.
>
> On 8/23/23 17:45, Adam Hassick wrote:
>
> Hi Andrew,
>
> I've set up a test rig repository here, and have created
> configurations for our development testbed based off of the
> examples.
> We've been able to get the test suite to run manually on
> Mellanox CX5 devices once.
> However, we are running into an issue where, when RCF starts,
> the RCF connection times out very frequently. We aren't sure
> why this is the case.
> It works sometimes, but most of the time when we try to run
> the test engine, it encounters this issue.
> I've tried changing the RCF port by setting
> "TE_RCF_PORT=<some port number>" and rebooting the testbed
> machines. Neither seems to fix the issue.
>
> It also seems like the timeout takes far longer than 60
> seconds, even when running "export TE_RCFUNIX_TIMEOUT=60"
> before I try to run the test suite.
> I assume the unit for this variable is seconds?
>
> Thanks,
> Adam
>
> On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
> <ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>> wrote:
>
> Hi Andrew,
>
> Thanks, I've cloned the example repository and will start
> setting up a configuration for our development testbed
> today. I'll let you know if I run into any difficulties
> or have any questions.
>
> - Adam
>
> On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru
> <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
> Hi Adam,
>
> I've published
> https://github.com/ts-factory/ts-rigs-sample
> <https://github.com/ts-factory/ts-rigs-sample>
> <https://github.com/ts-factory/ts-rigs-sample>.
> Hopefully it will help to define your test rigs and
> successfully run some tests manually. Feel free to
> ask any questions and I'll answer here and try to
> update documentation.
>
> Meanwhile I'll prepare missing bits for steps (2) and
> (3).
> Hopefully everything is in place for step (4), but we
> need to make steps (2) and (3) first.
>
> Andrew.
>
> On 8/18/23 21:40, Andrew Rybchenko wrote:
>
> Hi Adam,
>
> > I've conferred with the rest of the team, and we
> think it would be best to move forward with mainly
> option B.
>
> OK, I'll provide the sample on Monday for you. It is
> almost ready right now, but I need to double-check
> it before publishing.
>
> Regards,
> Andrew.
>
> On 8/17/23 20:03, Adam Hassick wrote:
>
> Hi Andrew,
>
> I'm adding the CI mailing list to this
> conversation. Others in the community might find
> this conversation valuable.
>
> We do want to run testing on a regular basis. The
> Jenkins integration will be very useful for us, as
> most of our CI is orchestrated by Jenkins.
> I've conferred with the rest of the team, and we
> think it would be best to move forward with mainly
> option B.
> If you would like to know anything about our
> testbeds that would help you with creating an
> example ts-rigs repo, I'd be happy to answer any
> questions you have.
>
> We have multiple test rigs (we call these
> "DUT-tester pairs") that we run our existing
> hardware testing on, with differing network
> hardware and CPU architecture. I figured this might
> be an important detail.
>
> Thanks,
> Adam
>
> On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru
> <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
> Greatings Adam,
>
> I'm happy to hear that you're trying to bring
> it up.
>
> As I understand the final goal is to run it on
> regular basis. So, we need to make it properly
> from the very beginning.
> Bring up of all features consists of 4 steps:
>
> 1. Create site-specific repository (we call it
> ts-rigs) which contains information about test
> rigs and other site-specific information like
> where to send mails, where to store logs etc.
> It is required for manual execution as well,
> since test rigs description is essential. I'll
> return to the topic below.
>
> 2. Setup logs storage for automated runs.
> Basically it is a disk space plus apache2 web
> server with few CGI scripts which help a lot to
> save disk space.
>
> 3. Setup Bublik web application which provides
> web interface to view testing results. Same as
> https://ts-factory.io/bublik
> <https://ts-factory.io/bublik>
> <https://ts-factory.io/bublik>
>
> 4. Setup Jenkins to run tests on regularly,
> save logs in log storage (2) and import it to
> bublik (3).
>
> Last few month we spent on our homework to make
> it simpler to bring up automated execution
> using Jenkins -
> https://github.com/ts-factory/te-jenkins
> <https://github.com/ts-factory/te-jenkins>
> <https://github.com/ts-factory/te-jenkins>
> Corresponding bits in dpdk-ethdev-ts will be
> available tomorrow.
>
> Let's return to the step (1).
>
> Unfortunately there is no publicly available
> example of the ts-rigs repository since
> sensitive site-specific information is located
> there. But I'm ready to help you to create it
> for UNH. I see two options here:
>
> (A) I'll ask questions and based on your
> answers will create the first draft with my
> comments.
>
> (B) I'll make a template/example ts-rigs repo,
> publish it and you'll create UNH ts-rigs based
> on it.
>
> Of course, I'll help to debug and finally bring
> it up in any case.
>
> (A) is a bit simpler for me and you, but (B) is
> a bit more generic and will help other
> potential users to bring it up.
> We can combine (A)+(B). I.e. start from (A).
> What do you think?
>
> Thanks,
> Andrew.
>
> On 8/17/23 15:18, Konstantin Ushakov wrote:
>
> Greetings Adam,
>
>
> Thanks for contacting us. I copy Andrew who
> would be happy to help
>
> Thanks,
> Konstantin
>
> On 16 Aug 2023, at 21:50, Adam Hassick
> <ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> wrote:
>
>
> Greetings Konstantin,
>
> I am in the process of setting up the DPDK
> Poll Mode Driver test suite as an addition to
> our testing coverage for DPDK at the UNH lab.
>
> I have some questions about how to set the
> test suite arguments.
>
> I have been able to configure the Test Engine
> to connect to the hosts in the testbed. The
> RCF, Configurator, and Tester all begin to
> run, however the prelude of the test suite
> fails to run.
>
>
> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
>
> The documentation mentions that there are
> several test parameters for the test suite,
> like for the IUT test link MAC, etc. These
> seem like they would need to be set somewhere
> to run many of the tests.
>
> I see in the Test Engine documentation, there
> are instructions on how to create new
> parameters for test suites in the Tester
> configuration, but there is nothing in the
> user guide or in the Tester guide for how to
> set the arguments for the parameters when
> running the test suite that I can find. I'm
> not sure if I need to write my own Tester
> config, or if I should be setting these in
> some other way.
>
> How should these values be set?
>
> I'm also not sure what environment
> variables/arguments are strictly necessary or
> which are optional.
>
> Regards,
> Adam
>
> -- *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu
> <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>
> -- *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>
>
> -- *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
> -- *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>
> -- *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>
> --
> *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>
--
*Adam Hassick*
Senior Developer
UNH InterOperability Lab
ahassick@iol.unh.edu
iol.unh.edu <https://www.iol.unh.edu/>
+1 (603) 475-8248
[-- Attachment #2: Type: text/html, Size: 42032 bytes --]
next prev parent reply other threads:[~2023-08-25 14:05 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAC-YWqiQfH4Rx-Et1jGHhGK9i47d0AArKy-B2P77iYbbM+Lpig@mail.gmail.com>
[not found] ` <C3B08390-DA6D-4BDC-BBD7-98561F92FE33@oktetlabs.ru>
[not found] ` <35340484-1d7e-7e5f-cad4-c965ba541397@oktetlabs.ru>
2023-08-17 17:03 ` Adam Hassick
2023-08-18 18:40 ` Andrew Rybchenko
2023-08-20 8:40 ` Andrew Rybchenko
2023-08-21 14:19 ` Adam Hassick
2023-08-23 14:45 ` Adam Hassick
2023-08-24 8:22 ` Andrew Rybchenko
2023-08-24 14:30 ` Adam Hassick
2023-08-24 18:34 ` Andrew Rybchenko
2023-08-24 20:29 ` Adam Hassick
2023-08-24 20:54 ` Andrew Rybchenko
2023-08-25 13:57 ` Andrew Rybchenko
2023-08-25 14:06 ` Adam Hassick [this message]
2023-08-25 14:41 ` Andrew Rybchenko
2023-08-25 17:35 ` Andrew Rybchenko
2023-08-28 15:02 ` Adam Hassick
2023-08-28 21:05 ` Andrew Rybchenko
2023-08-29 12:07 ` Andrew Rybchenko
2023-08-29 14:02 ` Adam Hassick
2023-08-29 20:43 ` Andrew Rybchenko
2023-08-31 19:38 ` Adam Hassick
2023-09-01 7:59 ` Andrew Rybchenko
2023-09-05 15:01 ` Adam Hassick
2023-09-06 11:36 ` Andrew Rybchenko
2023-09-06 15:00 ` Adam Hassick
2023-09-08 14:57 ` Adam Hassick
2023-09-13 15:45 ` Andrew Rybchenko
2023-09-18 6:15 ` Andrew Rybchenko
2023-09-18 6:23 ` Konstantin Ushakov
2023-09-18 6:26 ` Andrew Rybchenko
2023-09-18 14:44 ` Adam Hassick
2023-09-18 15:04 ` Andrew Rybchenko
2023-10-04 13:48 ` Adam Hassick
2023-10-05 10:25 ` Andrew Rybchenko
2023-10-10 14:09 ` Adam Hassick
2023-10-11 11:46 ` Andrew Rybchenko
2023-10-23 11:11 ` Andrew Rybchenko
2023-10-25 20:27 ` Adam Hassick
2023-10-26 12:19 ` Andrew Rybchenko
2023-10-26 17:44 ` Adam Hassick
2023-10-27 8:01 ` Andrew Rybchenko
2023-10-27 19:13 ` Andrew Rybchenko
2023-11-06 23:16 ` Adam Hassick
2023-11-07 16:57 ` Andrew Rybchenko
2023-11-07 20:30 ` Adam Hassick
2023-11-08 7:20 ` Andrew Rybchenko
2023-11-16 20:03 ` Adam Hassick
2023-11-16 20:38 ` DPDK Coverity test run Mcnamara, John
2023-11-16 20:43 ` Patrick Robb
2023-11-16 20:56 ` Mcnamara, John
2023-11-20 17:18 ` Setting up DPDK PMD Test Suite Andrew Rybchenko
2023-12-01 14:39 ` Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAC-YWqj28X-YhLzM-rChuKMdaoTrDS6fN68YxzS+rb1N-i5AkQ@mail.gmail.com \
--to=ahassick@iol.unh.edu \
--cc=Konstantin.Ushakov@oktetlabs.ru \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=ci@dpdk.org \
--cc=probb@iol.unh.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).