DPDK CI discussions
 help / color / mirror / Atom feed
From: Adam Hassick <ahassick@iol.unh.edu>
To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: Patrick Robb <probb@iol.unh.edu>,
	Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>,
	ci@dpdk.org
Subject: Re: Setting up DPDK PMD Test Suite
Date: Fri, 25 Aug 2023 10:06:00 -0400	[thread overview]
Message-ID: <CAC-YWqj28X-YhLzM-rChuKMdaoTrDS6fN68YxzS+rb1N-i5AkQ@mail.gmail.com> (raw)
In-Reply-To: <cc758ad7-2f40-7c6c-ffab-c574011ba770@oktetlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 20482 bytes --]

Hi Andrew,

Two of our systems (the Test Engine runner and the DUT host) are running
Ubuntu 20.04 LTS, however this morning I noticed that the tester system
(the one having issues) is running Ubuntu 22.04 LTS.
This could be the source of the problem. I encountered a dependency issue
trying to run the Test Engine on 22.04 LTS, so I downgraded the system.
Since the tester is also the host having connection issues, I will try
downgrading that system to 20.04, and see if that changes anything.

I did try passing in the "--vg-rcf" argument to the run.sh script of the
test suite after installing valgrind, but there was no additional output
that I saw.

I will try pulling in the changes you've pushed up, and will see if that
fixes anything.

Thanks,
Adam

On Fri, Aug 25, 2023 at 9:57 AM Andrew Rybchenko <
andrew.rybchenko@oktetlabs.ru> wrote:

> Hello Adam,
>
> On 8/24/23 23:54, Andrew Rybchenko wrote:
>
> I'd like to try to repeat the problem locally. Which Linux distro is
> running on test engine and agents?
>
> In fact I know one problem with Debian 12 and Fedora 38 and we have
> patch in review to fix it, however, the behaviour is different in
> this case, so it is unlike the same problem.
>
>
> I've just published a new tag which fixes known test engine side problems
> on Debian 12 and Fedora 38.
>
>
> One more idea is to install valgrind on the test engine host and
> run with option --vg-rcf to check if something weird is happening.
>
> What I don't understand right now is why I see just one failed attempt
> to connect in your log.txt and then Logger shutdown after 9 minutes.
>
> Andrew.
>
> On 8/24/23 23:29, Adam Hassick wrote:
>
>  > Is there any firewall in the network or on test hosts which could block
> incoming TCP connection to the port 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host where you
> run test engine?
>
> Our test engine host and the testbed are on the same subnet. The
> connection does work sometimes.
>
>  > If behaviour the same on the next try and you see that test agent is
> kept running, could you check using
>  >
>  > # netstat -tnlp
>  >
>  > that Test Agent is listening on the port and try to establish TCP
> connection from test agent using
>  >
>  > $ telnet iol-dts-tester.dpdklab.iol.unh.edu
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>  >
>  > and check if TCP connection could be established.
>
> I was able to replicate the same behavior again, where it hangs while RCF
> is trying to start.
> Running this command, I see this in the output:
>
> tcp        0      0 0.0.0.0:23571 <http://0.0.0.0:23571>
> <http://0.0.0.0:23571>           0.0.0.0:*               LISTEN
>  18599/ta
>
> So it seems like it is listening on the correct port.
> Additionally, I was able to connect to the Tester machine from our Test
> Engine host using telnet. It printed the PID of the process once the
> connection was opened.
>
> I tried running the "ta" application manually on the command line, and it
> didn't print anything at all.
> Maybe the issue is something on the Test Engine side.
>
> On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko <
> andrew.rybchenko@oktetlabs.ru <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
>     Hi Adam,
>
>      > On the tester host (which appears to be the Peer agent), there
>     are four processes that I see running, which look like the test
>     agent processes.
>
>     Before the next try I'd recommend to kill these processes.
>
>     Is there any firewall in the network or on test hosts which could
>     block incoming TCP connection to the port 23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host
>     where you run test engine?
>
>     If behaviour the same on the next try and you see that test agent is
>     kept running, could you check using
>
>     # netstat -tnlp
>
>     that Test Agent is listening on the port and try to establish TCP
>     connection from test agent using
>
>     $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
>     and check if TCP connection could be established.
>
>     Another idea is to login Tester under root as testing does, get
>     start TA command from the log and try it by hands without -n and
>     remove extra escaping.
>
>     # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
>
> LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
> /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
> host=iol-dts-tester.dpdklab.iol.unh.edu:
> port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=
>
>     Hopefully in this case test agent directory remains in the /tmp and
>     you don't need to copy it as testing does.
>     May be output could shed some light on what's going on.
>
>     Andrew.
>
>     On 8/24/23 17:30, Adam Hassick wrote:
>
>     Hi Andrew,
>
>     This is the output that I see in the terminal when this failure
>     occurs, after the test agent binaries build and the test engine
>     starts:
>
>     Platform default build - pass
>     Simple RCF consistency check succeeded
>     --->>> Starting Logger...done
>     --->>> Starting RCF...rcf_net_engine_connect(): Connection timed
>     out iol-dts-tester.dpdklab.iol.unh.edu:23571
>     <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>
>     Then, it hangs here until I kill the "te_rcf" and "te_tee"
>     processes. I let it hang for around 9 minutes.
>
>     On the tester host (which appears to be the Peer agent), there are
>     four processes that I see running, which look like the test agent
>     processes.
>
>     ta.Peer is an empty file. I've attached the log.txt from this run.
>
>      - Adam
>
>     On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
>     <andrew.rybchenko@oktetlabs.ru
>     <mailto:andrew.rybchenko@oktetlabs.ru> <andrew.rybchenko@oktetlabs.ru>>
> wrote:
>
>         Hi Adam,
>
>         Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked
>         that it goes to 'copy_timeout' in ts-conf/rcf.conf.
>         Description in in doc/sphinx/pages/group_te_engine_rcf.rst
>         says that copy_timeout is in seconds and implementation in
>         lib/rcfunix/rcfunix.c passes the value to select() tv_sec.
>         Theoretically select() could be interrupted by signal, but I
>         think it is unlikely here.
>
>         I'm not sure that I understand what do you mean by RCF
>         connection timeout. Does it happen on TE startup when RCF
>         starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or
>         does it happen when tests are in progress, e.g. in the middle
>         of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most
>         likely either host with test agent dies or test agent itself
>         crashes. It would be easier for me if classify it if you share
>         text log (log.txt, full or just corresponding fragment with
>         some context). Also content of ta.DPDK or ta.Peer file
>         depending on which agent has problems could shed some light.
>         Corresponding files contain stdout/stderr of test agents.
>
>         Andrew.
>
>         On 8/23/23 17:45, Adam Hassick wrote:
>
>         Hi Andrew,
>
>         I've set up a test rig repository here, and have created
>         configurations for our development testbed based off of the
>         examples.
>         We've been able to get the test suite to run manually on
>         Mellanox CX5 devices once.
>         However, we are running into an issue where, when RCF starts,
>         the RCF connection times out very frequently. We aren't sure
>         why this is the case.
>         It works sometimes, but most of the time when we try to run
>         the test engine, it encounters this issue.
>         I've tried changing the RCF port by setting
>         "TE_RCF_PORT=<some port number>" and rebooting the testbed
>         machines. Neither seems to fix the issue.
>
>         It also seems like the timeout takes far longer than 60
>         seconds, even when running "export TE_RCFUNIX_TIMEOUT=60"
>         before I try to run the test suite.
>         I assume the unit for this variable is seconds?
>
>         Thanks,
>         Adam
>
>         On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
>         <ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>> wrote:
>
>             Hi Andrew,
>
>             Thanks, I've cloned the example repository and will start
>             setting up a configuration for our development testbed
>             today. I'll let you know if I run into any difficulties
>             or have any questions.
>
>              - Adam
>
>             On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
>             <andrew.rybchenko@oktetlabs.ru
>             <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
>                 Hi Adam,
>
>                 I've published
>                 https://github.com/ts-factory/ts-rigs-sample
>                 <https://github.com/ts-factory/ts-rigs-sample>
> <https://github.com/ts-factory/ts-rigs-sample>.
>                 Hopefully it will help to define your test rigs and
>                 successfully run some tests manually. Feel free to
>                 ask any questions and I'll answer here and try to
>                 update documentation.
>
>                 Meanwhile I'll prepare missing bits for steps (2) and
>                 (3).
>                 Hopefully everything is in place for step (4), but we
>                 need to make steps (2) and (3) first.
>
>                 Andrew.
>
>                 On 8/18/23 21:40, Andrew Rybchenko wrote:
>
>                 Hi Adam,
>
>                 > I've conferred with the rest of the team, and we
>                 think it would be best to move forward with mainly
>                 option B.
>
>                 OK, I'll provide the sample on Monday for you. It is
>                 almost ready right now, but I need to double-check
>                 it before publishing.
>
>                 Regards,
>                 Andrew.
>
>                 On 8/17/23 20:03, Adam Hassick wrote:
>
>                 Hi Andrew,
>
>                 I'm adding the CI mailing list to this
>                 conversation. Others in the community might find
>                 this conversation valuable.
>
>                 We do want to run testing on a regular basis. The
>                 Jenkins integration will be very useful for us, as
>                 most of our CI is orchestrated by Jenkins.
>                 I've conferred with the rest of the team, and we
>                 think it would be best to move forward with mainly
>                 option B.
>                 If you would like to know anything about our
>                 testbeds that would help you with creating an
>                 example ts-rigs repo, I'd be happy to answer any
>                 questions you have.
>
>                 We have multiple test rigs (we call these
>                 "DUT-tester pairs") that we run our existing
>                 hardware testing on, with differing network
>                 hardware and CPU architecture. I figured this might
>                 be an important detail.
>
>                 Thanks,
>                 Adam
>
>                 On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko
>                 <andrew.rybchenko@oktetlabs.ru
>                 <mailto:andrew.rybchenko@oktetlabs.ru>
> <andrew.rybchenko@oktetlabs.ru>> wrote:
>
>                     Greatings Adam,
>
>                     I'm happy to hear that you're trying to bring
>                     it up.
>
>                     As I understand the final goal is to run it on
>                     regular basis. So, we need to make it properly
>                     from the very beginning.
>                     Bring up of all features consists of 4 steps:
>
>                     1. Create site-specific repository (we call it
>                     ts-rigs) which contains information about test
>                     rigs and other site-specific information like
>                     where to send mails, where to store logs etc.
>                     It is required for manual execution as well,
>                     since test rigs description is essential. I'll
>                     return to the topic below.
>
>                     2. Setup logs storage for automated runs.
>                     Basically it is a disk space plus apache2 web
>                     server with few CGI scripts which help a lot to
>                     save disk space.
>
>                     3. Setup Bublik web application which provides
>                     web interface to view testing results. Same as
>                     https://ts-factory.io/bublik
>                     <https://ts-factory.io/bublik>
> <https://ts-factory.io/bublik>
>
>                     4. Setup Jenkins to run tests on regularly,
>                     save logs in log storage (2) and import it to
>                     bublik (3).
>
>                     Last few month we spent on our homework to make
>                     it simpler to bring up automated execution
>                     using Jenkins -
>                     https://github.com/ts-factory/te-jenkins
>                     <https://github.com/ts-factory/te-jenkins>
> <https://github.com/ts-factory/te-jenkins>
>                     Corresponding bits in dpdk-ethdev-ts will be
>                     available tomorrow.
>
>                     Let's return to the step (1).
>
>                     Unfortunately there is no publicly available
>                     example of the ts-rigs repository since
>                     sensitive site-specific information is located
>                     there. But I'm ready to help you to create it
>                     for UNH. I see two options here:
>
>                     (A) I'll ask questions and based on your
>                     answers will create the first draft with my
>                     comments.
>
>                     (B) I'll make a template/example ts-rigs repo,
>                     publish it and you'll create UNH ts-rigs based
>                     on it.
>
>                     Of course, I'll help to debug and finally bring
>                     it up in any case.
>
>                     (A) is a bit simpler for me and you, but (B) is
>                     a bit more generic and will help other
>                     potential users to bring it up.
>                     We can combine (A)+(B). I.e. start from (A).
>                     What do you think?
>
>                     Thanks,
>                     Andrew.
>
>                     On 8/17/23 15:18, Konstantin Ushakov wrote:
>
>                     Greetings Adam,
>
>
>                     Thanks for contacting us. I copy Andrew who
>                     would be happy to help
>
>                     Thanks,
>                     Konstantin
>
>                     On 16 Aug 2023, at 21:50, Adam Hassick
>                     <ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
>                     <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> wrote:
>
>                     
>                     Greetings Konstantin,
>
>                     I am in the process of setting up the DPDK
>                     Poll Mode Driver test suite as an addition to
>                     our testing coverage for DPDK at the UNH lab.
>
>                     I have some questions about how to set the
>                     test suite arguments.
>
>                     I have been able to configure the Test Engine
>                     to connect to the hosts in the testbed. The
>                     RCF, Configurator, and Tester all begin to
>                     run, however the prelude of the test suite
>                     fails to run.
>
>
> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
>
>                     The documentation mentions that there are
>                     several test parameters for the test suite,
>                     like for the IUT test link MAC, etc. These
>                     seem like they would need to be set somewhere
>                     to run many of the tests.
>
>                     I see in the Test Engine documentation, there
>                     are instructions on how to create new
>                     parameters for test suites in the Tester
>                     configuration, but there is nothing in the
>                     user guide or in the Tester guide for how to
>                     set the arguments for the parameters when
>                     running the test suite that I can find. I'm
>                     not sure if I need to write my own Tester
>                     config, or if I should be setting these in
>                     some other way.
>
>                     How should these values be set?
>
>                     I'm also not sure what environment
>                     variables/arguments are strictly necessary or
>                     which are optional.
>
>                     Regards,
>                     Adam
>
>                     --                     *Adam Hassick*
>                     Senior Developer
>                     UNH InterOperability Lab
>                     ahassick@iol.unh.edu
>                     <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
>                     iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>                     +1 (603) 475-8248
>
>
>
>
>                 --                 *Adam Hassick*
>                 Senior Developer
>                 UNH InterOperability Lab
>                 ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
>                 iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>                 +1 (603) 475-8248
>
>
>
>
>
>             --             *Adam Hassick*
>             Senior Developer
>             UNH InterOperability Lab
>             ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
>             iol.unh.edu <https://www.iol.unh.edu/>
> <https://www.iol.unh.edu/>
>             +1 (603) 475-8248
>
>
>
>         --         *Adam Hassick*
>         Senior Developer
>         UNH InterOperability Lab
>         ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
>         iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
>         +1 (603) 475-8248
>
>
>
>
>     --     *Adam Hassick*
>     Senior Developer
>     UNH InterOperability Lab
>     ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
> <ahassick@iol.unh.edu>
>     iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
>     +1 (603) 475-8248
>
>
>
>
> --
> *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu> <ahassick@iol.unh.edu>
> iol.unh.edu <https://www.iol.unh.edu/> <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>
>

-- 
*Adam Hassick*
Senior Developer
UNH InterOperability Lab
ahassick@iol.unh.edu
iol.unh.edu <https://www.iol.unh.edu/>
+1 (603) 475-8248

[-- Attachment #2: Type: text/html, Size: 42032 bytes --]

  reply	other threads:[~2023-08-25 14:05 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAC-YWqiQfH4Rx-Et1jGHhGK9i47d0AArKy-B2P77iYbbM+Lpig@mail.gmail.com>
     [not found] ` <C3B08390-DA6D-4BDC-BBD7-98561F92FE33@oktetlabs.ru>
     [not found]   ` <35340484-1d7e-7e5f-cad4-c965ba541397@oktetlabs.ru>
2023-08-17 17:03     ` Adam Hassick
2023-08-18 18:40       ` Andrew Rybchenko
2023-08-20  8:40         ` Andrew Rybchenko
2023-08-21 14:19           ` Adam Hassick
2023-08-23 14:45             ` Adam Hassick
2023-08-24  8:22               ` Andrew Rybchenko
2023-08-24 14:30                 ` Adam Hassick
2023-08-24 18:34                   ` Andrew Rybchenko
2023-08-24 20:29                     ` Adam Hassick
2023-08-24 20:54                       ` Andrew Rybchenko
2023-08-25 13:57                         ` Andrew Rybchenko
2023-08-25 14:06                           ` Adam Hassick [this message]
2023-08-25 14:41                             ` Andrew Rybchenko
2023-08-25 17:35                               ` Andrew Rybchenko
2023-08-28 15:02                                 ` Adam Hassick
2023-08-28 21:05                                   ` Andrew Rybchenko
2023-08-29 12:07                                     ` Andrew Rybchenko
2023-08-29 14:02                                       ` Adam Hassick
2023-08-29 20:43                                         ` Andrew Rybchenko
2023-08-31 19:38                                           ` Adam Hassick
2023-09-01  7:59                                             ` Andrew Rybchenko
2023-09-05 15:01                                               ` Adam Hassick
2023-09-06 11:36                                                 ` Andrew Rybchenko
2023-09-06 15:00                                                   ` Adam Hassick
2023-09-08 14:57                                                     ` Adam Hassick
2023-09-13 15:45                                                       ` Andrew Rybchenko
2023-09-18  6:15                                                         ` Andrew Rybchenko
2023-09-18  6:23                                                           ` Konstantin Ushakov
2023-09-18  6:26                                                             ` Andrew Rybchenko
2023-09-18 14:44                                                               ` Adam Hassick
2023-09-18 15:04                                                                 ` Andrew Rybchenko
2023-10-04 13:48                                                                   ` Adam Hassick
2023-10-05 10:25                                                                     ` Andrew Rybchenko
2023-10-10 14:09                                                                       ` Adam Hassick
2023-10-11 11:46                                                                         ` Andrew Rybchenko
2023-10-23 11:11                                                                         ` Andrew Rybchenko
2023-10-25 20:27                                                                           ` Adam Hassick
2023-10-26 12:19                                                                             ` Andrew Rybchenko
2023-10-26 17:44                                                                               ` Adam Hassick
2023-10-27  8:01                                                                                 ` Andrew Rybchenko
2023-10-27 19:13                                                                                 ` Andrew Rybchenko
2023-11-06 23:16                                                                                   ` Adam Hassick
2023-11-07 16:57                                                                                     ` Andrew Rybchenko
2023-11-07 20:30                                                                                       ` Adam Hassick
2023-11-08  7:20                                                                                         ` Andrew Rybchenko
2023-11-16 20:03                                                                                           ` Adam Hassick
2023-11-16 20:38                                                                                             ` DPDK Coverity test run Mcnamara, John
2023-11-16 20:43                                                                                               ` Patrick Robb
2023-11-16 20:56                                                                                                 ` Mcnamara, John
2023-11-20 17:18                                                                                             ` Setting up DPDK PMD Test Suite Andrew Rybchenko
2023-12-01 14:39                                                                                               ` Andrew Rybchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAC-YWqj28X-YhLzM-rChuKMdaoTrDS6fN68YxzS+rb1N-i5AkQ@mail.gmail.com \
    --to=ahassick@iol.unh.edu \
    --cc=Konstantin.Ushakov@oktetlabs.ru \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ci@dpdk.org \
    --cc=probb@iol.unh.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).