From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
To: Adam Hassick <ahassick@iol.unh.edu>
Cc: Patrick Robb <probb@iol.unh.edu>,
Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>,
ci@dpdk.org
Subject: Re: Setting up DPDK PMD Test Suite
Date: Fri, 25 Aug 2023 16:57:33 +0300 [thread overview]
Message-ID: <cc758ad7-2f40-7c6c-ffab-c574011ba770@oktetlabs.ru> (raw)
In-Reply-To: <873c7972-3e5a-9e82-9449-4d12b2c96032@oktetlabs.ru>
[-- Attachment #1: Type: text/plain, Size: 22293 bytes --]
Hello Adam,
On 8/24/23 23:54, Andrew Rybchenko wrote:
> I'd like to try to repeat the problem locally. Which Linux distro is
> running on test engine and agents?
>
> In fact I know one problem with Debian 12 and Fedora 38 and we have
> patch in review to fix it, however, the behaviour is different in
> this case, so it is unlike the same problem.
I've just published a new tag which fixes known test engine side
problems on Debian 12 and Fedora 38.
>
> One more idea is to install valgrind on the test engine host and
> run with option --vg-rcf to check if something weird is happening.
>
> What I don't understand right now is why I see just one failed attempt
> to connect in your log.txt and then Logger shutdown after 9 minutes.
>
> Andrew.
>
> On 8/24/23 23:29, Adam Hassick wrote:
>> > Is there any firewall in the network or on test hosts which could
>> block incoming TCP connection to the port 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host where
>> you run test engine?
>>
>> Our test engine host and the testbed are on the same subnet. The
>> connection does work sometimes.
>>
>> > If behaviour the same on the next try and you see that test agent
>> is kept running, could you check using
>> >
>> > # netstat -tnlp
>> >
>> > that Test Agent is listening on the port and try to establish TCP
>> connection from test agent using
>> >
>> > $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>> >
>> > and check if TCP connection could be established.
>>
>> I was able to replicate the same behavior again, where it hangs while
>> RCF is trying to start.
>> Running this command, I see this in the output:
>>
>> tcp 0 0 0.0.0.0:23571 <http://0.0.0.0:23571>
>> 0.0.0.0:* LISTEN 18599/ta
>>
>> So it seems like it is listening on the correct port.
>> Additionally, I was able to connect to the Tester machine from our
>> Test Engine host using telnet. It printed the PID of the process once
>> the connection was opened.
>>
>> I tried running the "ta" application manually on the command line,
>> and it didn't print anything at all.
>> Maybe the issue is something on the Test Engine side.
>>
>> On Thu, Aug 24, 2023 at 2:35 PM Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru
>> <mailto:andrew.rybchenko@oktetlabs.ru>> wrote:
>>
>> Hi Adam,
>>
>> > On the tester host (which appears to be the Peer agent), there
>> are four processes that I see running, which look like the test
>> agent processes.
>>
>> Before the next try I'd recommend to kill these processes.
>>
>> Is there any firewall in the network or on test hosts which could
>> block incoming TCP connection to the port 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> from the host
>> where you run test engine?
>>
>> If behaviour the same on the next try and you see that test agent is
>> kept running, could you check using
>>
>> # netstat -tnlp
>>
>> that Test Agent is listening on the port and try to establish TCP
>> connection from test agent using
>>
>> $ telnet iol-dts-tester.dpdklab.iol.unh.edu
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571> 23571
>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>>
>> and check if TCP connection could be established.
>>
>> Another idea is to login Tester under root as testing does, get
>> start TA command from the log and try it by hands without -n and
>> remove extra escaping.
>>
>> # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1
>> LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1
>> /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571
>> host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=
>>
>> Hopefully in this case test agent directory remains in the /tmp and
>> you don't need to copy it as testing does.
>> May be output could shed some light on what's going on.
>>
>> Andrew.
>>
>> On 8/24/23 17:30, Adam Hassick wrote:
>>> Hi Andrew,
>>>
>>> This is the output that I see in the terminal when this failure
>>> occurs, after the test agent binaries build and the test engine
>>> starts:
>>>
>>> Platform default build - pass
>>> Simple RCF consistency check succeeded
>>> --->>> Starting Logger...done
>>> --->>> Starting RCF...rcf_net_engine_connect(): Connection timed
>>> out iol-dts-tester.dpdklab.iol.unh.edu:23571
>>> <http://iol-dts-tester.dpdklab.iol.unh.edu:23571>
>>>
>>> Then, it hangs here until I kill the "te_rcf" and "te_tee"
>>> processes. I let it hang for around 9 minutes.
>>>
>>> On the tester host (which appears to be the Peer agent), there are
>>> four processes that I see running, which look like the test agent
>>> processes.
>>>
>>> ta.Peer is an empty file. I've attached the log.txt from this run.
>>>
>>> - Adam
>>>
>>> On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko
>>> <andrew.rybchenko@oktetlabs.ru
>>> <mailto:andrew.rybchenko@oktetlabs.ru>> wrote:
>>>
>>> Hi Adam,
>>>
>>> Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked
>>> that it goes to 'copy_timeout' in ts-conf/rcf.conf.
>>> Description in in doc/sphinx/pages/group_te_engine_rcf.rst
>>> says that copy_timeout is in seconds and implementation in
>>> lib/rcfunix/rcfunix.c passes the value to select() tv_sec.
>>> Theoretically select() could be interrupted by signal, but I
>>> think it is unlikely here.
>>>
>>> I'm not sure that I understand what do you mean by RCF
>>> connection timeout. Does it happen on TE startup when RCF
>>> starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or
>>> does it happen when tests are in progress, e.g. in the middle
>>> of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most
>>> likely either host with test agent dies or test agent itself
>>> crashes. It would be easier for me if classify it if you share
>>> text log (log.txt, full or just corresponding fragment with
>>> some context). Also content of ta.DPDK or ta.Peer file
>>> depending on which agent has problems could shed some light.
>>> Corresponding files contain stdout/stderr of test agents.
>>>
>>> Andrew.
>>>
>>> On 8/23/23 17:45, Adam Hassick wrote:
>>>> Hi Andrew,
>>>>
>>>> I've set up a test rig repository here, and have created
>>>> configurations for our development testbed based off of the
>>>> examples.
>>>> We've been able to get the test suite to run manually on
>>>> Mellanox CX5 devices once.
>>>> However, we are running into an issue where, when RCF starts,
>>>> the RCF connection times out very frequently. We aren't sure
>>>> why this is the case.
>>>> It works sometimes, but most of the time when we try to run
>>>> the test engine, it encounters this issue.
>>>> I've tried changing the RCF port by setting
>>>> "TE_RCF_PORT=<some port number>" and rebooting the testbed
>>>> machines. Neither seems to fix the issue.
>>>>
>>>> It also seems like the timeout takes far longer than 60
>>>> seconds, even when running "export TE_RCFUNIX_TIMEOUT=60"
>>>> before I try to run the test suite.
>>>> I assume the unit for this variable is seconds?
>>>>
>>>> Thanks,
>>>> Adam
>>>>
>>>> On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick
>>>> <ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>> wrote:
>>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks, I've cloned the example repository and will start
>>>> setting up a configuration for our development testbed
>>>> today. I'll let you know if I run into any difficulties
>>>> or have any questions.
>>>>
>>>> - Adam
>>>>
>>>> On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru
>>>> <mailto:andrew.rybchenko@oktetlabs.ru>> wrote:
>>>>
>>>> Hi Adam,
>>>>
>>>> I've published
>>>> https://github.com/ts-factory/ts-rigs-sample
>>>> <https://github.com/ts-factory/ts-rigs-sample>.
>>>> Hopefully it will help to define your test rigs and
>>>> successfully run some tests manually. Feel free to
>>>> ask any questions and I'll answer here and try to
>>>> update documentation.
>>>>
>>>> Meanwhile I'll prepare missing bits for steps (2) and
>>>> (3).
>>>> Hopefully everything is in place for step (4), but we
>>>> need to make steps (2) and (3) first.
>>>>
>>>> Andrew.
>>>>
>>>> On 8/18/23 21:40, Andrew Rybchenko wrote:
>>>>> Hi Adam,
>>>>>
>>>>> > I've conferred with the rest of the team, and we
>>>>> think it would be best to move forward with mainly
>>>>> option B.
>>>>>
>>>>> OK, I'll provide the sample on Monday for you. It is
>>>>> almost ready right now, but I need to double-check
>>>>> it before publishing.
>>>>>
>>>>> Regards,
>>>>> Andrew.
>>>>>
>>>>> On 8/17/23 20:03, Adam Hassick wrote:
>>>>>> Hi Andrew,
>>>>>>
>>>>>> I'm adding the CI mailing list to this
>>>>>> conversation. Others in the community might find
>>>>>> this conversation valuable.
>>>>>>
>>>>>> We do want to run testing on a regular basis. The
>>>>>> Jenkins integration will be very useful for us, as
>>>>>> most of our CI is orchestrated by Jenkins.
>>>>>> I've conferred with the rest of the team, and we
>>>>>> think it would be best to move forward with mainly
>>>>>> option B.
>>>>>> If you would like to know anything about our
>>>>>> testbeds that would help you with creating an
>>>>>> example ts-rigs repo, I'd be happy to answer any
>>>>>> questions you have.
>>>>>>
>>>>>> We have multiple test rigs (we call these
>>>>>> "DUT-tester pairs") that we run our existing
>>>>>> hardware testing on, with differing network
>>>>>> hardware and CPU architecture. I figured this might
>>>>>> be an important detail.
>>>>>>
>>>>>> Thanks,
>>>>>> Adam
>>>>>>
>>>>>> On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko
>>>>>> <andrew.rybchenko@oktetlabs.ru
>>>>>> <mailto:andrew.rybchenko@oktetlabs.ru>> wrote:
>>>>>>
>>>>>> Greatings Adam,
>>>>>>
>>>>>> I'm happy to hear that you're trying to bring
>>>>>> it up.
>>>>>>
>>>>>> As I understand the final goal is to run it on
>>>>>> regular basis. So, we need to make it properly
>>>>>> from the very beginning.
>>>>>> Bring up of all features consists of 4 steps:
>>>>>>
>>>>>> 1. Create site-specific repository (we call it
>>>>>> ts-rigs) which contains information about test
>>>>>> rigs and other site-specific information like
>>>>>> where to send mails, where to store logs etc.
>>>>>> It is required for manual execution as well,
>>>>>> since test rigs description is essential. I'll
>>>>>> return to the topic below.
>>>>>>
>>>>>> 2. Setup logs storage for automated runs.
>>>>>> Basically it is a disk space plus apache2 web
>>>>>> server with few CGI scripts which help a lot to
>>>>>> save disk space.
>>>>>>
>>>>>> 3. Setup Bublik web application which provides
>>>>>> web interface to view testing results. Same as
>>>>>> https://ts-factory.io/bublik
>>>>>> <https://ts-factory.io/bublik>
>>>>>>
>>>>>> 4. Setup Jenkins to run tests on regularly,
>>>>>> save logs in log storage (2) and import it to
>>>>>> bublik (3).
>>>>>>
>>>>>> Last few month we spent on our homework to make
>>>>>> it simpler to bring up automated execution
>>>>>> using Jenkins -
>>>>>> https://github.com/ts-factory/te-jenkins
>>>>>> <https://github.com/ts-factory/te-jenkins>
>>>>>> Corresponding bits in dpdk-ethdev-ts will be
>>>>>> available tomorrow.
>>>>>>
>>>>>> Let's return to the step (1).
>>>>>>
>>>>>> Unfortunately there is no publicly available
>>>>>> example of the ts-rigs repository since
>>>>>> sensitive site-specific information is located
>>>>>> there. But I'm ready to help you to create it
>>>>>> for UNH. I see two options here:
>>>>>>
>>>>>> (A) I'll ask questions and based on your
>>>>>> answers will create the first draft with my
>>>>>> comments.
>>>>>>
>>>>>> (B) I'll make a template/example ts-rigs repo,
>>>>>> publish it and you'll create UNH ts-rigs based
>>>>>> on it.
>>>>>>
>>>>>> Of course, I'll help to debug and finally bring
>>>>>> it up in any case.
>>>>>>
>>>>>> (A) is a bit simpler for me and you, but (B) is
>>>>>> a bit more generic and will help other
>>>>>> potential users to bring it up.
>>>>>> We can combine (A)+(B). I.e. start from (A).
>>>>>> What do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew.
>>>>>>
>>>>>> On 8/17/23 15:18, Konstantin Ushakov wrote:
>>>>>>> Greetings Adam,
>>>>>>>
>>>>>>>
>>>>>>> Thanks for contacting us. I copy Andrew who
>>>>>>> would be happy to help
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Konstantin
>>>>>>>
>>>>>>>> On 16 Aug 2023, at 21:50, Adam Hassick
>>>>>>>> <ahassick@iol.unh.edu>
>>>>>>>> <mailto:ahassick@iol.unh.edu> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Greetings Konstantin,
>>>>>>>>
>>>>>>>> I am in the process of setting up the DPDK
>>>>>>>> Poll Mode Driver test suite as an addition to
>>>>>>>> our testing coverage for DPDK at the UNH lab.
>>>>>>>>
>>>>>>>> I have some questions about how to set the
>>>>>>>> test suite arguments.
>>>>>>>>
>>>>>>>> I have been able to configure the Test Engine
>>>>>>>> to connect to the hosts in the testbed. The
>>>>>>>> RCF, Configurator, and Tester all begin to
>>>>>>>> run, however the prelude of the test suite
>>>>>>>> fails to run.
>>>>>>>>
>>>>>>>> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
>>>>>>>> <https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters>
>>>>>>>>
>>>>>>>>
>>>>>>>> The documentation mentions that there are
>>>>>>>> several test parameters for the test suite,
>>>>>>>> like for the IUT test link MAC, etc. These
>>>>>>>> seem like they would need to be set somewhere
>>>>>>>> to run many of the tests.
>>>>>>>>
>>>>>>>> I see in the Test Engine documentation, there
>>>>>>>> are instructions on how to create new
>>>>>>>> parameters for test suites in the Tester
>>>>>>>> configuration, but there is nothing in the
>>>>>>>> user guide or in the Tester guide for how to
>>>>>>>> set the arguments for the parameters when
>>>>>>>> running the test suite that I can find. I'm
>>>>>>>> not sure if I need to write my own Tester
>>>>>>>> config, or if I should be setting these in
>>>>>>>> some other way.
>>>>>>>>
>>>>>>>> How should these values be set?
>>>>>>>>
>>>>>>>> I'm also not sure what environment
>>>>>>>> variables/arguments are strictly necessary or
>>>>>>>> which are optional.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Adam
>>>>>>>>
>>>>>>>> -- *Adam Hassick*
>>>>>>>> Senior Developer
>>>>>>>> UNH InterOperability Lab
>>>>>>>> ahassick@iol.unh.edu
>>>>>>>> <mailto:ahassick@iol.unh.edu>
>>>>>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>>>>>> +1 (603) 475-8248
>>>>>>
>>>>>>
>>>>>>
>>>>>> -- *Adam Hassick*
>>>>>> Senior Developer
>>>>>> UNH InterOperability Lab
>>>>>> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
>>>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>>>> +1 (603) 475-8248
>>>>>
>>>>
>>>>
>>>>
>>>> -- *Adam Hassick*
>>>> Senior Developer
>>>> UNH InterOperability Lab
>>>> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>> +1 (603) 475-8248
>>>>
>>>>
>>>>
>>>> -- *Adam Hassick*
>>>> Senior Developer
>>>> UNH InterOperability Lab
>>>> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>> +1 (603) 475-8248
>>>
>>>
>>>
>>> -- *Adam Hassick*
>>> Senior Developer
>>> UNH InterOperability Lab
>>> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
>>> iol.unh.edu <https://www.iol.unh.edu/>
>>> +1 (603) 475-8248
>>
>>
>>
>> --
>> *Adam Hassick*
>> Senior Developer
>> UNH InterOperability Lab
>> ahassick@iol.unh.edu <mailto:ahassick@iol.unh.edu>
>> iol.unh.edu <https://www.iol.unh.edu/>
>> +1 (603) 475-8248
>
[-- Attachment #2: Type: text/html, Size: 39090 bytes --]
next prev parent reply other threads:[~2023-08-25 13:57 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAC-YWqiQfH4Rx-Et1jGHhGK9i47d0AArKy-B2P77iYbbM+Lpig@mail.gmail.com>
[not found] ` <C3B08390-DA6D-4BDC-BBD7-98561F92FE33@oktetlabs.ru>
[not found] ` <35340484-1d7e-7e5f-cad4-c965ba541397@oktetlabs.ru>
2023-08-17 17:03 ` Adam Hassick
2023-08-18 18:40 ` Andrew Rybchenko
2023-08-20 8:40 ` Andrew Rybchenko
2023-08-21 14:19 ` Adam Hassick
2023-08-23 14:45 ` Adam Hassick
2023-08-24 8:22 ` Andrew Rybchenko
2023-08-24 14:30 ` Adam Hassick
2023-08-24 18:34 ` Andrew Rybchenko
2023-08-24 20:29 ` Adam Hassick
2023-08-24 20:54 ` Andrew Rybchenko
2023-08-25 13:57 ` Andrew Rybchenko [this message]
2023-08-25 14:06 ` Adam Hassick
2023-08-25 14:41 ` Andrew Rybchenko
2023-08-25 17:35 ` Andrew Rybchenko
2023-08-28 15:02 ` Adam Hassick
2023-08-28 21:05 ` Andrew Rybchenko
2023-08-29 12:07 ` Andrew Rybchenko
2023-08-29 14:02 ` Adam Hassick
2023-08-29 20:43 ` Andrew Rybchenko
2023-08-31 19:38 ` Adam Hassick
2023-09-01 7:59 ` Andrew Rybchenko
2023-09-05 15:01 ` Adam Hassick
2023-09-06 11:36 ` Andrew Rybchenko
2023-09-06 15:00 ` Adam Hassick
2023-09-08 14:57 ` Adam Hassick
2023-09-13 15:45 ` Andrew Rybchenko
2023-09-18 6:15 ` Andrew Rybchenko
2023-09-18 6:23 ` Konstantin Ushakov
2023-09-18 6:26 ` Andrew Rybchenko
2023-09-18 14:44 ` Adam Hassick
2023-09-18 15:04 ` Andrew Rybchenko
2023-10-04 13:48 ` Adam Hassick
2023-10-05 10:25 ` Andrew Rybchenko
2023-10-10 14:09 ` Adam Hassick
2023-10-11 11:46 ` Andrew Rybchenko
2023-10-23 11:11 ` Andrew Rybchenko
2023-10-25 20:27 ` Adam Hassick
2023-10-26 12:19 ` Andrew Rybchenko
2023-10-26 17:44 ` Adam Hassick
2023-10-27 8:01 ` Andrew Rybchenko
2023-10-27 19:13 ` Andrew Rybchenko
2023-11-06 23:16 ` Adam Hassick
2023-11-07 16:57 ` Andrew Rybchenko
2023-11-07 20:30 ` Adam Hassick
2023-11-08 7:20 ` Andrew Rybchenko
2023-11-16 20:03 ` Adam Hassick
2023-11-16 20:38 ` DPDK Coverity test run Mcnamara, John
2023-11-16 20:43 ` Patrick Robb
2023-11-16 20:56 ` Mcnamara, John
2023-11-20 17:18 ` Setting up DPDK PMD Test Suite Andrew Rybchenko
2023-12-01 14:39 ` Andrew Rybchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc758ad7-2f40-7c6c-ffab-c574011ba770@oktetlabs.ru \
--to=andrew.rybchenko@oktetlabs.ru \
--cc=Konstantin.Ushakov@oktetlabs.ru \
--cc=ahassick@iol.unh.edu \
--cc=ci@dpdk.org \
--cc=probb@iol.unh.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).