DPDK CI discussions
 help / color / mirror / Atom feed
From: Adam Hassick <ahassick@iol.unh.edu>
To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Cc: Patrick Robb <probb@iol.unh.edu>,
	Konstantin Ushakov <Konstantin.Ushakov@oktetlabs.ru>,
	ci@dpdk.org
Subject: Re: Setting up DPDK PMD Test Suite
Date: Thu, 24 Aug 2023 10:30:09 -0400	[thread overview]
Message-ID: <CAC-YWqgvgKRgffwWY3mqWbC8o-LJ_o0BkRuuBxDSqg8Pnj0h1Q@mail.gmail.com> (raw)
In-Reply-To: <7734826a-840d-d0d9-e7a5-91951223398c@oktetlabs.ru>


[-- Attachment #1.1: Type: text/plain, Size: 9906 bytes --]

Hi Andrew,

This is the output that I see in the terminal when this failure occurs,
after the test agent binaries build and the test engine starts:

Platform default build - pass
Simple RCF consistency check succeeded
--->>> Starting Logger...done
--->>> Starting RCF...rcf_net_engine_connect(): Connection timed out
iol-dts-tester.dpdklab.iol.unh.edu:23571

Then, it hangs here until I kill the "te_rcf" and "te_tee" processes. I let
it hang for around 9 minutes.

On the tester host (which appears to be the Peer agent), there are four
processes that I see running, which look like the test agent processes.

ta.Peer is an empty file. I've attached the log.txt from this run.

 - Adam

On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko <
andrew.rybchenko@oktetlabs.ru> wrote:

> Hi Adam,
>
> Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked that it goes to
> 'copy_timeout' in ts-conf/rcf.conf.
> Description in in doc/sphinx/pages/group_te_engine_rcf.rst says that
> copy_timeout is in seconds and implementation in lib/rcfunix/rcfunix.c
> passes the value to select() tv_sec. Theoretically select() could be
> interrupted by signal, but I think it is unlikely here.
>
> I'm not sure that I understand what do you mean by RCF connection timeout.
> Does it happen on TE startup when RCF starts test agents. If so,
> TE_RCFUNIX_TIMEOUT could help. Or does it happen when tests are in
> progress, e.g. in the middle of a test. If so, TE_RCFUNIX_TIMEOUT is
> unrelated and most likely either host with test agent dies or test agent
> itself crashes. It would be easier for me if classify it if you share text
> log (log.txt, full or just corresponding fragment with some context). Also
> content of ta.DPDK or ta.Peer file depending on which agent has problems
> could shed some light. Corresponding files contain stdout/stderr of test
> agents.
>
> Andrew.
>
> On 8/23/23 17:45, Adam Hassick wrote:
>
> Hi Andrew,
>
> I've set up a test rig repository here, and have created configurations
> for our development testbed based off of the examples.
> We've been able to get the test suite to run manually on Mellanox CX5
> devices once.
> However, we are running into an issue where, when RCF starts, the RCF
> connection times out very frequently. We aren't sure why this is the case.
> It works sometimes, but most of the time when we try to run the test
> engine, it encounters this issue.
> I've tried changing the RCF port by setting "TE_RCF_PORT=<some port
> number>" and rebooting the testbed machines. Neither seems to fix the issue.
>
> It also seems like the timeout takes far longer than 60 seconds, even when
> running "export TE_RCFUNIX_TIMEOUT=60" before I try to run the test suite.
> I assume the unit for this variable is seconds?
>
> Thanks,
> Adam
>
> On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick <ahassick@iol.unh.edu>
> wrote:
>
>> Hi Andrew,
>>
>> Thanks, I've cloned the example repository and will start setting up a
>> configuration for our development testbed today. I'll let you know if I run
>> into any difficulties or have any questions.
>>
>>  - Adam
>>
>> On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko <
>> andrew.rybchenko@oktetlabs.ru> wrote:
>>
>>> Hi Adam,
>>>
>>> I've published https://github.com/ts-factory/ts-rigs-sample. Hopefully
>>> it will help to define your test rigs and successfully run some tests
>>> manually. Feel free to ask any questions and I'll answer here and try to
>>> update documentation.
>>>
>>> Meanwhile I'll prepare missing bits for steps (2) and (3).
>>> Hopefully everything is in place for step (4), but we need to make steps
>>> (2) and (3) first.
>>>
>>> Andrew.
>>>
>>> On 8/18/23 21:40, Andrew Rybchenko wrote:
>>>
>>> Hi Adam,
>>>
>>> > I've conferred with the rest of the team, and we think it would be
>>> best to move forward with mainly option B.
>>>
>>> OK, I'll provide the sample on Monday for you. It is almost ready right
>>> now, but I need to double-check it before publishing.
>>>
>>> Regards,
>>> Andrew.
>>>
>>> On 8/17/23 20:03, Adam Hassick wrote:
>>>
>>> Hi Andrew,
>>>
>>> I'm adding the CI mailing list to this conversation. Others in the
>>> community might find this conversation valuable.
>>>
>>> We do want to run testing on a regular basis. The Jenkins integration
>>> will be very useful for us, as most of our CI is orchestrated by Jenkins.
>>> I've conferred with the rest of the team, and we think it would be best
>>> to move forward with mainly option B.
>>> If you would like to know anything about our testbeds that would help
>>> you with creating an example ts-rigs repo, I'd be happy to answer any
>>> questions you have.
>>>
>>> We have multiple test rigs (we call these "DUT-tester pairs") that we
>>> run our existing hardware testing on, with differing network hardware and
>>> CPU architecture. I figured this might be an important detail.
>>>
>>> Thanks,
>>> Adam
>>>
>>> On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko <
>>> andrew.rybchenko@oktetlabs.ru> wrote:
>>>
>>>> Greatings Adam,
>>>>
>>>> I'm happy to hear that you're trying to bring it up.
>>>>
>>>> As I understand the final goal is to run it on regular basis. So, we
>>>> need to make it properly from the very beginning.
>>>> Bring up of all features consists of 4 steps:
>>>>
>>>> 1. Create site-specific repository (we call it ts-rigs) which contains
>>>> information about test rigs and other site-specific information like where
>>>> to send mails, where to store logs etc. It is required for manual execution
>>>> as well, since test rigs description is essential. I'll return to the topic
>>>> below.
>>>>
>>>> 2. Setup logs storage for automated runs. Basically it is a disk space
>>>> plus apache2 web server with few CGI scripts which help a lot to save disk
>>>> space.
>>>>
>>>> 3. Setup Bublik web application which provides web interface to view
>>>> testing results. Same as https://ts-factory.io/bublik
>>>>
>>>> 4. Setup Jenkins to run tests on regularly, save logs in log storage
>>>> (2) and import it to bublik (3).
>>>>
>>>> Last few month we spent on our homework to make it simpler to bring up
>>>> automated execution using Jenkins -
>>>> https://github.com/ts-factory/te-jenkins
>>>> Corresponding bits in dpdk-ethdev-ts will be available tomorrow.
>>>>
>>>> Let's return to the step (1).
>>>>
>>>> Unfortunately there is no publicly available example of the ts-rigs
>>>> repository since sensitive site-specific information is located there. But
>>>> I'm ready to help you to create it for UNH. I see two options here:
>>>>
>>>> (A) I'll ask questions and based on your answers will create the first
>>>> draft with my comments.
>>>>
>>>> (B) I'll make a template/example ts-rigs repo, publish it and you'll
>>>> create UNH ts-rigs based on it.
>>>>
>>>> Of course, I'll help to debug and finally bring it up in any case.
>>>>
>>>> (A) is a bit simpler for me and you, but (B) is a bit more generic and
>>>> will help other potential users to bring it up.
>>>> We can combine (A)+(B). I.e. start from (A). What do you think?
>>>>
>>>> Thanks,
>>>> Andrew.
>>>>
>>>> On 8/17/23 15:18, Konstantin Ushakov wrote:
>>>>
>>>> Greetings Adam,
>>>>
>>>>
>>>> Thanks for contacting us. I copy Andrew who would be happy to help
>>>>
>>>> Thanks,
>>>> Konstantin
>>>>
>>>> On 16 Aug 2023, at 21:50, Adam Hassick <ahassick@iol.unh.edu>
>>>> <ahassick@iol.unh.edu> wrote:
>>>>
>>>> 
>>>> Greetings Konstantin,
>>>>
>>>> I am in the process of setting up the DPDK Poll Mode Driver test suite
>>>> as an addition to our testing coverage for DPDK at the UNH lab.
>>>>
>>>> I have some questions about how to set the test suite arguments.
>>>>
>>>> I have been able to configure the Test Engine to connect to the hosts
>>>> in the testbed. The RCF, Configurator, and Tester all begin to run, however
>>>> the prelude of the test suite fails to run.
>>>>
>>>> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters
>>>>
>>>> The documentation mentions that there are several test parameters for
>>>> the test suite, like for the IUT test link MAC, etc. These seem like they
>>>> would need to be set somewhere to run many of the tests.
>>>>
>>>> I see in the Test Engine documentation, there are instructions on how
>>>> to create new parameters for test suites in the Tester configuration, but
>>>> there is nothing in the user guide or in the Tester guide for how to set
>>>> the arguments for the parameters when running the test suite that I can
>>>> find. I'm not sure if I need to write my own Tester config, or if I should
>>>> be setting these in some other way.
>>>>
>>>> How should these values be set?
>>>>
>>>> I'm also not sure what environment variables/arguments are strictly
>>>> necessary or which are optional.
>>>>
>>>> Regards,
>>>> Adam
>>>>
>>>> --
>>>> *Adam Hassick*
>>>> Senior Developer
>>>> UNH InterOperability Lab
>>>> ahassick@iol.unh.edu
>>>> iol.unh.edu <https://www.iol.unh.edu/>
>>>> +1 (603) 475-8248
>>>>
>>>>
>>>>
>>>
>>> --
>>> *Adam Hassick*
>>> Senior Developer
>>> UNH InterOperability Lab
>>> ahassick@iol.unh.edu
>>> iol.unh.edu <https://www.iol.unh.edu/>
>>> +1 (603) 475-8248
>>>
>>>
>>>
>>>
>>
>> --
>> *Adam Hassick*
>> Senior Developer
>> UNH InterOperability Lab
>> ahassick@iol.unh.edu
>> iol.unh.edu <https://www.iol.unh.edu/>
>> +1 (603) 475-8248
>>
>
>
> --
> *Adam Hassick*
> Senior Developer
> UNH InterOperability Lab
> ahassick@iol.unh.edu
> iol.unh.edu <https://www.iol.unh.edu/>
> +1 (603) 475-8248
>
>
>

-- 
*Adam Hassick*
Senior Developer
UNH InterOperability Lab
ahassick@iol.unh.edu
iol.unh.edu <https://www.iol.unh.edu/>
+1 (603) 475-8248

[-- Attachment #1.2: Type: text/html, Size: 27971 bytes --]

[-- Attachment #2: log.txt --]
[-- Type: text/plain, Size: 3752 bytes --]

Log report
~~~~~~~~~~

RING  Dispatcher  Command-line options  13:55:53.387
--conf-dirs=/opt/tsf/dpdk-ethdev-ts/conf:/opt/tsf/ts-rigs:/opt/tsf/ts-conf --trc-db=/opt/tsf/dpdk-ethdev-ts/trc/top.xml --trc-comparison=normalised --trc-html=trc-brief.html --trc-no-expected --trc-no-total --trc-no-unspec --trc-keep-artifacts --opts=run/iol-dts-mcx5 --opts=opts.ts

RING  Dispatcher  Expanded command-line options  13:55:53.402
 --conf-dirs=/opt/tsf/dpdk-ethdev-ts/conf:/opt/tsf/ts-rigs:/opt/tsf/ts-conf --trc-db=/opt/tsf/dpdk-ethdev-ts/trc/top.xml --trc-comparison=normalised --trc-html=trc-brief.html --trc-no-expected --trc-no-total --trc-no-unspec --trc-keep-artifacts --script=env/iol-dts --script=env/mlx-cx5 --script=scripts/iut.h1 --script=scripts/iut.h1-mcx5 --conf-cs=cs/dpdk-pmd-ts.yml --script=scripts/ta-def --script=scripts/defaults --tester-script=scripts/dpdk-trc-tags --tester-script=scripts/os-trc-tags --script=scripts/net-modules --script=scripts/iut-net-driver-loaded --script=scripts/disable_unused_agts

RING  Dispatcher  Start  14:01:03.417
Starting TEN applications

RING  Dispatcher  Start  14:01:03.435
Start Logger:  /opt/tsf/ts-rigs/logger.conf

RING  Logger  Cfg file  14:01:03.461
Opening config file: /opt/tsf/ts-rigs/logger.conf

RING  Logger  Log streaming  14:01:03.462
Current listeners configuration:
Listeners:
Filters:

RING  Dispatcher  Start  14:01:03.481
Start RCF:  /opt/tsf/ts-conf/rcf.conf

RING  RCF  RCF Unix  14:01:03.487
Starting TA 'Peer' type 'linux_x86_64_linux_gnu__glibc2_35__kernel5_15_0_79__cpu_avx512bw__cpu_bmi2' conf_str 'host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell='

RING  RCF  RCF Unix  14:01:03.487
CMD to copy: ssh -qxTn -o BatchMode=yes -p 22 -i /opt/tsf/keys/id_ed25519  -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@iol-dts-tester.dpdklab.iol.unh.edu "mkdir /tmp/linux_x86_root_76872_1692885663_1" && echo put /opt/tsf/dpdk-ethdev-ts/ts/inst/agents/linux_x86_64_linux_gnu__glibc2_35__kernel5_15_0_79__cpu_avx512bw__cpu_bmi2//. /tmp/linux_x86_root_76872_1692885663_1 | sftp -rpq -P 22 -i /opt/tsf/keys/id_ed25519  -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@iol-dts-tester.dpdklab.iol.unh.edu

RING  RCF  RCF Unix  14:01:05.063
Command to detect shell name: ssh -qxTn -o BatchMode=yes -p 22 -i /opt/tsf/keys/id_ed25519  -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@iol-dts-tester.dpdklab.iol.unh.edu "echo -n \$SHELL"

RING  RCF  RCF Unix  14:01:05.321
Shell is: /bin/bash

RING  RCF  RCF Unix  14:01:05.321
Command to start TA: ssh -qxTn -o BatchMode=yes -p 22 -i /opt/tsf/keys/id_ed25519  -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@iol-dts-tester.dpdklab.iol.unh.edu "sudo -n PATH=\${PATH}:/tmp/linux_x86_root_76872_1692885663_1 LD_LIBRARY_PATH=\${LD_LIBRARY_PATH}\${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1 /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571 host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=" 2>&1 | te_tee RCF Peer 10 >ta.Peer 

WARN  RCF  RCF Unix  14:11:16.984
Connecting to TA Peer iol-dts-tester.dpdklab.iol.unh.edu:23571 failed (COMM-ETIMEDOUT) - connect again after delay

RING  RCF  RCF Unix  14:11:16.984
Sleeping 1 seconds

RING  Dispatcher  Start  14:20:25.622
Shutdown Logger

RING  Logger  Self  14:20:25.624
Logger shutdown ...

WARN  Logger  Self  14:20:25.624
Logger is shut down without polling of TAs

WARN  Logger  Log streaming  14:20:25.624
Not all messages in listener queue have been processed

RING  Logger  Self  14:20:25.624
Shutdown is completed

  reply	other threads:[~2023-08-24 14:29 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAC-YWqiQfH4Rx-Et1jGHhGK9i47d0AArKy-B2P77iYbbM+Lpig@mail.gmail.com>
     [not found] ` <C3B08390-DA6D-4BDC-BBD7-98561F92FE33@oktetlabs.ru>
     [not found]   ` <35340484-1d7e-7e5f-cad4-c965ba541397@oktetlabs.ru>
2023-08-17 17:03     ` Adam Hassick
2023-08-18 18:40       ` Andrew Rybchenko
2023-08-20  8:40         ` Andrew Rybchenko
2023-08-21 14:19           ` Adam Hassick
2023-08-23 14:45             ` Adam Hassick
2023-08-24  8:22               ` Andrew Rybchenko
2023-08-24 14:30                 ` Adam Hassick [this message]
2023-08-24 18:34                   ` Andrew Rybchenko
2023-08-24 20:29                     ` Adam Hassick
2023-08-24 20:54                       ` Andrew Rybchenko
2023-08-25 13:57                         ` Andrew Rybchenko
2023-08-25 14:06                           ` Adam Hassick
2023-08-25 14:41                             ` Andrew Rybchenko
2023-08-25 17:35                               ` Andrew Rybchenko
2023-08-28 15:02                                 ` Adam Hassick
2023-08-28 21:05                                   ` Andrew Rybchenko
2023-08-29 12:07                                     ` Andrew Rybchenko
2023-08-29 14:02                                       ` Adam Hassick
2023-08-29 20:43                                         ` Andrew Rybchenko
2023-08-31 19:38                                           ` Adam Hassick
2023-09-01  7:59                                             ` Andrew Rybchenko
2023-09-05 15:01                                               ` Adam Hassick
2023-09-06 11:36                                                 ` Andrew Rybchenko
2023-09-06 15:00                                                   ` Adam Hassick
2023-09-08 14:57                                                     ` Adam Hassick
2023-09-13 15:45                                                       ` Andrew Rybchenko
2023-09-18  6:15                                                         ` Andrew Rybchenko
2023-09-18  6:23                                                           ` Konstantin Ushakov
2023-09-18  6:26                                                             ` Andrew Rybchenko
2023-09-18 14:44                                                               ` Adam Hassick
2023-09-18 15:04                                                                 ` Andrew Rybchenko
2023-10-04 13:48                                                                   ` Adam Hassick
2023-10-05 10:25                                                                     ` Andrew Rybchenko
2023-10-10 14:09                                                                       ` Adam Hassick
2023-10-11 11:46                                                                         ` Andrew Rybchenko
2023-10-23 11:11                                                                         ` Andrew Rybchenko
2023-10-25 20:27                                                                           ` Adam Hassick
2023-10-26 12:19                                                                             ` Andrew Rybchenko
2023-10-26 17:44                                                                               ` Adam Hassick
2023-10-27  8:01                                                                                 ` Andrew Rybchenko
2023-10-27 19:13                                                                                 ` Andrew Rybchenko
2023-11-06 23:16                                                                                   ` Adam Hassick
2023-11-07 16:57                                                                                     ` Andrew Rybchenko
2023-11-07 20:30                                                                                       ` Adam Hassick
2023-11-08  7:20                                                                                         ` Andrew Rybchenko
2023-11-16 20:03                                                                                           ` Adam Hassick
2023-11-16 20:38                                                                                             ` DPDK Coverity test run Mcnamara, John
2023-11-16 20:43                                                                                               ` Patrick Robb
2023-11-16 20:56                                                                                                 ` Mcnamara, John
2023-11-20 17:18                                                                                             ` Setting up DPDK PMD Test Suite Andrew Rybchenko
2023-12-01 14:39                                                                                               ` Andrew Rybchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAC-YWqgvgKRgffwWY3mqWbC8o-LJ_o0BkRuuBxDSqg8Pnj0h1Q@mail.gmail.com \
    --to=ahassick@iol.unh.edu \
    --cc=Konstantin.Ushakov@oktetlabs.ru \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=ci@dpdk.org \
    --cc=probb@iol.unh.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).