From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EC419430F3; Thu, 24 Aug 2023 20:35:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0D4E540EE1; Thu, 24 Aug 2023 20:35:04 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id 405C04067B for ; Thu, 24 Aug 2023 20:35:02 +0200 (CEST) Received: from [192.168.1.39] (unknown [188.170.85.179]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 0C07F66; Thu, 24 Aug 2023 21:35:00 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 0C07F66 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1692902101; bh=ftz85QMj1mnZm8AkCo/691p/Dh41fVxMRMZ6A8GcDp0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=WnaGm5PZUKTxZs2Xfgx7vyhnszlX82ruKN2OwO8TbCymKhOLRDjggeT66tkh2eyGv 5qBaoFIiqt3P894jMsliumtm5B24c7GUmEqZPP1QNVrws00T96nvcYkG2xaC8X6XqO UVbMZ0uuUAvTKWyjzGhARv6JM7XRYT1WoJMwTksk= Content-Type: multipart/alternative; boundary="------------Z74Scc2fXu00OkJoVIboLM1k" Message-ID: <9d920676-485d-3b4d-ca20-2b5ea3a5b606@oktetlabs.ru> Date: Thu, 24 Aug 2023 21:34:54 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: Setting up DPDK PMD Test Suite Content-Language: en-US To: Adam Hassick Cc: Patrick Robb , Konstantin Ushakov , ci@dpdk.org References: <35340484-1d7e-7e5f-cad4-c965ba541397@oktetlabs.ru> <9ce9d7fd-4051-6d51-26bb-7e96e98c677e@oktetlabs.ru> <781ca146-955f-85af-5727-66015ae1d326@oktetlabs.ru> <7734826a-840d-d0d9-e7a5-91951223398c@oktetlabs.ru> From: Andrew Rybchenko In-Reply-To: X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org This is a multi-part message in MIME format. --------------Z74Scc2fXu00OkJoVIboLM1k Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi Adam, > On the tester host (which appears to be the Peer agent), there are four processes that I see running, which look like the test agent processes. Before the next try I'd recommend to kill these processes. Is there any firewall in the network or on test hosts which could block incoming TCP connection to the port 23571 from the host where you run test engine? If behaviour the same on the next try and you see that test agent is kept running, could you check using # netstat -tnlp that Test Agent is listening on the port and try to establish TCP connection from test agent using $ telnet iol-dts-tester.dpdklab.iol.unh.edu 23571 and check if TCP connection could be established. Another idea is to login Tester under root as testing does, get start TA command from the log and try it by hands without -n and remove extra escaping. # sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1 LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1 /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571 host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell= Hopefully in this case test agent directory remains in the /tmp and you don't need to copy it as testing does. May be output could shed some light on what's going on. Andrew. On 8/24/23 17:30, Adam Hassick wrote: > Hi Andrew, > > This is the output that I see in the terminal when this failure > occurs, after the test agent binaries build and the test engine starts: > > Platform default build - pass > Simple RCF consistency check succeeded > --->>> Starting Logger...done > --->>> Starting RCF...rcf_net_engine_connect(): Connection timed out > iol-dts-tester.dpdklab.iol.unh.edu:23571 > > > Then, it hangs here until I kill the "te_rcf" and "te_tee" processes. > I let it hang for around 9 minutes. > > On the tester host (which appears to be the Peer agent), there are > four processes that I see running, which look like the test agent > processes. > > ta.Peer is an empty file. I've attached the log.txt from this run. > >  - Adam > > On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko > wrote: > > Hi Adam, > > Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked that it > goes to 'copy_timeout' in ts-conf/rcf.conf. > Description in in doc/sphinx/pages/group_te_engine_rcf.rst says > that copy_timeout is in seconds and implementation in > lib/rcfunix/rcfunix.c passes the value to select() tv_sec. > Theoretically select() could be interrupted by signal, but I think > it is unlikely here. > > I'm not sure that I understand what do you mean by RCF connection > timeout. Does it happen on TE startup when RCF starts test agents. > If so, TE_RCFUNIX_TIMEOUT could help. Or does it happen when tests > are in progress, e.g. in the middle of a test. If so, > TE_RCFUNIX_TIMEOUT is unrelated and most likely either host with > test agent dies or test agent itself crashes. It would be easier > for me if classify it if you share text log (log.txt, full or just > corresponding fragment with some context). Also content of ta.DPDK > or ta.Peer file depending on which agent has problems could shed > some light. Corresponding files contain stdout/stderr of test agents. > > Andrew. > > On 8/23/23 17:45, Adam Hassick wrote: >> Hi Andrew, >> >> I've set up a test rig repository here, and have created >> configurations for our development testbed based off of the examples. >> We've been able to get the test suite to run manually on Mellanox >> CX5 devices once. >> However, we are running into an issue where, when RCF starts, the >> RCF connection times out very frequently. We aren't sure why this >> is the case. >> It works sometimes, but most of the time when we try to run the >> test engine, it encounters this issue. >> I've tried changing the RCF port by setting "TE_RCF_PORT=> port number>" and rebooting the testbed machines. Neither seems >> to fix the issue. >> >> It also seems like the timeout takes far longer than 60 seconds, >> even when running "export TE_RCFUNIX_TIMEOUT=60" before I try to >> run the test suite. >> I assume the unit for this variable is seconds? >> >> Thanks, >> Adam >> >> On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick >> wrote: >> >> Hi Andrew, >> >> Thanks, I've cloned the example repository and will start >> setting up a configuration for our development testbed today. >> I'll let you know if I run into any difficulties or have any >> questions. >> >>  - Adam >> >> On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko >> wrote: >> >> Hi Adam, >> >> I've published >> https://github.com/ts-factory/ts-rigs-sample. Hopefully >> it will help to define your test rigs and successfully >> run some tests manually. Feel free to ask any questions >> and I'll answer here and try to update documentation. >> >> Meanwhile I'll prepare missing bits for steps (2) and (3). >> Hopefully everything is in place for step (4), but we >> need to make steps (2) and (3) first. >> >> Andrew. >> >> On 8/18/23 21:40, Andrew Rybchenko wrote: >>> Hi Adam, >>> >>> > I've conferred with the rest of the team, and we think >>> it would be best to move forward with mainly option B. >>> >>> OK, I'll provide the sample on Monday for you. It is >>> almost ready right now, but I need to double-check it >>> before publishing. >>> >>> Regards, >>> Andrew. >>> >>> On 8/17/23 20:03, Adam Hassick wrote: >>>> Hi Andrew, >>>> >>>> I'm adding the CI mailing list to this conversation. >>>> Others in the community might find this conversation >>>> valuable. >>>> >>>> We do want to run testing on a regular basis. The >>>> Jenkins integration will be very useful for us, as most >>>> of our CI is orchestrated by Jenkins. >>>> I've conferred with the rest of the team, and we think >>>> it would be best to move forward with mainly option B. >>>> If you would like to know anything about our testbeds >>>> that would help you with creating an example ts-rigs >>>> repo, I'd be happy to answer any questions you have. >>>> >>>> We have multiple test rigs (we call these "DUT-tester >>>> pairs") that we run our existing hardware testing on, >>>> with differing network hardware and CPU architecture. I >>>> figured this might be an important detail. >>>> >>>> Thanks, >>>> Adam >>>> >>>> On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko >>>> wrote: >>>> >>>> Greatings Adam, >>>> >>>> I'm happy to hear that you're trying to bring it up. >>>> >>>> As I understand the final goal is to run it on >>>> regular basis. So, we need to make it properly from >>>> the very beginning. >>>> Bring up of all features consists of 4 steps: >>>> >>>> 1. Create site-specific repository (we call it >>>> ts-rigs) which contains information about test rigs >>>> and other site-specific information like where to >>>> send mails, where to store logs etc. It is required >>>> for manual execution as well, since test rigs >>>> description is essential. I'll return to the topic >>>> below. >>>> >>>> 2. Setup logs storage for automated runs. Basically >>>> it is a disk space plus apache2 web server with few >>>> CGI scripts which help a lot to save disk space. >>>> >>>> 3. Setup Bublik web application which provides web >>>> interface to view testing results. Same as >>>> https://ts-factory.io/bublik >>>> >>>> 4. Setup Jenkins to run tests on regularly, save >>>> logs in log storage (2) and import it to bublik (3). >>>> >>>> Last few month we spent on our homework to make it >>>> simpler to bring up automated execution using >>>> Jenkins - https://github.com/ts-factory/te-jenkins >>>> Corresponding bits in dpdk-ethdev-ts will be >>>> available tomorrow. >>>> >>>> Let's return to the step (1). >>>> >>>> Unfortunately there is no publicly available >>>> example of the ts-rigs repository since sensitive >>>> site-specific information is located there. But I'm >>>> ready to help you to create it for UNH. I see two >>>> options here: >>>> >>>> (A) I'll ask questions and based on your answers >>>> will create the first draft with my comments. >>>> >>>> (B) I'll make a template/example ts-rigs repo, >>>> publish it and you'll create UNH ts-rigs based on it. >>>> >>>> Of course, I'll help to debug and finally bring it >>>> up in any case. >>>> >>>> (A) is a bit simpler for me and you, but (B) is a >>>> bit more generic and will help other potential >>>> users to bring it up. >>>> We can combine (A)+(B). I.e. start from (A). What >>>> do you think? >>>> >>>> Thanks, >>>> Andrew. >>>> >>>> On 8/17/23 15:18, Konstantin Ushakov wrote: >>>>> Greetings Adam, >>>>> >>>>> >>>>> Thanks for contacting us. I copy Andrew who would >>>>> be happy to help >>>>> >>>>> Thanks, >>>>> Konstantin >>>>> >>>>>> On 16 Aug 2023, at 21:50, Adam Hassick >>>>>> >>>>>> wrote: >>>>>> >>>>>>  >>>>>> Greetings Konstantin, >>>>>> >>>>>> I am in the process of setting up the DPDK Poll >>>>>> Mode Driver test suite as an addition to our >>>>>> testing coverage for DPDK at the UNH lab. >>>>>> >>>>>> I have some questions about how to set the test >>>>>> suite arguments. >>>>>> >>>>>> I have been able to configure the Test Engine to >>>>>> connect to the hosts in the testbed. The RCF, >>>>>> Configurator, and Tester all begin to run, >>>>>> however the prelude of the test suite fails to run. >>>>>> >>>>>> https://ts-factory.io/doc/dpdk-ethdev-ts/index.html#test-parameters >>>>>> >>>>>> The documentation mentions that there are several >>>>>> test parameters for the test suite, like for the >>>>>> IUT test link MAC, etc. These seem like they >>>>>> would need to be set somewhere to run many of the >>>>>> tests. >>>>>> >>>>>> I see in the Test Engine documentation, there are >>>>>> instructions on how to create new parameters for >>>>>> test suites in the Tester configuration, but >>>>>> there is nothing in the user guide or in the >>>>>> Tester guide for how to set the arguments for the >>>>>> parameters when running the test suite that I can >>>>>> find. I'm not sure if I need to write my own >>>>>> Tester config, or if I should be setting these in >>>>>> some other way. >>>>>> >>>>>> How should these values be set? >>>>>> >>>>>> I'm also not sure what environment >>>>>> variables/arguments are strictly necessary or >>>>>> which are optional. >>>>>> >>>>>> Regards, >>>>>> Adam >>>>>> >>>>>> -- >>>>>> *Adam Hassick* >>>>>> Senior Developer >>>>>> UNH InterOperability Lab >>>>>> ahassick@iol.unh.edu >>>>>> iol.unh.edu >>>>>> +1 (603) 475-8248 >>>> >>>> >>>> >>>> -- >>>> *Adam Hassick* >>>> Senior Developer >>>> UNH InterOperability Lab >>>> ahassick@iol.unh.edu >>>> iol.unh.edu >>>> +1 (603) 475-8248 >>> >> >> >> >> -- >> *Adam Hassick* >> Senior Developer >> UNH InterOperability Lab >> ahassick@iol.unh.edu >> iol.unh.edu >> +1 (603) 475-8248 >> >> >> >> -- >> *Adam Hassick* >> Senior Developer >> UNH InterOperability Lab >> ahassick@iol.unh.edu >> iol.unh.edu >> +1 (603) 475-8248 > > > > -- > *Adam Hassick* > Senior Developer > UNH InterOperability Lab > ahassick@iol.unh.edu > iol.unh.edu > +1 (603) 475-8248 --------------Z74Scc2fXu00OkJoVIboLM1k Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
Hi Adam,

> On the tester host (which appears to be the Peer agent), there are four processes that I see running, which look like the test agent processes.

Before the next try I'd recommend to kill these processes.

Is there any firewall in the network or on test hosts which could block incoming TCP connection to the port 23571 from the host where you run test engine?

If behaviour the same on the next try and you see that test agent is kept running, could you check using

# netstat -tnlp

that Test Agent is listening on the port and try to establish TCP connection from test agent using

$ telnet iol-dts-tester.dpdklab.iol.unh.edu 23571

and check if TCP connection could be established.

Another idea is to login Tester under root as testing does, get start TA command from the log and try it by hands without -n and remove extra escaping.

# sudo PATH=${PATH}:/tmp/linux_x86_root_76872_1692885663_1 LD_LIBRARY_PATH=${LD_LIBRARY_PATH}${LD_LIBRARY_PATH:+:}/tmp/linux_x86_root_76872_1692885663_1 /tmp/linux_x86_root_76872_1692885663_1/ta Peer 23571 host=iol-dts-tester.dpdklab.iol.unh.edu:port=23571:user=root:key=/opt/tsf/keys/id_ed25519:ssh_port=22:copy_timeout=15:kill_timeout=15:sudo=:shell=

Hopefully in this case test agent directory remains in the /tmp and you don't need to copy it as testing does.
May be output could shed some light on what's going on.

Andrew.

On 8/24/23 17:30, Adam Hassick wrote:
Hi Andrew,

This is the output that I see in the terminal when this failure occurs, after the test agent binaries build and the test engine starts:

Platform default build - pass
Simple RCF consistency check succeeded
--->>> Starting Logger...done
--->>> Starting RCF...rcf_net_engine_connect(): Connection timed out iol-dts-tester.dpdklab.iol.unh.edu:23571

Then, it hangs here until I kill the "te_rcf" and "te_tee" processes. I let it hang for around 9 minutes.

On the tester host (which appears to be the Peer agent), there are four processes that I see running, which look like the test agent processes.

ta.Peer is an empty file. I've attached the log.txt from this run.

 - Adam

On Thu, Aug 24, 2023 at 4:22 AM Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
Hi Adam,

Yes, TE_RCFUNIX_TIMEOUT is in seconds. I've double-checked that it goes to 'copy_timeout' in ts-conf/rcf.conf.
Description in in doc/sphinx/pages/group_te_engine_rcf.rst says that copy_timeout is in seconds and implementation in lib/rcfunix/rcfunix.c passes the value to select() tv_sec. Theoretically select() could be interrupted by signal, but I think it is unlikely here.

I'm not sure that I understand what do you mean by RCF connection timeout. Does it happen on TE startup when RCF starts test agents. If so, TE_RCFUNIX_TIMEOUT could help. Or does it happen when tests are in progress, e.g. in the middle of a test. If so, TE_RCFUNIX_TIMEOUT is unrelated and most likely either host with test agent dies or test agent itself crashes. It would be easier for me if classify it if you share text log (log.txt, full or just corresponding fragment with some context). Also content of ta.DPDK or ta.Peer file depending on which agent has problems could shed some light. Corresponding files contain stdout/stderr of test agents.

Andrew.

On 8/23/23 17:45, Adam Hassick wrote:
Hi Andrew,

I've set up a test rig repository here, and have created configurations for our development testbed based off of the examples.
We've been able to get the test suite to run manually on Mellanox CX5 devices once.
However, we are running into an issue where, when RCF starts, the RCF connection times out very frequently. We aren't sure why this is the case.
It works sometimes, but most of the time when we try to run the test engine, it encounters this issue.
I've tried changing the RCF port by setting "TE_RCF_PORT=<some port number>" and rebooting the testbed machines. Neither seems to fix the issue.

It also seems like the timeout takes far longer than 60 seconds, even when running "export TE_RCFUNIX_TIMEOUT=60" before I try to run the test suite.
I assume the unit for this variable is seconds?

Thanks,
Adam

On Mon, Aug 21, 2023 at 10:19 AM Adam Hassick <ahassick@iol.unh.edu> wrote:
Hi Andrew,

Thanks, I've cloned the example repository and will start setting up a configuration for our development testbed today. I'll let you know if I run into any difficulties or have any questions.

 - Adam

On Sun, Aug 20, 2023 at 4:40 AM Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
Hi Adam,

I've published https://github.com/ts-factory/ts-rigs-sample. Hopefully it will help to define your test rigs and successfully run some tests manually. Feel free to ask any questions and I'll answer here and try to update documentation.

Meanwhile I'll prepare missing bits for steps (2) and (3).
Hopefully everything is in place for step (4), but we need to make steps (2) and (3) first.

Andrew.

On 8/18/23 21:40, Andrew Rybchenko wrote:
Hi Adam,

> I've conferred with the rest of the team, and we think it would be best to move forward with mainly option B.

OK, I'll provide the sample on Monday for you. It is almost ready right now, but I need to double-check it before publishing.

Regards,
Andrew.

On 8/17/23 20:03, Adam Hassick wrote:
Hi Andrew,

I'm adding the CI mailing list to this conversation. Others in the community might find this conversation valuable.

We do want to run testing on a regular basis. The Jenkins integration will be very useful for us, as most of our CI is orchestrated by Jenkins.
I've conferred with the rest of the team, and we think it would be best to move forward with mainly option B.
If you would like to know anything about our testbeds that would help you with creating an example ts-rigs repo, I'd be happy to answer any questions you have.

We have multiple test rigs (we call these "DUT-tester pairs") that we run our existing hardware testing on, with differing network hardware and CPU architecture. I figured this might be an important detail.

Thanks,
Adam

On Thu, Aug 17, 2023 at 11:44 AM Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru> wrote:
Greatings Adam,

I'm happy to hear that you're trying to bring it up.

As I understand the final goal is to run it on regular basis. So, we need to make it properly from the very beginning.
Bring up of all features consists of 4 steps:

1. Create site-specific repository (we call it ts-rigs) which contains information about test rigs and other site-specific information like where to send mails, where to store logs etc. It is required for manual execution as well, since test rigs description is essential. I'll return to the topic below.

2. Setup logs storage for automated runs. Basically it is a disk space plus apache2 web server with few CGI scripts which help a lot to save disk space.

3. Setup Bublik web application which provides web interface to view testing results. Same as https://ts-factory.io/bublik

4. Setup Jenkins to run tests on regularly, save logs in log storage (2) and import it to bublik (3).

Last few month we spent on our homework to make it simpler to bring up automated execution using Jenkins - https://github.com/ts-factory/te-jenkins
Corresponding bits in dpdk-ethdev-ts will be available tomorrow.

Let's return to the step (1).

Unfortunately there is no publicly available example of the ts-rigs repository since sensitive site-specific information is located there. But I'm ready to help you to create it for UNH. I see two options here:

(A) I'll ask questions and based on your answers will create the first draft with my comments.

(B) I'll make a template/example ts-rigs repo, publish it and you'll create UNH ts-rigs based on it.

Of course, I'll help to debug and finally bring it up in any case.

(A) is a bit simpler for me and you, but (B) is a bit more generic and will help other potential users to bring it up.
We can combine (A)+(B). I.e. start from (A). What do you think?

Thanks,
Andrew.

On 8/17/23 15:18, Konstantin Ushakov wrote:
Greetings Adam,


Thanks for contacting us. I copy Andrew who would be happy to help

Thanks,
Konstantin

On 16 Aug 2023, at 21:50, Adam Hassick <ahassick@iol.unh.edu> wrote:


Greetings Konstantin,

I am in the process of setting up the DPDK Poll Mode Driver test suite as an addition to our testing coverage for DPDK at the UNH lab.

I have some questions about how to set the test suite arguments.

I have been able to configure the Test Engine to connect to the hosts in the testbed. The RCF, Configurator, and Tester all begin to run, however the prelude of the test suite fails to run.
The documentation mentions that there are several test parameters for the test suite, like for the IUT test link MAC, etc. These seem like they would need to be set somewhere to run many of the tests.

I see in the Test Engine documentation, there are instructions on how to create new parameters for test suites in the Tester configuration, but there is nothing in the user guide or in the Tester guide for how to set the arguments for the parameters when running the test suite that I can find. I'm not sure if I need to write my own Tester config, or if I should be setting these in some other way.

How should these values be set?

I'm also not sure what environment variables/arguments are strictly necessary or which are optional.

Regards,
Adam

--
Adam Hassick
Senior Developer
UNH InterOperability Lab
+1 (603) 475-8248



--
Adam Hassick
Senior Developer
UNH InterOperability Lab
+1 (603) 475-8248




--
Adam Hassick
Senior Developer
UNH InterOperability Lab
+1 (603) 475-8248


--
Adam Hassick
Senior Developer
UNH InterOperability Lab
+1 (603) 475-8248



--
Adam Hassick
Senior Developer
UNH InterOperability Lab
+1 (603) 475-8248

--------------Z74Scc2fXu00OkJoVIboLM1k--