August 31, 2023

#####################################################################
Attendees
1. Patrick Robb
2. Adam Hassick
3. Aaron Conole
4. Bruce Richardson
5. Juraj Linkeš
6. Paul Szczepenek
7. Ali Alnubani

#####################################################################
Agenda
1. General Announcements
2. CI Status
3. DTS Improvements & Test Development
4. Any other business

#####################################################################
Minutes

=====================================================================
General Announcements
* DPDK Summit is Sept 12-13
* The next CI meeting will be rescheduled from September 14th to September 21st, to avoid a clash with DPDK summit travel. The following meeting on September 28th will return us to our normal cycle.
* There is a gov board and tech board session on the 11th with some ci discussion in both meetings
* Unit test suites: Bruce’s patch for dynamically building the unit test suites has hit mainline
* David’s patch fixing the memory leak from PCI device probing (arm64) is still pending
* V3 of patch for skipping specific tests based on an env variable is submitted
* Opting to use an environment variable to skip the tests based on command line parsing concerns
* Patrick should ack this

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Mellanox perf testing:
* cx5 is back online
* Hardware Refresh
* The CX6 NIC is running with no reporting. It is testing at line rates for performance test runs with frame sizes 256-1518, but is falling below line rate for the test when run for 64B and 128B frames.
* Is this due to packet overhead?
* Currently this is running on a gen3.0 x8 pci slot
* Ali is going to remote onto the testbed soon to take a look
* CX7: backordered
* Running DTS within a VM as a security measure?
* Will require pci passthrough, cant be done with virtio
* Does this added level of complexity justify the benefits?
* Connection to host and rest of network should be blocked, and dispose of vm after each run
* Intel 8970 QAT Accelerator card:
* The custom patch doesn’t cleanly apply on the kernel after checking out to 5.4.0-155(currently running), so I’d like to just rebuild the kernel from 5.15 or 6.0. But, I don’t want to do it without ARM people authorizing it, so I’ll proceed once I have the go-ahead from them.
* Juraj: ubuntu versions should be uniform across the lab (so 22.04 for all systems)
* Retesting framework roadmap - UNH:
* This is online, and an email explaining the process has been sent to the dev mailing list: https://inbox.dpdk.org/dev/CAC-YWqiXqBYyzPsc4UD7LbUHKha_Vb3=Aot+dQomuRLojy2hvA@mail.gmail.com/
* We will add some basic instructions to the DPDK website: https://inbox.dpdk.org/web/20230831031834.9271-2-probb@iol.unh.edu/T/#u
* We’ll also put something on the community lab dashboard about page
* TS-Factory: Using our dev testbed, Adam has attempted to run the ethdev testsuite on a few nics (MLNX cx5, Intel x710, Intel E810). We have only gotten the test suite to run on our CX5s so far. Oktet lab is communicating with us to resolve some issues we’ve run into and provide guidance regarding how this can be used in CI.
* Adam discovered the bug that the RCF (remote control) implementation for ts-factory required that DNS returned v4 ip addresses to the test engine when initiating connections to the tester and DUT systems. Andrew at Oktet labs hotfixed this for the testing branch we are using it, and this bugfix will hit ts-factory mainline soon
* New changes (as of a few days ago) to ts-factory are causing –werror builds of the testsuite to fail, which has been reported to Oktet labs and they are working on this week
* How to use this in a ci context?
* The DPDK testsuite has approx 6000 test cases, which presently takes 9 hours to run, and none of the NICs supported by ts-factory will pass all of them. So, there is not an expectation that all tests will pass. The “Bublik” tool used by the framework allows for comparison of pass/fails from the previous run to the current run, but Oktet says this output is aimed for human readability and is not designed for automation.
* There is a flag for cutting down the testsuite to “sanity check” testcases, which we could more reasonably expect to pass 100%. This would mean we could more reasonably report CI results, expecting 100% passes, and also it would mean a much shorter runtime. but my guess is this is throwing the baby out with the bathwater, as those sanity checks results won’t be very valuable.
* Do we need to run this periodically, as opposed to on every patch, due to the test duration?
* Should we find a way to report some kind of result based on what tests pass? I don’t yet know if this is feasible. Or, we can simply run it and store the human readable artifact on the dashboard at an easy to find place.
* Does not compile on ARM, but we can reach out to Oktet/ARM to resolve these issues
* Test engine only has to compile on a non-worker node, and that node could be x86
* Need to figure out what exactly has to be compiled on the (arm) worker node and communicate with oktet labs if there are issues
* Patrick and Aaron talked about the UNH possibly doing more redundancy testing for the Intel lab. So, running some of the testsuites they’re running which we aren’t.
* It looks like Intel is reporting results again (woohoo!)
* Patrick will determine the coverage gap between UNH and the Intel lab before Dublin so that he can discuss it with any interested parties in person
* Last meeting Aaron asked about maintainers for next-* branches getting immediate CI runs on an “on-push” basis, like with the LTS-staging branches. There is nothing preventing this except A. There needs to be a github mirror so we can use the github API (like we do with LTS-staging), and B. to Aaron’s point from last time, we need to agree on testrun frequency.
* 2024 SOW item?

---------------------------------------------------------------------
Intel Lab
* They are reporting results again

---------------------------------------------------------------------
Loongarch Lab
* none

---------------------------------------------------------------------
Github Actions
* Retest framework: currently testing this internally, and will soon submit it for review.
* Physical server move: downtime will occur at the end of the year or beginning of next year
* Figuring out if there’s a way to migrate the VM to another system so that downtime is reduced. Otherwise, it will be about a week of downtime

=====================================================================
DTS Improvements & Test Development
* DTS presentation (work in progress): https://docs.google.com/presentation/d/1fm8EtbzEQHrFyHoHiy0PNQz3MYY2NsQRatR_eC3SvHw/edit#slide=id.g260b440c69d_0_331
* Honnappa, Juraj, and Patrick
* Paul Szczepanek will be working on DTS
* Should be included in any conversation for DTS improvement group
* Group met last week to discuss DTS roadmap for 23.11, and Honnappa is sending that out today
* Jeremy is porting over the scatter testsuite and packet module for packet comparison/other packet related functions
* Juraj - Documentation
* What tools to generate the API docs? Sphinx (developed for python, is a natural choice) or Doxygen.
* We need to agree on the format of the documentation
* Juraj likes Google docs format (very readable)
* Jeremy will review this patch
* DTS roadmap
* 1) Documentation
* 2) TG related code (Packet manipulation and verification module, Support for TREX which require Non-packet-capture method enhancements)
* 3) Scatter test suite
* 4) Merge pending patches

=====================================================================
Any other business
* Next meeting is September 21, 2023