August 17, 2023

#####################################################################
Attendees
1. Patrick Robb
2. Bruce Richardson
3. Honnappa Nagarahalli
4. Aaron Conole
5. David Marchand
6. Manit Mahajan
7. Lincoln Lavoie

#####################################################################
Agenda
1. General Announcements
2. CI Status
3. DTS Improvements & Test Development
4. Any other business

#####################################################################
Minutes

=====================================================================
General Announcements
* DPDK Summit is Sept 12-13
   * Should we hold the scheduled Sept 14th ci meeting? Which ci meeting regulars will be at the summit? Aaron and Patrick can’t make it, so let’s cancel this one.
      * Will suggest an off normal schedule meeting to replace this
* Patchwork v3:
   * Bruce reported incorrect series titles and missing cover letters for some newer patchseries
   * Don’t CC the ci mailing list with dpdk patches, as it may get diverted to the ci project on patchwork.
      * A temporary workaround is to cc just relevant people in the ci community
      * Aaron is supposed to have admin permissions for the ci project on pathwork, but he needs to ping Ali about this
* Unit test suites: Bruce Richardson is working on a patch which eliminates the old approach of setting the DPDK unit testuites from /app/test/meson.build, and instead dynamically builds the lists based on build configuration
   * This raised one issue with UNH CI, which is that we are filtering out the eal_flags_file_prefix_autotest on our ARM containers, and this will no longer be possible following Bruce’s rework. Bruce has suggested implementing meson functionality allowing for disabling specific tests, which is a better version of how we filter currently.
      * Testlogs for the failure are now shared with David who is going to try and help
      * David has reproduced and has seen a leak when there is 1 (and only 1) device
      * UNH will send lspci output from the VMs which host the containers
         * ARM vms are on KVM, so that might be a reason
   * Bruce pushed a new patch where you can set an env variable of tests to be skipped
      * This is better in that the tests show up as skipped instead of just not showing up at all - in this way it is clear to people looking at the logs that the test was not run, but usually is run for this testsuite
   * For 32 bit aarch testing, which cannot run via meson test and is dependent on the dpdk-test binary path, we will have to coordinate the merge of this patchseries with Bruce so we can modify our script, or Bruce can put a symlink connecting the old path to the new
      * Cross compile a version of meson to run on that? Honnappa does not think this is needed, but he is going to talk to arm folks about it.  
     
=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Mellanox perf testing:
   * The Mellanox DUT is upgraded to 22.04 and is running perf testing, but with no reporting until Mellanox people assess the new results. So far, results appear consistent with the performance seen when the DUT was on 18.04, and with an older version of the MLNX OFED driver.
      * Patrick will send a report to Ali today with the results from the previous 24 hours
   * Hardware Refresh
      * The CX6 NIC is installed. Ali created a TREX config for this nic. Patrick will complete the remaining setup for perf testing. That will require a patch to DTS adding NIC info. CX7 is in the same boat. Ali shared an example from when the cx5 was added, which I will use as a reference: b6b0a575d34f ("tests/nic_single_core_perf: add support for Mellanox's nics")
      * CX7: The SKU we ordered was discontinued, so our order has been canceled. We put in another order for a different model:MCX713106AC-CEAT
   * Running DTS within a VM as a security measure?
      * Will require pci passthrough, cant be done with virtio
      * Does this added level of complexity justify the benefits?
* Intel 8970 QAT Accelerator card:
   * We still need to rebuild the kernel with the patch Ruifeng shared: https://lkml.org/lkml/2022/6/17/328
   * Are there security issues or instability introduced by building the custom kernel and not changing it?
   * There may be an official PPA we can use with the latest official kernel
      * It should be in kernel 6.0 and newer, so it may be as simple as upgrading to the latest kernel
* Retesting framework roadmap - UNH:
   * 1. Database migrations - done
      * Storing retest request relations to patchseries tarballs, and the retest datetimes, to be used as parameters for future comment requests
      * Will limit retest requests to x per patchseries tarball (two seems reasonable, but open to a group decision)
         * Limit to 1 at first
   * 2. Script for parsing comments for api calls to pw, contexts/labels, producing list, writing json file. - done.
      * This is not UNH specific and we can upstream this to the dpdk-ci repo if there’s interest from other labs
   * 3. Write jenkinsfile script for requesting comments and triggering testing pipelines according to retest requests - in progress:
      * What is an acceptable frequency for us to run this (which may involve many patchwork API calls)
   * 4. Contact community with the rules and format
* Reporting: UNH rolled out “tail reporting” which moves reporting from result aggregating reporting stages at the end of our CI process, to reporting from the individual jobs which run testing. This should decrease our time to delivery for reporting, and increase our reliability, as it essentially moves us to one source of truth for reporting.
* TS-Factory: We are running this on a Dev server, but need some help with setting the config files for the DPDK PMD testsuite. Adam has emailed Konstantin from Oktet Labs for clarification on this.
   * Consider test capacity
* If Intel lab is downsizing, UNH may need to reassess our goal of adding a framework for other labs to submit results and artifacts to the UNH dashboard, since that was an Intel lab request.
* Aaron is interested in maintainers getting their own branch that maintainers can push to to get test results
   * Similar to LTS-staging* branches
   * What policies would have to be set regarding how maintainers use this ability?
   
---------------------------------------------------------------------
Intel Lab
* No reports since August 12th

---------------------------------------------------------------------
Loongarch Lab
* none

---------------------------------------------------------------------
Github Actions
* Robot’s poller will be offline for some time in sept-october due to systems being physically moved
   * The poller is run in a VM - if this can be migrated, downtime may be a day or less, but it’s unclear whether we can
* Retesting framework: work on this is ongoing, and will be sent to the ci mailing list
* Going to be re-assessing some APIs on pw and GHA, and reverting old workaround for polling via GHA

=====================================================================
DTS Improvements & Test Development
* DTS presentation (work in progress): https://docs.google.com/presentation/d/1fm8EtbzEQHrFyHoHiy0PNQz3MYY2NsQRatR_eC3SvHw/edit#slide=id.g260b440c69d_0_331
   * Honnappa, Juraj, and Patrick
* Jeremy submitted an RFC for porting over the Scatter Testsuite
* 9AM eastern next week we will have a meeting about the DTS roadmap

=====================================================================
Any other business
* Next meeting is August 31, 2023