July 20, 2023

#####################################################################
Attendees
1. Patrick Robb
2. Alex Agerholm
3. Christian Muf
4. Juraj Linkeš
5. Danylo Vodopianov
6. Jeremy Spewock
7. M Kostenok
8. Lincoln Lavoie
9. Manit Mahajan
10. Alex Kholodnyi
11. Aaron Conole
12. Thomas Monjalon
13. Ali Alnubani

#####################################################################
Agenda
1. General Announcements
2. CI Status
3. DTS Improvements & Test Development
4. Any other business

#####################################################################
Minutes

=====================================================================
General Announcements
* Napatech is beginning the process of contributing their driver for DPDK, and intends to stand up a process for CI testing
   * 1. What is the preferable way for community to work with custom HW? According
   * to the documentation: send hardware to the DPDK Lab in the UNH-IOL is one possible way? Could we set own lab? Where CI should be located and which accesses should be provided?
      * Making use of the Community Lab is one way to accomplish CI testing without having to setup your own CI Testing infrastructure. But, setting up your own internal lab and reporting your results is also good.
         * Governing board approval might be required about bringing a new vendor into the community lab
         * Is a specific membership tier required for hardware inclusion in the lab?
         * Napatech is wondering about any regular fee (per month, per year)
      * [Aaron] For running an internal lab, scripts for polling patchwork and DPDK CI testing scripts can be found at https://github.com/ovsrobot/pw-ci and at https://git.dpdk.org/tools/dpdk-ci/
      * For how long should an internal lab be available?
         * As long as possible - the DPDK project will continuously be changing, so if a driver is contributed, long term testing will be very helpful in verifying for the community that no functionality breaks months or years down the road. If testing results can be provided long term, the community can help keep the driver in a good state long term.
   * 2. Which DTS tests should be executed by NT team to prove that NT DPDK driver is okay to be upstreamed?
      * There are no specific DTS testsuite requirements for upstreaming, so it’s best to just extend use of DTS as much as possible according to the driver’s feature set
   * 3. we have implemented custom DTS tests which are covering the functionality of the NIC. May these tests be considered as enough for merge of driver? If so, should they be upstreamed as well?
      * If you are going to report (to patchwork) results from these custom testsuites, they should be upstreamed to DTS. Lijuan is the maintainer. https://git.dpdk.org/tools/dts/
      * Ping Lijuan about schedule for development
      * Small-Medium changes are fine halfway through a release cycle, but the RFCs exist for a reason and major changes cannot come late in the process
   * 4. Is there documentation for setting up a CI Testing lab with DPDK Patchwork?
      * There is not much in terms of documentation, but for running an internal lab, scripts for polling patchwork and DPDK CI testing scripts can be found at https://github.com/ovsrobot/pw-ci and at https://git.dpdk.org/tools/dpdk-ci/
   * 5.  In which format should be this test report?
      * [Aaron] Test reports can be as simple as: https://mails.dpdk.org/archives/test-report/2023-July/420253.html - Note that we need the subject line, Test-Label, Test-Status, patch url, and _test text_ to all be present in the form in that email.
   * 6. Which access level to the setup we should to provide for the community?
      * [Aaron] If you run your own lab, the community generally doesn't need access - but you will need to support in the case your lab has issues.
   * 7. On which request and how often we have to execute tests and send report?
      * [Aaron] We run the tests on every patch that comes via patchwork.
      * Ideally, the report will include the necessary output showing the failure and help the developer start digging on how to fix their patch
      * Currently Napatech has run testing periodically, but not (per patch/ per series)
   * 8. Do test reports need to be saved locally?
      * Not a requirement, if enough information can be included in the test report
      * UNH does save some artifacts associated with the testrun, which can be downloaded from the dashboard. In some cases these logs are much more extensive than what is provided on the emailed test report.
         * The ideal is to hold these for at least a release cycle
* Napatech has developed a PMD dpdk driver for their smartNIC, which they aim to upstream to DPDK
   * Has run their custom DTS tests internally
   * Looking for clarity on what is required for upstreaming
      * Is CI testing a requirement for submitting a driver? A: No, code review is sufficient, but CI which reports upstream will help both Napatech and the community to protect against and performance or functional regressions or breakage
* Provide an RFC by Aug 12th, further review and development can come afterwards
* What meetings should Napatech attend for submitting their driver?
   * It would be typical for a presentation to be made to the tech board meeting. But first, the driver should be submitted via the mailing list. After that, someone will be invited to the tech board.
   * Ideally someone will check in with the CI meeting periodically
* If Napatech didn’t want to report per patch, could we key off of RC-* releases and have them simply report for those releases?

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Eal_misc_flags_autotest: As you may have noticed, the Community Lab CI is failing a fast-test unit test on all new patchseries. The cause is a bnxt driver commit (https://git.dpdk.org/dpdk/commit/?id=97435d7906d7706e39e5c3dfefa5e09d7de7f733) which introduces statics which are stored in the same address space as is requested by the misc_flags_autotest.
   * Ajit suggests simply changing the address space requested by the unit test - which is fine with me
      * Aaron has posted a patch making this change
         * This is simply changing from one random address to another random address, and hoping this issue never happens again. Is there a better solution?
   * How to handle in the short term? Modify meson fast-tests suite every time?
   * The most interested part of this in terms of CI Testing is, how did it get through CI Testing and into DPDK, if it fails in our CI? The answer is that when the patchseries hit patchwork, it was “missing'' one of the patches, which was added to patchwork hours later/next day. Unfortunately, our process does not currently account for this possibility (which is admittedly quite rare), and we simply tried to apply the series, failed, because we were missing one of the patches, and didn’t run the patchseries through our CI.
      * From Patchwork’s API, each “series” object has a “recieved_all” field. We will modify our process to account for this field. If it is False, then skip and try again later, until it is true
      * Bnxt is going to avoid submitting huge patches in the future
   * They created some static globals which “randomly” are assigned to a specific memory area which overlaps with the unit test
* Apply failures and sanity check build failures are both reporting to patchwork:
   * Apply failure: https://patchwork.dpdk.org/project/dpdk/patch/20230712173050.599482-1-sivaprasad.tummala@amd.com/
   * Sanity check build: https://patchwork.dpdk.org/project/dpdk/patch/20230713102659.230606-2-maxime.coquelin@redhat.com/
* Compression-Perf sample app
   * We have dry run this using the Zlib vdev for compression/decomression operations, which provides results on validity of the decompressed output (compared to input file) and also provides performance metrics
      * For the zlib vdev, our thinking is for the testcase we should pass/fail based on the validity of the output. We will report the performance results, but these should not determine pass/fail for the vdev
      * We may have the opportunity to run the compression-perf sample app using hardware accelerators in the future. In that context, it would be appropriate to condition pass/fail on a performance variance threshold.
* Linux kmods dpdk compile test has been broken for a few weeks, due to a linux kernel API modification in v6.5. Ferruh submitted a patch fixing build behavior for v6.5: https://git.dpdk.org/dpdk/commit/?id=dd33d53b9a032d7376aa04a28a1235338e1fd78f
   * Seen in daily periodic testing of main: https://dpdkdashboard.iol.unh.edu/results/dashboard/tarballs/25540/
* NIC hardware refresh:
   * Marvell board has arrived - still need details from them about chassis, other components to pair with the board, and guidelines for how they want to test DPDK on the hardware. They did share a l2fwd sample app setup specific to the marvell board, which we could setup and “test,” but presumably they will want to accomplish more on the board than this sample app run.
      * E810 (traffic generator) to pair with this board has arrived
      * Need to acquire DAC cabling for this, this needs to be a specific breakout cable (from discussions with Marvell), so we need to ensure the correct breakout is ordered.
         * Request an exact part number and UNH can probably order it
   * 2 Intel E810s are in, DAC cabling has arrived, this NIC will be installed in the to be donated Intel server
      * Legal review continues for Intel’s server donation
   * 1 of the 2 mlnx cx6 cards are in, the other is backordered
      * DAC cables are here
   * 2 mlnx cx7 cards are still on backorder
      * DAC cables are here
   * Intel QAT card for the arm server is ordered, but backordered right now
   * No news from Ericcson about the embedded snow board
   * Everything else is ordered
   
---------------------------------------------------------------------
Intel Lab
* Intel reps keep saying there is a risk of less testing in the near future
   * The lab may be decreasing its operations
   * What was done in Intel lab, which we can replicate
      * UNH should check the gaps between what testing they are doing and Intel has been doing, and see if/where coverage can be bridged

---------------------------------------------------------------------
Loongarch Lab
* None

---------------------------------------------------------------------
Github Actions
* No news
* Will return to retesting framework stuff going forward

=====================================================================
DTS Improvements & Test Development
* Smoke tests: http://patches.dpdk.org/project/dpdk/patch/20230719201807.12708-3-jspewock@iol.unh.edu/
* traffic generator abstractions: http://patches.dpdk.org/project/dpdk/list/?series=28973
* In future releases, changes will not be accepted this late in the release cycle
* There is a slot for DTS in the DPDK Summit (at the end of the first day - 20 minutes)
   * Juraj will be presenting remotely, Patrick will be there in person to answer any questions
* Docstrings patch should be split into multiple parts. Agreement is still needed on the format to be used for documentation.
* Extending smoke tests is one possibility
* Dynamically skipping and selecting testsuites according to system info is one possible QOL feature
* It probably makes sense to record possible development in a spreadsheet and vote on priorities
* Are we ready to start porting over some of the functional tests? This should be technically possible, so we should explore this.
* Dockerfile patch for DTS will possibly be a part of 23.11 DTS roadmap - Thomas will try to review this in August
* Can more DTS patches be merged “mid release” to make the DTS development process more dynamic?
   * This should be possible - make sure to utilize slack and CC Thomas
   * Thomas will be the one to merge patches for DTS in the near future
      * Can this responsibility be deferred to someone else? Thomas is going to discuss this with the tech board at Dublin. In general Thomas would like to recruit more subtree maintainers.
     
=====================================================================
Any other business
* PW will be upgraded by Ali Alnubani on the first sunday after the 23.07 release, which should be this Sunday
* Next meeting is August 3, 2023